Input Token Cost of RAG-Focused AI

Andrew Karpathy has identified an issue with the way those of us who make use of AI to interact with content we have collected (RAG- retrieval-augmented generation). Each time we submit a prompt to query our designated content, the AI resets and reprocesses it to address our query. As I will eventually demonstrate, this repeated input processing can be significantly demanding, and if you pay by the token or have a plan that has a token limit, this repetitive processing can be quite expensive. Karpathy proposed that you have a reasonable understanding of what you generally want to get from your curated content, you can find and export this content once and then focus on an organized wiki (Karpathy’s term) of the types of information you anticipated would be of value. Karpathy’s concept and its multiple variations have generated considerable attention, including interest in running them on personal computers. It is not my intent to focus on the how-tos of the LLM-wiki process. I am intrigued and I did purchase a mac-mini I can dedicate to exploring this process. The popularity of this idea has made purchasing Mac Minis very popular and the machine I purchased in April is not scheduled for delivery until mid-July. I take this as an indication of the potential usefulness of this approach.

What I want to document here is the inefficiency of RAG. Unless you are a heavy AI user and unless you make frequent use of AI to explore a large pool of content that you designate, you are unlikely to encounter this challenge. I don’t presently have a $20 a month account. Instead, I have a lower-max account (Abacus.AI) and pay for several accounts that allow me to upload content for exploration (Mem.ai, Recall.ai). The situation that allowed me to carefully analyze token use was the API plugin I used to apply AI to content I have stored and organized in Obsidian. This plugin accesses Claude via an API and a “pay as you use” plan. Anthropic provides detailed usage data, enabling interesting analyses.

Claude via Obsidian

I have used Obsidian to store notes and highlights I generate from my reading for several years and I have collected a sizeable body of content. The folder on note-taking I am using in the following demonstration contains 132 notes totaling 38,208 words. The following demonstration takes the following approach. The data available from my use of Claude to prompt this folder with prompts is saved by the day so for purposes of this little experiment, I issued one prompt a day.

Prompts by the day:

Day 1: Under what circumstances is notetaking a generative activity

Day 2: What have research studies shown regarding the relative effectiveness of taking notes by hand or using a keyboard?

Day3: Using my notes, generate separate responses to the following prompts. Under what circumstances is notetaking a generative activity? What have research studies shown regarding the relative effectiveness of taking notes by hand or using a keyboard?

Logic of this approach. The data retained by Anthropic for use of Claude (4.6 in this case) is totaled daily. In addition to the charge to my account (one chart), a daily token total is provided, along with a running total of Tokens In and Tokens Out. By tracking the totals in combination with the charts provided, it is possible to demonstrate the problem others have demonstrated with AI and RAG – the processing required for inputting the content is required each time the system is used hence asking for a response to the same two requests following a single input will save a considerable number of tokens (i.e., day 3 token use will be nearly the same as either day 1 or day 2). So, the cost and token usage on day 3 were only slightly higher than on either day 1 or day 2.

Summary

I must admit I was surprised at the cost of asking an AI tool to focus on content a user provides instead of asking a similar question to be answered from that tool’s own stored “knowledge”. These seemed basic requests someone might make many times a day in exploring with AI, but the focus on content the tool must first tokenize adds substantially to the cost. Given this cost, the Karpathy wiki strategy makes a lot of sense. At present as a temporary strategy, I store the output from my Obsidian-related AI queries in a separate Obsidian folder and I can then target this folder for a lower Input Token cost.

One more thing. Claude’s response to my query was great and would have been quite helpful had I been focused on creating a post. The structure of the output was, I would say, insightful. I have appended the response to my query about taking notes by hand versus on a digital device (see below).

*****

Notetaking – Hand vs Keyboard

Hand vs. Keyboard Note-Taking: What Your Notes Show

Your notes contain extensive coverage of this topic across multiple studies. Here is a synthesis of the key findings:

The Foundational Study

The landmark study by Mueller & Oppenheimer (2014), documented in Pen is mightier than the keyboard, found that keyboard users recorded more content but that longhand note-takers performed better, particularly on conceptual understanding items. In Study 3, which included a week delay with note review, differences favoring handwritten notes emerged specifically when review was allowed.

The Generative Processing Explanation

Several of your notes describe the primary theoretical explanation for a longhand advantage. Because typing speed (~33 wpm) exceeds handwriting speed (~22 wpm), as noted in transcription speeds, longhand note-takers are forced to summarize and paraphrase rather than transcribe verbatim. This engages generative processing—connecting new information to prior knowledge and reorganizing ideas—which promotes deeper learning. As described in Luo, Linlin, Kenneth A. Kiewra, Abraham E. Flanigan, and Markeya S. Peteranetz2018:

“Laptop note takers had a transcription orientation and recorded mainly verbal information… whereas longhand note takers had a generative orientation and used a combination of verbal and spatial note-taking strategies.”

The Luo et al. (2018) study further found that keyboard users recorded fewer visual notes (charts, graphs) and that the process function was more important for keyboard notes while the product function (review) was more important for longhand notes.

Contradictory and Nuanced Findings

Your notes reveal that the research is far from settled:

The 2022 Meta-Analysis (2022 Notetaking Meta-Analysis) by Voyer et al. analyzed 36 articles and found no overall effect for notetaking approach. The authors speculated that studies showing a longhand benefit may have been confounded by distraction opportunities on digital devices.
Urry et al. (2021) (Urry.et.al) replicated Mueller & Oppenheimer and found inconsistent results. Laptop users took more notes, and more notes were related to better test performance. Importantly, Urry differentiated quantity from verbatim overlap—quantity was positively related to performance while proportion of verbatim overlap was negatively correlated.
Note completeness research (NoteCompleteness) challenges the longhand advantage by showing that the volume and completeness of notes influence retention. Bui et al. (2013) found that students with access to more complete notes (as produced by keyboards) demonstrated better performance, directly contradicting the Mueller & Oppenheimer interpretation.

The Revision Factor

Your notes highlight that the comparison changes substantially when revision is introduced:

Flanigan et al. (2023) (Flaniganetal2023) found that among note revisers, longhand note-takers added three times as many complete ideas during revision pauses as computer note-takers, and scored more than half a letter grade higher on posttests. However, the advantage emerged primarily through the revision process rather than initial note-taking alone.
Cojean & Grand (2024) (Cojean&Grand2024) found that computer note-takers were more likely to reformat their notes after class, suggesting they compensate for shallow initial encoding through later revision—a finding that complicates simple hand-vs-keyboard comparisons.

Research with Children

Your note childrennotes describes Horbury & Edmonds (2021), which is notable as one of the few studies with children (ages 10-11). They found conceptual understanding was superior when taking notes by hand, but unlike adult studies, there were no significant differences in the amount of content recorded. The authors speculated that children’s lesser keyboarding experience may have produced greater cognitive load when typing.

Critical Perspectives from Your Notes

Your notes contain several important critiques of this research area:

The generative argument has a logical flaw: As you note in transcription speeds, the logic assumes “a strange lack of awareness on the part of the learner. It would seem that the keyboarding learner is oblivious to the benefits of generative processing and goes with what is easier, but less effective.”
Modification can equalize outcomes: Your AI response to my question challenging the Luo study that assumed the notes themselves could not result in generative activity argues persuasively that generative processing can be shifted to the review phase, that laptop notes provide a richer base for modification, and that laptop affordances enable superior reorganization. The conclusion: “The medium matters less than the strategy.”
Ecological validity concerns: Multiple notes flag that most studies use brief delays and immediate testing, which differs significantly from actual classroom conditions where review may occur days or weeks later (Luo, et al 2018, Flaniganetal2023).

Summary

Finding	Support
Longhand produces more generative/paraphrased notes	Consistent across studies
Keyboard produces more complete/voluminous notes	Consistent across studies
Longhand leads to better learning outcomes	Mixed—meta-analysis shows no overall effect
Note completeness predicts achievement	Supported, which favors keyboard
Longhand advantage strongest with note review	Supported in some studies, not replicated in others
Revision can compensate for shallow keyboard encoding	Supported by modification research

The overall picture from your notes is that the hand-vs-keyboard question is far more nuanced than popular accounts suggest. The medium matters less than the strategy the learner employs, and the opportunity for revision and post-processing may be the most important factor in determining which approach is more effective.

Input Token Cost of RAG-Focused AI

Archives

Archives

Archives

Input Token Cost of RAG-Focused AI

Share this:

Archives

Archives

Archives