Learning Aloud

Karpathy Plugin for Obsidian

I have spent a significant amount of time over the past week or so developing a Karpathy wiki based on a large portion of my Obsidian notes. This process began in April when I decided to purchase a Macintosh Mini I intended to devote exclusively to the exploration of AI on the desktop. I was a bit slow in making this purchase and it took until last week to receive my purchase. My tardiness cost several hundred dollars more than it would have a few months ago.

I was motivated to invest and explore this area for two reasons. First, my main interest in AI continues to focus on retrieval-augemented generation (RAG) of the notes and highlights I have collected to serve as a foundation for my writing projects. As I have used AI plugins to interact with the content I have stored and organized in Obsidian, I discovered that the API-based services for interacting with these notes are relatively expensive because the process of first feeding the notes to the AI service must be repeated each time a session is initiated. Karpathy proposed that AI could be used to create a wiki based on the concepts and connections in a collection of source material, either once (when the collection was created) or as each new item of content was added, and this wiki could then be the focus of future explorations, reducing the cost due to the repeated input of the same content to the AI service.

My second motivator was personal curiosity, sparked by the many posts promoting the potential of AI tools and models that could run on personal hardware, avoiding the costs and scrutiny associated with using online services from major AI companies. The proposal was that many common uses of AI no longer required access to $20 or $200-a-month subscription services.

I understand that another tutorial or “how I did it” post may be required at this point, but I read a post explaining that getting started with self-hosting LLMs will not be easy as the posts newbies are likely to read make it sound, and a good deal of exploration and personalization will be required. The message was intended not to be discouraging but to communicate that “don’t give up, you should be able to get it to work.” This was pretty much my experience, and I thought it worthwhile to explain the issues I encountered and why I had to make adjustments to my specific situation. My experiences with tech since the mid 1980s have kind of gone this way.

So, I have two Mac Minis now, and the first challenge was how to connect both to the same large monitor so I can switch back and forth as required by a general-use and a specific-use computer arrangement. I knew I would have to purchase a KVM (keyboard, video, mouse), but I had not considered that my current setup uses a Bluetooth mouse and keyboard. More specifically, Apple’s Magic Keyboard and Mouse are not intended to be linked to more than one device. You charge your Magic keyboard with a USB cable, so the cable can be used as it has long been used to connect to a computer. You also charge your Magic Mouse, but the cable is inserted on the bottom of the mouse, preventing it from being used while it is being charged. Solution – purchase a mouse with a cable. The first challenge is overcome.

My plan was to use the Obsidian Karpathy LLM wiki plugin because this seemed the most efficient way to create a working system. The plugin’s setup allows selecting multiple AI sources, including subscription services. I did use Anthropic’s Claude API when I was having difficulty getting either of the two local options (Ollama or LMStudio) to work. Claude worked great, but adding one new source document cost 70 cents. My present collection is close to 300 note files, and the work the AI does increases as the complexity of the wiki increases so I treated the success as a sign the struggles I was experiencing could eventually be overcome.

When using Ollama, I was experiencing a consistent problem with some, but not all of the note files the AI was ingesting to build the wiki. I spent a considerable amount of time over several days comparing the files that could and could not be processed and I never did find a difference. It wasn’t the length, the presence of specific markdown tags, the tool I had used to create the original markdown file, or any other variable I could imagine. Nothing. However, the problem was consistent. The same files, time after time, would either work or fail.

My typical strategy in such situations is to ask questions of the Internet. One proposal was that the JSON history had become corrupted. The solution was to reveal the invisible files (the .files and folders) and delete these files. New files would be generated when the Obsidian app was next launched. This was done without consequence.

One issue I encountered was that the models displayed as options within Ollama did not contain the model (qwen2.5) I had found recommended when I read the descriptions of others. I searched how to add other models to Ollama and found it could be done with a terminal command (ollama pull <model name>. Now qwen2.5 appeared. Qwen3.6 was originally listed and I assumed there would be little difference, but for some reason, I was wrong, and the system worked with qwen2.5.

Without going into details because others have already provided tutorials, you first add and install the Karpathy LLM Wiki plugin for Obsidian. The gear icon associated with this community plugin provides a “fill in the blank” form where you enter information linking Obsidian to the AI online service or local option you want to use.

The wiki construction process is controlled by the commands that appear in the Obsidian command list when the Karpathy plugin is installed.

So, you start Ollama and select the model you will use in Obsidian. Start Obsidian and select the command to ingest a file or folder and be patient. Eventually, your wiki will be generated, and you can query the wiki rather than the source files. The right-hand column displays a response to a prompt.

So, I was able to generate a wiki based on more than 150 of my notes. In examining some of the components of the wiki I did find some weird artifacts. There were some with Chinese characters. I happen to be listening to LeoLaporte talking about different AI models and he said that qwen originated in China. It then made sense to me that the model might translate some of the Chinese names in my article summaries and include their Chinese translations (no idea if that is actually what happened). I also found some md pages with titles, but no content.

When I used the Karpathy command to submit queries, I found the quality of the responses to vary. Some made sense and some ignored sources I knew existed and were central to what I expected. Rewording of the query in ways I thought the model should have understood as equivalent sometimes resulted in the response I expected.

The following comparison (first from querying my new wiki, and the second from using Claude with my original source files) demonstrates fairly substantial differences. As you can tell from a quick skim, there are significant differences in the eloquence and completeness of the responses. The sample responses are presented as an Appendix.

Summary

Each user is likely to have a unique experience when applying a local AI system to their content to generate a Karpathy-style wiki. Expecting that it will take some tinkering is good advice even when following instructions or “how to do it” videos on YouTube. My effort here was to describe what this tinkering experience looked like.

After at least getting the system to work, I was able to compare my first successful effort with what I generated for 44 cents using Claude. The differences were substantial.

I assume there are many more issues to explore. I am currently repeating the process with a different script that instructs the AI on what to do when creating the wiki. If that effort produces substantial improvements, I will have to write another post.

*******

Response of local model to wiki

Summary of Research Comparing Handwritten Notes vs. Computer Notes

According to the literature summarized from various studies, there are several key findings regarding the comparison between handwritten notes and notes taken on computers:

Repetition in Note-Taking:

– A study by Bui et al. 2015 suggests that repeated exposure to material enhances learning outcomes, especially when coupled with effective note-taking strategies.

Handwritten Notes and Memory Retention:

– Longhand notes are described as more paraphrased and containing more signals and images, which can lead to higher text-related achievement when reviewed (from Longhand Notes).

– Lecture Note-Taking Strategies indicate that students who take notes by hand outperform those who use digital devices, as handwritten notes require a higher level of processing and engagement.

Editing and Revisions:

– Research by Mueller and Oppenheimer (2014) found that while typing is faster, it leads to less cognitive effort and thus less retention due to the lack of engagement required in handwriting notes.

– Follow-up revisions are crucial as they enhance memory retention. According to a study cited by AI response, revising digital notes can mitigate some of the disadvantages of digital note-taking, improving long-term recall and academic performance (from AI Response to Challenging Luo Study).

Professional Note-Taking:

– Additionally, in the context of professional training, effective note-taking is vital for future practitioners (from “Notetaking in Professions”).

References

Longhand Notes — Core mechanism explaining why handwritten notes are more beneficial.
Bui et al. 2015 — Effects of Repetition on Note-Taking and Recall Strategies in College Lectures.
“Notetaking in Professions”. Stacy and Cain, American Journal of Pharmaceutical Education (2015) — Importance of note-taking skills for professional training.
Mueller and Oppenheimer (2014) — Pen and Pad versus Laptop for Note-Taking.
AI Response to Challenging Luo Study — How Note Modification Research Might Counter Luo et al. (2018)

*******

Paste from Claude plugin within Obsidian

Handwritten vs. Computer Note-Taking: What Your Notes Say

Your vault contains extensive research on this topic, and the picture that emerges is more nuanced and contested than the popular narrative suggests. Here is a synthesis of what your notes reveal:

The Foundational Study

The most influential study is Mueller & Oppenheimer (2014), documented in Pen is mightier than the keyboard. Their key finding was that faster keyboard transcription led to more verbatim, less generative notes, while the slower speed of handwriting forced summarization and paraphrasing. Notably, their Study 3 found no differences when note review was not allowed, but longhand was superior when review was allowed—a surprising result.

The Generative Processing Argument

Several of your notes outline the theoretical rationale:

Translation hypothesis: Writing adds a distinctive encoding channel beyond just listening (Luo, Linlin, Kenneth A. Kiewra, Abraham E. Flanigan, and Markeya S. Peteranetz2018)
Generative hypothesis: Handwriting’s slower speed forces paraphrasing, organization, and integration with prior knowledge, leading to deeper processing (Depth and Generative, EncodingEffect)
As noted in transcription speeds, adults type at ~33 wpm and write longhand at ~22 wpm, meaning handwriters must be more selective

Your note in transcription speeds raises a sharp critique of this logic: it “suggests a strange lack of awareness on the part of the learner. It would seem that the keyboarding learner is oblivious to the benefits of generative processing and goes with what is easier, but less effective.”

Contradictory and Inconsistent Findings

Your notes consistently emphasize that research findings in this area are contradictory:

Urry.et.al documents a major replication of Mueller & Oppenheimer that found inconsistent results. Urry found laptop users took more notes and more notes related to better performance. Critically, they differentiated quantity from verbatim overlap—quantity was positively related to performance while verbatim proportion was negatively correlated.
Luo, et al 2018 found that when notes were recorded but not reviewed (process function), laptop was actually better for image-related learning. Only when notes were reviewed (product function) did longhand show an advantage. The authors themselves note that “findings in this area of research are often contradictory and uncontrolled variables may be responsible.”
Flaniganetal2023 found that, surprisingly, longhand and computer note takers recorded a similar number of ideas during the lesson, contradicting the expected computer advantage. However, longhand revisers added three times as many complete ideas during revision pauses.

The Note Completeness Counter-Argument

Your notes in NoteCompleteness and Colllaborative Notes document an important counter-narrative: note completeness predicts achievement, and computers produce more complete notes. As noted:

“The studies of note volume and completeness would seem to challenge the argument that taking notes on a computer is a liability because more notes are taken implying less thinking about the content.”

Bui et al. (2013) found that students with access to their more complete (computer-generated) notes demonstrated better performance, directly contradicting Mueller & Oppenheimer’s interpretation.

The Revision Factor

Multiple notes highlight that the handwriting vs. computer debate may miss the most important stage entirely—revision:

Three stage model of note-taking proposes that effective note-taking is a three-stage process: recording ? revision ? review, and most studies ignore the revision stage.
Cojeanatal2024 found that computer note-takers were more likely to reformat and revise their notes after class, potentially compensating for less generative initial encoding.
Flaniganetal2023 found longhand revisers added more ideas during revision pauses, but this study used interspersed pauses—an artificial design unlikely to reflect real practice.
Cohen2013ActiveSummary demonstrated that note restructuring significantly improved exam scores regardless of medium, suggesting what you do after taking notes matters more than how you take them.

The Children’s Research

Your note childrennotes documents a unique study (Horbury & Edmonds, 2021) with 10–11-year-olds that found:

Conceptual understanding was superior with handwritten notes
Unlike adult studies, there were no significant differences in note quantity between conditions
The authors speculated that children’s limited keyboarding experience may have increased cognitive load when typing, a different mechanism than proposed for adults

Your AI-Generated Synthesis

Your note AI response to my question challenging the Luo study that assumed the notes themselves could not result in generative activity offers a compelling reframing:

Luo et al. Conclusion	Modification Research Counter
Laptop notes are transcription-oriented	Generative processing can be added during revision
Longhand notes are better review products	Laptop notes can be transformed into superior products through revision
Laptop notes lack images and signals	These can be added easily using laptop affordances
Reviewing longhand notes yields higher achievement	Studies only tested passive review, not active modification
Longhand is generally preferable	The medium matters less than the strategy

Overall Conclusions From Your Notes

The popular narrative favoring handwriting is overstated. Replication attempts and meta-analyses show inconsistent results.
The mechanism matters: Quantity of notes is positively related to achievement; verbatim overlap is negatively related. These are separate variables that should not be conflated.
Most studies use artificial conditions (immediate testing, brief delays, no real revision opportunities) that don’t reflect actual academic practice.
Revision and post-processing may be more important than the initial recording medium, and computers offer superior affordances for revision.
The real issue is strategy, not medium. Untrained students default to transcription on keyboards and forced summarization by hand—but training could equalize outcomes.
Note completeness matters for the storage function, and the storage function is generally more important than the encoding function (ExternalStorage).

Digital Devices and Effective Studying and Long-Term Note Use

Lately, I seem to encounter a great deal of advice warning against reading and note-taking on a device. Paper and a pencil are argued to be better. In a recent post, I argued that I found these complaints ill-advised. This is a second attempt to present the same argument and a different way to justify my position.

Most simply put, I don’t think the researchers and I are talking about the same thing, even though it seems the anti-tech types would make the case that we are. The research frequently cited reading and taking notes tend to focus on more focused and short-term processes than what I have in mind. My focus, and I think the actual focus of most learners, is more on what I might describe as studying or the use of notes as imagined by the PKM/Second Brain aficionados. When I read a fiction book for pleasure, I might have a better understanding of that book if I read the content on paper. This is very different from how I would best get from reading a textbook to doing well on an exam a couple of months from now. A similar comparison might be made with taking notes during a lecture. I would seldom be taking notes for an exam the same day, but more likely for one week away.

Process Models

I write a lot about process models relevant to understanding and developing both learning skills and the knowledge that results from applying those skills. The process model of writing (Flower & Hayes) makes a good example. These researchers proposed that a process model of writing was useful to researchers because the model identified subskills that could be studied to see how these contributing skills might explain the performance of more and less effective writers and to educators trying to understand what contributing behaviors might be isolated for practice and development.

I suggest that the same type of process model would be helpful for developing study skills and for taking a different look at the possible advantages of using digital devices for reading and note-taking. The argument in this second case is that research comparing tech vs. traditional approaches has overlooked important processes in studying and note-taking applications.

Processes in tasks involving the collection and eventual application of information

There have been efforts to identify the processes involved in translating a presentation (e.g., a lecture or book chapter) into intended applications. Several researchers (e.g., Cojean & Grand, 2024; Flanigan et al., 2023; Luo et al., 2016) have extended the original two-stage model (note-taking and external storage) to emphasize the importance of revision. Returning to the importance of multi-process models in understanding the potential issue of whether it matters if one takes notes on paper or using a device, the studies differ. Flanigan and colleagues engaged the unusual practice of inserting pauses during a presentation to allow for revision and found that those taking notes by hand created more revisions. In contrast, Cojean and Grand found that after class those taking notes on a device made more revisions. Systems of taking notes, for example Cornell Notes (Pauk & Owens, 2011) and recent PKM systems (Ahrens, 2022; Forte, 2022), differentiate revision as a separate process in the use of notes.

In the spirit of the writing process model, I have created my own identification of note-taking and note-using processes listed as a sequence with the recognition that notetakers frequently revisit earlier processes after finding a limitation in what a later process makes available. The sequence of descriptors for these processes follows.

Collecting
Considering
Elaborating
Exporting

Collecting – creating a representation of content (presentations, videos, text material) for use in the future

Examples – creating annotations, notes, highlights

Considering – offline processing of the information collected for personal understanding and to evaluate gaps in understanding

Examples – rewrite existing notes based on comparison of personal collection with that of peers, return to source material to fill in gaps

Elaborating – speculation based on personal understanding of original information for fit within existing knowledge and potential application

Examples – links to existing notes on similar topics, Internet searches to locate and augment existing notes with additional examples of key concepts

Exporting – use of cumulative stored content to meet personal or assigned goals

Examples – test performance, assigned writing tasks, personal writing projects

The Processes and The Question of Handwriting vs. Digital

My contention is that when tasks involve all of the processes I have identified, digital tools offer advantages in the efficiency of collection, storage, search, and manipulation. These advantages are magnified when the task’s time frame is extended and initial goals are unclear. I have written at length on these topics and I have tried to organize some of these posts, organized by process, below. I have avoided considering how AI might be used in these processes, but such engagement would be far easier if working in a digital environment.

Collecting

Take digital notes for best lecture performance

Note and highlight extraction for efficient review and storage (Readwise for books, Highlights for PDFs)

Considering

Note and highlight extraction for efficient review and storage (Readwise for books, Highlights for PDFs)

The Power of Collaboration: Enhancing Your Note-Taking Experience

Preserving context in digital writing

Elaborating

Smart Connections finds note connections

Highlighting in the age of digital content

Notes and the Translation Process

The Space Between Encountering Information and Application

Digital for serious reading tasks

School and Professional Note-Taking

Exporting

School and Professional Note-Taking

Resources

Ahrens, S. (2022). How to take smart notes: One simple technique to boost writing, learning and thinking.

Cojean, S., & Grand, M. (2024). Note-taking by university students on paper or a computer: Strategies during initial note-taking and revision. British Journal of Educational Psychology, 94, 557–570. https://doi.org/10.1111/bjep.12663

Flanigan, A. E., Kiewra, K. A., Lu, J., & Dzhuraev, D. (2023). Computer versus longhand note-taking: Influence of revision. Instructional Science, 51(2), 251-284

Forte, T. (2022). Building a second brain: A proven method to organize your digital life and unlock your creative potential. Simon and Schuster.

Luo, L., Kiewra, K. A., & Samuelson, L. (2016). Revising lecture notes: How revision, pauses, and partners affect note-taking and achievement. Instructional Science, 44(1), 45-67.

Pauk, W., & Owens, R, (2011). How to study in college. Boston, MA: Wadsworth, Cengage Learning.

Cooperative Learning When AI Is Your Partner

Various terms have been used to describe AI and human partnerships in attempting to accomplish a goal defined by the human. For example, a recent book by Ethan Mollick was titled “Co-intelligence”. Recently, I have been reading about the work of several authors who have described different ways in which learners might interact with AI some being more successful than others. I will return to this material after my preliminary remarks. These analyses also consider various types of collaboration.

As I read the most recent set of papers, I flashed back on research I encountered in the early 1990s. In this case, there was no AI partner, but educational researchers and instructional designers were investigating ways in which student peers could collaborate (e.g., Johnson, et al., 1991; Slavin, 1995). If you were a preservice or practicing teacher at that time, you likely heard a lot about cooperative learning. There were multiple proposed benefits of cooperative learning. Social interactions were motivating. Multiple individuals have unique experiences and skills and combining these resources benefits all who are exposed. Interaction, whether it be a form of teaching, working through differences of opinion or error-checking each other, or simply sharing experiences, augments an individual’s cognitive activity. More individuals, theoretically, allow more to be accomplished in less time.

There were also concerns about cooperative learning activities, and if the connection to learning with an AI partner is not obvious, identifying educators’ and researchers’ concerns should clarify the similarities. The first was called the freeloader effect – in a pair version of cooperative learning, one participant might do all of the work and the other would do and learn very little. Perhaps one student would simply be more motivated or more capable and find that working alone was more efficient. Other concerns included a lack of experience and skill in cooperative planning or in effectively using the talents of multiple individuals. A few remedies from that era I recall included individual accountability (e.g., individual tests on content), positive interdependence (e.g., clear identification of tasks whose outputs will be combined in the final product), and a structured or scaffolded process.

Scaffolding will come up again when discussing the potentially effective use of AI, so perhaps an example of what this might look like would be helpful. In building construction, a scaffold provides temporary support for workers. In education, a scaffold provides a structure that supports a learning task by guiding how work is done. Consider a version of a common strategy for a cooperative task – i.e., think, pair, share. Students are given a task. Then, each student writes (or just thinks) of a proposed solution. Finally, students consider each other’s proposals (pair) and integrate them to arrive at a solution (share). The imposed structure in this case ensures that each individual participates and provides a record of their work, should the teacher want to hold each accountable for participating. The jigsaw cooperative learning technique offers another method for scaffolding a cooperative learning activity to promote participation and individual accountability. A project is selected that requires identifiable roles or tasks. For example, when creating a brochure describing the butterflies one might most likely encounter in a local garden, individual students could be assigned to research different butterflies and then asked to combine their research into a single document.

Before moving on to the cooperation between a student and AI, I propose that a body of research examines peer interaction in educational settings, and that its identified issues and remedies might be useful to those now focused on AI.

Differentiating Ways Learners Use AI In Attempting to Identify Productive Approaches

First, I would refer readers to a previous post in which I examined the classification scheme of AI learning strategies. The following is my previous explanation of the levels in this classification scheme.

The three levels have the following characteristics:

Zone 1: No AI Involvement

In this level, learning occurs without any AI assistance. While learning happens, it is often “capacity-constrained” because the learner must spend significant time and effort on execution and task completion, leaving less bandwidth for higher-order reflection.

Zone 2: Scattered, Half-Hearted Use

This is characterized by using AI for minor tasks like fixing sentences, checking facts, or tidying paragraphs. It often produces the worst learning outcomes. The learner still carries nearly the full cognitive load but adds the overhead of managing AI interactions without gaining significant cognitive savings. Note: this summary paraphrases the description of the authors. My version would add having the student using the AI tool to perform the task based on simplistic instructions.

Zone 3: Committed, Strategic Delegation

This level involves offloading entire categories of substantive work to AI to free up genuine cognitive capacity. This freed bandwidth is then redirected toward tasks AI cannot do, such as critiquing frameworks, questioning assumptions, and making complex judgment calls. This zone is where “transformative learning” is thought to live, provided the course design is intentional about how and why tasks are delegated.

My attempt to summarize this scheme suggested that the use or nonuse of AI and its relationship to successful learning experiences could be explained by investigating how certain conditions of student motivation, metacognitive proficiency, and working memory interact. For motivation, was the learner focused on developing personal knowledge and/or skills beyond task completion and on receiving a positive evaluation? Working memory reflects the capacity for meaningful learning beyond underlying task demands. The notion here is that AI might, in some situations, handle nonessential or untargeted tasks, allowing the learner to devote their attention and processing capacity to accomplishing targeted goals. Metacognitive proficiency suggests that more sophisticated learners with sufficient available attentional capacity are more likely to make sound decisions about when to use AI to free cognitive capacity to accomplish goal-related knowledge and skills.

I hope it makes some sense how these factors might interact. Allow an argument based on a personal perspective. I would suggest that my own learning offers motivational advantages. I learn to accomplish personal goals rather than be subject to external goals and reward structures in a classroom setting. I am also more metacognitively sophisticated than secondary school students. I understand the tasks I want to accomplish well having explored them countless times over the decades and such experiences offer me useful insights, but also means that I have background knowledge and cognitive skills that require less working memory capacity on my part. My use of AI could enable Zone 3 processing. I am not saying that this is always the case, but it makes sense I have a greater opportunity to function at this level.

Does this analysis then suggest that less-experienced learners, even secondary students, should not work with an AI partner? Not necessarily. This is where the scaffolding found to be important in cooperative learning and proposed as a way for Zone 3 functioning to be practical. Scaffolding bridges the gap, allowing tasks to be accomplished before all necessary conditions are in place and offering a mechanism for introducing required skills.

The mention of design in the Zone Three descriptions is another way of suggesting the value of scaffolding. I have included multiple references I found that explain the three-zone model and offer suggestions for scaffolding. My typical reaction is that the examples never seem to include the tasks educators might most need assistance in addressing. For example, writing tasks are commonly described, but not general “homework” tasks.

I think the best advice is to focus on processes and discuss assignment goals with students, differentiating those processes students are free to use AI to accomplish and which are expected to be completed by students. Consider how students might document their activity in completing each. For example, again using a writing example, use AI to identify content you will use in your writing project (available for submission). Submit your draft based on this content. Have AI evaluate the quality of your draft (available for submission). Submit your final version. Students might also be given a general topic on which a written product will be required. Students will bring the relevant content they have found to class and then receive specific instructions on the product to be fashioned during that class period. Both resources and product to be submitted.

General Resources:

Mollick, E. (2024). Co-intelligence: Living and working with AI. Penguin. Penguin.

Johnson, D., Johnson, R., & Holubec, E. (1991). Cooperation in the classroom (rev. ed.). Edina, MN: Interaction.

Slavin, R. (1995). Cooperative learning: Theory, research, and practice (2nd ed.). New York: Merrill.

Sources for AI Level Analysis

Hardman – The cognitive offloading paradox

Lodge and Lobel. Artificial intelligence, cognitive offloading and implications for education

Means. Strategic Cognitive Offloading: What the Research Says, and Why Higher Education Isn’t Ready for It

Wang, S., Zhang, H. Pedagogical partnerships with generative AI in higher education: how dual cognitive pathways paradoxically enable transformative learning. Int J Educ Technol High Educ 23, 11 (2026). https://doi.org/10.1186/s41239-026-00585-x

The “you are doing it wrong” excuse and Classroom AI

There is a common perspective on the practice of education, intended as a criticism, I think, that proposes if a visitor from the past were to be time-traveled to the present, he or she would be amazed, but bewildered by so many areas of civilization (travel, medicine, farming), but feel completely at home in K12 or university classrooms. As an educational researcher, I admit this claim has always troubled me. Was the process of passing on knowledge and developing important skills optimized centuries ago despite all of the folks like me who study how people learn and how the processes supporting learning might be improved? If I disagreed, what would I identify as a counterexample, or how, at the very least, would I justify the time, effort, and resources people like me have invested in changing the status quo?

My Interest in Individualization

A general topic that has long been at the core of my personal research interests has been individualization. It seems obvious that learners differ on important variables that impact learning. Some have greater aptitude than others. Some, due to an endless list of differences in life experiences, at a given point in time have significant gaps in relevant background knowledge and prerequisite skills. For economic and historical reasons, our approach to assisting student learning largely ignores these differences. Our system, despite claims, fails to actually meet students where they are to most efficiently move them forward. Where students are also ignores differences in goals, interests, and whatever else might come under the general heading of motivation.

Those who have followed my posts over the years will likely recognize that much of what I have done has focused on evaluating approaches that make use of technology to expand the flexibility educational systems can practically offer. I will identify two such topics for those who might want to explore my past posts –mastery learning andtechnology-supported tutoring. I admit that these seemingly logical opportunities have not yet yielded the benefits in application I had hoped, and this is the topic I want to examine in this post.

Research in the social sciences which would include applied research in education (e.g., classroom learning) has notorious weaknesses, but unique challenges. For example, recent criticism offered to the public notes the high rate at which published research cannot be replicated. We seem in an era in which funding for science in general has been questioned, so with cutbacks those of us working in more challenging areas have reason to be concerned. Yes, I said more challenging. I agree the “basic sciences” are of great importance and deserve support, but think of the claim I used to offer when I taught the research section of Introduction to Psychology – the chemicals in the test tube, the electrons in the circuit, or the planets in space don’t think about how they feel like reacting today. The rules that explain such behaviors may be intricate and difficult to ascertain, but at least most are reliable. The challenges social scientists face are simply different, but the general trend has been toward greater understanding.

Back to the thought experiment about the visitor from the past

If one assumes progress should happen when it either hasn’t or at least not to the degree that seems reasonable, is there reason for optimism? Are optimists delusional? What are optimists up against when it comes to criticism of present practices and seeking funding and attention for new approaches?

Changing a massive system with highly ingrained beliefs and behaviors is tremendously difficult. New ideas struggle to take hold and mature within this environment. An “intellectual pessimism” used to resist deep exploration of theoretically logical and basic research justified changes I have decided to describe as the “you are doing it wrong” plea for continued experimentation. I don’t think you can search for additional references to this phrase expecting a lot of success, but it is a phrase I have decided captures the attitude I think typifies the resistance others have identified.

The phrase implies criticism of researchers who insulate themselves from scrutiny of their “big ideas” by attributing poor outcomes to implementation failures rather than to flaws in the ideas themselves. In other words, why do “big ideas” continue to resurface repeatedly over time when attempts to apply these ideas have not previously been successful. Perhaps most simplistic put, it is about excuses.

The “you are doing it wrong” explanation works like this. When a widely adopted educational innovation – learning styles, discovery learning, whole language reading, AI tutoring, open classrooms, etc. produces disappointing or mixed results, proponents rarely concede the theory is wrong. Instead they argue: the idea is sound, but practitioners didn’t execute it faithfully or well enough. The failure belongs to the implementers, not the framework.

This rationale functions as an unfalsifiable escape hatch. Any negative evidence gets reframed as a measurement of implementation quality rather than a test of the underlying idea. The theory can never lose, because every failure is a fidelity problem.

In education, common variants are typified by the following:

“Teachers didn’t receive adequate training” – used with constructivism, project-based learning, differentiated instruction, AI

“It wasn’t implemented with fidelity” – the research or theoretical components were not followed with sufficient care.

“The conditions weren’t right” – class sizes, demographics, resources, culture

“It was a watered-down version” – the pure form was never really tried

With these excuses, it is the grain of truth that makes it plausible. Of course, there is always the possibility that the excuse is valid. This pattern is worth naming clearly in writing about learning science, because it explains a lot about why education cycles through fashions without accumulating settled knowledge the way other applied fields do.

Are classroom uses of AI the most recent examples of “you are doing it wrong”?

AI applications in classrooms represent recent examples of promising innovations, but also potentially of an impotent fad (e.g., Gerlich). Claims of cheating instead of learning abound and while theory and carefully controlled research point to logical and demonstrated benefits it would seem fair to argue educators are concerned about most student use.

Recently, I have encountered multiple accounts that propose “you are doing it wrong”. I intend to develop an extended analysis of the core ideas of these claims in a future more analytical post, but I might quickly summarize here by explaining that true success in the use of AI is most likely when certain conditions of student motivation, metacognitive proficiency, and working memory issues are met. In many cases, these variables are not functioning at desirable levels.

Several writers have backed a three-level AI use model proposing that some levels (the lowest and the highest) are likely to be successful and the middle level, which is presently most common, is less likely to produce satisfactory results.

The three levels have the following characteristics:

Zone 1: No AI Involvement

Zone 2: Scattered, Half-Hearted Use

Zone 3: Committed, Strategic Delegation

My suggest for making sense of these differences is to take a familiar task and work through what these different zones might look like in practice. I think learning to write makes a good case.

In this example, Zone one is easy – no use of AI. Zone two might include simply asking the AI tool to complete an assignment for you or perhaps using a tool such as Grammarly to check spelling and grammar. What would Zone three look like? Perhaps you might use an AI tool to suggest a list of topics you might address based on your general goal or perhaps create a draft and then ask the AI for a critic based on concerns you have about your initial effort.

Summary

I hope it is obvious how “you are not doing it right” would apply to how educators may allow students to use AI. The challenge then to evaluate whether such uses are an example of a typical educational fad or actually are limited because the learner is not doing it right.

Source

Gerlich M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. Societies. 15(1), https://doi.org/10.3390/soc15010006

The U.S. and China AI Competition

The very recent summit involving Presidents Trump and XI Jinping dealt with many political controversies of the day, which included AI and related issues such as intellectual property. The mention of AI brought to mind a book by Kai-Fu Lee, which I think I read in 2019. I remembered some of the comments Lee made about China, computer science, and AI at that time. Lee, who has held both U.S. and Taiwanese citizenship, wrote that China would have important advantages in the development and application of technology, which surprised me at the time but made some sense given what I knew about China. Lee was educated in the U.S. (Carnegie Mellon Ph.D.), worked for Apple, then returned to Taiwan and later worked for Google in China. I explored my notes and highlights from that book and also from The Big Nine. My interest in the role of AI in education and its application across different countries led me to another article in my personal archive (Hao, 2019). The following comments are mostly based on Lee’s ideas, with some expansion using the other two references I have mentioned. All sources are a bit dated, given the rapid pace of AI developments, but I still find the core ideas worth considering.

According to Lee, China’s advantages in AI come from scale, data, industrial capacity, talent, and state coordination.

Scale equals more data

China’s 1.4 billion people give it control of “the largest, and possibly most important, natural resource in the era of AI: human data”—and that its huge number of internet users gives it both data quantity and quality for training models. This resource is roughly the equivalent of the combined resources of the United States and Europe. Lee offered this perspective some years ago when finding content seemed more a priority for U.S. companies who encountered push back when scrapping the web and books without permission.

Industry integration

Chinese companies share. For example, Tencent’s ecosystem is noted as perhaps the single richest data ecosystem of all the giants and combines multiple services, say, in contrast to X and Amazon. Concentration of data and services in a few massive platforms offers a related quantity and quality advantage.

Quantity of Talent

There is a Thomas Friedman quote I have always remembered. “Remember in China if you are a one in a million talent, there are 1400 others just like you.” Lee offers a different assessment of the talent situation specific to AI. He claims that the U.S. has more superstars, but China has the advantage in the number of engineers and computer scientists working in on AI and related fields. Aside of the great difference in population, engineering, programming and science are simply fields of advanced study that are seen as more of an opportunity in China. My own way of thinking about this difference is that in the U.S., business and finance attract many and in China these fields are less of a draw.

State Coordination and Standards

A “big advantage for China: it doesn’t have the privacy and security restrictions that might hinder progress in the United States”. The commitment to the massive surveillance of its own population is known focus of the Chinese government and a means of control and manipulation of its population. We rightfully consider the use of technology to probe the personal lives and values a violation of basic human rights and bristle internally at the collection of information about us by companies and the government. Simply put, China doesn’t have the privacy and security restrictions that might hinder progress in the United States. Despite tolerated abuses, the commitment to collecting and analyzing this type of information is a source of funding and a focus of experimentation in China.

” Move fast and break things” was the original Google creed, but a value system that has come under increasing criticism in China. Without the pressure to curb potential negative aspects of AI, China moves faster. Related to this is the greater top down decision making of the Chinese system. In the U.S., you have multiple businesses trying to raise huge sums of money and are often isolated from each other, often duplicating similar approaches. We historically value competition and assume the motivation has advantages. While true, I wonder about the “business model” sucking up a large share of the available investment money in this sector in the US. The amount of money required has to a great degree squeezed out university researchers who either leave universities or work around the edges of AI innovation. While AI research is a high priority in China, the U.S. has cut funding for NSF funding for AI and cybersecurity.

AI in China and Education

The personal interest that has driven my own interest in AI has been potential opportunities in education. This has been a messy issue in this country with pushback due to legitimate concerns for cheating, failure to address skill development, and lack of interest in instruction presented by a computer. China has committed to exploring AI-facilitated education.

Academic competition in China is tense. Millions of students a year take the college entrance exam, the gaokao. Your score determines whether and where you can study for a degree, and it’s seen as the biggest determinant of success for the rest of your life. Parents willingly pay for tutoring or anything else that helps their children get ahead. The options tech can provide outside of classrooms offer opportunities to sell experiences to well-meaning parents. (Hao)

Two companies that are likely unfamiliar to most U.S. educators, Squirrel AI and Alo7, make good example. Since the Hao article was published both services became available in the U.S.

Squirrel AI uses an “adaptive learning” model that breaks subjects into thousands of “knowledge points”—far more granular than traditional textbooks. The system diagnoses a student’s specific gaps and provides targeted video lectures and practice problems. The teachers are intended to act like “pilots,” stepping in only for emotional support or complex issues while the algorithm handles the core instruction. Educators will likely recognize similarities to the Kahn Academy.

In contrast, Alo7 emphasizes a “quality-oriented education” focusing on creativity and the liberal arts. This “intelligent classroom” use AI to analyze student engagement, pronunciation, and even “joy” through facial and vocal recognition

The interest in AI in education seems to be a combination of the emphasis of standardized test performance for advancement and opportunity, the larger population, and the greater risk tolerance within the context of exploration for improvement.

Summary

This post is not a value judgment comparing U.S. AI policies, but rather an attempt to summarize what some experts have said about the differences. My personal issue concerns the economic pressure in the U.S. based in our trust in competition among corporations to drive innovation. While this is an approach that has worked in many areas, the huge investments that are required have to this point sucked a great deal of capital from the economy and seem largely and unnecessarily redundant. I personally also find the focus of interest in AI in education (personalized and adaptive instruction) interesting as this emphasis has appealed to me based on my interest in mastery learning.

Sources

Hao, K. (2019). China has started a grand experiment in AI education. It could reshape how the world learns. MIT Technology Review, 123(1), 1-9.

Lee, Kai Fu. 2018). AI Superpowers: China, Silicon Valley, and the New World Order. Boston, Mass: Houghton Mifflin.

Webb, A. (2019). The big nine: How the tech titans and their thinking machines could warp humanity. PublicAffairs.

Ignoring The Instruction Option Of EdTech

When I first began writing professionally about K-12 use of technology in the mid-1990s, a popular approach was to organize content around the tutor, tool, tutee model. This model proposed that technology in the hands of students could deliver instruction (tutor), facilitate the activities of being a student (tool), and program/code (tutee). While AI now blurs the lines between these roles, this simple organizational scheme still seems useful.

This post was prompted by what I sense to be dissatisfaction with the instructional component of this model and a recent paper entitled the “5% problem. This paper challenged the positive benefits of commercial instructional offerings (e.g., Kahn Academy, CK-12) as misrepresenting what the data on achievement they have collected demonstrate. Ignore my descriptor of such programs as commercial when I know you can use at least many of the features of such offerings at no cost. How these efforts are funded is a different issue. The relevance of “5%” lies in the hidden expectation that only those who use the learning system as intended are included in the analyzed data. Some studies reporting high effectiveness are based on 5% of those provided access and this important factor is not highlighted in the reporting of results.

Such assertions make me uncomfortable. Despite what to me seems a backlash against screen time, cautions related to AI allowing learners to offload the experiences intended by learning tasks, and concerns classroom circumstances associated with technology have caused educators to limit meaningful social contact with students and students with each other, now I am feeling I must question the studies I have explored on the benefits of AI tutoring and the personalization of the rate of progress through instructional materials allowed by computer supported instruction (e.g., Kulik & Fletcher).

Teacher Commitment

As I have considered this recent challenge, it has occurred to me that I have encountered a variant of it throughout my career. In 2019, I wrote a blog post titled “There is a reason teachers don’t use the software provided by their districts.” At the time, this issue caught my attention because my wife and I were serving on an advisory group for our local school district and the tech director reported on a monitoring software used to track the use of software the district had purchased to make decisions about which license access packages could be dropped so funds could be reallocated to other requests. I noticed some researchers were using what seemed like a similar system to examine the use of instructional technology and to consider why it was underutilized. These scholars reached a conclusion nearly identical to that of the more recent, in-depth examination of online instructional tools. “One of the other primary findings of this report is that usage of apps is generally lower than might be expected. Most apps are used only for a limited time, and most purchased by districts go unused. This has an impact on efficacy – an app cannot be effective if it is not used” (p.25).

At that time, it seemed the issue was explaining teacher commitment. Thomas Arnett has weighed in on the issue of school-funded software being seriously underutilized, speculating, based on his Jobs to be Done Theory, that educators simply don’t perceive that the software they have access to helps them satisfy the jobs they perceive as expected of them, relative to more traditional approaches. These jobs are described as 1) Help me lead the way in improving my school, 2) Help me find practical ways to engage and challenge more students, and 3) Help me replace a broken instructional model so I can help each student. From my perspective, many technology-based instruction systems seem purposefully designed to address individual learning speeds and existing knowledge, but perhaps this is how these resources by educators. In a more detailed version of this only online description, these authors propose that educators might respond if a greater effort were made to engage educators with data and anecdotal accounts of the success of peer educators.

What about the learners?

As I explored this history and what seems a frustrating pattern for those of us who have been influenced by the seeming promise of personalized progress systems and intelligent tutoring systems in a carefully controlled context, when turned loose in the complexity of schools and classrooms. The challenge of matching key elements of the controlled setting in which concepts are developed in applied settings is termed fidelity and is an issue in many fields (e.g., Trustschel and colleagues). I have struggled with this challenge in my own research, which has often focused on creating technology-facilitated study environments for college students enrolled in large introductory classes.

Cognitive research has accumulated a massive amount of evidence demonstrating the effectiveness of retrieval practice and the challenge that less capable learners are often much less aware of their specific knowledge gaps and a false sense of understanding (i.e., metacomprehension). In other words, less capable learners often don’t know what they don’t know and thus are very inefficient at remediating their problem areas. One way to provide retrieval practice and address poor metacomprehension is to provide practice tests. More sophisticated applications that make use of technology can also track weak areas so that these areas can be emphasized, link the student to remedial content when individual elements of information are not known or misunderstood, and even request students to predict the accuracy of their performance in an effort to increase awareness of strengths and weaknesses.

If you are interested in the details of this study, I have provided a citation below. The relevance of this study for the present post concerns the willingness of learners, college students in this case, to take advantage of a resource designed to improve their performance. The following graph is an easy way for me to make my point. Learners were divided into three groups based on course performance. For each of the three exams, the percentage of learners in each performance group who satisfy the stated goal of the study task, use but do not meet this standard, or do not use the study task is identified. There is a clear pattern: those performing the worst do not meet the study goal. Most persuasively in keeping with the other data reported in this post is the data on those who made no effort to use the system. It is possible trying but failing to reach the stated standard is related to understanding or aptitude, but failing to try, which should still be beneficial, is not.

As was the case in the 5% paper, those less in need of assistance participated more in a likely beneficial activity. In fairness, the “perceived suitability” of a learning opportunity proposed while vague offers a second possible explanation.

Summary

In this post, I consider the persistent “underutilization gap” in educational technology, where instructional tools—from commercial platforms to AI tutors—frequently fail to achieve their promised impact because they are either ignored by teachers or avoided by the students who need them most. It is true that the “5% problem highlights how efficacy data is often skewed by only including the small fraction of users who follow the system as intended, while struggling learners consistently participate the least in these personalized systems. Ultimately, I suggest that EdTech’s potential for personalized progress remains stalled by a lack of “fidelity” in real-world settings and a failure to align software with the practical “jobs” educators and students actually prioritize.

Citations:

Grabe, M., & Flannery, K. (2010). A Preliminary Exploration of Online Study Question Performance and Response Certitude as Predictors of Future Examination Performance. Journal of Educational Technology Systems, 38(4), 457-472.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of educational research, 86(1), 42-78.

Trutschel, D., Blatter, C., Simon, M. et al. (2023). The unrecognized role of fidelity in effectiveness-implementation hybrid trials: simulation study and guidance for implementation researchers. BMC Medical Research Methodology, 23, 116. https://doi.org/10.1186/s12874-023-01943-3

Searching the Scientific Literature

My work has always required that I locate, read, and keep track of the content of scholarly papers – mostly journal articles. This is typical of those of us whose academic interests combine research with teaching the core ideas of a science-related field of study. My personal focus was educational psychology and, even more specifically, reading skills and study behavior. Over the years with this foundation, I became interested in the role technology could play in these same topics and most recently, including how technology can effectively be employed in the reading, processing, and application of information by independent learners (i.e, learners who guide their own learning outside of formal classroom settings).

Over the course of 50+ years, the means by which those of us with such interests have experienced many changes in how we locate, read, and keep track of the content that forms of the basis and sometimes the outlet for our work. We usually purchased the journals we could afford and perused others in our local library. We once had postcard-sized forms we used to send requests to researchers to see if they had free copies of papers they would return as a professional courtesy. When you published a paper the journal at one time would provide you 50 or so individual copies you would use to participate in this exchange. Libraries have always had limited budgets and some of the less popular journals might be purchased as microfilm or microfiche that could be used to guide personal notetaking or perhaps be connected to a coin-fed “xerox” machine. Now, there are many more journals and libraries that still have limited budgets may buy access to digital collections of journals that allow patrons to download PDFs.

A challenge then and now in this process is how one goes about finding the specific articles and chapters you would read and collect. Libraries used to subscribe to services that provided intricately organized periodicals that would attempt to label research studies. If you didn’t peruse the journals on the “just arrived” section or the shelves, you would try to use these periodicals to guess what labels had been used to identify the content you might want to find in the stacks of your library or send for. Which articles you found, you would use the “reference” section to identify related work that seemed promising. We still do this, but it only works to find documents that are older than the one you happen to be reading at the time. As technology played a more and more important role in organizing content, large databases were developed that could be searched first by matching key words and now with AI capabilities that can respond to prompts that do not have to rely on exact matches to specific words or phrases.

This bring me to my goal in this post. There are now many tools available to both academically affiliated and independent learners to find what they hope will be useful resources. Some of these tools will now go further in summarizing what is found and even attempt to apply what was found in the creation of papers for different purposes. I am most interested in the location. I want to read the documents for a variety of reasons that I think are important, but I do not intend to discuss. I also have access to a research library that allows me to download PDFs of documents so I don’t need a service that will do that for me.

So, to summarize, where this leaves me personally. I am now retired, but retain online access to library resources. I do not have an easy way to work with library personnel or the most powerful tools available if I could work directly from a library. I do not want to spend a great deal of money on what I guess I would call “search tools”, but I have spent a good deal of time exploring a variety of free or inexpensive tools. I want to share insights related to my own experiences.

Here is one issue that may not be obvious to those with access to more expensive tools or those with no reason to explore as I have. Most of the literature I am interested in is behind a paywall. Many probably have been exposed to issues related to this reality. Why can’t citizens who, in a way, pay for much of this research through their taxes, read what the research looks like and what it concludes? Who makes the money from this component of academic scholarship? The researchers don’t get paid by journals for their papers. They are expected to review submitted papers for publication to identify high-quality work without compensation. Where does the huge fees libraries pay for access to scientific journals go?

These issues aside, most search engines that scour the Internet for information that users can search for cannot typically access content protected by paywalls. My personal issue is this how can I efficiently identify useful sources to read. Others have an even greater challenge. How can those without a “faculty pass” learn what recent research has to offer?

My current approach

I currently make use of the following tools/services:

SciSpace is the only one of these options I pay a subscription service to use so the rest have a free level or do not charge for any of the services provided. Again, I only need to locate citations as I have full access to a research library and I am not under the immediate pressure of working on a thesis or dissertation.

Comments

For articles behind paywalls, Google Scholar is usually my best starting point. It provides citations (sometimes incomplete in my experience). It also lists other publications that have cited the item you have targeted, which can be very useful. The citations include links to the journals in which the articles are published, which provide the full abstract and may or may not allow downloading the full article, depending on the individual journal’s policy.

Long-time Google Scholar users who have not explored the Google Labs option for Scholar should take a look. Rather than search terms, you can ask research questions much as you would with an AI tool. This approach allows a user to identify key topics and related issues. So, to stay focused on searching for journal articles on cyberbullying, I could request articles that examine school programs to combat it. After evaluating the results, the system identifies relevant papers and explains how each paper addresses your request.

Semantic Scholar provides features similar to Google Scholar (see below), but I have found it less effective in identifying sources I know exist. Given the overlap with Google Scholar, I use this service much less frequently.

I use Research Rabbit once I have identified a source I find valuable. Research Rabbit will then surface other sources from this entry point and show the citation map of how these sources are connected. This is also somewhat redundant, but the interconnection graphs are interesting.

SciSpace is useful for semantic searching and summaries of the contents of papers that are located. It is my impression that it is a hit-and-miss tool for locating documents on paywalled journals and I would not depend on it for this purpose.

The following sequence of images shows the return from the prompt “What is the average daily writing time for K12 students?”. The tool responds with a summary based on the best sources found and provides access to specific information for the sources it identified. Often, a PDF is not available for paywalled sources, but a citation is available, allowing me to try to find that paper in some cases.

Perplexity can help you find references and surface source links, but it is a general web answer engine, so it is usually not the best choice for systematically searching scholarly journal literature. I do use it to offer insights into how I might address topics for which references are less important.

When access to a journal is not available

When you have identified an article that looks good but is paywalled, there are still things you can try. Scholars may post prepublication versions of papers elsewhere. Just try a traditional search using the title of the article you want.

Some official repositories of alternatives can be identified through Google Scholar. After identifying an article of interest, check whether the response indicates there are alternative versions.

In this case, one of the alternatives (see following image) identifies a secondary source as ResearchGate, and this repository offers a full pdf of the article the journal protects. These are not illegal copies so you do not have to hesitate to make use of this option.

Summary

For my purposes, which involve paywalled content, Google Scholar is usually the best starting point because it is broad and often surfaces publisher pages, institutional copies, and free versions when they exist. It also indexes paywalled articles themselves, so you can still discover the citation even when the full text is inaccessible.

Semantic Scholar is also strong for discovery, but it focuses on open-access options where available and is less oriented toward paywalled content than Google Scholar.

Research Rabbit is very good once you already have one paper or author and want related literature through citation chaining, but it is less of a primary search engine for broad paywalled journal discovery.

SciSpace is useful for semantic searching and paper summaries, but it is better as a literature-review assistant than as the main tool for hunting down paywalled journal records.

Perplexity can help you find references and surface source links, but it is a general web answer engine, so it is usually not the best first choice for systematically searching scholarly journal literature.