Claude, Excel, and a 1991 Masterpiece

Last Updated on April 29, 2025 by Editorial Team

Author(s): Han Qi

Originally published on Towards AI.

Claude, Excel, and a 1991 Masterpiece — Photo from Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1991). *A manual for the use of the Motivated Strategies for Learning Questionnaire*

The gist

Transform a scanned PDF into a user-friendly Excel sheet.
Validate extracted data by overlaying it onto the PDF.
Debug issues using conditional breakpoints.
Gain insights into your learning habits and those of others.

What’s the MSLQ

The Motivated Strategies for Learning Questionnaire (MSLQ) is a 1991 gem — a five-year effort distilling 81 questions into 15 scales measuring motivation and cognitive habits.

As an edtech enthusiast, I wanted to use it to better understand myself and support my colleagues. But all I found was a grainy scanned PDF from ERIC (Education Resources Information Center: https://eric.ed.gov/?id=ED338122), stamped “BEST COPY AVAILABLE.”

Follow Along

This process takes about 15 minutes.
Pick some memorable courses, then start answering the 81 questions on page 36 of the PDF.
Avoid reading earlier pages to keep your responses unbiased.

I began jotting down answers like “14331 34456” on paper, but quickly realized this wasn’t scalable.
Ditching old habits, I turned to Claude, the best free AI model I know, to test its OCR prowess and streamline the process.

Extraction prompt

There are 81 questions in MLSQ. Create an excel sheet where a user can fill in his responses for the 81 questions by typing numbers, and the 9 scales will be automatically calculated. The 9 scales each are links to a set of non-overlapping, non-consecutive questions. I don’t have many respondents, so want 1 row per question, and columns represent respondents. I also want a readable description of the question for each row as the lefmost column The results of the 9 scales for each respondent should appear in a new worksheet (tab) in excel

Extraction code

Here’s the final Python code after tweaks: MSLQ_excel_template to create excel after OCR on MSLQ.pdf

Very impressive to generate 600 lines of near-perfect code at one go!

Though the MSLQ has 15 scales, I focused on the 9 with sample feedback in the PDF. Recovering the other 6 is as simple as editing selected_scales.

The 15 scales group 81 questions in a mutually exclusive manner.

Give Claude a round of applause

Extracted all 81 questions in order, with correct numbering (1-based, not Python’s 0-based). Saved time finding the questions and hardcoding strings
Found the 8 reverse coded questions. Reverse coding means lower answers yield higher scores. Reverse coded questions have their answers deducted from 8 to reverse 1–7 to 7–1.
Grouped questions into 9 scales, saving hours of manual scrolling through the 75-page PDF. We must scroll because by design, questionnaires shuffle the questions so their associated scales do not appear consecutively
Good UX — Instructions provided at the top left to make the Excel self-explanatory, and colored headers
Good UX — Included a CLI for output path and respondent count, which I later hardcoded for speed
Complex Excel Formulas for AVERAGE, COUNTIF, IF, and references across sheets are all correct
Set up the data structures, implicitly doing data modelling

mslq_questions = [
 # Format: [question_number, question_text, scale, is_reversed]
 [
 1,
 "In a class like this, I prefer course material that really challenges me so I can learn new things.",
 "Intrinsic Goal Orientation",
 False,
 ],
 [
 2,
 "If I study in appropriate ways, then I will be able to learn the material in this course.",
 "Control of Learning Beliefs",
 False,
 ],
##################### TRUNCATED ########################

Try It

Run the code to generate the Excel or copy this template:: https://docs.google.com/spreadsheets/d/1B9suxNdatBROsPIz8OrdKYoIla-VDR46

Minor Quirks

A missing “=” in formulas was an easy fix (changed f"IF… to f"=IF…). Easy to spot since the cells contain the formula as a string.
The code used fitz, the old version of pymupdf. Swapping all instances of fitz to pymupdf worked seamlessly
Initially, scales and reverse-coding info appeared on the Questions tab, potentially priming respondents. A follow-up prompt moved these to a new Metadata tab.

Scale and Reversed? should not be in the Questions tab because it will prime the respondent. show them in another tab

Now what?

With the Excel ready, I needed to ensure Claude’s extraction was accurate. Could I trust the data pulled from a fuzzy PDF?

Validation with Visualization

I asked Claude to highlight where it found each question, reverse-coding status, and scale associations in the PDF, just like validating a translation by translating back to the source language.

Validation Prompt

The source MSLQ.pdf is attached.

I want to see where in the pdf is the information in code extracted from. Write python to highlight the pdf, and create a new pdf.
Information i want to check: 1. question text 2. whether each question is reverse coded 3. Associations between questions and scale

Validation Code

See the code here: Read MSLQ.pdf and add highlights and summary page to create MSLQ_highlighted.pdf.

It produced MSLQ_highlighted.pdf with a new page appended, and a validation report MSLQ_highlighted_validation.xlsx

Processing PDF: MSLQ.pdf
Scanning document for relevant information...
Questions not found:
Q1: I prefer class work that is challenging so I can learn new things.
Q50: When studying for this course, I often set aside time to discuss course material with a group of students from the class.
Q51: I treat the course material as a starting point and try to develop my own ideas about it.
Q61: I try to think through a topic and decide what I am supposed to learn from it rather than just reading it over when studying for this course.
Validation data saved to: MSLQ_highlighted_validation.xlsx
Highlighted PDF saved to: MSLQ_highlighted.pdf

Analysis Statistics:
- Questions found: 77
- Questions not found: 4
- Reversed items found: 8
- Scales found: 0

Process completed successfully!

I had to add REVERSED into search_terms to help the program highlight reversed items.

We can still see that 4 questions and scales could not be found. Let’s investigate.

Debugging Missing Highlights

Question 1

We use conditional breakpoints to stop at page 12 (page_num == 11) and question 1 (q_num == 1), then compare the search term (q_text) against the pdf (page.get_text()).

You can see Claude has rephrased the question, thus breaking the match.

Why can Question 16 be found?

Sharp readers would have realized the text is split across 2 lines in the pdf, but the search term is on 1 line.
This still works because
page.search_for is designed to search across pdf lines by default.

Why Question 24 is not highlighted but not reported as not found?

The pdf repeats each question twice, first under their associated scale, then as an ordered list of questions in the questionnaire. If at least 1 of the 2 appearances match, it is considered found. Even if both match, they are deduplicated in a set before reporting statistics.

Question 50

Following the same strategy, q_text was missing the before course material.

q_text: When studying for this course, I often set aside time to discuss course material with a group of students from the class
page.get_text: When studying for this course, I often set aside time to discuss the course material with a group of students from the class

Question 51

print(page.get_text()) saw try as ery due to bad scan qualitytreat the course material as a starting point and ery to develop my own ideas about it.

Question 61

q_text had 3 extra words appended (I try to think through a topic and decide what I am supposed to learn from it rather than just reading it over when studying for this course.)

Fixing the Bugs

In the LLM era, perfection isn’t the goal — empowerment is. Potential fixes:

Rephrasing (Q1): Prompt Claude to avoid grammatical changes.
Missing/extra words (Q50, Q61): Emphasize exact word count in prompts.
Scan quality (Q51): Apply spellcheck tools.

Claude’s Wins

Consistent variable names (mslq_questions) across extraction script and highlighting scripts
Question phrasing consistency within mslq_questions across the scripts (Extraction errors in Q2, Q4 appear in both scripts)
Clear data structures indexing question numbers (reversed_items and scale_associations)
Bonus: A validation Excel (MSLQ_highlighted_validation.xlsx) for non-technical teammates and a new summary page 76 in the highlighted PDF (though pagination overflow hid some text).

The new page 76 contains a list of question number, the page it was first found, and description.

Possible Improvements

Optimize the highlighting code’s 4 nested loops (e.g., convert O(n) list searches to O(1) set lookups or use parallel processing).
Set Excel column widths to the longest question text to avoid truncation.
Address user impatience with lengthy questionnaires, the real bottleneck

Interpreting the scores

You may want to use this feedback to do something about changing your study skills or motivation. All of the motivational and study skills mentioned on your feedback sheet are learnable.

If your scores are above 3, then you are doing well. If you are below 3 on more than six of the nine scales, you may want to seek help from your instructor or the counseling services at your institution.

What did I learn

I completed the questionnaire by reflecting on how I behaved during university, averaging across a broad range of courses.

Task Value 5
Self-Efficacy for Learning & Performance 4.375
Test Anxiety 2.6
Rehearsal 2.25
Elaboration 3.5
Organization 4.5
Metacognitive Self-Regulation 3.9
Time & Study Environment 5.125
Effort Regulation 6.25

Effort Regulation
My effort regulation score was surprisingly high, indicating that when I set goals, I tend to complete them. The real challenge arises when I either don’t set goals or set the wrong ones.

Metacognitive Self-Regulation
An average score here makes sense. In my first two years of university, I spent a lot of time obsessively planning, monitoring, and regulating my study sessions — sometimes down to 5-minute granularities over 3-hour blocks.

While it made me efficient, it also drained any sense of inspiration or spontaneity. I went through a phase of consuming self-help content, chasing constant productivity and brilliance, only to become jaded.

Later, after taking a leadership position in AIESEC, I shifted 20 hours per week into leadership development. My grades suffered, but the life skills I gained were worth it.

Elaboration
My average elaboration score reflects that I didn’t take particularly good notes. Studying two degrees in four years forced me into an unsustainable pace — I had little time to slow down, deeply understand concepts, or consolidate ideas.

I often rushed through recorded webcasts at 2–3× speed and had just 1.5 days of revision time per module to cover 13 weeks of material.
Still, the sheer intensity of that experience raised my learning ceiling to a level I haven’t quite reached since.

Rehearsal
My low rehearsal score isn’t surprising. I never understood the point of rote memorization.

I believed time was better spent engaging in higher-order thinking and connecting ideas rather than copying notes verbatim.

I did explore memory techniques for fun — once memorizing 30 random items within 90 seconds and recalling them after a 10-minute gap, but ultimately decided against investing the immense amount of brain rewiring needed to become truly proficient.

Moving Forward

Looking ahead, I want to tap into that high-performance mode from university once or twice a year, in a sustainable rhythm.

Now that I’m beyond university, I can choose the work I want to pour effort into, rather than being forced through arbitrary modules. As a result, strict effort regulation feels less critical.

Instead, my focus will be on sharpening Metacognitive Self-Regulation — particularly planning, pivoting effectively, and becoming more comfortable with uncertainty and change.

The unexamined life is not worth living — Socrates

Long Room Library, Trinity College Dublin. Photo by Giammarco Boscaro on Unsplash

Reigniting dormant wisdom

The MSLQ, like many research treasures, has its insights buried away in dusty PDFs. Decades-old frameworks still hold answers to today’s challenges — attention, focus, and growth.

Imagine the forgotten gems waiting to be rediscovered.
Pick one, dust it off, and let AI help you bring it to life to empower communities, schools, and workplaces.

We’re not just using LLMs to generate the future — we can use them to reclaim the best of the past

Resources

Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ) (Technical Report №91-B-004). University of Michigan, Ann Arbor, MI. Available from https://eric.ed.gov/?id=ED338122
Philip Guo’s Computational Pedagogy Research: https://pg.ucsd.edu/
Proquest Education Resources: https://about.proquest.com/en/products-services/pq_ed_journals
Aggregator of Education databases: https://www.bu.edu/library/pickering-educational/research/collections/databases
Aggregator of Subject-based databases: https://browse.welch.jhmi.edu/searching/databases-by-subject

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Claude, Excel, and a 1991 Masterpiece

Author(s): Han Qi

The gist

What’s the MSLQ

Follow Along

Extraction prompt

Extraction code

Give Claude a round of applause

Try It

Minor Quirks

Now what?

Validation with Visualization

Validation Prompt

Validation Code

Debugging Missing Highlights

Question 1

Question 50

Question 51

Question 61

Fixing the Bugs

Claude’s Wins

Possible Improvements

Interpreting the scores

What did I learn

Moving Forward

Reigniting dormant wisdom

Resources

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement