Claude Code as a Data Analyst: From Zero to First Report

Author(s): Freddie Robinson

Originally published on Towards AI.

As data analysts we’ve all been there, the dreaded request for the monthly/yearly [insert topic] report, an essential task that’s also a massive time sink.

My thoughts for the last week? “Can’t AI just… do this?” Surely, it can whip up a simple data analysis report. Right?

So I set out on a quest to see if I could truly automate a detailed customer contacts analysis report end to end using Claude Code. Here’s the story of how it went from a chaotic mess to a surprisingly competent first draft.

Claude Code as a Data Analyst: From Zero to First Report

The prerequisites

Before we dive in, a word on the setup. Running Claude Code in your local environment is a must, but to give Claude Code true data analyst powers you’ll also need a tool for Claude Code to securely read your database and run queries.

There are a ton of great open-source examples of this online, just search for “MCP SQL server” and find one that works for your setup. Now let’s get to the fun part.

Attempt 1: The no-context data analyst

My first attempt was an optimistic attempt to give Claude Code full decision making on its own. I threw a simple prompt into the chat:

##Role

You are an expert data analyst and your job is to provide a monthly analysis of why users are contacting us.

##Task

You need to perform an analysis and write a report on contact trends up to the end of July 2025. You should use the last 12 months of contact data to analyse trends, but index more heavily on the change in contact volume/contact rate between June and July 2025. Focus your analysis on which types of contacts and what contact rates are increasing or decreasing the most, and why those trends are happening.

Unsurprisingly, Claude Code struggled:

The SQL queries were generating errors. Claude Code had no context on what the correct tables to reference were, what the columns meant or what the values meant. It made guesses everywhere, and was forced to constantly fix its errors.
It wrote inaccurate queries. Even if the query worked, it was often an inaccurate one. For example it decided to guess the best way to calculate ‘active users’ and ended up using the wrong column to count on.

Clearly there’s a need to refine both the context we provide it and the instructions to follow.

Improving the context

The key is to think about the context you’d give a brand new data analyst who joined your team and you wanted them to write the report. What would you share with them? You’d share information about the tables, a few SQL queries to get going, and an example structure of the report.

Here’s the info we’ll pass it:

Table documentation. I gave Claude a list of all the relevant tables and a brief, human-readable description for every single column. If you don’t tell it what contact_reason_id means, it’s just going to guess.
A list of the main queries to run. If you know what queries you want to look giving it a list of the main queries to run in plain English will reduce the scope of the agent missing context massively and focus on what matters.
A few example SQL queries. Most queries that you run on a single table are very similar, often swapping out a condition or column. A few example queries boosted the query quality significantly.
A chart style guide. I gave it instructions for the charts, with a colour palette to use, what type of chart and what to do with axes.
An analysis blueprint. I provided an example of a previous report to use as a structure to follow.

Table docs

contact_id — The unique id of the contact. You can count contacts with this column.

contact_channel — The channel the customer contacted us through. This can be either phone, email or chat.

created_at_ts — The time the contact was created at. You can split contacts on time series with this column.

…

This was a game changer for its output, we had correct queries, clear visualisations and a consistent document structure.

But we still had a problem.

Even with all this context and list of queries written out, Claude Code still went a little rogue. It would run a query, then run a second query, and then start writing the analysis before it had gathered all the necessary data.

It tried to execute multiple tasks at once and became overwhelmed. It decided to run a single task to create and execute all the SQL queries at once in a notebook, this caused all sorts of pains in terms of query quality, observability and timeouts.
It drifted away from the tasks it should follow regularly. It decided to write the report after generating a couple of SQL queries, unlikely to be enough for a good analysis.

Improving the instructions to follow

We needed a way to control Claude Code’s attention so that it focused on the right task step by step. It was time to enforce Claude’s todo list. It’s also likely for longer workflows that you may have to break it over multiple Claude Code sessions. Therefore we want a todo list that can pick up where the last session left off.

Therefore the prompt I created was a step by step workflow for it to follow which baked in an iterative loop over each of the ‘pending tasks’, and once the pending task was done it moved it to ‘completed tasks’ in the todo list.

A further improvement was to split out major tasks into separate prompts. Creating queries and writing reports are very distinct tasks which use separate skill sets, so we split this workflow into two separate prompts over one Claude conversation.

By forcing it to follow this logical flow, I finally got it to stop improvising and stick to the plan.

Attempt 2: A much more detailed prompt

So, after all that trial and error, what did the final prompt look like?

First message — query writer

<master_prompt>

<role_and_task>

You are an expert data analyst, operating autonomously. Your goal is to create SQL queries, execute them and create charts based off the results. You should continue without stopping.

</role_and_task>

<resources>

<resource=”[table_name]”>Primary table for contacts data.</resource>

<resource=”[docs_name]”>Documentation explaining the data tables. You MUST read this first.</resource>

</resources>

<style_guides>

The charts should use colours based off this palette: [‘#1f77b4’, ‘#ff7f0e’, ‘#2ca02c’, ‘#d62728’, ‘#9467bd’, ‘#8c564b’, ‘#e377c2’, ‘#7f7f7f’, ‘#bcbd22’, ‘#17becf’]

</style_guides>

<workflow>

<step_1_setup>

First, check if the file `july_2025_analysis/todo.txt` exists. If it does not, create the`july_2025_analysis` folder and then create `july_2025_analysis/todo.txt` with the content from the <initial_todo_list> below.

</step_1_setup>

<step_2_loop>

Once the `todo.txt` file is ready, you will begin an execution loop. You MUST continue this process until the `PENDING TASKS` list is empty. DO NOT STOP between tasks. For each loop cycle:

a. Read the `todo.txt` file to identify the next task.

b. If the `PENDING TASKS` list is empty, proceed to the final step and output a message confirming the entire process is complete.

c. Execute the FIRST task from the `PENDING TASKS` list.

d. After the task is successfully completed, immediately update the `july_2025_analysis/todo.txt` file by moving the task description from the `PENDING TASKS` section to the `COMPLETED TASKS` section.

e. Immediately begin the next loop cycle without stopping.

</step_2_loop>

</workflow>

<initial_todo_list>

## PENDING TASKS

1. Read `[docs_name]` to understand the data schema.

2. Create an empty Jupyter notebook named `july_2025_analysis/contacts_analysis_july_2025.ipynb`.

3. **QUERY 1:** Create an SQL query for ‘monthly contact rate per active user’. Execute the query in the notebook. Create a dual-axis chart (bar for users, line for rate). Save as `july_2025_analysis/01_contact_rate_per_user.png`.

4. **QUERY 2:** [repeat the steps for query 1 for the remaining queries]

## COMPLETED TASKS

</initial_todo_list>

</master_prompt>

Second round — writing the analysis

<master_prompt>

<role_and_task>

You are an expert data analyst, operating autonomously. Your goal is to provide clear analysis about trends in data. You write data reports based off charts. Do not guess the reasons for data trends, just report on the data trends. You will work through the entire task list without stopping. The final deliverable will be a .docx file containing your written analysis and all supporting charts.

</role_and_task>

<resources>

<resource file=”[example_analysis]”>An example report to use as a template for style and structure.</resource>

</resources>

<initial_todo_list>

1. **WRITE ANALYSIS:** Write the full analysis of contact trends in a new file:`july_2025_analysis/contacts_analysis_july_2025.md`. The analysis must be based entirely off the .png charts in the july_2025_analysis folder and follow the structure of `[example_analysis]`.

2. **GENERATE DOCX:** Generate the final report by combining the text from `contacts_analysis_july_2025.md` and all saved .png charts into a single .docx file named `july_2025_report.docx`.

</initial_todo_list>

</master_prompt>

How did it do?

It did well! Ultimately the AI was able to produce all the correct queries/charts that it needed to do and write an analysis of the data trends.

Here are the key principles we learned:

Give it its onboarding. Taking the time to set up the documentation for it to follow can take a bit of time, but if you’re writing this report regularly the work pays dividends quick.
Control Claude Code’s todo list. Claude works through the list, use that to your advantage to control its workflow rather than let it go off in its own direction.

In our next exploration, we’ll get a bit bolder. We’ll look at giving Claude Code more freedom to decide what queries to deep-dive into. Can we upgrade it from a junior intern to a self-sufficient mid-level analyst? Stay tuned!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Claude Code as a Data Analyst: From Zero to First Report

Author(s): Freddie Robinson

The prerequisites

Attempt 1: The no-context data analyst

Improving the context

Improving the instructions to follow

Attempt 2: A much more detailed prompt

How did it do?

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Claude Code as a Data Analyst: From Zero to First Report

Author(s): Freddie Robinson

The prerequisites

Attempt 1: The no-context data analyst

Improving the context

Improving the instructions to follow

Attempt 2: A much more detailed prompt

How did it do?

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement