
Create an Adaptive Customer Behavior Analytics Dashboard with Claude AI and Python
Last Updated on February 6, 2025 by Editorial Team
Author(s): Adnan Siddiqi
Originally published on Towards AI.
In my previous post, I introduced OpenAIβs image APIs and used them to create avatars. Today, Iβm diving into another LLM service, ClaudeAI, which has gained traction for its speed and sharp analytical responses. I asked Claude to brainstorm ideas for my next blog post about itself. Among many fantastic suggestions, I chose Customer Behavior Analysis.
Finding a suitable dataset on Kaggle was easy, but the real challenge began when I started working on it. I didnβt expect it to take 4β5 hours to achieve my goal. Initially, I had no clear idea how to use Claude for this, but Claude itself guided me, step by step, on the best approach. As I got deeper, the problem turned into an exciting puzzle; it was challenging, yet fun; thanks to Claudeβs capabilities.
Curious why it took so long and how it all worked? Keep reading, or jump straight to the demo video if youβre in a hurry or not interested:
Before discussing the project, let me tell you what Customer Behaviour Analysis is.
Customer Behaviour Analytics
Customer Behavior Analysis(CBA) involves collecting and studying data to understand how customers interact with a business, including their engagement with products, services, marketing, and website features.
Simply put, CBA is a kind of exploratory data analysis in which you use shopping data to find insights about your business.
How it works
Before proceeding, letβs clarify what weβre building. The title mentions adaptive which means our system doesnβt rely on a fixed data schema. Instead, it analyzes the data and generates a dashboard dynamically. For example, websites A and B might store orders and products differently in their databases. When both CSVs are uploaded, our system uses Claude to interpret the schemas, generates Python code, executes it, and returns computed data to Claude. Claude then creates the HTML and JS for the dashboard at runtime. The steps are:
- Upload the CSV file via the web interface.
- Convert the CSV to JSON for Claude.
- Writing Prompt
- Send JSON with a prompt to Claude to generate Python code dynamically.
- Execute the generated Python code to create a partially pre-defined JSON structure.
- Send the resulting JSON to Claude, which generates and renders the HTML and JS for the dashboard.
I am using Flask for the web app, which is ideal for building prototypes like this. Now, letβs get into development.
Development
Web Interface for CSV Upload
Itβs a basic web interface for uploading CSV files. It sends an AJAX request to the /upload
endpoint, where the file is renamed and saved. The process happens entirely in the /upload
endpoint, which we'll break down step by step.
Converting CSV to JSON for Claude
To inform our first prompt about the data structure, we donβt need to send the entire file. The goal is to provide Claude with the column names and their types so it can generate the relevant Python code. As of January 2025, Claude cannot execute code directly from data input. To address this, weβll create a function called extract_sample_data
. Oh, BTW, I have used this Kaggle dataset and another( randomly generated by GPT)
def extract_sample_data(file_path, num_rows=5):
try:
# Load the CSV file
df = pd.read_csv(file_path)
# Extract the first few rows
sample_data = df.head(num_rows).to_dict(orient='records')
# Get column data types
column_types = {col: str(df[col].dtype) for col in df.columns}
del df
gc.collect()
return {
"sample_data": sample_data,
"column_types": column_types
}
except Exception as e:
print(f"Error: {e}")
return None
This function will extract the first nth rows and their data types and return a Dict object:
{
'sample_data': [
{
'Customer ID': 1,
'Age': 55,
'Gender': 'Male',
'Item Purchased': 'Blouse',
'Category': 'Clothing',
'Purchase Amount (USD)': 53,
'Location': 'Kentucky',
'Size': 'L',
'Color': 'Gray',
'Season': 'Winter',
'Review Rating': 3.1,
'Subscription Status': 'Yes',
'Shipping Type': 'Express',
'Discount Applied': 'Yes',
'Promo Code Used': 'Yes',
'Previous Purchases': 14,
'Payment Method': 'Venmo',
'Frequency of Purchases': 'Fortnightly'
},
{
'Customer ID': 2,
'Age': 19,
'Gender': 'Male',
'Item Purchased': 'Sweater',
'Category': 'Clothing',
'Purchase Amount (USD)': 64,
'Location': 'Maine',
'Size': 'L',
'Color': 'Maroon',
'Season': 'Winter',
'Review Rating': 3.1,
'Subscription Status': 'Yes',
'Shipping Type': 'Express',
'Discount Applied': 'Yes',
'Promo Code Used': 'Yes',
'Previous Purchases': 2,
'Payment Method': 'Cash',
'Frequency of Purchases': 'Fortnightly'
}
],
'column_types': {
'Customer ID': 'int64',
'Age': 'int64',
'Gender': 'object',
'Item Purchased': 'object',
'Category': 'object',
'Purchase Amount (USD)': 'int64',
'Location': 'object',
'Size': 'object',
'Color': 'object',
'Season': 'object',
'Review Rating': 'float64',
'Subscription Status': 'object',
'Shipping Type': 'object',
'Discount Applied': 'object',
'Promo Code Used': 'object',
'Previous Purchases': 'int64',
'Payment Method': 'object',
'Frequency of Purchases': 'object'
}
}
As you can see, it returned two records, which are sufficient for Claude to identify the structure and data types. Knowing the data types is essential for generating Python code tailored to the available data.
Prompt for Python Code Generation
Now we have to come up with a prompt. If you do not know what prompt engineering is, check it out!
So, the goal is to come up with a Claude prompt that will accept the schema we generated above and produce Python code. The goal is to generate Python code that can show both text and graphs. I took the help of both GPT and Claude to generate the first prompt which is:
You are a Python data analyst tasked with creating dynamic analysis code. I will provide you with a JSON structure containing field names and their data types. Your task is to generate a Python script that reads a CSV file, performs flexible analysis based on the dataset, and returns the output strictly in a JSON payload format.
{FIELDS_AND_TYPES}
### **Output Requirements**
1. **If Required Fields Are Missing:**
- The generated script should validate the presence of required fields in the CSV file.
- If any required fields are missing, the script should return:
```json
{
"error": "true",
"message": "<Explanation of missing fields or sections>"
}
```
2. **If Code Is Successfully Generated:**
- Return a JSON payload with:
```json
{
"error": "false",
"message": "<Entire Python code including imports, main function, and execution block>"
}
```
3. **Additional Requirements:**
- Do not include any explanatory text, comments, or preambles in the response.
- Avoid any prefixes like "Here's the code" or suffixes explaining the output.
- Only output the final JSON payload.
### **Code Requirements**
- **Input Handling:**
- The generated code should accept a CSV file as input and load it into a Pandas DataFrame.
- Include error handling for missing or malformed CSV files.
- **Field Validation:**
- Validate the presence of required fields in the CSV file:
- Demographic Analysis: Requires `age`, `gender`, `location`.
- Behavioral Analysis: Requires `frequency`, `previous purchases`, `subscription status`.
- Purchase Patterns: Requires `amount`, `date`, `category`, `discount`.
- Product Preferences: Requires `item`, `size`, `color`, `season`.
- Response Analysis: Requires `rating`, `reviews`.
- If required fields are missing, the script should log the missing fields and skip the related analysis.
- **Modular Analysis:**
- The script must perform modular analysis based on available fields:
- Summary statistics
- Segmentation and profiling
- Behavioral and purchase pattern analysis
- Product preference and response analysis
- Each module should check for the required fields before execution and log skipped modules.
- **Visualization:**
- The script must generate visualizations dynamically based on the dataset.
- Save visualizations as base64-encoded strings for JSON compatibility.
- **Output:**
- The script should return analysis results in a JSON-compatible dictionary with the following structure:
```json
{
"error": false,
"results": {
"summary_statistics": {...},
"segmentation_results": {...},
"behavioral_analysis": {...},
"purchase_patterns": {...},
"product_preferences": {...},
"visualizations": [...]
}
}
```
### **Response Examples**
#### **Error Output:**
```json
{
"error": "true",
"message": "Missing required fields: ['age', 'gender', 'location']"
}
I pass the previously generated schema, along with instructions and sample output, to the prompt. The code must produce an output similar to the example below, where the inner fields may vary, but the main structure remains consistent. If the required CSV file is missing, the prompt will generate a relevant JSON error message.
{
"error": false,
"results": {
"summary_statistics": {
...
},
"segmentation_results": {
...
},
"behavioral_analysis": {
...
},
"purchase_patterns": {
...
},
"product_preferences": {
...
},
"visualizations": [
...
]
}
}
The code accepts a CSV file, loads it into Pandas, and generates text and graphs.
An early output of the prompt was:
{
'error': 'false',
'message': "import pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport base64\nfrom io import BytesIO\n\ndef encode_plot(plt):\n buffer = BytesIO()\n plt.savefig(buffer, format='png', bbox_inches='tight')\n buffer.seek(0)\n image_png = buffer.getvalue()\n buffer.close()\n plt.close()\n return base64.b64encode(image_png).decode()\n\ndef analyze_data(csv_file):\n try:\n df = pd.read_csv(csv_file)\n results = {\n 'error': False,\n 'results': {\n 'summary_statistics': {},\n 'segmentation_results': {},\n 'behavioral_analysis': {},\n 'purchase_patterns': {},\n 'product_preferences': {},\n 'visualizations': []\n }\n }\n \n # Validate required fields\n required_fields = {\n 'demographic': ['Age', 'Gender', 'Location'],\n 'behavioral': ['Frequency of Purchases', 'Previous Purchases', 'Subscription Status'],\n 'purchase': ['Purchase Amount (USD)', 'Category', 'Discount Applied'],\n 'product': ['Item Purchased', 'Size', 'Color', 'Season'],\n 'response': ['Review Rating']\n }\n \n for category, fields in required_fields.items():\n if not all(field in df.columns for field in fields):\n return {\n 'error': True,\n 'message': f'Missing required fields for {category} analysis: {[f for f in fields if f not in df.columns]}'\n }\n \n # Summary Statistics\n results['results']['summary_statistics'] = {\n 'total_customers': len(df),\n 'average_purchase': float(df['Purchase Amount (USD)'].mean()),\n 'average_rating': float(df['Review Rating'].mean()),\n 'total_revenue': float(df['Purchase Amount (USD)'].sum())\n }\n \n # Segmentation Results\n age_segments = df.groupby(pd.cut(df['Age'], bins=[0, 25, 50, 75, 100])).size().to_dict()\n results['results']['segmentation_results'] = {\n 'age_distribution': {str(k): int(v) for k, v in age_segments.items()},\n 'gender_distribution': df['Gender'].value_counts().to_dict()\n }\n \n # Behavioral Analysis\n results['results']['behavioral_analysis'] = {\n 'purchase_frequency': df['Frequency of Purchases'].value_counts().to_dict(),\n 'avg_previous_purchases': float(df['Previous Purchases'].mean())\n }\n \n # Purchase Patterns\n results['results']['purchase_patterns'] = {\n 'category_distribution': df['Category'].value_counts().to_dict(),\n 'avg_amount_by_category': df.groupby('Category')['Purchase Amount (USD)'].mean().to_dict()\n }\n \n # Product Preferences\n results['results']['product_preferences'] = {\n 'popular_colors': df['Color'].value_counts().to_dict(),\n 'size_distribution': df['Size'].value_counts().to_dict(),\n 'seasonal_preference': df['Season'].value_counts().to_dict()\n }\n \n # Visualizations\n plt.figure(figsize=(10, 6))\n sns.histplot(data=df, x='Age', bins=20)\n plt.title('Age Distribution')\n results['results']['visualizations'].append({\n 'title': 'age_distribution',\n 'image': encode_plot(plt)\n })\n \n plt.figure(figsize=(10, 6))\n sns.boxplot(data=df, x='Category', y='Purchase Amount (USD)')\n plt.xticks(rotation=45)\n plt.title('Purchase Amount by Category')\n results['results']['visualizations'].append({\n 'title': 'purchase_by_category',\n 'image': encode_plot(plt)\n })\n \n return results\n \n except Exception as e:\n return {\n 'error': True,\n 'message': f'Error analyzing data: {str(e)}'\n }\n\ndef main(csv_file):\n return analyze_data(csv_file)\n\nif __name__ == '__main__':\n import sys\n if len(sys.argv) > 1:\n result = main(sys.argv[1])\n print(result)\n else:\n print({'error': True, 'message': 'No CSV file provided'})"
}
When you run the above code, it produces the output like this:
[
TextBlock(text='{\n "error": false,\n "message": "Analysis code generated successfully",\n "code": {\n "imports": """\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport json\nimport base64\nfrom io import BytesIO\nimport os\n""",\n "main_analysis": """\ndef analyze_retail_data(data_json):\n # Convert JSON to DataFrame\n df = pd.DataFrame(data_json[\'sample_data\'])\n \n # Create results dictionary\n results = {\n "error": False,\n "message": "Analysis completed successfully",\n "results": {}\n }\n \n # Create visualization directory if it doesn\'t exist\n os.makedirs(\'visualizations\', exist_ok=True)\n \n # Summary Statistics\n numerical_cols = [\'Age\', \'Purchase Amount (USD)\', \'Review Rating\', \'Previous Purchases\']\n results[\'results\'][\'summary_statistics\'] = {\n col: df[col].describe().to_dict() for col in numerical_cols\n }\n \n # Segmentation Results\n results[\'results\'][\'segmentation_results\'] = {\n \'gender_distribution\': df[\'Gender\'].value_counts().to_dict(),\n \'age_segments\': pd.qcut(df[\'Age\'], q=3, labels=[\'Young\', \'Middle\', \'Senior\']).value_counts().to_dict()\n }\n \n # Behavioral Analysis\n results[\'results\'][\'behavioral_analysis\'] = {\n \'subscription_status\': df[\'Subscription Status\'].value_counts().to_dict(),\n \'purchase_frequency\': df[\'Frequency of Purchases\'].value_counts().to_dict()\n }\n \n # Purchase Patterns\n results[\'results\'][\'purchase_patterns\'] = {\n \'seasonal_distribution\': df[\'Season\'].value_counts().to_dict(),\n \'payment_methods\': df[\'Payment Method\'].value_counts().to_dict(),\n \'shipping_preferences\': df[\'Shipping Type\'].value_counts().to_dict()\n }\n \n # Product Preferences\n results[\'results\'][\'product_preferences\'] = {\n \'popular_items\': df[\'Item Purchased\'].value_counts().to_dict(),\n \'size_distribution\': df[\'Size\'].value_counts().to_dict(),\n \'color_preferences\': df[\'Color\'].value_counts().to_dict()\n }\n \n # Visualizations\n results[\'results\'][\'visualizations\'] = []\n \n # Age vs Purchase Amount\n plt.figure(figsize=(10, 6))\n sns.scatterplot(data=df, x=\'Age\', y=\'Purchase Amount (USD)\')\n plt.title(\'Age vs Purchase Amount\')\n \n # Save plot as base64\n buffer = BytesIO()\n plt.savefig(buffer, format=\'png\')\n buffer.seek(0)\n image_png = buffer.getvalue()\n buffer.close()\n \n # Encode to base64\n graphic = base64.b64encode(image_png).decode(\'utf-8\')\n results[\'results\'][\'visualizations\'].append({\n \'type\': \'scatter_plot\',\n \'title\': \'Age vs Purchase Amount\',\n \'base64_image\': graphic\n })\n \n # Category Distribution\n plt.figure(figsize=(10, 6))\n sns.countplot(data=df, x=\'Category\')\n plt.title(\'Category Distribution\')\n plt.xticks(rotation=45)\n \n # Save plot as base64\n buffer = BytesIO()\n plt.savefig(buffer, format=\'png\')\n buffer.seek(0)\n image_png = buffer.getvalue()\n buffer.close()\n \n # Encode to base64\n graphic = base64.b64encode(image_png).decode(\'utf-8\')\n results[\'results\'][\'visualizations\'].append({\n \'type\': \'bar_plot\',\n \'title\': \'Category Distribution\',\n \'base64_image\': graphic\n })\n \n return results\n""",\n "execution": """\nif __name__ == "__main__":\n # Rea',
type='text')
]
Another variation of the generated code output is shown below:
{
'error': False,
'results': {
'summary_statistics': {
'total_customers': 3900,
'average_purchase': 59.76435897435898,
'average_rating': 3.7499487179487176,
'total_revenue': 233081.0
},
'segmentation_results': {
'age_distribution': {
'(0, 25]': 571,
'(25, 50]': 1853,
'(50, 75]': 1476,
'(75, 100]': 0
},
'gender_distribution': {
'Male': 2652,
'Female': 1248
}
},
'behavioral_analysis': {
'purchase_frequency': {
'Every 3 Months': 584,
'Annually': 572,
'Quarterly': 563,
'Monthly': 553,
'Bi-Weekly': 547,
'Fortnightly': 542,
'Weekly': 539
},
'avg_previous_purchases': 25.35153846153846
},
'purchase_patterns': {
'category_distribution': {
'Clothing': 1737,
'Accessories': 1240,
'Footwear': 599,
'Outerwear': 324
},
'avg_purchase_by_category': {
'Accessories': 59.83870967741935,
'Clothing': 60.025331030512376,
'Footwear': 60.25542570951586,
'Outerwear': 57.17283950617284
}
},
'product_preferences': {
'popular_colors': {
'Olive': 177,
'Yellow': 174,
'Silver': 173,
'Teal': 172,
'Green': 169,
'Black': 167,
'Cyan': 166,
'Violet': 166,
'Gray': 159,
'Maroon': 158,
'Orange': 154,
'Charcoal': 153,
'Pink': 153,
'Magenta': 152,
'Blue': 152,
'Purple': 151,
'Peach': 149,
'Red': 148,
'Beige': 147,
'Indigo': 147,
'Lavender': 147,
'Turquoise': 145,
'White': 142,
'Brown': 141,
'Gold': 138
},
'size_distribution': {
'M': 1755,
'L': 1053,
'S': 663,
'XL': 429
},
'seasonal_preference': {
'Spring': 999,
'Fall': 975,
'Winter': 971,
'Summer': 955
}
},
'visualizations': [
{
'title': 'age_distribution',
'image': ''
},
{
'title': 'purchase_by_category',
'image': ''
}
]
}
}
If you ask me, it almost achieved what I wanted; it generated stats and charts in base64 format. However, I wasnβt completely satisfied. I thought, why not have the code produce just the data and let JavaScript handle the chart generation?
Another crucial issue I faced was that the generated code didnβt accept CSV files with different names due to the following clause:
**Field Validation:**
- Validate the presence of required fields in the CSV file:
- Demographic Analysis: Requires `age`, `gender`, `location`.
This caused the Python code to explicitly look for specific fields, defeating the purpose of the project. I wanted Claude to infer field names and adapt accordingly. After many iterations, the following prompt worked as intended:
You are a Python data analyst tasked with creating dynamic analysis code. I will provide you with a dataset in CSV format. Your task is to generate a Python script that:
1. Dynamically identifies and categorizes fields based on their context and meaning in the dataset.
2. Performs modular analysis for each category based on the available fields.
3. Returns the output strictly in a JSON payload format.
{FIELDS_AND_TYPES}
### **Output Requirements**
1. **If Required Fields Are Missing:**
- If no fields can be inferred for certain categories, return:
```json
{
"error": "true",
"message": "Unable to infer required fields for <category> analysis."
}
```
2. **If Code Is Successfully Generated:**
- Return a JSON payload with:
```json
{
"error": "false",
"message": "<Entire Python code, escaped for valid JSON format>"
}
```
3. **Code Escaping:**
- Ensure the `message` field is a valid JSON string:
- Escape all special characters, including newlines (`\n`) and quotes (`\"`).
- Use `json.dumps()` in Python or equivalent methods to serialize the code as a JSON-compatible string.
- Avoid using multi-line string blocks (`"""`) in the JSON output.
4. **Additional Requirements:**
- Do not include any explanatory text, comments, or preambles in the response.
- Avoid any prefixes like "Here's the code" or suffixes explaining the output.
- Only output the final JSON payload.
### **Code Requirements**
1. **Input Handling:**
- Accept a CSV file as a parameter in the `main()` function. The parameter must be named `{CSV_FILE_PATH}` and dynamically passed when calling the function.
- Validate the `csv_file` parameter:
- Check if `csv_file` is a non-empty string.
- Check if the file exists and is readable before proceeding with analysis.
- Raise a clear error (e.g., `FileNotFoundError`) if the file does not exist.
2. **No `if __name__ == '__main__':` Block:**
- Do not include the `if __name__ == '__main__':` block in the generated code.
3. **Output Restrictions:**
- Do not include any `print()` statements in the generated code.
- Return results exclusively via the `main()` function, as a Python dictionary.
- Avoid any direct output, logging, or side effects.
4. **Invoke `main` at the End of the Script:**
- Ensure the script includes a call to `main("{CSV_FILE_PATH}")` at the end, passing the `{CSV_FILE_PATH}` placeholder dynamically.
- Replace `{CSV_FILE_PATH}` with the actual file path during runtime.
- Do not print or log the results in the script.
5. **Output Example for the Generated Code:**
```python
def main(csv_file: str) -> Dict[str, Any]:
try:
if not csv_file:
return {'error': True, 'message': 'No file provided'}
if not os.path.exists(csv_file):
return {'error': True, 'message': f'File not found: {csv_file}'}
# Load and analyze the file
df = pd.read_csv(csv_file)
if df.empty:
return {'error': True, 'message': 'Empty CSV file'}
# Perform analysis
results = {'message': 'File processed successfully'}
return {'error': False, 'results': results}
except Exception as e:
return {'error': True, 'message': str(e)}
results = main("{CSV_FILE_PATH}")
```
### **Dynamic Analysis**
- Dynamically analyze the column names and sample data to:
- Infer field types (e.g., numeric, categorical, boolean)
- Determine the relevance of fields for analysis
**Field Validation**
- Dynamically categorize fields into:
- **Demographic Analysis:** Fields describing age, gender, or geographic distribution.
- **Behavioral Analysis:** Fields related to customer behavior, such as frequency, purchase history, or subscriptions.
- **Purchase Patterns:** Fields related to purchase amounts, categories, or discounts.
- **Product Preferences:** Fields describing product details, such as size, color, and seasonality.
- **Response Analysis:** Fields indicating customer feedback, such as ratings or reviews.
1. **Dynamic Field Identification:**
- Implement logic to identify and classify fields dynamically based on:
- Substring matches (e.g., `AgeGroup` β Age, `Region` β Location).
- Value patterns (e.g., numerical, categorical, boolean).
2. **Modular Analysis:**
- Perform analysis for each identified category.
- Skip categories where no relevant fields can be inferred.
3. **Output:**
- Return results in JSON-compatible format, structured as:
```json
{
"error": false,
"results": {
"summary_statistics": {...},
"segmentation_results": {...},
"behavioral_analysis": {...},
"purchase_patterns": {...},
"product_preferences": {...},
"visualizations": [...]
}
}
```
4. **Escape Code in JSON Output:**
- The Python script should be included in the `message` field as a single JSON-compatible string, with all newlines (`\n`) and special characters escaped.
Below is the generated code from one of the recent prompts:
import pandas as pd
import json
from typing import Dict, Any
import numpy as np
def categorize_fields(df):
categories = {
'demographic': [],
'behavioral': [],
'purchase': [],
'response': []
}
for column in df.columns:
col_lower = column.lower()
if any(term in col_lower for term in ['age', 'gender', 'region']):
categories['demographic'].append(column)
elif any(term in col_lower for term in ['frequency', 'membership', 'preferred']):
categories['behavioral'].append(column)
elif any(term in col_lower for term in ['purchase', 'promo', 'value']):
categories['purchase'].append(column)
elif any(term in col_lower for term in ['review', 'score', 'rating']):
categories['response'].append(column)
return categories
def analyze_demographics(df, demographic_fields):
results = {}
for field in demographic_fields:
results[field] = df[field].value_counts().to_dict()
return results
def analyze_behavioral(df, behavioral_fields):
results = {}
for field in behavioral_fields:
results[field] = df[field].value_counts().to_dict()
return results
def analyze_purchases(df, purchase_fields):
results = {}
numeric_fields = df[purchase_fields].select_dtypes(include=[np.number]).columns
for field in numeric_fields:
results[field] = {
'mean': float(df[field].mean()),
'median': float(df[field].median()),
'std': float(df[field].std())
}
categorical_fields = list(set(purchase_fields) - set(numeric_fields))
for field in categorical_fields:
results[field] = df[field].value_counts().to_dict()
return results
def analyze_response(df, response_fields):
results = {}
for field in response_fields:
if df[field].dtype in [np.float64, np.int64]:
results[field] = {
'mean': float(df[field].mean()),
'median': float(df[field].median())
}
else:
results[field] = df[field].value_counts().to_dict()
return results
def main(csv_file: str) -> Dict[str, Any]:
try:
df = pd.read_csv(csv_file)
categories = categorize_fields(df)
results = {
'error': False,
'results': {
'demographic_analysis': analyze_demographics(df, categories['demographic']),
'behavioral_analysis': analyze_behavioral(df, categories['behavioral']),
'purchase_analysis': analyze_purchases(df, categories['purchase']),
'response_analysis': analyze_response(df, categories['response'])
}
}
return results
except Exception as e:
return {
'error': True,
'message': f'Analysis failed: {str(e)}'
}
results = main('shopping_behavior_test.csv')
print(json.dumps(results, indent=2))
Send JSON to Claude for Python code generation
Below is the code that executes the generated Python code and then makes things compatible with Python data types:
# To execute generated Python code
def execute_runtime_code(generated_code):
try:
sandbox_namespace = {}
# Execute the generated code in the sandboxed namespace
exec(generated_code, sandbox_namespace)
if "main" not in sandbox_namespace:
return {
"error": True,
"message": "main function not found",
"results": None
}
if "results" in sandbox_namespace:
return {
"error": False,
"message": "Execution completed successfully.",
"results": sandbox_namespace["results"]
}
else:
return {
"error": True,
"message": "No 'results' variable found in the generated script.",
"results": None
}
except Exception as e:
return {
"error": True,
"message": f"An error occurred during execution: {str(e)}",
"results": None
}
# Casting numpy datatypes to Python built-in types
def convert_to_serializable(obj):
if isinstance(obj, dict):
return {k: convert_to_serializable(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [convert_to_serializable(item) for item in obj]
elif isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
return obj
Alright. The next step was to access this generated data in JavaScript and show both charts and stats on the webpage.
Execute Python to create JSON structure.
The plan was to send the Python-generated JSON output via an AJAX request and access it in the JavaScript code. However, this approach failed because, while the parent JSON fields remained consistent, the inner fields varied. Hard-coding the JavaScript wasnβt feasible, as files with different schemas caused errors for undefined fields. The solution? Create a new prompt that generates both HTML and JavaScript by inferring the JSON structure produced by the Python code.
For Dashboardβs JavaScript and HTML Generation
The very first prompt was:
You are a front-end developer tasked with creating dynamic dashboards based on JSON data. The JSON data includes fixed top-level categories but variable inner keys. The JSON Data is given below:
```json
{DASHBOARD_JSON}
```
Generate:
1. **HTML Structure:**
- Only the relevant sections of the HTML (not the entire `<html>` page).
- Use Bootstrap for layout and styling.
- Include summary cards for key metrics.
- Each card must have a **dynamic `id`** for future updates.
- Dynamically generate **chart containers** for data visualization.
2. **JavaScript Code:**
- Write JavaScript using **jQuery** to:
- Dynamically populate the summary cards.
- Generate charts based on the JSON data.
- Assume that the JSON data will be provided as a variable named `{DASHBOARD_JSON}`.
- Focus on generating dynamic JS for `demographic_analysis`, `behavioral_analysis`, `purchase_analysis`, and `response_analysis`.
3. **Output Requirements:**
- Output a JSON object with two fields:
- `html`: The Bootstrap-compatible HTML structure as a JSON string.
- `javascript`: The jQuery code to dynamically update the dashboard.
4. **JSON Input Example:**
```json
{
"error": false,
"message": "Execution completed successfully.",
"results": {
"error": false,
"results": {
"demographic_analysis": {
"age_stats": {
"mean": 44.06,
"median": 44.0,
"std": 15.2
},
"gender_distribution": {
"Male": 2652,
"Female": 1248
}
},
"behavioral_analysis": {
"subscription_rate": 0.27
},
"purchase_analysis": {
"amount_stats": {
"mean": 59.76,
"total": 233081
}
},
"response_analysis": {
"rating_stats": {
"mean": 3.75
}
}
}
}
}
```
5. **HTML Section Example:**
```html
<section class="row mb-5" id="demographic-analysis">
<div class="col-md-4">
<div class="card text-white bg-primary mb-3">
<div class="card-body">
<h5 class="card-title">Average Age</h5>
<p class="card-text" id="avg-age">0.00</p>
</div>
</div>
</div>
</section>
```
6. **Expected Output Format:**
```json
{
"html": "<section class='row mb-5' id='demographic-analysis'> ... </section>",
"javascript": "$('#avg-age').text({DASHBOARD_JSON}.results.results.demographic_analysis.age_stats.mean); ... // Full JS code"
}
```
7. **Output Strictness:**
Do not include any preamble or suffix like "Here's the solution" or "I'll help you with this."
The response must contain only the final JSON payload, nothing else.
This prompt accepts the JSON structure generated by the generated code( yeah kind of βinceptionβ you know).
Another thing it was doing to produce the generated output of both HTML and JS in JSON format:
6. **Expected Output Format:**
```json
{
"html": "<section class='row mb-5' id='demographic-analysis'> ... </section>",
"javascript": "$('#avg-age').text({DASHBOARD_JSON}.results.results.demographic_analysis.age_stats.mean); ... // Full JS code"
}
```
It didnβt work due to encoding and escaping issues with the JSON. I decided to return the data in XML format instead. It was cleaner too. Below is one version of the dashboard prompt:
You are a front-end developer tasked with creating dynamic dashboards based on JSON data. The JSON data includes fixed top-level categories but variable inner keys. The JSON Data is given below:
```json
{DASHBOARD_JSON}
```
Generate:
1. **HTML Structure:**
- Only the relevant sections of the HTML (not the entire `<html>` page).
- Use Bootstrap for layout and styling.
- Include summary cards for key metrics.
- Each card must have a **dynamic `id`** for future updates.
- Dynamically generate **chart containers** for data visualization.
2. **JavaScript Code:**
- Write JavaScript using **jQuery** to:
**Variable Declaration:**
- Define a variable rawData containing the JSON string from {DASHBOARD_JSON}.
- Ensure `rawData` is a properly escaped and valid JSON string for use in JavaScript.
**Parsing Logic**
- Parse `rawData` into a JavaScript object using `JSON.parse`.
**Function Definition:**
- Create a function named `generateDashboard` that accepts a single parameter `dashboardData`.
- The function must:
- Dynamically populate summary cards.
- Dynamically generate charts based on the data in `dashboardData`.
2. **Function Invocation:**
- Define a variable `rawData` containing the JSON string from `{DASHBOARD_JSON}`.
4. **Code Structure:**
- Do not include `$(document).ready` in the generated code.
- Only provide:
1. The function definition for `generateDashboard`.
2. The variable `rawData` with `{DASHBOARD_JSON}` as the placeholder.
3. The `JSON.parse` logic.
4. The `generateDashboard` function call.
5. **Dynamic Logic:**
- Assume the input JSON follows this structure:
```json
{
"results": {
"demographic_analysis": { ... },
"behavioral_analysis": { ... },
"purchase_analysis": { ... },
"response_analysis": { ... }
}
}
```
- Use `dashboardData.results` as the base for extracting and visualizing data.
3. **Output Requirements:**
- Generate an XML response with the following format:
1. `<dashboard>`: Root element containing:
- `<html>`: Encapsulates the HTML structure needed to display the dashboard.
- Use a `<![CDATA[]]>` section for HTML content.
- `<javascript>`: Encapsulates the JavaScript code required to populate and render the dashboard.
- Use a `<![CDATA[]]>` section for JavaScript content.
2. Avoid any prefixes, suffixes, or extra text outside the XML structure. Ensure the generated XML is valid and can be directly parsed.
4. **JSON Input Example:**
```json
{
"error": false,
"message": "Execution completed successfully.",
"results": {
"error": false,
"results": {
"demographic_analysis": {
"age_stats": {
"mean": 44.06,
"median": 44.0,
"std": 15.2
},
"gender_distribution": {
"Male": 2652,
"Female": 1248
}
},
"behavioral_analysis": {
"subscription_rate": 0.27
},
"purchase_analysis": {
"amount_stats": {
"mean": 59.76,
"total": 233081
}
},
"response_analysis": {
"rating_stats": {
"mean": 3.75
}
}
}
}
}
```
5. **HTML Summary Card Example:**
```html
<div style="margin-top:20px;" class="row">
<div class="col-md-12">
<!-- Summary Section -->
<section class="row mb-5">
<div class="col-md-4">
<div class="card text-white bg-primary mb-3">
<div class="card-body">
<h5 class="card-title">Average Purchase Amount</h5>
<p class="card-text" id="average-purchase-amount">$0.00</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card text-white bg-success mb-3">
<div class="card-body">
<h5 class="card-title">Total Revenue</h5>
<p class="card-text" id="total-revenue">$0.00</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card text-white bg-warning mb-3">
<div class="card-body">
<h5 class="card-title">Subscription Rate</h5>
<p class="card-text" id="subscription-rate">0%</p>
</div>
</div>
</div>
</section>
</div>
</div>
```
**HTML Charts Example:**
```html
<section>
<style>
.chart-container {
display: flex;
justify-content: center; /* Center horizontally */
align-items: center; /* Center vertically (if needed) */
}
.charthing {
width: 400px !important;
height: 300px !important;
}
</style>
<!-- Age Distribution Chart -->
<div class="chart-container">
<div>
<h5 class="text-center">Age Distribution</h5>
<canvas class="charthing" id="ageDistributionChart" width="800" height="400" style="display: block; box-sizing: border-box; height: 200px; width: 400px;"></canvas>
</div>
</div>
<!-- Gender Distribution Chart -->
<div class="chart-container">
<h5>Gender Distribution</h5>
<canvas class="charthing" id="genderDistributionChart" width="600" height="600" style="display: block; box-sizing: border-box; height: 300px; width: 300px;"></canvas>
</div>
<!-- Color Preferences Chart -->
<div class="chart-container">
<h5>Color Preferences</h5>
<canvas class="charthing" id="colorPreferencesChart" width="1200" height="600" style="display: block; box-sizing: border-box; height: 300px; width: 600px;"></canvas>
</div>
</section>
```
6. **Expected Output Format:**
```json
{
"html": "<section class='row mb-5' id='demographic-analysis'> ... </section>",
"javascript": "$('#avg-age').text({DASHBOARD_JSON}.results.results.demographic_analysis.age_stats.mean); ... // Full JS code"
}
```
7. **Output Strictness:**
Do not include any preamble or suffix like "Here's the solution" or "I'll help you with this."
The response must contain only the final JSON payload, nothing else.
It almost worked but there was a glitch. In the section that was responsible for generating JS code, somehow it was not assigning the returned JSON for the dashboard itself in a variable:
2. **JavaScript Code:**
- Write JavaScript using **jQuery** to:
**Variable Declaration:**
- Define a variable rawData containing the JSON string from {DASHBOARD_JSON}.
- Ensure `rawData` is a properly escaped and valid JSON string for use in JavaScript.
**Parsing Logic**
- Parse `rawData` into a JavaScript object using `JSON.parse`.
**Function Definition:**
- Create a function named `generateDashboard` that accepts a single parameter `dashboardData`.
- The function must:
- Dynamically populate summary cards.
- Dynamically generate charts based on the data in `dashboardData`.
It took a few iterations to come up with an output like the one below:
<?xml version="1.0" encoding="UTF-8"?>
<dashboard>
<html>
<![CDATA[
<div class="container-fluid">
<!-- Summary Cards -->
<div class="row mt-4">
<div class="col-md-3">
<div class="card bg-primary text-white">
<div class="card-body">
<h5 class="card-title">Average Age</h5>
<p class="card-text" id="avg-age">0</p>
</div>
</div>
</div>
<div class="col-md-3">
<div class="card bg-success text-white">
127.0.0.1 - - [15/Jan/2025 09:50:07] "POST /upload HTTP/1.1" 200 -
<div class="card-body">
<h5 class="card-title">Total Sales</h5>
<p class="card-text" id="total-sales">$0</p>
</div>
</div>
</div>
<div class="col-md-3">
<div class="card bg-info text-white">
<div class="card-body">
<h5 class="card-title">Average Rating</h5>
<p class="card-text" id="avg-rating">0</p>
</div>
</div>
</div>
<div class="col-md-3">
<div class="card bg-warning text-white">
<div class="card-body">
<h5 class="card-title">Subscription Rate</h5>
<p class="card-text" id="sub-rate">0%</p>
</div>
</div>
</div>
</div>
<!-- Charts -->
<div class="row mt-4">
<div class="col-md-6">
<div class="card">
<div class="card-body">
<h5 class="card-title">Gender Distribution</h5>
<canvas id="genderChart"></canvas>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card">
<div class="card-body">
<h5 class="card-title">Size Distribution</h5>
<canvas id="sizeChart"></canvas>
</div>
</div>
</div>
</div>
<div class="row mt-4">
<div class="col-md-6">
<div class="card">
<div class="card-body">
<h5 class="card-title">Category Distribution</h5>
<canvas id="categoryChart"></canvas>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card">
<div class="card-body">
<h5 class="card-title">Seasonal Preferences</h5>
<canvas id="seasonalChart"></canvas>
</div>
</div>
</div>
</div>
</div>
]]>
</html>
<javascript>
<![CDATA[
const rawData = `{"error":false,"message":"Execution completed successfully.","results":{"error":false,"results":{"demographic_analysis":{"age_stats":{"mean":44.06846153846154,"median":44.0,"std":15.207589127162382},"gender_distribution":{"Male":2652,"Female":1248},"location_distribution":{"Montana":96,"California":95,"Idaho":93,"Illinois":92,"Alabama":89,"Minnesota":88,"Nebraska":87,"New York":87,"Nevada":87,"Maryland":86,"Delaware":86,"Vermont":85,"Louisiana":84,"North Dakota":83,"Missouri":81,"West Virginia":81,"New Mexico":81,"Mississippi":80,"Indiana":79,"Georgia":79,"Kentucky":79,"Arkansas":79,"North Carolina":78,"Connecticut":78,"Virginia":77,"Ohio":77,"Tennessee":77,"Texas":77,"Maine":77,"South Carolina":76,"Colorado":75,"Oklahoma":75,"Wisconsin":75,"Oregon":74,"Pennsylvania":74,"Washington":73,"Michigan":73,"Alaska":72,"Massachusetts":72,"Wyoming":71,"Utah":71,"New Hampshire":71,"South Dakota":70,"Iowa":69,"Florida":68,"New Jersey":67,"Hawaii":65,"Arizona":65,"Kansas":63,"Rhode Island":63}},"purchase_patterns":{"amount_stats":{"mean":59.76435897435898,"median":60.0,"total":233081},"category_distribution":{"Clothing":1737,"Accessories":1240,"Footwear":599,"Outerwear":324},"discount_usage":0.43,"promo_usage":0.43},"product_preferences":{"size_distribution":{"M":1755,"L":1053,"S":663,"XL":429},"color_preferences":{"Olive":177,"Yellow":174,"Silver":173,"Teal":172,"Green":169,"Black":167,"Cyan":166,"Violet":166,"Gray":159,"Maroon":158,"Orange":154,"Charcoal":153,"Pink":153,"Magenta":152,"Blue":152,"Purple":151,"Peach":149,"Red":148,"Beige":147,"Indigo":147,"Lavender":147,"Turquoise":145,"White":142,"Brown":141,"Gold":138},"seasonal_preferences":{"Spring":999,"Fall":975,"Winter":971,"Summer":955},"item_popularity":{"Blouse":171,"Jewelry":171,"Pants":171,"Shirt":169,"Dress":166,"Sweater":164,"Jacket":163,"Belt":161,"Sunglasses":161,"Coat":161,"Sandals":160,"Socks":159,"Skirt":158,"Shorts":157,"Scarf":157,"Hat":154,"Handbag":153,"Hoodie":151,"Shoes":150,"T-shirt":147,"Sneakers":145,"Boots":144,"Backpack":143,"Gloves":140,"Jeans":124}},"customer_behavior":{"avg_review_rating":3.7499487179487176,"subscription_rate":0.27,"shipping_preferences":{"Free Shipping":675,"Standard":654,"Store Pickup":650,"Next Day Air":648,"Express":646,"2-Day Shipping":627},"payment_methods":{"PayPal":677,"Credit Card":671,"Cash":670,"Debit Card":636,"Venmo":634,"Bank Transfer":612},"purchase_frequency":{"Every 3 Months":584,"Annually":572,"Quarterly":563,"Monthly":553,"Bi-Weekly":547,"Fortnightly":542,"Weekly":539},"avg_previous_purchases":25.35153846153846}}}}`;
const dashboardData = JSON.parse(rawData);
const data = dashboardData.results.results;
function generateDashboard(data) {
// Update Summary Cards
$('#avg-age').text(data.demographic_analysis.age_stats.mean.toFixed(2));
$('#total-sales').text(`$${data.purchase_patterns.amount_stats.total.toLocaleString()}`);
$('#avg-rating').text(data.customer_behavior.avg_review_rating.toFixed(2));
$('#sub-rate').text(`${(data.customer_behavior.subscription_rate * 100).toFixed(1)}%`);
// Gender Distribution Chart
new Chart(document.getElementById('genderChart'), {
type: 'pie',
data: {
labels: Object.keys(data.demographic_analysis.gender_distribution),
datasets: [{
data: Object.values(data.demographic_analysis.gender_distribution),
backgroundColor: ['#36A2EB', '#FF6384']
}]
}
});
// Size Distribution Chart
new Chart(document.getElementById('sizeChart'), {
type: 'bar',
data: {
labels: Object.keys(data.product_preferences.size_distribution),
datasets: [{
label: 'Size Distribution',
data: Object.values(data.product_preferences.size_distribution),
backgroundColor: '#4BC0C0'
}]
}
});
// Category Distribution Chart
new Chart(document.getElementById('categoryChart'), {
type: 'bar',
data: {
labels: Object.keys(data.purchase_patterns.category_distribution),
datasets: [{
label: 'Category Distribution',
data: Object.values(data.purchase_patterns.category_distribution),
backgroundColor: '#FFCE56'
}]
}
});
// Seasonal Preferences Chart
new Chart(document.getElementById('seasonalChart'), {
type: 'bar',
data: {
labels: Object.keys(data.product_preferences.seasonal_preferences),
datasets: [{
label: 'Seasonal Preferences',
data: Object.values(data.product_preferences.seasonal_preferences),
backgroundColor: '#FF9F40'
}]
}
});
}
generateDashboard(data);
]]>
</javascript>
</dashboard>
As you can see, the rawData
variable holds the JSON output generated by the Python code. It produces summary cards and charts by inferring the structure of the incoming JSON. The entire HTML and JS code is generated based on the inference of the JSON output from the Python code.
One more thing
No, no, Iβm not trying to imitate Steve Jobs. When I was preparing the demo for this project, I initially planned to create a CSV file unrelated to behavior analysis just for testing. I uploaded the file, and to my surprise, Claude accepted it and generated the damn code.
A,B,C
LOL,LOL,LOL
Obviously, the generated JS code didnβt work at all. The code produced a dictionary like this:
{
'error': False,
'message': 'Execution completed successfully.',
'results': {
'error': True,
'results': {
.......
So, while the error inside the results was correctly returning True
with the relevant text in the message
field, the outer error was still False
. This was happening due to the following code:
results = sandbox_namespace['results']
if results['error']:
return {
"error": True,
"message": "Execution failed",
"results": sandbox_namespace["results"]
}
else:
return {
"error": False,
"message": "Execution completed successfully.",
"results": sandbox_namespace["results"]
}
As you can see I was only testing the existence of results
key only. I fixed it by doing the following:
results = sandbox_namespace['results']
if results['error']:
return {
"error": True,
"message": "Execution failed",
"results": sandbox_namespace["results"]
}
else:
return {
"error": False,
"message": "Execution completed successfully.",
"results": sandbox_namespace["results"]
}
OK, this was fixed, but I also need to make changes on the JS side in the index.html
file.
if(response.error) {
console.log(response.message)
$("#errorMessage").show()
$("#errorMessage").html(`<div class="alert alert-danger">${response.message}</div>`)
$("#wait").hide()
return true
}
OK, everything was fine but then I started getting the error:
Uncaught SyntaxError: Identifier 'rawData' has already been declared
Somehow, the rawData
variable reference was being retained. I tried the following, but it still didn't work:
$("#runtimeJs").remove()
// Dynamically execute the JavaScript
const script = document.createElement('script')
script.id = "runtimeJs"
script.type = 'text/javascript'
script.text = jsContent
document.body.appendChild(script);
The relevant prompt section was:
2. **JavaScript Code:**
- Write JavaScript using **jQuery** to:
**Variable Declaration:**
- Define a variable rawData containing the JSON string from {DASHBOARD_JSON}.
Example:
```javascript
const rawData = `{DASHBOARD_JSON}`;
```
- Ensure `rawData` is a properly escaped and valid JSON string for use in JavaScript.
**Parsing Logic**
- Parse the JSON string into a JavaScript object using JSON.parse
Example:
```javascript
const dashboardData = JSON.parse(rawData);
```
- Extract the main data object for rendering charts and cards:
```javascript
const data = dashboardData.results.results;
```
First, it was declared as const
, preventing the removal of existing references. The final changes are:
2. **JavaScript Code:**
- Write JavaScript using **jQuery** to:
**Variable Cleanup (Before Declaration)**
- Ensure `rawData` is removed before defining it again.
- Ensure `dashboardData` is removed before defining it again.
Example:
```javascript
delete window.rawData
delete window.dashboardData
```
**Variable Declaration:**
- Define a variable `rawData` using `var` (instead of `const`) to avoid redeclaration errors.
Example:
```javascript
var rawData = `{DASHBOARD_JSON}`;
```
- Ensure `rawData` is a properly escaped and valid JSON string for use in JavaScript.
**Parsing Logic**
- Parse the JSON string into a JavaScript object using JSON.parse
Example:
```javascript
var dashboardData = JSON.parse(rawData);
```
- Extract the main data object for rendering charts and cards:
```javascript
var data = dashboardData.results.results;
```
Now, it generated the JS code as follows:
// Clean up existing variables
delete window.rawData;
delete window.dashboardData;
// Define raw data
var rawData = `....`
As you can see, the generated code now clears all context related to both rawData
and dashboardData
from the existing browser window. To be on the safe side, I added the following before injecting and executing the runtime JS code in the current window:
// Cleaning previous JS execution
// Remove existing one
$("#runtimeJs").remove()
window.rawData = undefined
delete window.rawData
window.dashboardData = undefined
delete window.dashboardData
if (window.Chart) {
Chart.helpers.each(Chart.instances, function(instance) {
instance.destroy();
});
}
.....
Now it was removing all existing traces of both variables before injecting and executing the incoming JS code.
And while I was running the final tests, the app suddenly stopped working. I then received the following email:
Ouch! My entire $5 free credit evaporated. I had to purchase credits just to run the demo.
Conclusion
Alright, so you saw how amazing and powerful Claude is to perform such kinds of tasks. Claude is not only good at code generation but also at performing data analysis. I donβt know whether you got excited similar to how I was while working on this project, but Iβd definitely tell you that it took hours and multiple sessions over days to make it happen. I have not covered all the βtoy scriptsβ I had written for different components of this project, as it would make this post unnecessarily lengthy, and you could have gotten bored. However, I will be putting all prompt iterations in a separate folder for your learning. Hope youβll enjoy it. Like always, the code is available on GitHub.
Looking to create something similar or even more exciting? Schedule a meeting or email me at kadnan @ gmail.com.
Love What Youβre Learning Here?
If my posts have sparked ideas or saved you time, consider supporting my journey of learning and sharing. Even a small contribution helps me keep this blog alive and thriving.
Originally published at https://blog.adnansiddiqi.me on January 30, 2025.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI