Speech-To-SQL Using OpenAI’s Whisper and Ollama (Llama3)

Last Updated on October 31, 2024 by Editorial Team

Author(s): Tapan Babbar

Originally published on Towards AI.

Source: 10points Gryffindor Richard Harris GIF by blue_elephant on Tenor

I was rewatching the Harry Potter series and kept wondering: teachers and prefects were deducting points from houses left and right! But do they keep track of these point changes across so many classes? What about data integrity? Scalability? Write conflicts? Surely they’d need something scalable, like a pub-sub system for house point updates. Besides the scale, how well does the speech recognition have to be?

On a slightly serious note, that got me thinking — could we recreate some of it with AI? What if we could go from speech directly to SQL? That’s how I ended up diving into this fun little experiment: Speech-to-SQL using OpenAI’s Whisper for transcription and Meta’s Llama3 for converting text into SQL queries.

Here’s how I did it, and you can too in just four simple steps:

Step 1 — Record Audio

We kick things off by capturing audio using a simple Python setup. Using the sounddevice library, we record audio directly from your mic, then temporarily save it as a .wav file to transcribe later.

import sounddevice as sd
import tempfile
import wave

# Function to record audio from the microphone and save it as a WAV file
def record_audio(duration, sample_rate=16000):
 print("Recording...")
 audio_data = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1, dtype='float32')
 sd.wait() # Wait for the recording to finish
 print("Recording finished.")
 
 # Save the audio to a temporary WAV file
 temp_wav = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
 with wave.open(temp_wav.name, 'wb') as wf:
 wf.setnchannels(1) # Mono channel
 wf.setsampwidth(2) # 16-bit audio
 wf.setframerate(sample_rate)
 wf.writeframes(np.int16(audio_data * 32767)) # Convert float32 to int16
 
 return temp_wav.name

This is where it all starts: a humble voice command ready to transform into SQL!

Step 2: Speech-to-Text with Whisper

Next, we transcribe the audio using OpenAI’s Whisper model. This model is excellent for converting speech into text. It’s almost like having a personal assistant who listens to your commands and writes them down — just more reliable and scalable.

import whisper
import os

# Function to transcribe audio from the microphone using Whisper
def audio_to_text_from_mic(duration=5):
 # Record audio from the microphone
 audio_file = record_audio(duration)

 # Load Whisper model
 model = whisper.load_model("turbo") # You can use "turbo", "small", etc.
 
 # Transcribe the recorded audio file
 result = model.transcribe(audio_file)

 # Delete the temporary audio file after transcription
 os.remove(audio_file)

 return result['text']

# Example usage
text = audio_to_text_from_mic(duration=3) # Record for 5 seconds
print("Transcription:", text)

Transcription: 10 points to Gryffindor

Your natural language command is now in text form, ready for the next level of transformation.

Step 3: Text-to-SQL using Llama 3

Now for the real magic — turning that transcribed text into SQL commands. Using the Llama 3 model, we feed the natural language command (“Add 10 points to Gryffindor”) and out a valid SQL query.

We start by building a prompt that gives context about the database schema. In our case, the house_points table has two columns: house_name (the name of the house) and points (the current point total). The prompt clearly explains this structure and instructs the model to return a well-formed SQL UPDATE query without unnecessary explanations.

Here’s what’s happening step-by-step:

1. Define the Table Schema: We provide the table's structure, making it clear to the model what it looks like. The schema specifies that the table contains house_name and points.

 table_schemas = """
 house_points(house_name TEXT PRIMARY KEY, points INTEGER)
 """

2. Create a Prompt: We generate a prompt that asks Llama 3 to convert the natural language command into an SQL UPDATE query. It explicitly asks for a response in JSON format with only the query, ensuring that the output is clean and usable.

 prompt = f"""
 You are a SQL expert.
 
 Please help to convert the following natural language command into a valid UPDATE SQL query. Your response should ONLY be based on the given context and follow the response guidelines and format instructions.

 ===Tables
 {table_schemas}

 ===Response Guidelines
 1. If the provided context is sufficient, please generate a valid query WITHOUT any explanations for the question.
 2. Please format the query before responding.
 3. Please always respond with a valid well-formed JSON object with the following format
 4. There are only UPDATE queries and points are either added or deducted from a house

 ===Response Format
 {{
 "query": "A valid UPDATE SQL query when context is sufficient.",
 }}

 ===command
 {natural_language_text}
 """

3. Send Request to Llama 3: The text is then sent to the LLM using Ollama API. The model processes the request and returns a JSON object with the SQL query. We parse the model’s JSON response to extract the SQL query. If something goes wrong, such as a failure to parse the response, an error is returned. This ensures robustness in the code.

import ollama
import json

response = ollama.chat(
 model="llama3",
 messages=[{"role": "user", "content": prompt}]
 )
# Directly return the content as it should now be only the SQL query
# Parse the JSON response and return the SQL query if provided
response_content = response['message']['content']
# Directly return the content as it should now be only the SQL query
 # Parse the JSON response and return the SQL query if provided
 response_content = response['message']['content']
 
 try:
 response_json = json.loads(response_content)
 if "query" in response_json:
 return response_json["query"]
 else:
 return f"Error: {response_json.get('explanation', 'No explanation provided.')}"
 except json.JSONDecodeError:
 return "Error: Failed to parse response as JSON."

With this, your “10 points to Gryffindor” becomes an SQL query like:

UPDATE house_points SET points = points + 10 WHERE house_name = 'Gryffindor';

Step 4 — Running the SQL Query

Finally, we take the generated SQL query and execute it on our database to update the house points. But before diving into query execution, let’s make sure the initial setup is in place.

First, you’ll need a table to track the points for each Hogwarts house. Here’s a simple table structure that does the job:

CREATE TABLE house_points (
 house_name VARCHAR(50) PRIMARY KEY,
 points INT
);

Now, populate the table with the initial points for each house. Here’s a quick SQL command to give each house 100 points to start with:

INSERT INTO house_points (house_name, points)
VALUES ('Gryffindor', 100), ('Hufflepuff', 100), ('Ravenclaw', 100), ('Slytherin', 100);

Once your database is ready, you’ll need to establish a connection to run queries. Using SQLAlchemy makes this super easy. Here’s how you set up the connection:

from sqlalchemy import create_engine, text

engine = create_engine('postgresql://db_user:db_password@localhost/db_name')
def run_sql_query(query):
 with engine.connect() as conn:
 conn.execute(text(query))
 conn.commit()

Replace 'db_user', 'db_password', and 'db_name' with your actual PostgreSQL credentials and database name.

This function takes the SQL query produced by our Speech-to-SQL script and executes it on your database. Every time a new voice command updates the points, this function will run the corresponding SQL and commit the changes, ensuring real-time updates to the house points table.

You’ve just updated your Hogwarts house points through voice commands — how’s that for magical?

Bonus: Bringing It All Together in Streamlit

To make your project even more user-friendly, why not wrap the entire solution into a web app? With Streamlit, you can build an interactive interface where users can record their voice commands, see live updates of house points, and view visuals like house logos.

I’ve already included the complete code for this in my Github repo, which you can check out and customize to suit your needs.

Conclusion

With just a few lines of code, you’ve built a voice-powered SQL executor. Perfect for those moments when you’re channeling your inner Professor McGonagall, deducting points with flair. So, whether you’re managing Hogwarts or any dataset, Speech-to-SQL might just be the spell you need to save time.

The full source code and the Jupyter Notebook are available in the GitHub repository. Feel free to reach out if you have ideas for improvements or any other observations.

GitHub Repo

Source: Snape Clap GIF by TaylorLopez on Tenor

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Speech-To-SQL Using OpenAI’s Whisper and Ollama (Llama3)

Author(s): Tapan Babbar

Step 1 — Record Audio

Step 2: Speech-to-Text with Whisper

Step 3: Text-to-SQL using Llama 3

Step 4 — Running the SQL Query

Bonus: Bringing It All Together in Streamlit

Conclusion

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Secret to Unlocking Deeper SWOT Analysis with AI (The Code That Started It All — and How I Took It to the Next Level)

Evaluating and Monitoring LLM Agents: Tools, Metrics, and Best Practices

Building Multi-Agent AI Systems From Scratch: OpenAI vs. Ollama

Web-LLM Assistant: Bridging Local AI Models With Real-Time Web Intelligence

ChatGPT Gets Windows App

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Speech-To-SQL Using OpenAI’s Whisper and Ollama (Llama3)

Author(s): Tapan Babbar

Step 1 — Record Audio

Step 2: Speech-to-Text with Whisper

Step 3: Text-to-SQL using Llama 3

Step 4 — Running the SQL Query

Bonus: Bringing It All Together in Streamlit

Conclusion

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement