Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Speech-To-SQL Using OpenAI’s Whisper and Ollama (Llama3)
Latest   Machine Learning

Speech-To-SQL Using OpenAI’s Whisper and Ollama (Llama3)

Last Updated on October 31, 2024 by Editorial Team

Author(s): Tapan Babbar

Originally published on Towards AI.

Source: 10points Gryffindor Richard Harris GIF by blue_elephant on Tenor

I was rewatching the Harry Potter series and kept wondering: teachers and prefects were deducting points from houses left and right! But do they keep track of these point changes across so many classes? What about data integrity? Scalability? Write conflicts? Surely they’d need something scalable, like a pub-sub system for house point updates. Besides the scale, how well does the speech recognition have to be?

On a slightly serious note, that got me thinking — could we recreate some of it with AI? What if we could go from speech directly to SQL? That’s how I ended up diving into this fun little experiment: Speech-to-SQL using OpenAI’s Whisper for transcription and Meta’s Llama3 for converting text into SQL queries.

Source: Image by the author.

Here’s how I did it, and you can too in just four simple steps:

Step 1 — Record Audio

We kick things off by capturing audio using a simple Python setup. Using the sounddevice library, we record audio directly from your mic, then temporarily save it as a .wav file to transcribe later.

import sounddevice as sd
import tempfile
import wave

# Function to record audio from the microphone and save it as a WAV file
def record_audio(duration, sample_rate=16000):
print("Recording...")
audio_data = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1, dtype='float32')
sd.wait() # Wait for the recording to finish
print("Recording finished.")

# Save the audio to a temporary WAV file
temp_wav = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
with wave.open(temp_wav.name, 'wb') as wf:
wf.setnchannels(1) # Mono channel
wf.setsampwidth(2) # 16-bit audio
wf.setframerate(sample_rate)
wf.writeframes(np.int16(audio_data * 32767)) # Convert float32 to int16

return temp_wav.name

This is where it all starts: a humble voice command ready to transform into SQL!

Step 2: Speech-to-Text with Whisper

Next, we transcribe the audio using OpenAI’s Whisper model. This model is excellent for converting speech into text. It’s almost like having a personal assistant who listens to your commands and writes them down — just more reliable and scalable.

import whisper
import os

# Function to transcribe audio from the microphone using Whisper
def audio_to_text_from_mic(duration=5):
# Record audio from the microphone
audio_file = record_audio(duration)

# Load Whisper model
model = whisper.load_model("turbo") # You can use "turbo", "small", etc.

# Transcribe the recorded audio file
result = model.transcribe(audio_file)

# Delete the temporary audio file after transcription
os.remove(audio_file)

return result['text']

# Example usage
text = audio_to_text_from_mic(duration=3) # Record for 5 seconds
print("Transcription:", text)
Transcription: 10 points to Gryffindor

Your natural language command is now in text form, ready for the next level of transformation.

Step 3: Text-to-SQL using Llama 3

Now for the real magic — turning that transcribed text into SQL commands. Using the Llama 3 model, we feed the natural language command (“Add 10 points to Gryffindor”) and out a valid SQL query.

We start by building a prompt that gives context about the database schema. In our case, the house_points table has two columns: house_name (the name of the house) and points (the current point total). The prompt clearly explains this structure and instructs the model to return a well-formed SQL UPDATE query without unnecessary explanations.

Here’s what’s happening step-by-step:

1. Define the Table Schema: We provide the table's structure, making it clear to the model what it looks like. The schema specifies that the table contains house_name and points.

 table_schemas = """
house_points(house_name TEXT PRIMARY KEY, points INTEGER)
"""

2. Create a Prompt: We generate a prompt that asks Llama 3 to convert the natural language command into an SQL UPDATE query. It explicitly asks for a response in JSON format with only the query, ensuring that the output is clean and usable.

 prompt = f"""
You are a SQL expert.

Please help to convert the following natural language command into a valid UPDATE SQL query. Your response should ONLY be based on the given context and follow the response guidelines and format instructions.

===Tables
{table_schemas}

===Response Guidelines
1. If the provided context is sufficient, please generate a valid query WITHOUT any explanations for the question.
2. Please format the query before responding.
3. Please always respond with a valid well-formed JSON object with the following format
4. There are only UPDATE queries and points are either added or deducted from a house

===Response Format
{{
"query": "A valid UPDATE SQL query when context is sufficient.",
}}

===command
{natural_language_text}
"""

3. Send Request to Llama 3: The text is then sent to the LLM using Ollama API. The model processes the request and returns a JSON object with the SQL query. We parse the model’s JSON response to extract the SQL query. If something goes wrong, such as a failure to parse the response, an error is returned. This ensures robustness in the code.

import ollama
import json

response = ollama.chat(
model="llama3",
messages=[{"role": "user", "content": prompt}]
)
# Directly return the content as it should now be only the SQL query
# Parse the JSON response and return the SQL query if provided
response_content = response['message']['content']
# Directly return the content as it should now be only the SQL query
# Parse the JSON response and return the SQL query if provided
response_content = response['message']['content']

try:
response_json = json.loads(response_content)
if "query" in response_json:
return response_json["query"]
else:
return f"Error: {response_json.get('explanation', 'No explanation provided.')}"
except json.JSONDecodeError:
return "Error: Failed to parse response as JSON."

With this, your “10 points to Gryffindor” becomes an SQL query like:

UPDATE house_points SET points = points + 10 WHERE house_name = 'Gryffindor';

Step 4 — Running the SQL Query

Finally, we take the generated SQL query and execute it on our database to update the house points. But before diving into query execution, let’s make sure the initial setup is in place.

First, you’ll need a table to track the points for each Hogwarts house. Here’s a simple table structure that does the job:

CREATE TABLE house_points (
house_name VARCHAR(50) PRIMARY KEY,
points INT
);

Now, populate the table with the initial points for each house. Here’s a quick SQL command to give each house 100 points to start with:

INSERT INTO house_points (house_name, points)
VALUES ('Gryffindor', 100), ('Hufflepuff', 100), ('Ravenclaw', 100), ('Slytherin', 100);

Once your database is ready, you’ll need to establish a connection to run queries. Using SQLAlchemy makes this super easy. Here’s how you set up the connection:

from sqlalchemy import create_engine, text

engine = create_engine('postgresql://db_user:db_password@localhost/db_name')
def run_sql_query(query):
with engine.connect() as conn:
conn.execute(text(query))
conn.commit()

Replace 'db_user', 'db_password', and 'db_name' with your actual PostgreSQL credentials and database name.

This function takes the SQL query produced by our Speech-to-SQL script and executes it on your database. Every time a new voice command updates the points, this function will run the corresponding SQL and commit the changes, ensuring real-time updates to the house points table.

You’ve just updated your Hogwarts house points through voice commands — how’s that for magical?

Bonus: Bringing It All Together in Streamlit

To make your project even more user-friendly, why not wrap the entire solution into a web app? With Streamlit, you can build an interactive interface where users can record their voice commands, see live updates of house points, and view visuals like house logos.

I’ve already included the complete code for this in my Github repo, which you can check out and customize to suit your needs.

Source: Image by the author.

Conclusion

With just a few lines of code, you’ve built a voice-powered SQL executor. Perfect for those moments when you’re channeling your inner Professor McGonagall, deducting points with flair. So, whether you’re managing Hogwarts or any dataset, Speech-to-SQL might just be the spell you need to save time.

The full source code and the Jupyter Notebook are available in the GitHub repository. Feel free to reach out if you have ideas for improvements or any other observations.

GitHub Repo

Source: Snape Clap GIF by TaylorLopez on Tenor

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓