How To Set up a NL2SQL System With Azure OpenAI Studio
Last Updated on November 10, 2023 by Editorial Team
Author(s): Pere Martra
Originally published on Towards AI.
This article is part of a free course about Large Language Models available on GitHub.
In the previous article, we learned how to set up a prompt able to generate SQL commands from the user requests.
Now, we will see how to use Azure OpenAI Studio to create an inference endpoint that we can call to generate SQL commands.
Although if youβre reading these lines, you probably already meet the requirements, Iβll remind you that itβs necessary to have:
- You will need an Azure subscription, which is free and comes with initial free credits if you sign up for the first time. Create an Azure subscription.
- You will also need to get access to OpenAI resources. Currently, Microsoft have reduced the requirements and waiting times. The application form may seem more intimidating than it actually is. I requested access using an email from a registered domain and mentioned that I wanted information for a course. They granted me access almost immediately. Request access to Azure OpenAI Studio.
It is highly likely that the need to request access will disappear soon. So, I would try anyway, even if you havenβt applied for permission.
Setting up Azure OpenAI Studio.
We can access it directly from the Azure portal: https://portal.azure.com/#home
Once inside, in services, we need to select Azure AI Services.
We click on Azure OpenAI Account.
In this screen, we need to choose the Subscription. If you have a student subscription, itβs likely that it wonβt work. Create a different one.
For the resource group, I would create a new one. You can give it any name you like. Itβs just a logical grouping that keeps all the resources under one group. This way, when you delete the group, you remove all the created resources without affecting any other projects or groups. Since this is just a temporary project, itβs a good practice to delete it, at the end, to avoid incurring any costs.
For the Region, I would choose the one closest to your location. Azure will only show you regions where OpenAI Services are available.
For the Name, pick a unique name for your project
For the Pricing Tier, select Standard S0. This is likely the only option available to you.
With this, you can press the Review+Create button, verify the data, and you can start enjoying the service.
The configuration will take a few minutes to complete, and when itβs done, youβll see a screen similar to this one:
Click on Go to Resource.
Now, you can create a new deployment from the Model Deployments section. Just give it a name and choose the model: GPT-3.5-Turbo.
In just a few seconds, Azure OpenAI Studio wthe deployment will be ready.
Jusct clicking on the Deployment name we can start working.
In the PlayGround section, select Chat, and you can start inputting your prompt.
We have created the prompt following the paper βHow to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings.β
create table employees(
ID_Usr INT primary key,
name VARCHAR);
/*3 example rows
select * from employees limit 3;
ID_Usr name
1344 George StPierre
2122 Jon jones
1265 Anderson Silva
*/
create table salary(
ID_Usr INT,
year DATE,
salary FLOAT,
foreign key (ID_Usr) references employees(ID_Usr));
/*3 example rows
select * from salary limit 3
ID_Usr date salary
1344 01/01/2023 61000
1344 01/01/2022 60000
1265 01/01/2023 55000
*/
create table studies(
ID_study INT,
ID_Usr INT,
educational_level INT, /* 5=phd, 4=Master, 3=Bachelor */
Institution VARCHAR,
Years DATE,
Speciality VARCHAR,
primary key (ID_study, ID_Usr),
foreign key(ID_Usr) references employees (ID_Usr));
/*3 example rows
select * from studies limit 3
ID_Study ID_Usr educational_level Institution Years Speciality
2782 1344 3 UC San Diego 01/01/2010 Bachelor of Science in Marketing
2334 1344 5 MIT 01/01/2023 Phd. Data Science.
2782 2122 3 UC San Diego 01/01/2010 Bachelor of Science in Marketing
*/
-Maintain the SQL order simple and efficient as you can, using valid SQL Lite, answer the following questions for the table provided above.
Question: How Many employes we have with a salary bigger than 50000?
SELECT COUNT(*) AS total_employees
FROM employees e
INNER JOIN salary s ON e.ID_Usr = s.ID_Usr
WHERE s.salary > 50000;
Question: Return the names of the three people who have had the highest salary increase in the last three years.
SELECT e.name
FROM employees e
JOIN salary s ON e.ID_usr = s.ID_usr
WHERE s.year >= DATE_SUB(CURDATE(), INTERVAL 3 YEAR)
GROUP BY e.name
ORDER BY (MAX(s.salary) - MIN(s.salary)) DESC
LIMIT 3;
Question: The name of the best paid employee.
This is our complete prompt. We will need to split it into parts to place each section in its corresponding box.
In the System message box, we will put the entire prompt except for the examples:
create table employees(
ID_Usr INT primary key,
name VARCHAR);
/*3 example rows
select * from employees limit 3;
ID_Usr name
1344 George StPierre
2122 Jon jones
1265 Anderson Silva
*/
create table salary(
ID_Usr INT,
year DATE,
salary FLOAT,
foreign key (ID_Usr) references employees(ID_Usr));
/*3 example rows
select * from salary limit 3
ID_Usr date salary
1344 01/01/2023 61000
1344 01/01/2022 60000
1265 01/01/2023 55000
*/
create table studies(
ID_study INT,
ID_Usr INT,
educational_level INT, /* 5=phd, 4=Master, 3=Bachelor */
Institution VARCHAR,
Years DATE,
Speciality VARCHAR,
primary key (ID_study, ID_Usr),
foreign key(ID_Usr) references employees (ID_Usr));
/*3 example rows
select * from studies limit 3
ID_Study ID_Usr educational_level Institution Years Speciality
2782 1344 3 UC San Diego 01/01/2010 Bachelor of Science in Marketing
2334 1344 5 MIT 01/01/2023 Phd. Data Science.
2782 2122 3 UC San Diego 01/01/2010 Bachelor of Science in Marketing
*/
-Maintain the SQL order simple and efficient as you can, using valid SQL Lite, answer the following questions for the table provided above.
In this section, we provided the database structure, some examples of its content, and instructions for the model to generate SQL queries.
Now itβs time to fill the Examples section with the examples from our prompt. In the original prompt, we have two of them:
Question: How Many employes we have with a salary bigger than 50000?
SELECT COUNT(*) AS total_employees
FROM employees e
INNER JOIN salary s ON e.ID_Usr = s.ID_Usr
WHERE s.salary > 50000;
Question: Return the names of the three people who have had the highest salary increase in the last three years.
SELECT e.name
FROM employees e
JOIN salary s ON e.ID_usr = s.ID_usr
WHERE s.year >= DATE_SUB(CURDATE(), INTERVAL 3 YEAR)
GROUP BY e.name
ORDER BY (MAX(s.salary) - MIN(s.salary)) DESC
LIMIT 3;
Question: The name of the best paid employee.
As you can see, we have created one example for each query. This technique is known as few-shot sampling and allows the model to understand how it should respond to user queries.
The optimal number of examples typically ranges from one to six; adding more examples may not have a significant positive impact on the modelβs response.
We can also modify the hyperparameter settings.
For code generation, I recommend using a Temperature of 0. Thereβs no need for the model to be overly imaginative with its responses.
Time to start playing, and testing the prompt.
From here, if we like the responses, we can do two things:
- Export the configuration. It will save a file that we can use to reconfigure the Playground.
- Click on See Code, and it will provide an example code to call our model.
Calling Azure OpenAI Services from a Notebook.
The notebook is available on the Large Language Models Course repository on Github.
Large-Language-Model-Notebooks-Course/P1-NL2SQL/NL2SQL_OpenAI_Azure.ipynb at main Β·β¦
Practical course about Large Language Models. . Contribute to peremartra/Large-Language-Model-Notebooks-Courseβ¦
github.com
Letβs begin by installing and importing the OpenAI library.
#Install openai
!pip install -q "openai<1.0.0"
import os
import openai
Now we need to set up our Azure access.
#Azure configuration
openai.api_type = "azure"
#Here the inference point that you can get from Azure
openai.api_base = "https://openaicourseperemartra.openai.azure.com/"
openai.api_version = "2023-07-01-preview"
openai.api_key = "your-azure-openai-key"
Inform the prompt:
context = [ {'role':'system', 'content':"""
create table employees(
ID_Usr INT primary key,
name VARCHAR
);
/* 3 example rows
select * from employees limit 3;
ID_Usr name
1344 George StPierre
2122 Jon jones
1265 Anderson Silva
*/
create table salary(
ID_Usr INT,
year DATE,
salary FLOAT,
foreign key (ID_Usr) references employees(ID_Usr)
);
/* 3 example rows
select * from salary limit 3
ID_Usr date salary
1344 01/01/2023 61000
1344 01/01/2022 60000
1265 01/01/2023 55000
*/
create table studies(
ID_study INT,
ID_Usr INT,
educational_level INT, /* 5=phd, 4=Master, 3=Bachelor */
Institution VARCHAR,
Years DATE,
Speciality VARCHAR,
primary key (ID_study, ID_Usr),
foreign key(ID_Usr) references employees (ID_Usr)
);
/* 3 example rows
select * from studies limit 3
ID_Study ID_Usr educational_level Institution Years Speciality
2782 1344 3 UC San Diego 01/01/2010 Bachelor of Science in Marketing
2334 1344 5 MIT 01/01/2023 Phd. Data Science.
2782 2122 3 UC San Diego 01/01/2010 Bachelor of Science in Marketing
*/
-Maintain the SQL order simple and efficient as you can, using valid SQL Lite, answer the following questions for the table provided above.
Question: How Many employees do we have with a salary bigger than 50000?
SELECT COUNT(*) AS total_employees
FROM employees e
INNER JOIN salary s ON e.ID_Usr = s.ID_Usr
WHERE s.salary > 50000;
Question: Return the names of the three people who have had the highest salary increase in the last three years.
SELECT e.name
FROM employees e
JOIN salary s ON e.ID_Usr = s.ID_Usr
WHERE s.year >= DATE_SUB(CURDATE(), INTERVAL 3 YEAR)
GROUP BY e.name
ORDER BY (MAX(s.salary) - MIN(s.salary)) DESC
LIMIT 3;
"""} ]
Iβm going to create a functiΓ³n to encapsulate the call to the OpenAI API.
#Functio to call the model.
def return_CCRMSQL(user_message, context):
newcontext = context.copy()
newcontext.append({'role':'user', 'content':"question: " + user_message})
response = openai.ChatCompletion.create(
engine="GPT3NL2SQLSample1Deployment", #Our deployment
messages = newcontext,
temperature=0,
max_tokens=800)
return (response.choices[0].message["content"])
Now we can proceed with the necessary tests and obtain the SQL code to query our database:
context_user = context.copy()
print(return_CCRMSQL("The name of the employee best paid", context_user))
SELECT e.name FROM employees e JOIN salary s ON e.ID_usr = s.ID_usr WHERE s.salary = (SELECT MAX(salary) FROM salary); This query will return the name of the employee with the highest salary.
print(return_CCRMSQL("Return the Institution with a higher average salary", context_user))
SELECT st.Institution, AVG(sa.salary) AS avg_salary FROM studies st JOIN employees e ON st.ID_Usr = e.ID_Usr JOIN salary sa ON e.ID_Usr = sa.ID_Usr GROUP BY st.Institution ORDER BY avg_salary DESC LIMIT 1;
Conclusions.
When youβve determined which prompt to use, configuring Azure OpenAI Services and utilizing the OpenAI models hosted on Azure instead of the OpenAI API is a straightforward task.
The choice between using one or the other depends on your companyβs decisions. The OpenAI models used through Azure Open Services do not reside within OpenAI. In other words, the data you provide never leaves your Azure subscription and will never be used to train future OpenAI models.
From a technical standpoint, the differences are minimal, and complexity is virtually non-existent.
Resources.
The full course about Large Language Models is available at Github. To stay updated on new articles, please consider following the repository or starring it. This way, youβll receive notifications whenever new content is added.
GitHub – peremartra/Large-Language-Model-Notebooks-Course: Practical course about Large Languageβ¦
Practical course about Large Language Models. . Contribute to peremartra/Large-Language-Model-Notebooks-Courseβ¦
github.com
The prompt used in the solutions is based on the paper βHow to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings.β
This article is part of a series where we explore the practical applications of Large Language Models. You can find the rest of the articles in the following list:
Large Language Models Practical Course
View list12 stories
I write about Deep Learning and AI regularly. Consider following me on Medium to get updates about new articles. And, of course, You are welcome to connect with me on LinkedIn, and twitter.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI