How to Perform Speech Synthesis in Python
Last Updated on July 24, 2023 by Editorial Team
Author(s): Tommaso De Ponti
Originally published on Towards AI.
Introduction to Text-To-Speech(TTS) in Python to perform useful tasks
Text to speech (TTS) is the use of software to create an audio output in the form of a spoken voice. The program that is used by programs to change the text on the page to an audio output of the spoken voice is normally a text to speech engine. TTS engines are needed for an audio output of machine translation results.
TTS Softwares are widely used by important companies such as Google, Apple, Microsoft, Amazon, and others. Google developed the Google Assistant, Apple developed Siri, Microsoft developed Cortana, and Amazon developed Alexa. All these advanced Softwares use, among lots of ML techniques and algorithms, TextToSpeech.
When you ask Siri something, it will process an answer using Machine Learning, and then using TTS, it will answer you vocally.
Today we will not build a Vocal Assistant. Instead, we will first introduce TTS in Python and then how to create a program that can read a text file aloud.
Finally, weβll embed our Text-To-Speech basic functionalities in a GUI made with Kivy.
Pyttsx3
Today, in order to perform Speech Synthesis, we will use the pyttsx3 python package. To install it via pip, we open our terminal and type:
pip install pyttsx3
With this package installed, we can start to perform Text-To-Speech
First Speech Synthesis Program
In this paragraph, we will learn how to create a very simple TTS script that from a given input performs Text-To-Speech:
Here we: imported the pyttsx3 (Line 1), created the engine object by using the pyttsx3 module(Line 2). At Line 5/6, we perform Speech Synthesis for the string.
Pretty simple! Now, before building another TTS program, letβs make some improvements to this code.
Here we defined the function so we can use TTS every time we want; in this case, we used it in the loop. For each text we give the program, it will perform Speech Synthesis on it. Cool but still not much useful.
Letβs see how we can apply Speech Synthesis to do some useful tasks.
Useful App
Have you ever had to read that document within the next day? That long and boring document? Well, I had to.
We are programmers, There is a problem? Solve it!
The first thing that came in my head after I learned to use TTS is that the problem weβve seen before could have been solved. Listening to something is much faster and relaxing than reading something, especially if it is boring or too long.
So, for a given text file, I choose to use TTS in order to listen to it instead of reading it:
In this simple script, we read the content of the example.txt file. Then we used the say function we talked about previously to apply Text-To-Speech to the content of the example.txt file.
Important
When you use this script, make sure to replace (At Line 3) the example.txt file with the file you want to be read loudly.
Embed TTS basic functionalities in a GUI
This step is very important. Knowing how to embed your programβs functionalities in a GUI can really make the difference, even a simple GUI. As I said in the introductory paragraph, we will use Kivy. It is an open-source Python library that allows us to quickly develop GUI apps with innovative graphics. More about Kivy can be found here.
First, letβs create a hello-world app with it:
As you can see, the code is really simple:
Lines 1/2: Imported Kivy.app and the Kivy button
Lines 4/6: Created the TestApp Class that returns us a GUI with a hello world button.
Line 8: Ran the app
This simple script gives us this as output:
Embedding
Now we can embed our TTS functionalities in a GUI: we want a GUI App that for a given text performs speech synthesis.
In this simple 30 lines code we:
- Imported the needed modules β Lines 1/5
- Paste the
say()
the function we created in the previous steps β Lines 8/13 - Built our GUI:
- Created a layout using Kivyβs BoxLayout β Line 18
- Generated a Text Input object, we will enter our text here β Line 19
- Generated a Button that, when pressed, performs the function weβll define at the 25th Line. β Line 20
- Added the object we generated to the layout β Lines 21/22
- Returned the layout with the TextInput and the Button objects β Line 23
- Defined the
self.perform()
function that, once pressed, will grab the text of our TextInput object and will perform TTS on it using thesay()
function defined at line 8. β Lines 25/27
4. Executed the GUI App
After executing this code, you should see this as output:
By entering your text there and clicking the Perform Speech Synthesis Button, the app will actuate TTS for the given text.
Conclusion
Today we have seen how speech synthesis works in Python. So, we implemented Text-To-Speech in a useful app that reads documents aloud. TTS applications have been growing significantly in recent years, and learning how to build this type of app is definitely a good way to improve your programming skills. Knowing to implement speech synthesis also applies in everyday codes; for example, you can use TTS while testing the code to receive a vocal notification of what is happening during the execution of the code.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI