Python: Text to Speech

Text-to-Speech (TTS) is a kind of speech synthesis which converts typed text into audible human-like voice.

There are several speech synthesizers that can be used with Python. In this tutorial, we take a look at three of them: pyttsx, Google Text-to-Speech (gTTS) and Amazon Polly.

python text to speech

We first install pip, the package installer for Python.

				
				$ curl -O https://bootstrap.pypa.io/get-pip.py
				$ sudo python3 get-pip.py
				
			

If you have already installed it, upgrade it.

				
				$ sudo pip install --upgrade pip
				
			

We will start with the tutorial on pyttsx, a Text-to-Speech (TTS) conversion library compatible with both Python 2 and 3. The best thing about pyttsx is that it works offline without any kind of delay. Install it via pip.

				
				$ sudo pip install pyttsx3
				
			

By default, the pyttsx3 library loads the best driver available in an operating system: nsss on Mac, sapi5 on Windows and espeak on Linux and any other platform.

Import the installed pyttsx3 into your program.

Here is the basic program which shows how to use it.

				
				import pyttsx3
				engine = pyttsx3.init()
				engine.say('Hello, World!')
				engine.runAndWait()
				
			

pyttsx3 Female Voices

Now let us change the voice in pyttsx3 from male to female. If you wish for a female voice, pick voices[10], voices[17] from the voices property of the engine. Of course, I have picked the accents which are easier for me to make out.

				
				import pyttsx3
				engine = pyttsx3.init()
				voices = engine.getProperty('voices')
				engine.setProperty('voice', voices[17].id)
				engine.say('Hello, World!')
				engine.runAndWait()
				
			

You can actually loop through all the available voices and pick the index of the voice you desire.

				
				import pyttsx3
				engine = pyttsx3.init()
				voices = engine.getProperty('voices')
				for voice in voices:
				   engine.setProperty('voice', voice.id)
				   engine.say('Here we go round the mulberry bush.')
				engine.runAndWait()
				
			

Google Text to Speech (gTTS)

Now, Google also has developed an application to read text on screen for its Android operating system. It was first released on November 6, 2013.

google text to speech

It has a library and CLI tool in Python called gTTS to interface with the Google Translate text-to-speech API.

We first install gTTS via pip.

				
				sudo pip install gTTS
				
			

gTTS creates an mp3 file from spoken text via the Google Text-to-Speech API.

We will install mpg321 to play these created mp3 files from the command-line.

				
				ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
				brew install mpg321
				
			

Using the gtts-cli, we read the text 'Hello, World!' and output it as an mp3 file.

				
				gtts-cli 'Hello, World!' --output hello.mp3
				
			

We now start the Python interactive shell known as the Python Shell

				
				$ python3
				
			

You will see the prompt consisting of three greater-than signs (>>>), which is known as the Python REPL prompt.

Import the os module and play the created hello.mp3 file.

				
				>>> import os
				>>> os.system("mpg321 hello.mp3")
				
			

Putting it all together in a single .py file

				
				from gtts import gTTS
				import os
				tts = gTTS(text='Hello, World!', lang='en')
				tts.save("hello.mp3")
				os.system("mpg321 hello.mp3")
				
			

The created hello.mp3 file is saved in the very location where your Python program is.

gTTS supports quite a number of languages. You will find the list here.

The below line creates an mp3 file which reads the text "你好" in Chinese.

				
				gtts-cli "你好" -l 'zh-cn' -o hello.mp3 --slow
				
			

The below program creates an mp3 file out of text "안녕하세요" in Korean and plays it.

				
				from gtts import gTTS
				import os
				tts = gTTS(text='안녕하세요', lang='ko')
				tts.save("hello.mp3")
				os.system("mpg321 hello.mp3")
				
			

Amazon Polly

Amazon also has a cloud-based text-to-speech service called Amazon Polly.

If you have an AWS account, you can access and try out the Amazon Polly console here:

https://console.aws.amazon.com/polly/

The interface looks as follows.

aws-polly-console

There is a Language and Region dropdown to choose the desired language from and several male and female voices to pick too. Pressing the Listen to speech button reads out the text typed into the text box. Also, the speech is available to download in several formats like MP3, OGG, PCM and Speech Marks.

Now to use Polly in a Python program, we need an SDK. The AWS SDK for Python is known as Boto.

We first install it.

				
				pip install boto3				
				
			

Now to initiate a boto session, we are going to need two more additional ingredients: Access Key ID and the Secret Access Key.

Login to your AWS account and expand the dropdown menu next to your user name, located on the top right of the page. Next select My Security Credentials from the menu.

aws dropdown menu

A pop-up appears. Click on the Continue to Security Credentials button on the left.

aws continue to security credentials

Expand the Access keys tab and click on the Create New Access Key button.

aws create new access keys

As soon as you click on the Create New Access Key button, it auto creates the two access keys: Access Key ID, a 20-digit hex number, and Secret Access Key, another 40-digit hex number.

aws created access keys

Now we have the two keys, here is the basic Python code which reads a given block of text, convert it into mp3 and play it with mpg321.

				
				import boto3
				import os
				polly_client = boto3.Session(
				                aws_access_key_id='********************',                     
				    aws_secret_access_key='****************************************',
				    region_name='us-west-2').client('polly')
				response = polly_client.synthesize_speech(VoiceId='Joanna',
				                OutputFormat='mp3', 
				                Text = 'There is a river called the river of no return.')
				file = open('speech.mp3', 'wb')
				file.write(response['AudioStream'].read())
				file.close()
				os.system("mpg321 speech.mp3")
				
			

There is also another way to configure Access Key ID and the Secret Access Key. You can install awscli, the universal command-line environment for AWS,

				
				pip install awscli
				
			

and configure them by typing the following command.

				
				aws configure
				
			
aws-configure

Notes

  • The latest documentation on pyttsx3 is available here.
  • You can also access the updated documentation on gTTS here.