Introduction to Azure Text-to-Speech API - Voice Synthesis

What is Azure Text-to-Speech API?

Azure Text-to-Speech API is a cloud-based service provided by Microsoft Azure that allows developers to convert text into spoken words. This API enables you to integrate natural-sounding voice synthesis into your applications, making it useful for various scenarios, such as accessibility features, voice assistants, and more.

Getting Started

To use the Azure Text-to-Speech API, you'll need an Azure account and an API key. Here are the basic steps to get started:

Sign in to your Azure Portal.
Create a new Azure Text-to-Speech resource.
Retrieve your API key and endpoint.

Sample Code

Here's a simple example of how to use the Azure Text-to-Speech API in Python:

import os
import requests
import json
subscription_key = 'YOUR_SUBSCRIPTION_KEY'
endpoint = 'YOUR_ENDPOINT'
text_to_speak = 'Hello, this is a sample text to be synthesized.'
headers = {
    'Content-Type': 'application/ssml+xml',
    'X-Microsoft-OutputFormat': 'audio-16khz-128kbitrate-mono-mp3',
    'Authorization': 'Bearer ' + subscription_key,
}
data = f'<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><voice name="en-US-Guy24kRUS">{text_to_speak}</voice></speak>'
response = requests.post(endpoint, headers=headers, data=data)
if response.status_code == 200:
    with open('output.mp3', 'wb') as audio_file:
        audio_file.write(response.content)
    print('Audio file created.')
else:
    print('Error:', response.status_code, response.text)

Conclusion

The Azure Text-to-Speech API offers powerful voice synthesis capabilities that can enhance your applications and services. With a few simple steps, you can integrate natural-sounding speech into your projects.