Beginner Tutorial

Text-to-Speech with the Speech Component

15 minutes

Posted on: November 21, 2017

Learn Sumerian
Text-to-Speech with the Speech Component


inspector panel
state machine

In this tutorial you will learn about:

Giving a character or entity the ability to speak can really bring a scene to life. Having to manually record audio clips, however, can really slow down your workflow. In this tutorial, we’ll show you how to use text files and the State Machine to give an entity the ability to speak with the Speech component. Behind the scenes, this uses the Amazon Polly API to synthesize the speech from the provided text.

You’ll learn about:

  • AWS Configuration
  • Cognito Identity Pool ID
  • Speech Component
  • Hosts
  • Physics
  • State Machine


Before you begin, be sure you complete the following tasks and tutorials:

Step 1: AWS Prerequisites

Create an Amazon Cognito identity pool with an unauthenticated role that has the AmazonPollyReadOnlyAccess IAM policy attached. To learn more, see the AWS Configuration tutorial.

Step 2: Create a New Scene

From the Dashboard, create a new empty scene. Name the scene “Speech Component Tutorial”.

Step 3: Configure the New Scene for AWS

For the scene to be able to connect with the Amazon Polly APIs, you need to configure it with the previously created Amazon Cognito identity pool.

  1. Click the scene’s name, Speech Component Tutorial. This is the root entity, located in the Entities panel on the left side of the interface.
  2. In the Inspector panel on the right, provide your Amazon Cognito identity pool ID in the AWS configuration panel.

Step 4: Create an Entity

  1. Choose Create Entity at the top of the canvas.
  2. Choose Box under the 3D Primitives category.

Step 5: Add the Speech Component

  1. Select the new Box entity. You can select an entity by clicking it in the Entities panel or by clicking it on the canvas.
  2. Navigate to the Inspector panel, and then click Add Component. Choose the Speech component.

Step 6: Add a Speech

You can add a speech to the Speech component in two ways:

  1. Create a new speech file and enter the content in the Sumerian script editor.
  2. Upload a plain text or SSML file.

We’ll go through both options in this tutorial.

Option 1: Create a New Speech File

  1. In the Inspector panel on the right, locate the Speech component.
  2. Select Amy (or your voice of choice) in the Voice drop-down list. This tells the Amazon Polly API which voice and language to use when synthesizing speech. To learn more, check out the full list of voices and their associated languages.
  3. Below Voice and Volume, click the + button to create a new speech file. This opens the Text Editor in a new browser tab or window.

  4. Type the text you want your Box entity to speak between the <speak> and </speak> tags. (The Amazon Polly API limits this text to 1,500 characters, excluding the XML-like tags.)
  5. Choose Save, and then close the text editor tab or window.

Option 2: Upload a Speech File

  1. Download the following text file to your computer: welcome.txt.
  2. Drag the downloaded welcome.txt file to the Speech component’s Drop Speech File input.

Step 7: Add Speech Marks to a Speech File

If you look at the Speech component, you will see two different speech files.

You can use the Speech component to automatically add speech marks to your plain text file, converting it to Amazon Polly-compatible SSML. This enables the speech component to trigger events based on certain XML tags within the speech file during scene playback. These could potentially be used to trigger lip-syncing or other animations. Although we won’t go into the details of implementing those animations in this tutorial, we will convert the speech file we uploaded to SSML.

  1. In the Speech panel, locate the second speech file (welcome.txt).
  2. Next to the speech name, you should see four icons (from left to right):
    • Play speech file
    • Edit speech file
    • Autogenerate speech marks
    • Remove speech file
  3. Choose the third icon to add speech marks to your speech file.

Step 8: Edit a Speech File

Now that we’ve added speech marks to our file, let’s open the modified file in the editor.

  1. Click the edit button (pencil icon) to open the speech file in the editor.

    You should now see the content wrapped in <speak>...</speak> tags, with a few <mark name="..."/> tags throughout the content.

  2. Close the editor.

Step 9: Play a Speech File

Before we play the speech files in the scene, let’s preview the output of a speech file.

In the Speech component, click the play button (not the scene play button!) that’s next to one of your speech files.

You should begin to hear the speech file read back to you. Depending on the length of the text, this can take a few seconds. If there are speech marks in the text, you might hear some emphasis added to certain words, or even pauses in the audio.

Step 10: Play the Speech in the Scene by Using the State Machine

Now that we have two speech files loaded (Speech and welcome.txt), let’s test them out.

Setting up the State Machine

If you haven’t already, go through the State Machine Basics tutorial to learn how the State Machine works.

  1. With the Box entity still selected, choose Add Component, and then State Machine.
  2. In the State Machine component, click the + button to add a new behavior.

  3. Click the Edit button (pencil icon) next to the new behavior to edit the new behavior.

Adding State and Actions

The State Machine should include one state by default.