Beginner Tutorial

Text-to-Speech with the Speech Component

15 minutes

Posted on: November 21, 2017

Learn Sumerian
Text-to-Speech with the Speech Component


inspector panel
state machine

In this tutorial you will learn about:

Giving a character or entity the ability to speak can really bring a scene to life. Having to manually record audio clips, however, can really slow down your workflow. In this tutorial, we’ll show you how to use text files and the State Machine to give an entity the ability to speak with the Speech component. Behind the scenes, this uses the Amazon Polly API to synthesize the speech from the provided text.

You’ll learn about:

  • AWS Configuration
  • Cognito Identity Pool ID
  • Speech Component
  • Hosts
  • Physics
  • State Machine


Before you begin, be sure you complete the following tasks and tutorials:

Step 1: AWS Prerequisites

Create an Amazon Cognito identity pool with an unauthenticated role that has the AmazonPollyReadOnlyAccess IAM policy attached. To learn more, see the AWS Configuration tutorial.

Step 2: Create a New Scene

From the Dashboard, create a new empty scene. Name the scene “Speech Component Tutorial”.

Step 3: Configure the New Scene for AWS

For the scene to be able to connect with the Amazon Polly APIs, you need to configure it with the previously created Amazon Cognito identity pool.

  1. Click the scene’s name, Speech Component Tutorial. This is the root entity, located in the Entities panel on the left side of the interface.
  2. In the Inspector panel on the right, provide your Amazon Cognito identity pool ID in the AWS configuration panel.

Step 4: Create an Entity

  1. Choose Create Entity at the top of the canvas.
  2. Choose Box under the 3D Primitives category.

Step 5: Add the Speech Component

  1. Select the new Box entity. You can select an entity by clicking it in the Entities panel or by clicking it on the canvas.
  2. Navigate to the Inspector panel, and then click Add Component. Choose the Speech component.

Step 6: Add a Speech

You can add a speech to the Speech component in two ways:

  1. Create a new speech file and enter the content in the Sumerian script editor.
  2. Upload a plain text or SSML file.

We’ll go through both options in this tutorial.

Option 1: Create a New Speech File

  1. In the Inspector panel on the right, locate the Speech component.
  2. Select Amy (or your voice of choice) in the Voice drop-down list. This tells the Amazon Polly API which voice and language to use when synthesizing speech. To learn more, check out the full list of voices and their associated languages.
  3. Below Voice and Volume, click the + button to create a new speech file. This opens the Text Editor in a new browser tab or window.

  4. Type the text you want your Box entity to speak between the <speak> and </speak> tags. (The Amazon Polly API limits this text to 1,500 characters, excluding the XML-like tags.)
  5. Choose Save, and then close the text editor tab or window.

Option 2: Upload a Speech File

  1. Download the following text file to your computer: welcome.txt.
  2. Drag the downloaded welcome.txt file to the Speech component’s Drop Speech File input. You can

Step 7: Add Speech Marks to a Speech File

If you look at the Speech component, you will see two different speech files.

You can use the Speech component to automatically add speech marks to your plain text file, converting it to Amazon Polly-compatible SSML. This enables the speech component to trigger events based on certain XML tags within the speech file during scene playback. These could potentially be used to trigger lip-syncing or other animations. Although we won’t go into the details of implementing those animations in this tutorial, we will convert the speech file we uploaded to SSML.

  1. Under the Speech component, choose the + button to add a Gesture Map.

    This will open up the DefaultGestureMap in the Sumerian Text Editor. Though generating a GestureMap is necessary for auto-generating speech marks in the following steps, Gestures are covered in the Using a Host and Speech Components tutorial.

  2. Return to the Sumerian editor.

  3. Under the Speech component, locate the second speech file (welcome.txt).

  4. Next to the speech name, you should see four icons (from left to right):
    • Play speech file
    • Edit speech file
    • Autogenerate gesture marks
    • Remove speech file
  5. Choose the third icon to add Gesture marks. Though we will not be covering Gestures in this tutorial, this action will add the necessary SSML speech marks to get us started.