Dynamic Speech During Runtime

By Saira Shaik | Posted March 1, 2019

Often times, the Speech component is used to define the text that an Amazon Sumerian Host can speak. The Speech component is based on a State Machine behavior, which triggers the Host to start speaking. While this is useful for many situations, it does have its limits. Often times, the State Machine is triggered by user input, and the text defined in the Speech component is static text and predefined by the developer. For a speech to carry slight variables, an entirely different speech would be required.

Dynamic speech is therefore useful for situations that require changes to a speech. For example, a Host can address the user directly by accessing their name from a database, or commenting on the number visits to application. If using a Lex bot, dynamic speech would allow the Host to address the user if they haven’t taken action after a specified amount of time. Dynamic speech could also be useful for addressing mistakes, errors, or incorrect responses. Alternatively, dynamic speech can validate correct input by a user. You also might want the Host to greet the user with speech that’s based on the time of the day, like “Good morning”, “Good afternoon”, or “Good evening”. Or you might want the Host speech to be based on the current weather information, saying “Sunny day” or “Rainy day”.

This article is the first in a series I am writing. In this exercise, we will start with how to use the SpeechController to create dynamic speech. The SpeechController is a wrapper around the Sumerian Speech component. You can define the text you want the Host to speak and initiate the Host speech based on programming logic, without needing user interaction. The script in this article will also greet a user based on their local time of day.

You’ll learn about:

  • AWS configuration
  • Amazon Cognito identity pool IDs
  • The Speech component
  • How to play dynamic speech during runtime


Before you begin, be sure you complete the following tasks and tutorials:

Step 1: Create a New Scene and Configure It with AWS

In this step, we create an Amazon Cognito identity pool ID (Amazon Cognito ID), create a scene, and add the Amazon Cognito ID to configure our scene with AWS. More specifically, we will create an AWS CloudFormation stack that has an Amazon Cognito Federated Identity pool that contains permissions for using functionality in your scene. We will also attach necessary IAM policies for this exercise.

  1. Create an AWS CloudFormation stack using the AWS Configuration tutorial. Once the stack is created, navigate to Output and make a note of the Cognito Identity Pool ID. We’ll insert this later in the Sumerian scene.

  2. Staying in the AWS Management Console, open the IAM console.

  3. Choose Roles, and then find the role you just created.

  4. Open the new unauthenticated role. Choose Attach Policy, and then search for and add the AmazonPollyReadOnlyAccess policy.

  5. Navigate to the Sumerian Dashboard, create a new, empty scene. Name the scene “Dynamic Speech Tutorial”.

  6. Click on the top entity, which carries the scene’s name, Dynamic Speech Tutorial. This is the root entity, located in the Entities panel on the left side of the user interface.

  7. In the Inspector panel on the right, provide your Cognito Identity Pool ID in the AWS configuration component.

Step 2: Set Up the Scene

In this step, we set up a basic scene, and add a Sumerian Host and an Empty entity which will be used to store our scripts.

  1. At the top of the canvas, choose Import Assets.

  2. Search for “Host”, and select any Host. Then choose Add in the lower-right corner. We will be selecting Cristine.

  3. Navigate to the Assets panel, and under the Host asset pack, find the Host entity (hexagon icon). Drag the entity into your scene. The screen should look like the following image.

  4. At the top of the canvas, choose Create Entity.

  5. Under the Others category, choose Entity.