Virtual Concierge Starter Pack

By Aiko Nakano | Posted July 27, 2018

Learn Sumerian
Virtual Concierge Starter Pack


Face detection
Dialogue Component
Amazon Lex
Speech Component
Amazon Polly
Web Audio
Amazon Rekognition
Amazon DynamoDB
Amazon S3

In this you will learn about:

Our customers at AWS Loft event in New York and our Slack channel have been asking us to make the Amazon Sumerian Concierge Experience scene available so they can start creating similar experiences quickly. Now we’re making this code available. It’s a simplified version of the demo in which our Host introduces users to the Amazon Sumerian team and shows a floor plan of the team space. The Host can respond to users’ movements in front of the screen and change her greeting based on emotion analysis from Amazon Rekognition. For more details on the user experience, see Amazon Sumerian Concierge Experience.

Minimum Hardware Requirements

Similar to the demo, the minimum hardware requirement for this experience is a laptop with a webcam, microphone, and speaker. For a kiosk installation, we additionally recommend a touch screen and an external webcam. We go into the detail of hardware setup in the Hardware Setup section below.


Before you begin, you should have completed the following tasks and tutorials:

Technologies Used in the Scene

As we mentioned previously, this is a modified scene from the Amazon Sumerian Concierge Experience demo. Similar to the demo, this scene uses the Amazon Sumerian Speech component for text-to-speech, and Amazon Lex to filter a user’s speech into different request types. The Amazon Sumerian Host component’s Point of Interest system is coupled with computer vision so that our Host can maintain eye contact with the user and turn her body toward the user to increase user engagement.

To highlight differences between this starter pack and the Amazon Sumerian Concierge Experience demo, we use the JavaScript computer vision library JSFeat for face detection, and Amazon Rekognition for emotion analysis. See the Amazon Rekognition documentation for information about implementing facial recognition.

Learn more about other AWS services used in this experience, including pricing and limits:

Note that you can set limits for AWS services. For example, you can limit the rate of API requests made to Amazon Rekognition’s DetectFaces image operation used in this starter pack (as described in detail below, the default is 2/ second during initial greeting by the Host).


The Hardware setup section below provides information about setting up your scene. The scene’s main script is MainScript.js on the MainScript entity. Other scripts are organized by their tasks, such as controlling the microphone (MicrophoneInput.js), making calls to Amazon Lex (Lex.js), customizing the Speech component to provide closed captioning (Speech.js), and showing teammate information when the user interacts with the map (Tooltip.js and Tooltip.html). We describe the main concepts below. See the individual script files for more information on what they do and their dependencies on other script files.

Scene States

There are five states: welcome screen, greeting screen, map screen, info screen, and idle screen.

  1. Welcome screen: This is the title screen with key UI components. The interaction starts when the user greets Cristine by saying “Hi, Cristine” and other variations, as configured in the Amazon Lex bot. The scene resets back to the welcome screen after a certain amount of time. This reset timeout length is configurable on the MainScript.js Inspector panel.

  2. Greeting screen: This is where the Host speaks a set of scripts. Her second sentence, “I hope your day is going great” or “Nice to see you”, is based on Amazon Rekognition emotion analysis. The greeting is customizable from MainScript.js.

  3. Map screen: Here, the Host shows a floor plan when the user requests “Show me map” or something similar. When the user clicks on desks that are interactable, the Amazon Sumerian teammate’s name and photo are displayed.