With the increase in popularity of smart devices, like Amazon Echo, voice user interaction (VUI) has already become a mainstream mode of human-computer interaction. You can use Amazon Sumerian Hosts to bring life to your VUI, and create interactive virtual concierge experiences. The Host can personalize a greeting for each user based on facial recognition, walk your users through your company’s services and offerings, and answer commonly asked questions.
In this Sumerian Concierge demo, our Host, Cristine, introduces a user to the Sumerian team. When she recognizes her teammate through a webcam, she greets them by name and shows them their desk’s location. She’s also able to help visitors learn more about the Sumerian office space.
The kiosk supports both voice and touch interactions to accommodate settings with various noise requirements. Powered by AWS artificial intelligence services, the Host understands the intent of different phrases as the same request, and responds accordingly. For example, Cristine can understand “Open floor plan” and “Show me map” as the same intent. We added emotional intelligence for the Host by changing the tone and content of her greeting, based on how her underlying AI interprets the user’s facial expression. She also has a varied response during each interaction.
Additionally, the Sumerian Host component’s Point of Interest system is coupled with computer vision. This enables Cristine to maintain eye contact with a user to increase user engagement. Although this demo is built for touch screens and laptops, you can easily extend it to mobile or virtual reality applications so that your users can continue to interact with our Sumerian Host even when they are not on site.
The minimum hardware requirement for this experience is a laptop with a webcam, microphone, and speaker. For a kiosk installation, we additionally recommend a touch screen and an external webcam.
Technologies Used in the Scene
At the core of this experience is the ability to converse with the Sumerian Host. This is enabled by the Sumerian Dialogue (chatbot) and Speech (text-to-speech) components, as well as the animation system that syncs the gestures and lip sync with the speech. You can use the Host component’s Point of Interest system in conjunction with OpenCV’s face detection to follow the location of the user’s face, relative to the webcam screen space.