Capturing and Visualizing Microphone Input, Part 1

By Aiko Nakano | Posted February 27, 2018


Learn Sumerian
Capturing and Visualizing Microphone Input, Part 1

Tags

web audio API
html
events
scripting

In this you will learn about:

As voice becomes the common mode of interaction between applications and their users, audio visualization becomes increasingly important. Similar to Alexa’s blue ring on the Amazon Echo, we can can improve user experience by adding audio visualization.

Using Amazon Sumerian, you can create interactive experiences such as music visualizer and synthesizer or have conversations with Amazon Lex chatbots using Web Audio API. If this is your first time working with Web Audio API, you can take a look at Mozilla Developer Network (MDN)’s Web Audio examples for ideas on experiences you can build with Web Audio API.

In this article, we will show you how to capture microphone input and visualize the audio buffer in Sumerian. Each step explains different techniques, and the completed set of scripts is provided in the source folder. We also provide references on how to prepare your audio buffer for Amazon Lex chatbots (see “Note on Sending your Voice Recording to Amazon Lex” section below). Please refer to the Mozilla Developer Network (MDN) Web Docs’ Web Audio API for browser compatibility and future updates. Please note that this experience was created for Firefox Quantum 57.0.

This is the scene we will create. For best results, use Firefox Quantum 58.0 as your web browser. Make sure you have enabled your microphone.

Prerequisites

Before you begin, you should have completed the following tasks and tutorials:

Microphone Controller

After creating a new scene, add an empty Entity from the Create Entity menu. Rename the entity to something like “Microphone Input”. Next, add a Script component to the Microphone Input entity, rename the new script “microphoneController”. Using the source files you downloaded, copy and paste the microphoneController.js script.

The Microphone class handles the mic recording and creates the audio buffer. We use a common sampleRate of 44,100 and specify buffer length, which determines how fast the audio buffer will be sent to the audio buffer. Since we are using the ScriptProcessorNode interface, the buffer size needs to be a power of 2 values in the rage 256 – 16384. To foreshadow a bit, a shorter buffer length will result in a more responsive visualization.

'use strict';
class Microphone {
  constructor(sampleRate = 44100, bufferLength = 4096) {
    this._sampleRate = sampleRate;
    // Shorter buffer length results in a more responsive visualization
    this._bufferLength = bufferLength;

    this._audioContext = new AudioContext();
    this._bufferSource = null;
    this._streamSource = null;
    this._scriptNode = null;

    this._realtimeBuffer = [];
    this._audioBuffer = [];
    this._audioBufferSize = 0;

    this._isRecording = false;

    this._setup(this._bufferLength, this._isRecording);
  };

  get realtimeBuffer() {
    return this._realtimeBuffer;
  }

  get isRecording() {
    return this._isRecording;
  }

  _validateSettings() {
    if (!Number.isInteger(this._sampleRate) || this._sampleRate < 22050 || this._sampleRate > 96000) {
      throw "Please input an integer samplerate value between 22050 to 96000";
    }

    this._validateBufferLength();
  }

  _validateBufferLength() {
    const acceptedBufferLength = [256, 512, 1024, 2048, 4096, 8192, 16384]
    if (!acceptedBufferLength.includes(this._bufferLength)) {
      throw "Please ensure that the buffer length is one of the following values: " + acceptedBufferLength;
    }
  }

  _setup(bufferLength, isRecording) {
    this._validateSettings();

    // Get microphone access
    if (navigator.mediaDevices) {
      navigator.mediaDevices.getUserMedia({audio: true}).then((stream) => {
        this._streamSource = this._audioContext.createMediaStreamSource(stream);
        this._scriptNode = this._audioContext.createScriptProcessor(bufferLength, 1, 1);      
        this._bufferSource = this._audioContext.createBufferSource();

        this._streamSource.connect(this._scriptNode);
        this._bufferSource.connect(this._audioContext.destination);
      }).catch ((e) => {
        throw "Microphone: " + e.name + ". " + e.message;
      })
    } else {
      throw "MediaDevices are not supported in this browser";
    }
  }

  processAudio() {
    // Whenever onaudioprocess event is dispatched it creates a buffer array with the length bufferLength
    this._scriptNode.onaudioprocess = (audioProcessingEvent) => {
      if (!this._isRecording) return;

      this._realtimeBuffer = audioProcessingEvent.inputBuffer.getChannelData(0);  

      // Create an array of buffer array until the user finishes recording
      this._audioBuffer.push(this._realtimeBuffer);
      this._audioBufferSize += this._bufferLength;
    }
  }

  playback() {
    this._setBuffer().then((bufferSource) => {
      bufferSource.start();
    }).catch((e) => {
      throw "Error playing back audio: " + e.name + ". " + e.message;
    })
  }

  _setBuffer() {
    return new Promise((resolve, reject) => {
      // New AudioBufferSourceNode needs to be created after each call to start()
      this._bufferSource = this._audioContext.createBufferSource();
      this._bufferSource.connect(this._audioContext.destination);

      let mergedBuffer = this._mergeBuffers(this._audioBuffer, this._audioBufferSize);
      let arrayBuffer = this._audioContext.createBuffer(1, mergedBuffer.length, this._sampleRate);
      let buffer = arrayBuffer.getChannelData(0);

      for (let i = 0, len = mergedBuffer.length; i < len; i++) {
        buffer[i] = mergedBuffer[i];
      }

      this._bufferSource.buffer = arrayBuffer;

      resolve(this._bufferSource);
    })
  }

  _mergeBuffers(bufferArray, bufferSize) {
    // Not merging buffers because there is less than 2 buffers from onaudioprocess event and hence no need to merge
    if (bufferSize < 2) return;

    let result = new Float32Array(bufferSize);

    for (let i = 0, len = bufferArray.length, offset = 0; i < len; i++) {
      result.set(bufferArray[i], offset);
      offset += bufferArray[i].length;
    }
    return result;
  }

  startRecording() {
    if (this._isRecording) return;

    this._clearBuffer();
    this._isRecording = true;
  }

  stopRecording() {
    if (!this._isRecording) {
      this._clearBuffer();
      return;
    }

    this._isRecording = false;

  }

  _clearBuffer() {
    this._audioBuffer = [];
    this._audioBufferSize = 0;
  }

  cleanup() {
    this._streamSource.disconnect(this._scriptNode);
    this._bufferSource.disconnect(this._audioContext.destination);
    this._audioContext.close();
  }
}

We prompt the users for permission to use their microphone using navigator.mediaDevices.getUserMedia. We create the MediaStreamSource with the media stream from navigator.mediaDevices.getUserMedia and connect it to the ScriptProcessorNode inside the AudioContext to create an audio buffer. Each time the onaudioprocess event is fired, the buffer is sent to the Visualizer, which we will go into detail in the following steps, and builds a separate audio buffer (a two dimensional array) until the microphone recording is stopped.

We’ve used the ScriptProcessorNode interface in this article as it can be used to generate and process audio buffer, which is useful for real-time visualization of the audio buffer. Unfortunately, this feature is marked as deprecated and has limited browser support. Please refer to MDN’s ScriptProcessorNode page for future updates. Keep in mind you can also record microphone input using MediaStream Recording API, which we cover in Part 2 of this article.

Microphone UI Elements

We will use a single button to start and stop recording. The microphone channel is open while the user is holding down the mic button. Add an HTML entity from the Create Entity menu. And open the HTML file from Tools menu (top left above the canvas). Choose Text Editor and copy and paste index.html file you downloaded.

Optional: Adding a Button

You can assign an image to use as a button. You can use the mic icon found below or any image of your choice. If you are using the icon below, save it to your computer.

Drag and drop the image on the Images section of the Text Editor panel. Note that an <img> tag appears in your HTML automatically.

Move this <img> tag to the section of your HTML file:

<button id="recordMic" class="noBackground alignBottomCenter">
  <!-- Insert the button icon here -->
  Mic button
</button>

The microphone icon should appear on the bottom center inside the canvas.

Add Event Listeners to Connect the Mic with the Button

As you may have noticed in the HTML above, we use the EventTarget interface to communicate between the HTML entity and the scripts on Script component. We add the event listeners to start and stop microphone recording.

function startListening(mic, args, ctx) {
  // Single button for recording using the mic icon
  if (document.getElementById("recordMic")) {
    ctx.recordMicButton = document.getElementById("recordMic");

    ctx.recordMicButton.addEventListener("mousedown", (event) => { startRecordingWithMicButton(mic, ctx); });
    ctx.recordMicButton.addEventListener("mouseup", (event) => { stopRecordingWithMicButton(mic, args, ctx); });
  }
}

function stopListening(ctx) {
  if (ctx.recordMicButton) {
    ctx.recordMicButton.removeEventListener("mousedown", startRecordingWithMicButton);
    ctx.recordMicButton.removeEventListener("mouseup", stopRecordingWithMicButton);
  }
}

function startRecordingWithMicButton(mic, ctx) {
  mic.startRecording();

  if (ctx.recordMicButton) {
    ctx.recordMicButton.style.opacity = 0.3;
  }
}

function stopRecordingWithMicButton(mic, args, ctx) {
  mic.stopRecording();

  // Play back the audio immediately after it stops recording
  if (args.isPlayingMicInput) {
    mic.playback();
  }

  if (ctx.recordMicButton) {
    ctx.recordMicButton.style.opacity = 1.0;
  }
}

Add Functionality to Play Back Audio when the Recording is Stopped

As you may have noticed above in stopRecordingWithMicButton(mic, args, ctx) function, you can play back your audio recording to help you debug and make sure the audio buffer was properly captured.

In _setupBuffer() we create an empty AudioBuffer, populate it with the audio buffer (_audioBuffer) created inside ScriptProcessorNode during processAudio(), and send that to the AudioDestinationNode, such as your device’s speakers. Even though the AudioBufferSourceNode is initialized in Microphone._setup(), we need to recreate it each time bufferSource.start(); is called. The _audioBuffer is a two-dimensional array containing the audio buffer array created each time ScriptProcessorNode.onaudioprocess is called in processAudio(). This is converted to a one-dimensional array before it is sent to the AudioBuffer.

// From Microphone class
playback() {
    _setBuffer().then((bufferSource) => {
      bufferSource.start();
    }).catch((e) => {
      throw "Error playing back audio: " + e.name + ". " + e.message;
    })
  }

_setBuffer() {
  return new Promise((resolve, reject) => {
    // New AudioBufferSourceNode needs to be created after each call to start()
    this._bufferSource = this._audioContext.createBufferSource();
    this._bufferSource.connect(this._audioContext.destination);

    let mergedBuffer = this._mergeBuffers(this._audioBuffer, this._audioBufferSize);
    let arrayBuffer = this._audioContext.createBuffer(1, mergedBuffer.length, this._sampleRate);
    let buffer = arrayBuffer.getChannelData(0);

    for (let i = 0, len = mergedBuffer.length; i < len; i++) {
      buffer[i] = mergedBuffer[i];
    }

    this._bufferSource.buffer = arrayBuffer;

    resolve(this._bufferSource);
  })
}

_mergeBuffers(bufferArray, bufferSize) {
  // Not merging buffers because there is less than 2 buffers from onaudioprocess event and hence no need to merge
  if (bufferSize < 2) return;

  let result = new Float32Array(bufferSize);

  for (let i = 0, len = bufferArray.length, offset = 0; i < len; i++) {
    result.set(bufferArray[i], offset);
    offset += bufferArray[i].length;
  }
  return result;
}

Check off Test Mic Input toggle in the Script component and press play. Your recorded voice will be played back when you stop recording.

Mic Input Visualizer

We can use the audio buffer to move entities in your scene to visualize your mic input.

Create an empty Entity from the Create Entity menu. Nest this under “Microphone Input” entity to help organize the scene better. Rename this entity “Visualizer”. Add five spheres from the Create Entity dropdown under “Visualizer” hierarchy. It should look like the below. We will reposition them by code in Step 8.

Set up a Custom Event for the Visualizer

We use Custom Events to send the audio buffer when an event is published. We add and remove the event listener in visualizer.js.

// From microphoneController.js
function onVisualizeBuffer(buffer, ctx) {
  ctx.worldData.onVisualizeBuffer = new CustomEvent("VisualizeBuffer", { detail: buffer });

  if (ctx.worldData.micInputVisualizer) {
    ctx.worldData.micInputVisualizer.dispatchEvent(ctx.worldData.onVisualizeBuffer);
  }
}

Move entities using the audio buffer

Add a custom script to the empty Visualizer entity and name it “Visualizer”. Copy and paste visualizer.js file you downloaded.

We “downsample” the audio buffer to just 5 numbers and assign that to the y positions of the spheres in the scene.

function visualizeBuffer(buffer, maxHeight, ctx) {
  let vizBuffer = downsampledBuffer(buffer, maxHeight, ctx.visualizationElements.length);

  assignBufferValuesToVizElements(ctx.visualizationElements, vizBuffer);

}

function downsampledBuffer(buffer, maxHeight, numElements) {
  let increment = Math.round(buffer.length / numElements);
  let downsampledBuffer = new Float32Array(numElements).fill(0);

  for (let i = 0; i < numElements; i++) {
    downsampledBuffer[i] = buffer[i * increment];
  }

  return downsampledBuffer;
}

function assignBufferValuesToVizElements(vizElementsArray, values) {
  if (values.length != vizElementsArray.length) {
    throw "The number of visualization elements does not match the length of buffer array used for visualization";
  }

  if (values.includes(NaN)) return;

  for (let i = 0, numElements = vizElementsArray.length; i < numElements; i++) {
    let x = vizElementsArray[i].getTranslation().x;
    let z = vizElementsArray[i].getTranslation().z;
    vizElementsArray[i].setTranslation(new sumerian.Vector3(x, values[i], z));
  }
}

You can customize the visualization from the Inspector panel in the Script component, which is defined by the script parameters. We scale the audio buffer (values between -1 and 1) by maxHeight. The spacing determines the distance between the spheres. Change these parameters values on the Inspector panel to your liking. Remember, shorter buffer length in the “Microphone Input” script results in a more responsive visualization.

var parameters = [
  { key: 'spacing', type: 'float', default: 2.0, name: 'Spheres Spacing' },
  { key: 'maxHeight', type: 'float', default: 4.0, name: 'Spheres Height' }
];

When you play the scene and start mic recording, you should see the spheres moving up and down. The spheres move higher when you speak louder.

Now publish and share your scene!

Now that you understand how to record microphone input as well as create and play back audio buffer using Web Audio API, you can start creating interactive sound experiences. To learn more, check out the next part of this article, [Creating interactive experiences using microphone input Part 2

Note on Sending your Voice Recording to Amazon Lex

Please refer to the Amazon Lex Develoepr Guide and AWS blog article, Capturing Voice Input in a Browser and Sending it to Amazon Lex (especially the “Preparing to export the recording to Amazon Lex” section) on audio buffer post-processing including merging, downsampling, and encoding.

Aiko Nakano

Aiko is a Design Technologist on the Amazon Sumerian team. She focuses on XR interactions and creates experiences to bring in artificial intelligence features into Sumerian.