Capturing and Visualizing Microphone Input, Part 2

By Aiko Nakano | Posted February 27, 2018


Learn Sumerian
Capturing and Visualizing Microphone Input, Part 2

Tags

web audio API
html
events
scripting

In this you will learn about:

In the Capturing and Visualizing Microphone Input, Part 1 article, we covered how to use the ScriptProcessorNode to capture microphone input and visualize it real-time. This post covers another technique for microphone audio capture using the MediaRecorder interface and audio buffer visualization using the Amazon Sumerian LineRenderSystem. We’ve added a feature for audio input playback with an user interface for playback control as well as a download feature. There are some overlaps with the Part 1 post, which we point out in the below sections. Note: We provide the full scripts in the source folder.

Using the techniques explained in this article, you can send your microphone recording to Amazon Lex to have conversations with the chatbots and create other interactive audio experiences using the Web Audio API. Please refer to the Mozilla Developer Network (MDN) Web Docs’ Web Audio API page for browser compatibility and future updates.

This is the scene we will create. For best results, use Firefox Quantum 58.0 as your web browser. Make sure you have enabled your microphone.

Prerequisites

Before you begin, you should have completed the following tasks and tutorials:

Overview of Techniques Used

The MicrophoneController script (see source folder) controls the microphone recording and prepares the user interface and visualization. The audio playback and download are handled by the <audio> (ctx.audioElement) and <a> (ctx.downloadElement) elements in the HTML component, respectively. We use Sumerian’s LineRenderSystem to draw show the audio buffer as a line graph.

In the next sections we will go into the details of microphone audio recording, playback, and download as well as audio buffer visualization.

function setup(args, ctx) {
  ctx.mic = new ctx.worldData.mic();

  // <audio> element used for audio playback
  ctx.audioElement = document.getElementById("audio");
  // Set download link for the audio download button
  ctx.downloadElement = document.getElementById("downloadLink");

  // For visualization
  ctx.lineRenderSystem = ctx.world.sumerianRunner.lineRenderSystem;
  ctx.lineColor = new sumerian.Vector3(args.lineColor);
  // "Resampled" buffer used for visualization
  ctx.vizBuffer = null;
  // Flag used to ensure that the resampling is done only once when a new recording is created
  ctx.isBufferProcessed = false;

  startListening(ctx.mic, ctx);
};

Microphone Recording

The Microphone class controls the mic recording. As in Part 1, we create an AudioContext to handle audio operations and use navigator.mediaDevices.getUserMedia to prompt users for permission to access their microphone. We use MediaRecorder in combination with FileReader to record the microphone input and create an audio buffer instead of ScriptProcessorNode used before. The MediaRecorder captures audio stream from the microphone, which is then saved as a Blob. The FileReader object is used to convert the audio blob created by the MediaRecorder interface into an audio buffer.

class Microphone {
  constructor(fileType = "audio/mp3") {
    // Using mp3 format as default
    // Supported audio formats: https://developer.mozilla.org/en-US/docs/Web/HTML/Supported_media_formats
    // Especially note OS and browser compatibilities
    // PCM format x-l16 is one of the audio formats supported by Amazon Lex
    // See https://docs.aws.amazon.com/lex/latest/dg/API_runtime_PostContent.html
    this._audioType = fileType;

    this._audioContext = new AudioContext();
    this._recorder = null;
    this._fileReader = new FileReader();

    this._recordedBlob = [];
    this._audioBlob = null;

    this._audioBuffer = [];
    this._bufferReady = false;

    this._setup();
  }

  get audioBlob() {
    return this._audioBlob;
  }

  get audioBuffer() {
    return this._audioBuffer;
  }

  get bufferReady() {
    return this._bufferReady;
  }

  _setup() {
    // Get access to microphone
    if (navigator.mediaDevices) {
      navigator.mediaDevices.getUserMedia({audio: true}).then((stream) => {
        this._recorder = new MediaRecorder(stream);

        this._recorder.ondataavailable = (e) => {
          this._recordedBlob.push(e.data);
        }

        this._recorder.onerror = (e) => {
          throw e.error || new Error(e.name);
        }

        this._recorder.onstart = (e) => {
          this._clearBuffer();
        }

        this._recorder.onstop = (e) => {
          this._createAudioBlob().then((blob) => {
            this._onBlobReady(blob);
            this._convertBlobToBuffer(blob);
          })
        }
      }).catch ((e) => {
        throw "Microphone: " + e.name + ". " + e.message;
      })
    } else {
      throw "MediaDevices are not supported in this browser";
    }
  }
}

Take a look at recorder.onstop event - when the recording is finished, we convert the blob created by the MediaRecorder into a media type that you’ve specified (see _createAudioBlob() function). We are using mp3 file format as a default type. Keep in mind OS and browser compatibilities as mentioned in MDN’s supported media formats documentation. We need to do this conversion because the blob created by the MediaRecorder is not typed. We need to convert it into an audio format and a typed array that can be used with the <audio> element in the HTML Component for audio playback and also FileReader for visualization.

As an aside, if you use PCM format supported by Amazon Lex as we noted in the code snippet above, you can send the audio buffer to Amazon Lex after resampling and processing (described below). You can also send the buffer to AudioNodes (in this case no processing is needed). Additionally, you can use the recorded audio file with the Sound Component if you use supported formats (See Music and Sound Basics).

After the new audio blob is created, we send an event onBlobReady to the MicrophoneController script when the conversion is finished, and then we decode the audio blob into a buffer for visualization in convertBlobToBuffer(blob) function.

this._recorder.onstop = (e) => {
    this._createAudioBlob().then((blob) => {
      this._onBlobReady(blob);
      this._convertBlobToBuffer(blob);
    })
  }
// Event fired when the audio blob is ready
_onBlobReady(blob) {
  const blobReady = new CustomEvent("micRecordingReady", { detail: blob });
  window.dispatchEvent(blobReady);
}

_createAudioBlob() {
  return new Promise((resolve, reject) => {
    this._audioBlob = new Blob(this._recordedBlob, {type: this._audioType});
    resolve(this._audioBlob);
  })
}

_convertBlobToBuffer(blob) {
  this._fileReader.readAsArrayBuffer(blob);

  this._fileReader.onload = () => {
    this._audioContext.decodeAudioData(this._fileReader.result).then((decodedData) => {
      this._audioBuffer = decodedData.getChannelData(0);
      this._bufferReady = true;
    }).catch((e) => {
      throw "Could not decode audio data: " + e.name + ". " + e.message;
    });
  }
}

In _convertBlobToBuffer(blob), we use the FileReader interface’s readAsArrayBuffer(blob) method to read the content of the Blob, and send that to audioContext.decodeAudioData() method to asynchronously decode the audio file data. The bufferReady = true; is used as a flag to draw the buffer in the update() function, as we will explain later.

We take a closer look at how this Microphone class relates to the user interface in the following sections.

Mic Capture Playback

When microphone recording is stopped, we set the source of this element with the audio blob using the URL dynamically generated with URL.createObjectURL(blobData). We clean up the audio buffer and URL used for audio playback and download for garbage collection before we start a new recording (onstart).

<!-- In HTML Component -->
<!-- The `controls` attribute provides the audio playback control. -->
<audio id="audio" controls></audio>
function setAudioSource(audioElement, downloadElement, blobData) {
  if (blobData) {
    audioElement.src = URL.createObjectURL(blobData);

    setDownloadLink(downloadElement, audioElement.src);
  } else {
    throw "Could not set the audio source and download link with the mic recording";
  }
}

function releaseAudioURL(audioElement) {
  if (audioElement.src) {
    window.URL.revokeObjectURL(audioElement.src);
  }
}

Download recorded audio file

After we set the <audio> source for playback, we also set the link for audio file download using setDownloadLink(downloadElement, audioElement.src) function. Here we use the <a> element in the HTML Component. The download attribute instructs the browsers to download a URL on click.

 <!-- In HTML Component -->
 <a id="downloadLink" download="micRecording">
   <button id="saveButton">Save mic recording</button>
 </a>
function setDownloadLink(downloadElement, audioSource) {
  if (!downloadElement) return;

  if (audioSource) {
      downloadElement.href = audioSource;
  } else {
    console.log("No audio source available to download");
  }
}

Drawing the Graph to Visualize Audio Buffer Values

We use Sumerian’s LineRenderSystem to draw line segments for the graph. You can customize the graph settings on the Inspector panel of the Script component. The inputs to this function include the number of points used to draw the graph (steps), the position to draw the graph on the x-axis (xMin and xMax) as well as the scale, which scales the graph on the y-axis since audio buffer values are in the range between -1 and 1. You can find the maximum and minimum buffer values of your microphone inputs using findMin() and findMax() functions provided in the MicrophoneController script to help determine the appropriate scale for your graph.

For each line segment, we provide the start and end positions to LineRenderSystem.drawLine(start, end, color) method. The line color is also customizable on the Inspector panel.

function calculatePointsForVisualization(buffer, steps, scale, xMin, xMax, ctx) {
  var newBuffer = [];

  const width = (xMax - xMin) / steps;

  var bufferLength = buffer.length;
  var increment = Math.round(bufferLength / (steps + 1));

  for (var i = 0, index = 0; i < steps; i++) {
    newBuffer[i] = new sumerian.Vector3(i * width + xMin, scale * buffer[i * increment], 0);
  }

  return newBuffer;
}

function drawLineGraph(pointsArray, ctx) {
  for (var i = 0, len = pointsArray.length; i < len - 1; i++) {
    ctx.lineRenderSystem.drawLine(pointsArray[i], pointsArray[i+1], ctx.lineColor);
  }
}

Note that the color types need to be converted to Vector3 type as shown below.

var setup = function(args, ctx) {
  //... More code here
  ctx.lineColor = new sumerian.Vector3(args.lineColor);
}

var parameters = [
  //... More code here
  {
    type: 'vec3',
    control: 'color',
    key: 'lineColor',
    'default': [1.0, 1.0, 1.0],
    description: 'RGB color input'
  },
]

Now take a look at the update() function in MicrophoneController script. This is where we call calculatePointsForVisualization() and drawLineGraph() explained above. We draw a new graph when a new recording is created and the audio buffer is ready (ctx.mic.bufferReady). The first time the new buffer is created, we “resample” the buffer array for visualization purpose and store this as a new array ctx.vizBuffer. It’s created in the script context ctx, so that the array is available for each frame update following this calculation. The ctx.isBufferProcessed flag is used so that this calculation happens only once and thus improve the performance of your scene.

var update = function(args, ctx) {
  if (ctx.mic.bufferReady) {
    if (!ctx.isBufferProcessed) {
      ctx.vizBuffer = calculatePointsForVisualization(ctx.mic.buffer, args.numPoints, args.scale, args.xMin, args.xMax, ctx);
      ctx.isBufferProcessed = true;
    }

    if (ctx.vizBuffer) {
      drawLineGraph(ctx.vizBuffer, ctx);
    }
  }
}

If you haven’t already, press the microphone button on the published scene above to record your microphone inputs and try using it in your scenes using the Sound Component.

Now that you understand how to record microphone input as well as create audio buffer, you can start creating interactive sound and voice-controlled experiences. To learn more, check out the following

Note on Sending your Voice Recording to Amazon Lex

You can use ctx.mic.audioBuffer to have conversations with the Amazon Lex bots after a bit of processing. Please refer to the Amazon Lex Develoepr Guide and the “Preparing to export the recording to Amazon Lex” section of the AWS blog Capturing Voice Input in a Browser and Sending it to Amazon Lex for more details on audio buffer post-processing including merging, downsampling, and encoding.

Complete Code

The complete code blocks for this exercise are included in the source folder.

Microphone Controller

This script handles the UI and audio buffer visualization as well as define the Microphone class. After creating a new scene, add an empty Entity from the Create Entity menu. Rename the entity to something like “Microphone Input”. Next, add the Script component to the Microphone Input entity.

See MicrophoneController.js file in the source folder.

HTML

See the Part 1 of this post to see how to use an image as your microphone recording button.

See index.html file in the source folder.

Aiko Nakano

Aiko is a Design Technologist on the Amazon Sumerian team. She focuses on XR interactions and creates experiences to bring in artificial intelligence features into Sumerian.