Expert Tutorial

Introduction to Amazon SageMaker and Reinforcement Learning, part 1

30 minutes

Posted on: February 18, 2019

Learn Sumerian
Introduction to Amazon SageMaker and Reinforcement Learning, part 1



In this tutorial you will learn about:

Amazon SageMaker provides every developer and data scientist the ability to build, train, and deploy machine learning (ML) models. Reinforcement Learning (RL) is a segment of ML that focuses on how software agents ought to take actions in an environment so as to take action for a cumulative reward, such as a numerical score in a simulated game. In the following two-part tutorial series, you will learn how to use SageMaker for RL, using Amazon Sumerian as the simulation environment.

Before we dive into learning how to use Amazon Sumerian as a simulator for RL, we’ll do a run through of the SageMaker RL workflow using the canonical Cart-Pole example.

You’ll learn about the following:

  • Amazon SageMaker
  • Machine learning
  • RL


Before you begin, make sure you are able to read Python script and you have some familiarity with Reinforcement Learning concepts.

Watch a recent AWS Twitch stream to learn more.

Step 1: Sign In to the AWS Management Console and Open the Amazon SageMaker Console

  1. Sign In to the AWS Management Console and Open the Amazon SageMaker Console

  2. To get started, choose Create notebook instance.

    The main workflow for Amazon SageMaker involves Jupyter Notebooks. A notebook is a webpage with embedded Python cells that execute code. It’s a nice mashup of documentation and Python interpreter. Create notebook instance begins the process of creating an Amazon EC2 instance running an environment that hosts the notebook.

  3. Name your new instance. We used “sumerianRL”.

  4. Choose your instance type. This selects the EC2 instance to run on. The default is ml.t2.medium, which is fine for this tutorial.

  5. Create an AWS Identity and Access Management (IAM) role. Amazon SageMaker will use this IAM role in your account to create resources on your behalf using the SageMakerFullAccess IAM policy. Accept the default values for this tutorial. The default values for VPC, Lifecycle configuration, Encryption key, and Volume size are all fine for this tutorial.

  6. In the Git repository section, we’ll add the sagemaker-examples repository at sagemaker-examples. Choose Clone a public Git repository to this notebook instance only from the dropdown menu. For the Git repository URL, enter the following:

  7. Choose Create notebook instance to create the instance.

Note: This creates resources in your AWS account. Depending on your usage, you might exceed the free tier and you could be billed for resources specified for this exercise tutorial.

Step 2: Use the Notebook Instance

Amazon SageMaker takes a few moments to generate the resources for your instance. The Notebook instance page on the Amazon SageMaker console shows a list of notebooks you’ve created. Wait for Status to change from Pending to InService. You can use this page to manage your notebook. Typically, we start the instance when we are actively using it, then stop it when finished.

  1. When the Status is InService, choose Open Jupyter to start using your notebook instance. You should see a webpage that looks like this.

    alt text

  2. The starting point of your notebook instance is an interface for browsing the files on your instance. The example we’ll use is in the reinforcement_learning folder. Select the reinforcement_learning link.

  3. We’ll use the rl_cartpole_coach example. Select the rl_cartpole_coach link at the top of the list of folders.

  4. The example has a folder of common scripts provided by the Amazon SageMaker team to simplify RL workflows, and an src folder with the specific code needed to complete this example. The file has a brief overview of the example, while the rl_cartpole_coach_gymEnv.ipynb notebook file is the entry point to the example. Select the rl_cartpole_coach_gymEnv.ipynb link to open the notebook file.

    You should see a new webpage that looks similar to this.

    notebook example

  5. This notebook is set up to use the conda_python3 environment. Conda is a package and dependency manager for Python. The Amazon SageMaker notebook instances use Conda to offer several different Python environments. Choose Kernel, Change kernel, conda_tensorflow_p36 to choose the environment with Python 3.6 and TensorFlow.

    change kernel

  6. By default, this notebook is in a Not Trusted state. Choose Not Trusted to allow the notebook to execute the Python code embedded within.


Step 3: Work through the Notebook

The notebook is a tutorial/code mashup. To use the notebook, simply follow along, reading from top to bottom, while executing the code blocks in order.

  1. Imports. The standard way of importing a bunch of Python modules into your working environment. You could do the same with a Python script or the command line interpreter. Press the play button next to the code block. This executes the code within, importing the modules into the notebook’s context.


  2. Setup S3 bucket. Creates an Amazon S3 bucket to store the results of your training.


  3. Define variables. Creates a prefix that’s used to identify this particular job when creating Amazon SageMaker training jobs or generating file or folder names in S3.

    define prefix

    Decide whether to perform training on this instance or to generate a new instance to do the training. You’ll often work locally, iterating toward a solution. Then as you solidify your approach, you’ll move to larger training jobs running on more powerful and expensive hardware. For example, initially you might train for 1,000 steps to test that your agent is converging on a solution. Then, as you’re confident in the training, you might do a 25,000-step job on a GPU_accelerated instance.

    choose instance

  4. Create an IAM Role. Allows this notebook to perform AWS operations on your behalf, such as creating S3 buckets or launching new instances.


  5. Install docker for local mode. Ensures that the notebook instance has Docker installed, because the training happens inside of a Docker container. Docker is a good way of packaging all the environment requirements and dependencies. Amazon SageMaker provides a variety of containers to give customers a broad selection of choices in machine learning frameworks. Check out for more information about Amazon SageMaker Containers.


  6. Set up the environment. Doesn’t have a code section in this notebook because Cart-Pole is a built-in environment for OpenAI’s Gym framework. If you use Sumerian to create a custom simulation environment, you’d set up the environment in this section.

  7. Configure preset for RL algorithm. Where you link the notebook to the Python code that guides the RL learning. For this notebook, the preset is stored as in the src folder. The naming structure shows that we’re using the cartpole environment with the clipped PPO algorithm. RL is a rapidly evolving space and new algorithms are invented and refined constantly. Clipped PPO is a good default choice for many RL problems.

    This and the following code cells that call pygmentize aren’t executing the code in the Python file. Instead, they are reprinting the source code in a pretty format to display as HTML in this notebook.

  8. Write the training code. The Python script contains the code that kicks off and manages the training.

  9. Train the model using Python SDK Script mode. Starts, overriding some of the parameters. Pressing the play button on this section runs a training role either locally or as a new Amazon SageMaker job, depending on values set earlier in the notebook. This is where the magic happens. Press the play button and wait for the output to finish.


    For this tutorial, we chose to train with a new Amazon SageMaker job running on an ml.m4.4xlarge instance. You can watch the progress of your training job on the Amazon SageMaker console. Select the training job to see extra information about it.


    When the job status changes from InProgress to Completed, return to the notebook instance page to continue the example.

  10. Store the intermediate training output and model checkpoints. Where the training job writes data to the S3 bucket. The intermediate folder contains extra debug information that’s useful to check on the progress of your training. The code in this cell copies the S3 data to a folder on this instance, so that we can display images and plots in this notebook.

  11. Plot metrics for the training job. Matplotlib is a Python module commonly used by data scientists to inspect and plot data. In this cell, we plot the reward given in each episode. For cart-pole, the agent receives one point for each step of the simulation that the pole remains balanced, up to 200 steps. For this particular training session, you can see that in the first 100 or so episodes, the agent is trying different strategies to balance the pole, and mostly losing around 20 steps. There are a few spikes of more successful episodes with 50 or 60 steps. Eventually these more successful episodes accumulate and the agent rapidly learns to balance, with episodes exceeding 100 steps and eventually reaching the winning condition of 200 steps.

    Running the training job again by going back to step 10 and playing the Training cell again will result in a new training job that has different results per episode. But it should follow a similar pattern of unsuccessful episodes with small spikes of more successful episodes that eventually converge to a series of successful episodes, around 100 to 150 episodes.


  12. Visualize the rendered Gifs. The OpenAI gym environments contain routines to draw the results of the simulation using Python. When training, the episode will render a .gif image for each step of the episode. This section gathers up all the rendered images and generates a new animated .gif for the episode.

    Here is the result of episode 194 that had a successful score of 200.

    animated gif

  13. Evaluation of the RL Models. Loads the model that’s been trained and evaluates it. This runs either locally or on a new instance. You can monitor the progress of the new instance on the Training jobs page on the Amazon SageMaker console.

    Looking at the results here, we see that the current model is pretty good, with an average total reward of 172.17. The definition of a solved model for cart-pole is an average reward of 195 over 5 sessions. This model could probably use a bit more training.


  14. Model deployment. Creates a hosted endpoint that can be used to provide access to your model. This section deploys the model and demonstrates how you would use it. You give the endpoint the state of the simulation, and it will return the action to take next.

    endpoint test

    The endpoint returns the action as probabilities. In the first test, with the system in the state of the cart at (0,0) and the pole tipped to the right and moving to the right, the endpoint returns an 84% probability that moving right is the correct action, versus a 16% probability that moving left is the correct action.

    The second example reverses the state. With the pole tipped and moving to the left, the returned probabilies are reversed, with 88% probability to move left versus 12% probability to move right.

    You can see the endpoint on the Amazon SageMaker console on the Endpoints page. You can inspect your endpoint service on the Amazon SageMaker console on the Endpoints page.

    endpoint inspect

  15. Clean up endpoint. Removes the deployed endpoint. Because this is a toy example with little real-world utility, we don’t want to maintain the endpoint as an ongoing service, generating billing.

    endpoint delete

  16. Now that we’ve explored the example, go back to the Amazon SageMaker console and Stop the notebook. To explore the other examples, start the notebook again. Stopping the notebook between uses will prevent billing for resources that aren’t actively used.


In this tutorial, you learned how to use Amazon SageMaker for reinforcement learning using the cart-pole example provided by

Now you should know how to start and stop notebook instances from the Amazon SageMaker console, how to use a Jupyter notebook to orchestrate your RL case, and how to evaluate and deploy your trained model.

Now that you you’ve completed the first part of this tutorial, try the following tutorial, check out Using Amazon Sumerian with Amazon SageMaker as a Simulation for Reinforcement Learning, part 2

Back to Tutorials

© 2019 Amazon Web Services, Inc or its affiliates. All rights reserved.