Inside AWS DeepRacer Garage: Fine-tuning the performance of your model

4 min readOct 23, 2020

How cool is to watch your AWS DeepRacer car navigating autonomously every corner and turn in the circuit with your Reinforcement Learning model guiding it? Now its time to fine-tune the model so that we will try to clock the best time.

If you are new to AWS DeepRacer, visit my previous article on “Get Started with AWS DeepRacer: Create, Train, Race your first model”. Let’s get our hands greasy now!

We will try to dive deeper into this tutorial and discuss more on the Action Space, Reward Function and Hyper-parameter tuning

Action Space

Action Space is the set of actions that is subjected to Maximum speed, Speed Granularity, Maximum Steering Angle and Steering angle Granularity. At a particular instance of the input image from the sensor (Front-facing camera), the model tries to pick one of the actions and the evaluates the reward obtained for the particular action.

In the last tutorial, we went with the default DeepRacer vehicle. To define our own action space, we will create a new DeepRacer vehicle.

On your left navigation pane, under Reinforcement Learning Tab, click Your Garage. You will be redirected to another page where we can build our own vehicle. Click Build New Vehicle.

Step 1: Mod specifications

Since we are only focusing on Time Trails, so the only sensor needed is the Front-facing Camera. Select Camera and then proceed to the next step.

Step 2: Action space

Granularity refers to the degrees of freedom of the DeepRacer model. Change the maximum and granularity values and notice the action list gets updated.

These action list define the behaviour of the model on the track.

Step 3: Personalization

You can personalize your vehicle by giving it a special unique name and the body colour of the car.

Reward Function

The reward function and the action space go hand in hand. The reward function must be compatible with the values specified in the action list.

Last time, we had selected one of the default reward function and trained the model. Though that's the best way to start, we need to understand the parameters of the vehicle input and design our own reward function to clock the best time.

The parameter that is passed to the reward function is ‘params’ which is a Python Dictionary object.

I recommend you to read the official documentation and understand each parameter here.

HyperParameter Tuning

Hyperparameters are variables to control your reinforcement learning training. They can be tuned to optimize the training time and model performance.

We have provision to fine-tune seven hyperparameters. We will try to understand how each hyperparameter influences model training. Hyperparameter tuning is all about iterative improvement through trial and error method.

Gradient descent batch size: The batch is a subset of an experience buffer that is composed of images captured by the camera mounted on the AWS DeepRacer vehicle and actions taken by the vehicle.

Number of epochs: The number of passes through the training data to update the neural network weights during gradient descent.

Learning rate: The learning rate controls how much a gradient-descent (or ascent) update contributes to the network weights.

Entropy: The added uncertainty helps the AWS DeepRacer vehicle explore the action space more broadly.

Discount factor: The discount factor of 0 means the current state is independent of future steps, whereas the discount factor 1 means that contributions from all of the future steps are included.

Loss type: The type of objective function to update the network weights.

The number of experience episodes between each policy-updating iteration: The size of the experience buffer used to draw training data from for learning policy network weights.

Training and Evaluating your model

We need to specify the stopping time for the model training. But there is a catch here. If we train our model for lesser time, there is a high chance of model underfitting, that is, the car may not perform well.

If the model trained for higher time, the model may experience overfitting, that is the model may perform really well for the current track you had trained for, but may not perform well in other tracks.

To achieve the generalization, we need to set optimum hours of training, so that the model converges. From other fellow developers, it is said that 4 hours of training yields good result.

Keep an eye on the evaluation Simulation and the Logs of the model, so that you can improve in future models.