Reinforcement learning in self-driving cars competition

How our student team achieved a top 1% ranking in an autonomous racing competition

By Daniel Gonzalez, Natalia Korchagina & Marc Cervera

In this article, we will look at how our Master in Business Analytics prepared us for this project, how the inner workings of an autonomous car are designed, how we participated in the competition, and how we managed to strenghten Esade's brand in the data science field.

Before proceeding further, let's briefly explain our capstone project.

Our capstone project

AWS DeepRacer is an autonomous racing car that can be trained on a virtual track in Amazon Web Services (AWS) with a machine learning technique called reinforcement learning that is explained below. In case you are not familiar, AWS is the largest provider of cloud services in the world. When a trained model is in its final version, it can be entered in a virtual race against developer teams from around the world.

Reinforcement learning as a machine learning paradigm has become well known for its successful applications in robotics, gaming (AlphaGo is one of the best-known examples), and self-driving cars. The area of its application is widening and this is drawing increasing attention from the expert community – and there are already various industrial applications (such as energy savings at Google).

Amazon DeepRacer
AWS DeepRacer car (Image credits: Amazon)

Considering the growing importance of reinforcement learning, we were motivated to learn more. Creating and training models in AWS DeepRacer, running experiments with the algorithm and hyperparameters, and entering in virtual races as part of our capstone project became a great opportunity to apply the knowledge and skills obtained during our masters programme at Esade.

Our team took part in a special event organised in May 2020 by AWS in collaboration with Formula One, and during which developers from around the world competed against F1 professionals. The competition took place on a 1:18 virtual replica of the official Spanish Grand Prix track.

How Esade prepared us for the capstone project

It might be hard to believe but before we started our Esade Master's in Business Analytics, none of us knew how to code, nor did we have any experience in machine learning or deep learning. Our year at Esade was fundamental for building our expertise in the field. The variety of courses in the master's programme provided a perfect combination of theoretical topics and practical assignments, and we learned Python, artificial intelligence (AI), and concepts of data analytics in less than nine months.

The first AI course, taught by Esteve Almirall, gave us an understanding of the basic tools in supervised and unsupervised learning, data pre-processing, and Python. Building more advanced algorithms and the application of deep learning was explained in the second AI course led by Marc Torrens. The knowledge acquired helped us develop our DeepRacer models: especially in writing the algorithm for the model, choosing the hyperparameters and the neural network structure, as well as analysing the model performance and detecting bottlenecks.

AWS Deepracer Esade
The Esade team competing during the virtual race (Image: Daniel Gonzalez)

AI was complemented by courses on cloud computing and infrastructures. These were mainly focused on the basic concepts of data science and cloud infrastructures, as well as how to apply web scraping tools, which we found helpful for automating some tasks (such as entering the car in the races). The cloud infrastructure course gave us the opportunity to explore some of the AWS services in depth, as well as learning how to build AWS infrastructure. This was very useful because the entire project was centred around AWS (in which we created and trained our models, ran the experiments and participated in the competitions).

The capstone project also required considerable work and study, as reinforcement learning is not covered in detail in this masters programme – so we had to learn everything from scratch. But we were not alone in this challenge, and throughout the project we were supported by professors from Esade, and we consulted them whenever we had any doubts. Francesc Rossel and Marc Torrens gave us insightful feedback on the work in progress that guided our next steps. Jordi Nin Guerro and his colleague from the Universy of Barcelona, Santi Segui, advised us on reinforcement learning and reviewed our technical papers.

We are thankful for the support and academic experience that Esade gave us, as without this help we would probably not have considered entering the AWS DeepRacer competition, nor achieved the results.

How reinforcement learning works in autonomous racing

To understand how we competed in the autonomous driving competition, we need to make a brief introduction about the inner workings of the car. What makes a car autonomous is an algorithm that "tells" the car which speed and direction to choose at each location on the track. The algorithm can be static or dynamic. A static algorithm has pre-made decisions for the car to execute at each location on the circuit.

Researchers have developed dynamic algorithms that automatically learn how to drive in changing conditions

The limitations of this approach are that it is not generalisable to other racing circuits and relies heavily on the racing knowledge of the developer. That is why researchers develop dynamic algorithms that automatically learn how to drive in changing conditions. These reinforcement learning algorithms are used by self-driving Tesla cars.

For our competition, we used a reinforcement learning algorithm. We will explain the basics of how this approach works. According to Anthony Robbins, 4 steps must be followed for success:

  1. Decide what you want
  2. Take action
  3. Notice what is working or not
  4. Change your approach until you achieve what you want

Reinforcement learning works similarly.

1. Decide what you want

The first step for using this algorithm is to set the objectives that you want the car to accomplish. In this case, the goal is to finish the lap as quickly as possible. Consequently, the algorithm chooses various sets of actions (combinations of steering and speed) to accomplish this objective. Now imagine putting a novice driver at the wheel and telling her to finish the lap as quickly as possible. Because she has never driven before she will not know where to start. To overcome this, data scientists give a score for each trial, which determines how good it was compared to the end goals. This score is set by the reward function and can be, for example, % of the track completed.

2. Take action

After these preliminary steps, the algorithm/driver tries to drive the circuit several times. Starting at the same place, the car chooses one action after another until it finishes the lap or goes off the track. Because it is the first trial, the algorithm might drive the car in seemingly random movements. Did you not behave in the same way when you started learning to drive?

3. Notice what is working or not

After a group of hundreds of trials, the algorithm stops and compiles the experience it has learnt. In other words, it looks at the score of each trial and "learns" which actions lead to faster results. For example, if the car is in a curve, the algorithm learns that slowing down is the best option because past trials showed that taking curves at speed leads to crashes.

4. Change your approach until you achieve what you want

The car goes through steps 2-3 several times, and learns what worked and what did not work, and then implements these insights in future trials. This results in better driving and the algorithm choosing actions that lead to better scores. Eventually, after some hours of training, the car is hopefully able to finish a fast lap.

This is a general overview of the reinforcement learning technique applied in AWS DeepRacer. Our challenge was to train the dynamic algorithm to make the model fast and reliable.

AWS Deepracer
The Esade team ready to compete in the autonomous virtual race (Image: Daniel Gonzalez)

Competing in the autonomous virtual race

In the time trial category, our team gained 12th place out of nearly 1300 participants.

So how did our team with a business background manage to beat so many professional software developers? We used the learnings from our Esade classes and set up a rigid trial and error approach to continuously improve our results. We identified multiple fields that needed to be optimised to train a fast and reliable racing car model:

  1. Calculating the optimal racing line and speed
  2. Defining possible actions that the car can take
  3. Rewarding the car for good behaviour
  4. Tweaking the inner workings of the reinforcement learning algorithm
  5. Analysing training logs to learn from past mistakes

We obtained a top 1% ranking in the F1 time trial category. In addition, we entered our model in other races for money and collected $900 in prize money. That is a lot of money for us students!

Articles published to share our insights

We decided to share our insights with the community in two articles in Towards Data Science. This is one of the most prestigious online publications in the data science field and has about as many daily readers as La Vanguardia, one of the main newspapers in Catalonia.

The first article explains reinforcement learning and its application in autonomous racing in an easy to understand way. It also covers the specific neural network powered algorithms, making the article appealing to both novices and experts in autonomous racing.

The second article is an advanced guide to AWS DeepRacer and summarises the insights we gathered during the competition. This article was well received in the DeepRacer community and helped many developers improve their strategies.

We are convinced that these articles strenghtened the brand of our team members and Esade, helping us to find better jobs in the tech industry and helping Esade attract more international talent for its Master in Business Analytics.


Our experience with AWS DeepRacer turned out to be exciting and valuable for our team. We successfully combined coding skills and knowledge of machine learning (which we acquired during the MSc in Business Analytics at Esade) with essential theory about reinforcement learning. We iteratively experimented with all the components of DeepRacer, accumulating 2950 hours of training, and combined a wide range of technical and analytical tools (some of which we designed ourselves). This approach helped us achieve top positions in the leaderboards of all the competitions we entered, especially the F1 event, in which we achieved a top 1% ranking.

We are delighted to have been part of this project, as we learnt many valuable lessons, helped other developers around the world, and strengthened Esade's brand in the data science field.

All written content is licensed under a Creative Commons Attribution 4.0 International license.