Reinforcement Learning: A Guide To Training Intelligent Agents

Reinforcement Learning is a machine learning methodology used to train ‘intelligent’ agents by exposing them to an environment and rewarding desired actions. The RL algorithms utilize different approaches to optimize the agent’s policy or value function and improve its decision-making capabilities over time.

Rahana A Kadir
May 31 2023 – 6 min read

Reinforcement learning is one of the three basic methodologies for training machine learning models, besides supervised learning and unsupervised learning

It is a class of machine learning related to intelligent agents and their learning activity. In other words, it is a feedback-based machine learning technique. 

The agents are ‘intelligent’ which means they have some knowledge about the task they are asked to perform. So, when they are exposed to an environment, they explore it and perform actions. 

Next, they notice the outcomes, retaining actions that led to good outcomes and avoiding actions with bad outcomes. This is because the training method penalizes undesired actions and rewards desired actions.

In this way, the model learns how to perform a task more efficiently and effectively, thus improving its performance over time. It is typically used for sequential, long-term decision-making tasks such as gaming.

In this post, we will discuss:

  • Examples of reinforcement learning
  • Achievements of reinforcement learning
  • Concept of reinforcement learning
  • Types of reinforcement learning

Before we proceed further, let’s go through the history of reinforcement learning in brief.

History of Reinforcement Learning

Reinforcement learning for machines took birth back in the 1950s. The idea was derived from the reinforcement memory systems of animals, which remembered tasks that were beneficial or rewarding as well that were harmful or expensive. 

So, the animals could be tamed or taught to perform desired tasks by awarding correct actions and punishing incorrect ones. In this way, animals could do complex tasks and learn to do them better over time.

Similarly, the intelligent agents of machines are trained and prepared to perform complex tasks by repeatedly doing them. In the process, they discover wrong ways and right ways, finally deciding the most optimum way themselves.

Example of Reinforcement Learning

Suppose an agent knows how to play chess, that means is aware of all the chess pieces’ names, rules, and moves. When the agent is put to play against a human chess expert, it makes moves and notices the next move made by the human expert. It also takes into account if it lost one of its chess pieces after that action.

If it lost a chess piece, that was a ‘penalizing’ move and the agent learns not to do it again in the same situation. On the contrary, if it did not lose a chess piece, that was a ‘rewarding’ move and in the future, the agent will use it in similar situations.

Like this, iterations of the game help the agent learn the best moves over time. Thus, the agent utilizes reinforcement learning to become an expert in the game of chess.

Another excellent example of reinforcement learning is the interaction of a robot with humans to learn how to walk and talk. It happens by experiential learning – performing actions and noting the error.

Achievements of Reinforcement Learning

In 1992, a reinforcement learning agent almost defeated the world champion in a game of backgammon. This was the first time where neural networks were integrated into reinforcement learning.

Later in 2014, Deep Mind, a company owned by Google, came up with its DQN algorithm. It outplayed every algorithm in existence in several Atari 2600 games. In 2016, Deep Minds Alphabet System beat the world champion Lee Sedol in the game of GO.

All this demonstrated the great capabilities of reinforcement learning and captured the AI industry’s hearts and minds. Companies started investing big time in this area, with studies and experiments for using it to solve day-to-day problems in full swing.

The result? It was great, fantastic, and admired. 

Reinforcement learning models were managing the cooling systems in Google’s data centers. They adjusted and controlled the temperature automatically based on environmental conditions.

Besides this, reinforcement learning agents were also helpful in understanding protein folding. It is quite complex and using RF, the structures or stage of folding could be determined easily.

Today, reinforcement learning is a key component in autonomous vehicles, traffic control systems, robotics, industrial automation, targeted advertisements, and much more.

Concept of Reinforcement Learning

There are 7 main elements of Reinforcement Learning algorithms:

  1. Environment
  2. Agent
  3. Episodic Task
  4. Continuing Task
  5. Policy
  6. Discount Factor
  7. Value

The Environment refers to the elements that affect a system or its inputs and outputs. It can be the gaming setup (virtual) or the building where we walk (real).

The Agent is the machine learning model with prior knowledge about the environment and behavior rules. When placed in the environment, it becomes the learner and decision-maker. It interacts with the environment by taking actions in it and in return getting back an observation and a reward. 

The cycle starts with the agent pulling an observation from prior knowledge and performing it in the environment. It processes the observation and reward given by the environment before taking the next action in the environment. 

With another observation and reward from the environment, the agent again processes them and takes action. In this way, the cycle keeps on repeating. 

If this cycle stops at a particular point, it is an episodic task. For example, in a board game, if you win, the game is over. So, if the agent plays the game once it is one episode of the task. It will have to repeat multiple episodes to become perfect.

If the cycle goes on forever, we call them continuing tasks. For example, controlling an oil refinery. So, the agent keeps on learning and performing tasks without needing to restart from the beginning.

In the entire reinforcement learning process, the agent’s goal is to collect as many rewards as possible. So, it takes the most favorable actions in the environment based on its prior knowledge and recent learning. 

This reward-oriented behavior of an agent in an environment becomes its Policy. It helps the agent to maximize the rewards by delivering the desired performance.

Besides this, reinforcement learning uses a discount factor. When you play chess, you can look three or four steps into the future and understand how good or bad playing a move at this point is. The same idea has an application here. We’re limiting the agent’s ability to look too far into the future.

The agent gets to learn the return after the moves so that it can behave optimally. But to make the move, it must estimate the return. It does so by examining the environment and taking exploratory moves in it. 

In this way, the agent learns what the rewards will be and can estimate them in the future (based on the observation). This estimated return that the agent thinks it will get is the Value

Let us assume that an agent pulls an observation. There are several actions that an agent can take. Those actions can lead to different observations. The policy determines what actions are the most promising. 

Next, the agent estimates the value or expected returns for each of these leading observations. Based on the rewards, it chooses the action and performs it. This cyclic process continues till the end. 

The goal of the agent, as already mentioned is to collect as many rewards as possible. Once it has all the rewards, it can calculate the return for every observation and perform even better in the environment. 

During the reinforcement learning process, whenever it finds a value deviating from the actual return, the agent updates the value. The agent continues to test observations again and again to optimize its performance. It keeps learning the values and updating the policy. This is the fundamental principle of reinforcement learning.

Types of Reinforcement Learning

Reinforcement learning is classified into two major types:

  1. Positive Reinforcement
  2. Negative Reinforcement

Positive Reinforcement

Positive reinforcement learning involves doing actions such that the desired behavior will be repeated in the future. It has a favorable effect on the agent’s behavior and makes the value of the actions higher.

The benefit of using positive reinforcement is that long-lasting changes can be sustained by it. However, too much positive reinforcement may result in an overflow of circumstances that lower learning. It is kind of overfitting.

Negative Reinforcement

Negative reinforcement learning enhances the likelihood that a certain behavior will take place again by avoiding undesirable circumstances. The agent notes the actions that are penalized and avoids them in the future.

Depending on the circumstance and behavior, it may be more effective than positive reinforcement. For the observations in the environment, the agent deliberately performs actions that lead to positive rewards.

Popular Algorithms of Reinforcement Learning

Some of the popular reinforcement learning algorithms are:

  1. Q-learning
  2. State Action Reward State action (SARSA)
  3. Deep Q Neural Network (DQN)

Conclusion

Reinforcement Learning is a very interesting and useful machine learning method for a variety of use cases including gaming, business strategy planning, trading, self-driving cars, etc. 

It places the agent (machine learning model) in the environment to interact, explore, and learn along the way. The observations lead to actions in the environment and rewards from the environment. These rewards determine the value of the actions and help the agent to determine rewards for a similar observation in the future. 

The process doesn’t involve any human intervention and is beneficial for tasks where data for training models is substantially less. However, the model takes time to learn vis reinforcement learning because of the number of observations and delays in reward estimation.

At AnuBrain, we work on AI and machine learning projects. If you have any requirements or ideas, contact us today.

 

I am a passionate Machine Learning Engineer with 2 Masters in Computer Science – MCA and M Sc (ML /AI). I am proficient in Image Processing using Computer Vision, Machine Learning, and Statistical Modelling Algorithms/Techniques for identifying Patterns and extracting valuable insights.

What to read next

Most Favourite Blogs

Leave a Reply

Your email address will not be published. Required fields are marked *