WHAT IS REINFORCEMENT LEARNING ?

Reinforcement learning is a type of dynamic programming that trains algorithms using a system of reward and punishment.
A reinforcement learning algorithm, or agent, learns by interacting with its environment. The agent receives rewards by performing correctly and penalties for performing incorrectly. The agent learns without intervention from a human by maximizing its reward and minimizing its penalty.
As an agent, which could be a self-driving car or a program playing chess, interacts with its environment, receives a reward state depending on how it performs, such as driving to the destination safely or winning a game. Conversely, the agent receives a penalty for performing incorrectly, such as going off the road or being checkmated.


 Uses for Reinforcement Learning:

A variety of different problems can be solved using Reinforcement Learning. Because RL agents can learn without expert supervision, the type of problems that are best suited to RL is complex problems where there appears to be no obvious or easily programmable solution. Two of the main ones are:

Game playing - determining the best move to make in a game often depends on a number of different factors, hence the number of possible states that can exist in a particular game is usually very large. To cover this many states using a standard rule-based approach would mean specifying an also large number of hard-coded rules. RL cuts out the need to manually specify rules, agents learn simply by playing the game. For two-player games such as backgammon, agents can be trained by playing against other human players or even other RL agents.
Control problems - such as elevator scheduling. Again, it is not obvious what strategies would provide the best, most timely elevator service. For control problems such as this, RL agents can be left to learn in a simulated environment and eventually they will come up with good controlling policies. Some advantages of using RL for control problems is that an agent can be retrained easily to adapt to environment changes, and trained continuously while the system is online, improving performance all the time.



Comments