Reinforcement
learning is a type of dynamic programming that trains algorithms using a system
of reward and punishment.
A
reinforcement learning algorithm, or agent, learns by interacting with its
environment. The agent receives rewards by performing correctly and penalties
for performing incorrectly. The agent learns without intervention from a human
by maximizing its reward and minimizing its penalty.
As
an agent, which could be a self-driving car or a program playing chess,
interacts with its environment, receives a reward state depending on how it
performs, such as driving to the destination
safely or winning a game. Conversely, the agent receives a penalty for
performing incorrectly, such as going off the road or being checkmated.
A variety of different problems can be solved
using Reinforcement Learning. Because RL agents can learn without expert
supervision, the type of problems that are best suited to RL is complex problems where there appears to be
no obvious or easily programmable solution. Two of the main ones are:
Game
playing - determining the best move to make in
a game often depends on a number of different factors, hence the number of
possible states that can exist in a particular game is usually very large. To
cover this many states using a standard rule-based
approach would mean specifying an also large number of hard-coded rules. RL cuts out the need to manually specify rules,
agents learn simply by playing the game. For two-player
games such as backgammon, agents can be trained by playing against other human
players or even other RL agents.
Control
problems - such as elevator
scheduling. Again, it is not obvious what strategies would provide the best,
most timely elevator service. For control problems such as this, RL agents can
be left to learn in a simulated environment and eventually they will come up
with good controlling policies. Some advantages of using RL for control
problems is that an agent can be retrained easily to adapt to environment changes, and trained continuously
while the system is online, improving performance all the time.
Comments
Post a Comment