What Is Reinforcement Learning?
People are said to make about 35,000 choices a day. The world is a continuous stream of decisions, full of problems about what choices and actions to take at each moment— i.e., time-series optimization problems.
If you can prepare a large amount of teacher data containing optimal choices and actions at each moment, you might build an AI that teaches optimal behavior under the supervised-learning framework. However, for most real-world problems, the optimal choice or action is not obvious.
In many cases, you can evaluate whether the outcome of a sequence of choices and actions was good or bad (called a “reward”). Reinforcement learning can learn optimal actions over time by relying on this reward signal. For example, in Go AI, it is difficult to provide the optimal move for every board position as teacher data, but it is relatively easy to feed back the final win/loss result.
The way people learn to ride a bicycle is also similar to reinforcement learning. Rather than being taught how hard to push the pedals at every moment, you try pedaling with different amounts of force, experience falling (negative reward) or moving forward smoothly (positive reward), and gradually acquire the appropriate way to pedal.
Why Time-Series / Sequential Optimization Problems Are Hard
The difficulty of this type of problem lies in the fact that deciding the optimal action at each moment requires looking ahead to some extent. Even if you chain together choices that seem best “right now,” it does not necessarily lead to the optimal overall result.
A simple and intuitive way to make foresighted decisions is to try all possible choices and actions and pick the best. However, there are not many problems where this approach is feasible within a realistic amount of time. (In Go, for example, there are always 100+ choices for where to place a stone, and the number of variations over about 100 moves becomes astronomical.)
To tackle these difficult problems, reinforcement learning has emerged as a powerful approach developed by predecessors. With the recent rise of deep learning, the application of deep reinforcement learning (= reinforcement learning + deep learning) to a wide variety of problems has been accelerating.
Engineering Matters in Reinforcement Learning Modeling
Reinforcement learning can be applied to many challenges, from Go AI to generative AI. To build a good reinforcement learning model, however, you need a deep understanding of the target problem. Based on that understanding, you must consider what information (state, reward) and what options (actions) to provide to the reinforcement learning agent.
At ESTECH, we leverage engineering knowledge cultivated over many years to support the construction of reinforcement learning models tailored to customers’ challenges and to help solve those challenges.
Examples of Our Work
-
Automated Trailer Parking
-
Energy Management
-
Parameter Planning for Mobile Machines
-
Automation of CAE Software Operations