Control of Cart Pole with a Policy Search
Published:
The general form of the cart-pole is shown in the following figure.
The states are shown in this figure as well. So we have four features to describe each state. We assume that the policy is a linear function of the state features, and the goal is to find the optimal policy.
So the policy is as follows:
So the goal is to find the optimal weight $W$.
A Random Policy
Now, we first run a simple random policy. Ofcourse we expect it fail immediately!
We can see that this policy quickly fails.
Optimal Policy
In order to find the optimal policy, we search through different random policies, and find the one that has the longest average life. In the following figure we can see the performance of different random policies that we have explored.
The winner of this policy is tested in the following figure:
To see the Github repository for this project, see Github.