Control of Cart Pole with a Policy Search

Published: September 03, 2023

The general form of the cart-pole is shown in the following figure.

The states are shown in this figure as well. So we have four features to describe each state. We assume that the policy is a linear function of the state features, and the goal is to find the optimal policy.

So the policy is as follows:

So the goal is to find the optimal weight $W$.

A Random Policy

Now, we first run a simple random policy. Ofcourse we expect it fail immediately!

We can see that this policy quickly fails.

Optimal Policy

In order to find the optimal policy, we search through different random policies, and find the one that has the longest average life. In the following figure we can see the performance of different random policies that we have explored.

The winner of this policy is tested in the following figure:

To see the Github repository for this project, see Github.

Share on

Twitter Facebook LinkedIn

Hossein Khazaei, PhD

A Random Policy

Optimal Policy

Share on