Cart-pole system

Cart-pole system

The cart-pole system is a well-known widely-used reinforcement learning testbed. It has been introduced in Barto, Sutton, and Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems," IEEE Trans. Syst., Man, Cybern., Vol. SMC-13, pp. 834--846, Sept.--Oct. 1983, and in Sutton, "Temporal Aspects of Credit Assignment in Reinforcement Learning", PhD Dissertation, Department of Computer and Information Science, University of Massachusetts, Amherst, 1984

The original C version is available at the reinforcement learning repository.

The drawback of the available ancient C code is, that the reinforcement learning code is strongly bound with the simulation. For the purpose of further experiments and easier methods comparison (the same system simulation), I have rewritten the code to Java and split the simulation (CartPole.java) from the Barto & Sutton & Anderson reinforcement learning (RL.java).

I have also created a new visualization tool. It can (but does not have to) be used to inspect the system behavior.

Simulation

The cart-pole system simulation is contained in the CartPole class. The interface allows an easy system manipulation.

Reinforcement learning

The RL class contains the original Barto & Sutton & Anderson reinforcement learning. This class can be checked to see, how a direct interaction with the CartPole system may work.

The RL controller can be started as a standalone application:

java RL 1234

where 1234 is an obligatory random seed value.

General controller / RandomController

To decouple the simulation from a controller even more, a simple message passing system is designed. The CartPole system prints its current configuration to the standard output and can read the actions from its standard input. The Controller class implements the basic counterpart - it reads the cart-pole system configuration from its standard input and prints the actions to standard output. The RandomController shows how easy it is to just override the abstract act() method in the Controller class to implement the Controller functionality. The CartPole simulation and the Controller need to be connected using a tool such as socat:

socat EXEC:'java CartPole' EXEC:'java RandomController'

Visualization

The CartPoleFrame reads the system configuration from the standard input and draws the system state. It can be connected both to the RL controller:

java RL 1234 | java CartPoleFrame 12

or to the general system:

socat SYSTEM:'java CartPole | java CartPoleFrame 12' EXEC:'java RandomController'

where the 12 parameter is an optional visualization delay. For this amount of milliseconds (0 by default) the visualization waits between two consecutive frames.

Logging

The RL output can be logged directly:

java RL 1234 | tee rl.log

and replayed:

java CartPoleFrame 12 < rl.log

Because the visualization needs just the system configurations, it is sufficient to:

socat SYSTEM:'java CartPole | tee cp.log' EXEC:'java RandomController'

and replay by:

java CartPoleFrame 12 < cp.log

Compilation

It should be possible to compile the code with any decent version of the Sun Java compiler.

Download

The source code is available under the GNU Public Licence:

RL performance

An illustrational graph of the RL performance:

RL performance graph