Cart-pole system
The cart-pole system is a well-known widely-used reinforcement learning testbed. It has been introduced in Barto, Sutton, and Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems," IEEE Trans. Syst., Man, Cybern., Vol. SMC-13, pp. 834--846, Sept.--Oct. 1983, and in Sutton, "Temporal Aspects of Credit Assignment in Reinforcement Learning", PhD Dissertation, Department of Computer and Information Science, University of Massachusetts, Amherst, 1984
The original C version is available at the reinforcement learning repository.
The drawback of the available ancient C code is, that the reinforcement learning code is strongly bound with the simulation. For the purpose of further experiments and easier methods comparison (the same system simulation), I have rewritten the code to Java and split the simulation (CartPole.java) from the Barto & Sutton & Anderson reinforcement learning (RL.java).
I have also created a new visualization tool. It can (but does not have to) be used to inspect the system behavior.
Simulation
The cart-pole system simulation is contained in the CartPole class. The interface allows an easy system manipulation.
Reinforcement learning
The RL class contains the original Barto & Sutton & Anderson reinforcement learning. This class can be checked to see,
how a direct interaction with the CartPole system may work.
The RL controller can be started as a standalone application:
java RL 1234
where 1234 is an obligatory random seed value.
General controller / RandomController
To decouple the simulation from a controller even more, a simple message passing system is designed.
The CartPole system prints its current configuration to the standard output and can read the actions from its standard input.
The Controller class implements the basic counterpart - it reads the cart-pole system configuration from its standard input
and prints the actions to standard output. The RandomController shows how easy it is to just override the abstract act()
method in the Controller class to implement the Controller functionality.
The CartPole simulation and the Controller need to be connected using a tool such as socat:
socat EXEC:'java CartPole' EXEC:'java RandomController'
Visualization
The CartPoleFrame reads the system configuration from the standard input and draws the system state.
It can be connected both to the RL controller:
java RL 1234 | java CartPoleFrame 12
or to the general system:
socat SYSTEM:'java CartPole | java CartPoleFrame 12' EXEC:'java RandomController'
where the 12 parameter is an optional visualization delay. For this amount of milliseconds (0 by default)
the visualization waits between two consecutive frames.
Logging
The RL output can be logged directly:
java RL 1234 | tee rl.log
and replayed:
java CartPoleFrame 12 < rl.log
Because the visualization needs just the system configurations, it is sufficient to:
socat SYSTEM:'java CartPole | tee cp.log' EXEC:'java RandomController'
and replay by:
java CartPoleFrame 12 < cp.log
Compilation
It should be possible to compile the code with any decent version of the Sun Java compiler.
Download
The source code is available under the GNU Public Licence:
RL performance
An illustrational graph of the RL performance: