Lunar Lander is a reinforcement learning problem provided by Open AI's Gym environment. In this project, I used this environment to create a model which learned the actions to take in given states, using a neural network along with the typical reinforcement learning loop of observe state, take action, observe reward, move to new state.
Lunar Lander poses a challenge in that the action space for the agent is continuous, where the lunar lander can go left, right, up, down by continous amounts, as well as the state space the lander finds itself in is also continous, because its location in the world are x,y coordinates along with velocites. A traditional reinforcement learning model would not be able to account for this continous behavior, as traditional reinforcement learning models would try to memorize which actions are best for every state. In Lunar Lander however, the possible states are infinite, making it impossible to simply memorize the best actions per state.
Enter Deep Q Network, a model created by Google to play Atari games, using the pixels on the screen as input. In this case however, the input is the state of the lunar lander, including x,y coordinates and velocites, as well as angle and angular velocity.
The general approach for this project was as follows:
Where there were two q networks, one to use as the training targets during training, the other to learn.
In this project I also adjusted certain parameters to find the best model to learn the problem, which involved weighing different choices such as choosing between prioritizing current reward more in the beginning of training, and decaying that importance later on as training continued, favoring long-term reward instead.
The source code can be found on a private repo, please DM me for access (johnstonpaul801 [at] gmail)
Available on GitHub