Pacman Q-Learning Agent - University of Pennsylvania
For this project, I developed and trained a Reinforcement Learning (RL) agent to play Pacman using Q-learning and Approximate Q-learning. The goal was to enable Pacman to learn an optimal policy through trial-and-error interactions with the game environment, maximizing rewards while avoiding ghosts.
1. Q-Learning Agent Implementation
Designed a Q-learning agent that updates a Q-table based on state-action pairs, learning from rewards to optimize future decisions.
Implemented ε-greedy action selection, balancing exploration and exploitation for effective learning.
Optimized Q-value updates using the Bellman equation:
Q(s,a)=(1−α)Q(s,a)+α(R+γV(s′))
2. Approximate Q-Learning for Generalization
Extended the Q-learning agent to an Approximate Q-learning agent using feature-based function approximation instead of a discrete Q-table.
Implemented feature extraction to represent the game state, improving learning efficiency for larger environments.
Used a weighted sum of features approach to compute Q-values: Q(s,a)=∑f(s,a)wi
where each weight wiw_iwi is updated using: wi←wi+α⋅(R+γV(s′)−Q(s,a))⋅fi(s,a)
3. Training and Performance Optimization
Trained Pacman for 2000 episodes in a noiseless environment to refine its policy.
Implemented a state-value function to prioritize high-reward paths and avoid negative states (ghosts).
Improved policy stability by introducing a learning rate decay mechanism.
The Q-learning agent initially struggled with random exploration but gradually converged to an optimal policy after hundreds of episodes.
The Approximate Q-learning agent, leveraging feature extraction, generalized better and was able to win consistently with fewer training episodes.
Final performance: Pacman achieved an 80-90% win rate on small grids and successfully adapted to larger environments using feature-based learning.
Lua based - Gaming Network Server
Python Based - Snake Game
Javascript - Fun Coding Games