Autonomous Quadrotor: SE(3) Control, Trajectory Optimization & Path Planning
University of Pennsylvania — MEAM 6200
For this project, I developed and deployed a fully autonomous quadrotor navigation system on a Crazyflie platform, implementing the complete stack from trajectory planning through real-time flight control. The goal was to enable autonomous maze navigation using only onboard sensing and VICON motion capture for state estimation, with no human intervention during flight.
Key Features:
1. SE(3) Geometric Flight Controller
Implemented a hierarchical position/attitude controller using SE(3) geometric control: an outer-loop PD controller computes desired accelerations from position/velocity error, which are converted to desired thrust and attitude via a geometric formulation
Inner-loop attitude controller tracks desired orientation and outputs thrust and torque commands using attitude error e_R and angular velocity error.
Tuned control gains Kp, Kd through Ziegler-Nichols-style initialization followed by iterative hardware tuning
2. Minimum-Snap Trajectory Generator
Generated smooth, dynamically feasible trajectories using piecewise quintic polynomial segments with continuity enforced through position, velocity, and acceleration at waypoints
Allocated segment times proportionally to inter-waypoint distance with a minimum time constraint to prevent aggressive short-segment motion
Applied Ramer-Douglas-Peucker (RDP) simplification to reduce dense A* waypoint sequences while preserving path shape and collision safety
3. A Graph-Search Path Planner
Formulated path planning as a shortest-path problem over a collision-free voxel graph with obstacle inflation margin m = 0.20m
Implemented A* search to find minimum-cost paths from start to goal, followed by RDP waypoint simplification with tolerance epsilon = 0.10m and collision re-verification on simplified segments
Tuned planner parameters: map resolution [0.1, 0.1, 0.1]m, nominal speed v = 0.75m/s, minimum segment time Tmin = 1.0s
Results
Successfully navigated all three maze environments autonomously in hardware deployment using VICON motion capture for state estimation
Achieved position RMSE < 0.21 m across all runs; peak z-axis tracking error reduced from 0.35 m to 0.17 m (47% reduction) through targeted Kd gain tuning
Managed sim-to-real transfer by reducing all gains to approximately 30% of simulation values to account for sensor noise, communication latency, and actuator limits
Pacman Q-Learning Agent - University of Pennsylvania
For this project, I developed and trained a Reinforcement Learning (RL) agent to play Pacman using Q-learning and Approximate Q-learning. The goal was to enable Pacman to learn an optimal policy through trial-and-error interactions with the game environment, maximizing rewards while avoiding ghosts.
1. Q-Learning Agent Implementation
Designed a Q-learning agent that updates a Q-table based on state-action pairs, learning from rewards to optimize future decisions.
Implemented ε-greedy action selection, balancing exploration and exploitation for effective learning.
Optimized Q-value updates using the Bellman equation:
Q(s,a)=(1−α)Q(s,a)+α(R+γV(s′))
2. Approximate Q-Learning for Generalization
Extended the Q-learning agent to an Approximate Q-learning agent using feature-based function approximation instead of a discrete Q-table.
Implemented feature extraction to represent the game state, improving learning efficiency for larger environments.
Used a weighted sum of features approach to compute Q-values: Q(s,a)=∑f(s,a)wi
where each weight w_i is updated using: wi←wi+α⋅(R+γV(s′)−Q(s,a))⋅fi(s,a)
3. Training and Performance Optimization
Trained Pacman for 2000 episodes in a noiseless environment to refine its policy.
Implemented a state-value function to prioritize high-reward paths and avoid negative states (ghosts).
Improved policy stability by introducing a learning rate decay mechanism.
The Q-learning agent initially struggled with random exploration but gradually converged to an optimal policy after hundreds of episodes.
The Approximate Q-learning agent, leveraging feature extraction, generalized better and was able to win consistently with fewer training episodes.
Final performance: Pacman achieved an 80-90% win rate on small grids and successfully adapted to larger environments using feature-based learning.