Rabit Habbit | Balmukund Sharma

What is it?

Rabbit Habit, is a 2D grid based survival simulation where an AI-controlled rabbit learns optimal survival strategies using Tabular Q-Learning.

The main focus of working on the game was to explore Reinforcement Learning (RL) using Q learning methodology and components of object oriented programming in python.

Level 1

Level 1, is a fresh brain with no traning

Level 2

Level 2 has a Q table trained 150,000 times

Abstract : ‘The Rabbit Habit’

This project explores the integration of Reinforcement Learning (RL) and classical pathfinding algorithms to create an adaptive and challenging environment for game AI research. The core objective was to train a Rabbit agent using Tabular Q-Learning to maximize its survival and resource consumption in a 2D grid-based world, while being pursued by a computationally optimal Hunter agent utilizing the A* search algorithm (with Manhattan distance heuristic). The Rabbit agent's state representation was kept small and manageable (approx. 2 * 10**4 states) by using a relative "Radar" vision system, rather than absolute coordinates. Its behavior was shaped by a reward function that heavily penalized capture (−100), slightly penalized movement (−1), and highly rewarded carrot consumption (+50), which acted as a reset-free goal to prioritize long-term efficiency and survival. Training results, analyzed via custom metrics visualization, demonstrated the successful learning progression of the agent. The Success Rate (survival vs. capture) stabilized at 40% - 55%, while the Efficiency metric (steps taken per carrot) sharply decreased from initial values of 50-70 steps to an average of 15-20 steps. This confirms the Q-Learning agent successfully learned to navigate the maze efficiently and flee from the deterministic, near-optimal A* adversary, proving the viability of RL-based agents as dynamic game characters. The project utilized Python 3 and Tkinter for visualization, with key assistance from Generative AI for logic verification and crucial system optimizations, such as decoupling the simulation logic from the rendering process for high-speed training.

Link to full Document

Would you like to know more?

These projects offer just a glimpse into the creative process behind them, not the case studies. To learn more, feel free to drop me an email. I'm usually quick to respond, typically within a day.

balmukund@utexas.edu