A comprehensive simulation system that combines a React frontend with a Python reinforcement learning backend to solve the drone routing problem with battery constraints and recharging stations.
- Interactive 2D grid visualization
- Click-to-place customer nodes and recharging stations
- Real-time drone position and path tracking
- Live battery level indicator
- Step-by-step decision-making logs
- Simulation statistics dashboard
- 2D grid graph environment representation
- Battery consumption (1 unit per second of travel)
- Constraint handling:
- Customers must be visited exactly once
- No direct travel between recharge stations
- Episode fails if battery depletes before reaching recharge station
- Q-learning algorithm implementation
- State space: current position, remaining battery, visited customers
- Action space: move to adjacent valid nodes
- Reward function:
- +100 for visiting all customers
- +50 for visiting new customer
- -100 for battery depletion
- -1 per move (efficiency incentive)
- Distance-based penalties
- Frontend: React.js with HTML5 Canvas for grid visualization
- Backend: Python Flask with CORS support
- RL Algorithm: Q-learning with epsilon-greedy exploration
- Communication: REST API between frontend and backend
- Node.js (v14 or higher)
- Python 3.8 or higher
- npm or yarn
- Install dependencies:
npm install- Start the React development server:
npm startThe frontend will be available at http://localhost:3000
- Navigate to the backend directory:
cd backend- Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install Python dependencies:
pip install -r requirements.txt- Start the Flask backend:
python app.pyThe backend will be available at http://localhost:5000
npm run devThis command will start both the React frontend and Python backend concurrently.
- Configure Parameters: Set grid size, number of customers, recharge stations, and battery capacity
- Place Customer Nodes: Click "Place Customers" and click on grid cells to place customer locations
- Place Recharge Stations: Click "Place Recharge Stations" and click on grid cells to place charging stations
- Set Start Position: Click "Set Start Position" and click on a recharge station to set the drone's starting point
- Click "Start Simulation" to begin the RL-powered routing
- Watch the drone navigate the grid in real-time
- Monitor battery levels, visited customers, and decision logs
- View simulation statistics including steps taken, efficiency, and completion rate
- Red Circles: Unvisited customers (C1, C2, etc.)
- Green Circles: Visited customers
- Yellow Squares: Recharge stations (R1, R2, etc.)
- Green Square Border: Start position
- Blue Circle: Current drone position
- Blue Line: Drone's path history
Start a new simulation with RL agent training and pathfinding.
Request Body:
{
"grid_size": 10,
"customers": [[2, 3], [7, 8], [1, 9]],
"recharge_stations": [[0, 0], [9, 9]],
"start_position": [0, 0],
"battery_capacity": 20
}Response:
{
"success": true,
"path": [[0, 0], [2, 3], [7, 8], [1, 9]],
"total_reward": 245.5,
"steps_taken": 15,
"training_complete": true
}Train the RL agent for additional episodes.
Retrieve the current Q-table for analysis.
Health check endpoint.
- Learning Rate: 0.1 (how much new information overrides old)
- Discount Factor: 0.95 (importance of future rewards)
- Epsilon: 1.0 → 0.01 (exploration vs exploitation balance)
- Epsilon Decay: 0.995 (gradual shift from exploration to exploitation)
States are represented as tuples containing:
- Current X coordinate
- Current Y coordinate
- Remaining battery level
- Visited customers bitmask
Actions represent moving to any valid position (customer or recharge station) that:
- Is reachable with current battery
- Doesn't violate movement constraints
- Follows the "no recharge-to-recharge" rule
The reward function balances multiple objectives:
- Task Completion: Large positive reward for visiting all customers
- Progress: Medium reward for visiting new customers
- Efficiency: Small penalty per move to encourage shorter paths
- Safety: Large penalty for battery depletion
- Distance: Penalty proportional to travel distance
Change the GRID_SIZE constant in src/App.js or use the UI controls.
Modify hyperparameters in backend/drone_rl_agent.py:
learning_rate: How quickly the agent learnsdiscount_factor: How much future rewards matterepsilon_decay: How quickly exploration decreases
Edit the calculate_reward method in backend/environment.py to implement different reward strategies.
- Ensure Python backend is running on port 5000
- Check CORS configuration in
backend/app.py - Verify all Python dependencies are installed
- Clear browser cache and reload
- Check browser console for JavaScript errors
- Ensure all npm dependencies are installed
- Verify start position is set at a recharge station
- Ensure at least one customer is placed
- Check that battery capacity is sufficient for basic moves
- Deep Q-Network (DQN) implementation
- Policy Gradient methods (PPO, A3C)
- Multi-drone coordination
- Dynamic obstacles and weather conditions
- 3D visualization
- Export simulation data and trained models
- Performance benchmarking tools
- Advanced pathfinding algorithms comparison
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is open source and available under the MIT License.