0% found this document useful (0 votes)

24 views30 pages

Moon Lander Algorithm

The Moon Lander Algorithm is a computational method for controlling the descent and landing of spacecraft on the lunar surface, focusing on safe and fuel-efficient landings through dynamic thrust and orientation adjustments. It integrates physics-based simulations with control systems like PID controllers and reinforcement learning, utilizing real-time data to adapt to environmental conditions. The algorithm is essential for autonomous lunar missions, ensuring precision and minimizing collision risks.

Uploaded by

kdj002519

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views30 pages

Moon Lander Algorithm

Uploaded by

kdj002519

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Moonlander Lander Algorithm

Utpal Sharma [2361559]

Sanchita Bargali[2361479]
Yachna Pathak[2361597]
Sachin Chaubey[2361463]

Graphic Era Hill University, Bhimtal

[email protected]
[email protected]
[email protected]
[email protected]

ABSTRACT
The Moon Lander Algorithm is a computational approach designed to
control the descent and landing of a spacecraft on the lunar surface. The
algorithm's primary goal is to ensure a safe, fuel-efficient landing by
dynamically adjusting the spacecraft's thrust and orientation based on real-
time parameters such as altitude, velocity, and fuel levels. It typically
involves physics-based simulations that integrate Newtonian mechanics
with control systems like PID (Proportional-Integral-Derivative) controllers
or reinforcement learning models. Modern implementations may also
utilize sensor data fusion and adaptive learning techniques to respond to
uncertain terrain and environmental conditions. This algorithm plays a
critical role in autonomous space exploration missions, ensuring precision
landing and minimizing the risk of collision or crash.

Keywords:
1.INTRODUCTION

The exploration of the Moon and other celestial bodies has driven the
development of sophisticated autonomous systems capable of performing
complex tasks without human intervention. One of the most critical challenges
in lunar missions is ensuring a safe and precise landing on the Moon’s surface.
The Moon Lander Algorithm plays a pivotal role in achieving this objective by
guiding the spacecraft during its descent phase, managing velocity, altitude,
orientation, and fuel consumption.

This algorithm simulates the physical dynamics of lunar descent and integrates
control mechanisms—such as Proportional-Integral-Derivative (PID) controllers
or machine learning models—to determine the optimal thrust required at each
moment. It must account for variable gravitational forces, limited fuel reserves,
and potential surface irregularities. As lunar missions increasingly rely on
autonomous systems, the development and refinement of Moon Lander
Algorithms remain a key area of research in aerospace engineering and
robotics.

With renewed interest in lunar exploration—evident in recent and upcoming

missions by NASA, ISRO, and private space companies—landing accuracy and
safety have become even more crucial. Algorithms must be capable of real-
time decision-making in unpredictable conditions, including dust clouds,
terrain slopes, and communication delays with ground control. Simulation-
based testing of these algorithms is commonly used before actual deployment,
allowing engineers to refine logic, error handling, and performance efficiency.

Advanced models may also incorporate artificial intelligence, enabling adaptive

behavior when encountering unknown terrain features. Thus, the Moon
Lander Algorithm is not only a technical control system but also a foundation
for the future of autonomous planetary exploration.
2.SYSTEM ARCHITECTURE
1. Input Layer

State Representation: The agent receives an environment state (vector of

features) as input. The input dimension is determined by the environment’s
observation space.

4. Action Selection Module

Epsilon-Greedy Policy:
With probability epsilon, selects a random action (exploration).
Otherwise, selects the action with the highest predicted Q-value
(exploitation).

5. Experience Replay Buffer

Type: Prioritized Experience Replay (using TD-error as priority)

Structure: A deque of PriorityItem instances storing:
State, action, reward, next state, done flag, and TD-error.
Sampling: Experiences are sampled using probability proportional to their
priority (TD-error).
Importance Sampling Weights: Used to adjust learning updates and reduce
bias introduced by non-uniform sampling.

6. Training Engine
Batch Sampling: Draws a batch of transitions from the replay buffer.
Action Selection: Using the policy network.
Target Value Computation: Using the target network to evaluate the selected
action.
Weighted mean-squared error (MSE) using importance sampling weights.
Stores TD-errors for updating priorities.

Optimization:
Gradient clipping is applied to stabilize learning.
The policy network is updated using backpropagation and Adam optimizer
Technology Stack

•Language:Python 3
•Frameworks: Deep learning,tensor flow •Libraries:
Numpy, pygame,pytorch,matplotlib.

IMPLEMENTATION & RESULTS

Here's a concise explanation of the implementation and results of a
MoonLander algorithm (usually solved using reinforcement learning):

1. Implementation of MoonLander Algorithm

a. Environment Setup

gym.make("LunarLander-v2") for the classic environment, or

b. Agent Selection

Algorithms

DQN (Deep Q-Network) – good for discrete actions.

c. Training the Agent:

. User Interface Design
•Splash screen animation for app launch
•Live video stream from Flask backend
•Gesture label and confidence shown dynamically
•Action preview buttons (e.g., Left Click, Scroll Down)
•Responsive layout using CSS and media queries

Flask Backend Integration

•Flask routes:
o"/": Loads main UI
o"/video_feed": Streams MJPEG video
o"/gesture": Returns JSON of latest gesture
•Video frames read via OpenCV and processed by MediaPipe
•CNN prediction performed per frame ROI
•PyAutoGUI simulates mouse input based on prediction

.
Program Code :
import numpy as np
import random
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from collections import deque

class DeepQNetwork(nn.Module):
def __init__(self, input_dim, output_dim):
super().__init__()

# Enhanced neural network architecture with larger

hidden layers
self.network = nn.Sequential(
nn.Linear(input_dim, 128), # Increased from 64 to
128
nn.ReLU(),
nn.Linear(128, 128), # Increased from 64 to 128
nn.ReLU(),
nn.Linear(128, output_dim)
)

def forward(self, x):

return self.network(x)

# Priority item for experience replay

class PriorityItem:
def __init__(self, state, action, reward, next_state, done,
error=None):
self.state = state
self.action = action
self.reward = reward
self.next_state = next_state
self.done = done
self.error = error if error is not None else 0.01 #
Default small error

class DQNAgent:
def __init__(self, state_dim, action_dim,
learning_rate=0.001):
self.device = torch.device("cuda" if
torch.cuda.is_available() else "cpu")

# Neural networks
self.policy_network = DeepQNetwork(state_dim,
action_dim).to(self.device)
self.target_network = DeepQNetwork(state_dim,
action_dim).to(self.device)

self.target_network.load_state_dict(self.policy_network.stat
e_dict())

# Improved hyperparameters
self.learning_rate = learning_rate
self.gamma = 0.99 # Discount factor
self.epsilon = 1.0 # Exploration rate
self.epsilon_min = 0.01
self.epsilon_decay = 0.99 # Faster decay (was 0.995)
self.tau = 0.01 # Soft update parameter

# Double DQN flag

self.use_double_dqn = True

# Optimizer and loss function with gradient clipping

self.optimizer =
optim.Adam(self.policy_network.parameters(),
lr=learning_rate)
self.loss_fn = nn.MSELoss()
# Prioritized experience replay
self.replay_memory = deque(maxlen=10000)
self.batch_size = 64
self.prioritized_replay = True
self.alpha = 0.6 # Priority exponent
self.beta = 0.4 # Importance sampling weight
self.beta_increment = 0.001 # Increment beta each
training step

# Training metrics
self.loss_history = []
self.reward_history = []

def select_action(self, state):

# Epsilon-greedy action selection with decaying
epsilon
if random.random() < self.epsilon:
return random.randint(0, 3) # Random action

with torch.no_grad():
state_tensor =
torch.FloatTensor(state).unsqueeze(0).to(self.device)
q_values = self.policy_network(state_tensor)
return q_values.argmax().item()

def store_transition(self, state, action, reward, next_state,

done):
# Calculate priority based on TD error
if self.prioritized_replay:
with torch.no_grad():
state_tensor =
torch.FloatTensor(state).unsqueeze(0).to(self.device)
next_state_tensor =
torch.FloatTensor(next_state).unsqueeze(0).to(self.device)

current_q = self.policy_network(state_tensor)[0,
action].item()
next_q =
self.target_network(next_state_tensor).max(1)[0].item()

# Calculate TD error
target = reward + (1 - done) * self.gamma * next_q
error = abs(current_q - target) + 0.01 # Add small
constant to avoid zero priority

# Store with priority

self.replay_memory.append(PriorityItem(state,
action, reward, next_state, done, error))
else:
# Standard replay memory
self.replay_memory.append(PriorityItem(state,
action, reward, next_state, done))

def train(self):
# Check if enough samples are available
if len(self.replay_memory) < self.batch_size:
return

# Sample batch with prioritization if enabled

if self.prioritized_replay:
# Get priorities
priorities = np.array([item.error for item in
self.replay_memory])
priorities = priorities ** self.alpha

# Calculate sampling probabilities

probs = priorities / priorities.sum()

# Sample based on priorities

indices = np.random.choice(len(self.replay_memory),
self.batch_size, p=probs)
batch = [self.replay_memory[idx] for idx in indices]

# Calculate importance sampling weights

weights = (len(self.replay_memory) * probs[indices])
** (-self.beta)
weights /= weights.max() # Normalize weights
weights = torch.FloatTensor(weights).to(self.device)

# Increase beta
self.beta = min(1.0, self.beta + self.beta_increment)
else:
# Random sampling
batch = random.sample(self.replay_memory,
self.batch_size)
weights = torch.ones(self.batch_size).to(self.device)

# Extract batch data

states = torch.FloatTensor([item.state for item in
batch]).to(self.device)
actions = torch.LongTensor([item.action for item in
batch]).to(self.device)
rewards = torch.FloatTensor([item.reward for item in
batch]).to(self.device)
next_states = torch.FloatTensor([item.next_state for
item in batch]).to(self.device)
dones = torch.FloatTensor([item.done for item in
batch]).to(self.device)

# Compute current Q values

current_q_values =
self.policy_network(states).gather(1,
actions.unsqueeze(1)).squeeze(1)

# Compute target Q values with Double DQN

if self.use_double_dqn:
# Select actions using policy network
next_actions =
self.policy_network(next_states).max(1)[1].unsqueeze(1)
# Evaluate actions using target network
next_q_values =
self.target_network(next_states).gather(1,
next_actions).squeeze(1)
else:
# Standard DQN
next_q_values =
self.target_network(next_states).max(1)[0]

target_q_values = rewards + (1 - dones) * self.gamma *

next_q_values

# Compute weighted loss for prioritized replay

td_errors = torch.abs(current_q_values -
target_q_values.detach())
loss = (weights * td_errors ** 2).mean()

# Store loss for monitoring

self.loss_history.append(loss.item())

# Optimize the model

self.optimizer.zero_grad()
loss.backward()
# Clip gradients to prevent exploding gradients

torch.nn.utils.clip_grad_norm_(self.policy_network.paramet
ers(), 1.0)
self.optimizer.step()

# Update priorities in replay memory for sampled

experiences
if self.prioritized_replay:
for idx, error in zip(indices,
td_errors.detach().cpu().numpy()):
self.replay_memory[idx].error = error + 0.01 #
Add small constant

# Decay exploration rate

self.epsilon = max(self.epsilon_min, self.epsilon *
self.epsilon_decay)

def update_target_network(self):
# Soft update of target network
for target_param, policy_param in
zip(self.target_network.parameters(),
self.policy_network.parameters()):
target_param.data.copy_(self.tau * policy_param.data
+ (1 - self.tau) * target_param.data)

def save(self, filename):

"""Save the trained model"""
torch.save({
'policy_network': self.policy_network.state_dict(),
'target_network': self.target_network.state_dict(),
'optimizer': self.optimizer.state_dict(),
'epsilon': self.epsilon
}, filename)

def load(self, filename):

"""Load a trained model"""
checkpoint = torch.load(filename)

self.policy_network.load_state_dict(checkpoint['policy_net
work'])

self.target_network.load_state_dict(checkpoint['target_net
work'])
self.optimizer.load_state_dict(checkpoint['optimizer'])
self.epsilon = checkpoint['epsilon']

import numpy as np
import random

import torch

import torch.nn as nn

import torch.optim as optim

import torch.nn.functional as F

from collections import deque

class DeepQNetwork(nn.Module):

def init(self, input_dim, output_dim):

super().__init__()

# Enhanced neural network architecture with larger hidden layers

self.network = nn.Sequential(

nn.Linear(input_dim, 128), # Increased from 64 to 128

nn.ReLU(),

nn.Linear(128, 128), # Increased from 64 to 128

nn.ReLU(),

nn.Linear(128, output_dim)

def forward(self, x):

return self.network(x)

# Priority item for experience replay

class PriorityItem:

def init(self, state, action, reward, next_state, done, error=None):

self.state = state

self.action = action

self.reward = reward
self.next_state = next_state

self.done = done

self.error = error if error is not None else 0.01 # Default small error

class DQNAgent:

def init(self, state_dim, action_dim, learning_rate=0.001):

self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Neural networks

self.policy_network = DeepQNetwork(state_dim, action_dim).to(self.device)

self.target_network = DeepQNetwork(state_dim, action_dim).to(self.device)

self.target_network.load_state_dict(self.policy_network.state_dict())

# Improved hyperparameters

self.learning_rate = learning_rate

self.gamma = 0.99 # Discount factor

self.epsilon = 1.0 # Exploration rate

self.epsilon_min = 0.01

self.epsilon_decay = 0.99 # Faster decay (was 0.995)

self.tau = 0.01 # Soft update parameter

# Double DQN flag

self.use_double_dqn = True

# Optimizer and loss function with gradient clipping

self.optimizer = optim.Adam(self.policy_network.parameters(), lr=learning_rate)

self.loss_fn = nn.MSELoss()

# Prioritized experience replay

self.replay_memory = deque(maxlen=10000)
self.batch_size = 64

self.prioritized_replay = True

self.alpha = 0.6 # Priority exponent

self.beta = 0.4 # Importance sampling weight

self.beta_increment = 0.001 # Increment beta each training step

# Training metrics

self.loss_history = []

self.reward_history = []

def select_action(self, state):

# Epsilon-greedy action selection with decaying epsilon

if random.random() < self.epsilon:

return random.randint(0, 3) # Random action

with torch.no_grad():

state_tensor = torch.FloatTensor(state).unsqueeze(0).to(self.device)

q_values = self.policy_network(state_tensor)

return q_values.argmax().item()

def store_transition(self, state, action, reward, next_state, done):

# Calculate priority based on TD error

if self.prioritized_replay:

with torch.no_grad():

state_tensor = torch.FloatTensor(state).unsqueeze(0).to(self.device)

next_state_tensor = torch.FloatTensor(next_state).unsqueeze(0).to(self.device)

current_q = self.policy_network(state_tensor)[0, action].item()

next_q = self.target_network(next_state_tensor).max(1)[0].item()

# Calculate TD error
target = reward + (1 - done) * self.gamma * next_q

error = abs(current_q - target) + 0.01 # Add small constant to avoid zero priority

# Store with priority

self.replay_memory.append(PriorityItem(state, action, reward, next_state, done, error))

else:

# Standard replay memory

self.replay_memory.append(PriorityItem(state, action, reward, next_state, done))

def train(self):

# Check if enough samples are available

if len(self.replay_memory) < self.batch_size:

return

# Sample batch with prioritization if enabled

if self.prioritized_replay:

# Get priorities

priorities = np.array([item.error for item in self.replay_memory])

priorities = priorities ** self.alpha

# Calculate sampling probabilities

probs = priorities / priorities.sum()

# Sample based on priorities

indices = np.random.choice(len(self.replay_memory), self.batch_size, p=probs)

batch = [self.replay_memory[idx] for idx in indices]

# Calculate importance sampling weights

weights = (len(self.replay_memory) * probs[indices]) ** (-self.beta)

weights /= weights.max() # Normalize weights

weights = torch.FloatTensor(weights).to(self.device)
# Increase beta

self.beta = min(1.0, self.beta + self.beta_increment)

else:

# Random sampling

batch = random.sample(self.replay_memory, self.batch_size)

weights = torch.ones(self.batch_size).to(self.device)

# Extract batch data

states = torch.FloatTensor([item.state for item in batch]).to(self.device)

actions = torch.LongTensor([item.action for item in batch]).to(self.device)

rewards = torch.FloatTensor([item.reward for item in batch]).to(self.device)

next_states = torch.FloatTensor([item.next_state for item in batch]).to(self.device)

dones = torch.FloatTensor([item.done for item in batch]).to(self.device)

# Compute current Q values

current_q_values = self.policy_network(states).gather(1, actions.unsqueeze(1)).squeeze(1)

# Compute target Q values with Double DQN

if self.use_double_dqn:

# Select actions using policy network

next_actions = self.policy_network(next_states).max(1)[1].unsqueeze(1)

# Evaluate actions using target network

next_q_values = self.target_network(next_states).gather(1, next_actions).squeeze(1)

else:

# Standard DQN

next_q_values = self.target_network(next_states).max(1)[0]

target_q_values = rewards + (1 - dones) * self.gamma * next_q_values

# Compute weighted loss for prioritized replay

td_errors = torch.abs(current_q_values - target_q_values.detach())

loss = (weights * td_errors ** 2).mean()

# Store loss for monitoring

self.loss_history.append(loss.item())

# Optimize the model

self.optimizer.zero_grad()

loss.backward()

# Clip gradients to prevent exploding gradients

torch.nn.utils.clip_grad_norm_(self.policy_network.parameters(), 1.0)

self.optimizer.step()

# Update priorities in replay memory for sampled experiences

if self.prioritized_replay:

for idx, error in zip(indices, td_errors.detach().cpu().numpy()):

self.replay_memory[idx].error = error + 0.01 # Add small constant

# Decay exploration rate

self.epsilon = max(self.epsilon_min, self.epsilon * self.epsilon_decay)

def update_target_network(self):

# Soft update of target network

for target_param, policy_param in zip(self.target_network.parameters(),

self.policy_network.parameters()):

target_param.data.copy_(self.tau * policy_param.data + (1 - self.tau) * target_param.data)

def save(self, filename):

"""Save the trained model"""

torch.save({

'policy_network': self.policy_network.state_dict(),

'target_network': self.target_network.state_dict(),
'optimizer': self.optimizer.state_dict(),

'epsilon': self.epsilon

}, filename)

def load(self, filename):

"""Load a trained model"""

checkpoint = torch.load(filename)

self.policy_network.load_state_dict(checkpoint['policy_network'])

self.target_network.load_state_dict(checkpoint['target_network'])

self.optimizer.load_state_dict(checkpoint['optimizer'])

self.epsilon = checkpoint['epsilon']

import numpy as np

import random

import torch

import torch.nn as nn

import torch.optim as optim

import torch.nn.functional as F

from collections import deque

class DeepQNetwork(nn.Module):

def init(self, input_dim, output_dim):

super().__init__()

# Enhanced neural network architecture with larger hidden layers

self.network = nn.Sequential(
nn.Linear(input_dim, 128), # Increased from 64 to 128

nn.ReLU(),

nn.Linear(128, 128), # Increased from 64 to 128

nn.ReLU(),

nn.Linear(128, output_dim)

def forward(self, x):

return self.network(x)

# Priority item for experience replay

class PriorityItem:

def init(self, state, action, reward, next_state, done, error=None):

self.state = state

self.action = action

self.reward = reward

self.next_state = next_state

self.done = done

self.error = error if error is not None else 0.01 # Default small error

class DQNAgent:

def init(self, state_dim, action_dim, learning_rate=0.001):

self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Neural networks

self.policy_network = DeepQNetwork(state_dim, action_dim).to(self.device)

self.target_network = DeepQNetwork(state_dim, action_dim).to(self.device)

self.target_network.load_state_dict(self.policy_network.state_dict())
# Improved hyperparameters

self.learning_rate = learning_rate

self.gamma = 0.99 # Discount factor

self.epsilon = 1.0 # Exploration rate

self.epsilon_min = 0.01

self.epsilon_decay = 0.99 # Faster decay (was 0.995)

self.tau = 0.01 # Soft update parameter

# Double DQN flag

self.use_double_dqn = True

# Optimizer and loss function with gradient clipping

self.optimizer = optim.Adam(self.policy_network.parameters(), lr=learning_rate)

self.loss_fn = nn.MSELoss()

# Prioritized experience replay

self.replay_memory = deque(maxlen=10000)

self.batch_size = 64

self.prioritized_replay = True

self.alpha = 0.6 # Priority exponent

self.beta = 0.4 # Importance sampling weight

self.beta_increment = 0.001 # Increment beta each training step

# Training metrics

self.loss_history = []

self.reward_history = []

def select_action(self, state):

# Epsilon-greedy action selection with decaying epsilon

if random.random() < self.epsilon:

return random.randint(0, 3) # Random action

with torch.no_grad():

state_tensor = torch.FloatTensor(state).unsqueeze(0).to(self.device)

q_values = self.policy_network(state_tensor)

return q_values.argmax().item()

def store_transition(self, state, action, reward, next_state, done):

# Calculate priority based on TD error

if self.prioritized_replay:

with torch.no_grad():

state_tensor = torch.FloatTensor(state).unsqueeze(0).to(self.device)

next_state_tensor = torch.FloatTensor(next_state).unsqueeze(0).to(self.device)

current_q = self.policy_network(state_tensor)[0, action].item()

next_q = self.target_network(next_state_tensor).max(1)[0].item()

# Calculate TD error

target = reward + (1 - done) * self.gamma * next_q

error = abs(current_q - target) + 0.01 # Add small constant to avoid zero priority

# Store with priority

self.replay_memory.append(PriorityItem(state, action, reward, next_state, done, error))

else:

# Standard replay memory

self.replay_memory.append(PriorityItem(state, action, reward, next_state, done))

def train(self):

# Check if enough samples are available

if len(self.replay_memory) < self.batch_size:

return
# Sample batch with prioritization if enabled

if self.prioritized_replay:

# Get priorities

priorities = np.array([item.error for item in self.replay_memory])

priorities = priorities ** self.alpha

# Calculate sampling probabilities

probs = priorities / priorities.sum()

# Sample based on priorities

indices = np.random.choice(len(self.replay_memory), self.batch_size, p=probs)

batch = [self.replay_memory[idx] for idx in indices]

# Calculate importance sampling weights

weights = (len(self.replay_memory) * probs[indices]) ** (-self.beta)

weights /= weights.max() # Normalize weights

weights = torch.FloatTensor(weights).to(self.device)

# Increase beta

self.beta = min(1.0, self.beta + self.beta_increment)

else:

# Random sampling

batch = random.sample(self.replay_memory, self.batch_size)

weights = torch.ones(self.batch_size).to(self.device)

# Extract batch data

states = torch.FloatTensor([item.state for item in batch]).to(self.device)

actions = torch.LongTensor([item.action for item in batch]).to(self.device)

rewards = torch.FloatTensor([item.reward for item in batch]).to(self.device)

next_states = torch.FloatTensor([item.next_state for item in batch]).to(self.device)

dones = torch.FloatTensor([item.done for item in batch]).to(self.device)

# Compute current Q values

current_q_values = self.policy_network(states).gather(1, actions.unsqueeze(1)).squeeze(1)

# Compute target Q values with Double DQN

if self.use_double_dqn:

# Select actions using policy network

next_actions = self.policy_network(next_states).max(1)[1].unsqueeze(1)

# Evaluate actions using target network

next_q_values = self.target_network(next_states).gather(1, next_actions).squeeze(1)

else:

# Standard DQN

next_q_values = self.target_network(next_states).max(1)[0]

target_q_values = rewards + (1 - dones) * self.gamma * next_q_values

# Compute weighted loss for prioritized replay

td_errors = torch.abs(current_q_values - target_q_values.detach())

loss = (weights * td_errors ** 2).mean()

# Store loss for monitoring

self.loss_history.append(loss.item())

# Optimize the model

self.optimizer.zero_grad()

loss.backward()

# Clip gradients to prevent exploding gradients

torch.nn.utils.clip_grad_norm_(self.policy_network.parameters(), 1.0)

self.optimizer.step()

# Update priorities in replay memory for sampled experiences

if self.prioritized_replay:

for idx, error in zip(indices, td_errors.detach().cpu().numpy()):

self.replay_memory[idx].error = error + 0.01 # Add small constant

# Decay exploration rate

self.epsilon = max(self.epsilon_min, self.epsilon * self.epsilon_decay)

def update_target_network(self):

# Soft update of target network

for target_param, policy_param in zip(self.target_network.parameters(),

self.policy_network.parameters()):

target_param.data.copy_(self.tau * policy_param.data + (1 - self.tau) * target_param.data)

def save(self, filename):

"""Save the trained model"""

torch.save({

'policy_network': self.policy_network.state_dict(),

'target_network': self.target_network.state_dict(),

'optimizer': self.optimizer.state_dict(),

'epsilon': self.epsilon

}, filename)

def load(self, filename):

"""Load a trained model"""

checkpoint = torch.load(filename)

self.policy_network.load_state_dict(checkpoint['policy_network'])

self.target_network.load_state_dict(checkpoint['target_network'])

self.optimizer.load_state_dict(checkpoint['optimizer'])

self.epsilon = checkpoint['epsilon']
Output
Results and Performance
1. High Precision Landing

Modern algorithms enable high-precision landings within a few meters of target coordinates, even
on challenging lunar terrain.

2. Autonomy and Safety

Autonomous decision-making during descent reduces human intervention and mitigates risks,
ensuring safe landings even in dynamic conditions (e.g., shifting dust, unknown terrain).

3. Real-Time Hazard Avoidance

Algorithms can detect and avoid hazards (rocks, craters, steep slopes) in real-time using sensors
like LIDAR and cameras.

4. Low Energy Consumption

Optimized fuel usage and efficient trajectory planning help reduce energy consumption, enabling
more sustainable lunar missions.

5. Speed and Efficiency

Fast processing of environmental data allows for timely corrective actions during descent,
improving mission success rates.

6. Versatility Across Terrain

Performance is consistent across different lunar surface types, from smooth plains to rugged
highlands.

7. Robustness in Adverse Conditions

Algorithms perform well even under challenging conditions like lunar dust storms or low-visibility
environments.

8. Simulation Accuracy

Pre-mission simulations provide accurate predictions, with actual landings showing minimal
discrepancies from planned trajectories.

Challenges and Limitations

Here are the challenges and limitations of moon landing algorithms, presented briefly in points:
Challenges:
1. Unpredictable Terrain

The lunar surface has uneven topography, dust, rocks, and craters, making safe landing difficult.

2. Real-Time Decision Making

Algorithms must process sensor data and make navigation decisions in milliseconds under tight
time constraints.c

3. Communication Delay

Time lag between Earth and Moon (~1.3 seconds) limits real-time human intervention during
descent.

4. Dust and Visibility Issues

Lunar dust (regolith) kicked up during landing can obscure sensors and damage lander
components.

5. Limited Onboard Computing Power

Spacecraft have limited hardware for running complex AI/ML algorithms due to size, weight, and
power constraints.

6. Sensor Reliability

Sensors must perform flawlessly in extreme conditions; failure can lead to mission loss.

7. Unknown Surface Features

Incomplete or outdated lunar maps can cause navigation errors or misidentification of landing

Limitations:

1. High Development Cost

Designing and testing robust algorithms require significant financial and technical investment

2. Dependence on Accurate Initial Data

Performance depends on precise data like initial velocity, altitude, and lunar topography.

3. Simulation Gaps

Simulated environments may not fully replicate real lunar conditions, leading to unexpected in-
flight issues.

4. Limited Reusability

Algorithms often need redesign or retuning for different missions or landing zones.
5. Battery and Power Constraints

Algorithms must balance computational load with limited energy resources during descent.

FUTURE SCOPE
The future scope for moon landing algorithms is vast and evolving rapidly, especially with
increasing interest from government space agencies (like NASA, ESA, ISRO, CNSA) and private
companies (like SpaceX, Blue Origin, Astrobotic). Here are several key areas of future development:

1. Autonomous Landing Capabilities

Improved autonomy: Future algorithms will focus on fully autonomous landings, requiring minimal
human input.

Real-time decision-making: Advanced AI and machine learning models will help dynamically adjust
trajectories during descent based on terrain and unexpected obstacles.

2. Precision and Hazard Avoidance

High-accuracy landing: Algorithms will be refined for pinpoint landing near specific lunar sites
(e.g., water-ice deposits at the poles).

Hazard detection and avoidance: Vision-based and LIDAR-based algorithms will improve the ability
to avoid rocks, craters, and slopes during descent.

3. Reusable Lander Support

With more focus on reusable lunar landers, algorithms will need to support both landing and take-
off maneuvers with minimal energy consumption and high reliability.

4. Adaptability to Lunar Environments

Support for different terrains: Algorithms will be tested for various types of lunar surfaces (smooth
plains, rugged highlands, polar regions).

Dust mitigation strategies: Managing lunar regolith disturbances during descent is a

REFERENCES
1.Mittal, P., & Agrawal, S. (2021). "Hand Gesture Recognition-Based
Virtual Mouse Using CNNs." International Journal of Computer
Applications, 183(27), 25-30.

2.Gupta, R., & Sharma, A. (2020). "Deep Learning for Hand Gesture
Recognition in Virtual Mouse Applications." Neural Computing Journal,
28(3), 112-126.

3.OpenCV Documentation – https://docs.opencv.org

4.PyAutoGUI Documentation –
https://pyautogui.readthedocs.io/en/latest/
5.TensorFlow Documentation – https://www.tensorflow.org
6.Flask Documentation – https://flask.palletsprojects.com
7.MediaPipe Documentation – https://developers.google.com/mediapipe
8.Keras Documentation – https://keras.io
9.NumPy Documentation – https://numpy.or

LunarLander-v2 Reinforcement Learning
No ratings yet
LunarLander-v2 Reinforcement Learning
4 pages
Project 3 2025 Opt Undergrad - 987675519 250520 133740
No ratings yet
Project 3 2025 Opt Undergrad - 987675519 250520 133740
52 pages
Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF Download
100% (1)
Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF Download
141 pages
AI for Telescope Scheduling
No ratings yet
AI for Telescope Scheduling
38 pages
Unit - 1
No ratings yet
Unit - 1
14 pages
RLDL PBL AmriteshChandra 09411503121
No ratings yet
RLDL PBL AmriteshChandra 09411503121
15 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
7 pages
RL Unit V Qa
No ratings yet
RL Unit V Qa
13 pages
Deep Q-Learning
No ratings yet
Deep Q-Learning
14 pages
Lunar Lander DQN Solution Analysis
No ratings yet
Lunar Lander DQN Solution Analysis
5 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
Title of Your RL Project
No ratings yet
Title of Your RL Project
1 page
CS461 Intermediate Report Team7
No ratings yet
CS461 Intermediate Report Team7
5 pages
Reinforcement Learning for Gamers
No ratings yet
Reinforcement Learning for Gamers
13 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
No ratings yet
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
73 pages
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
No ratings yet
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
6 pages
Assignment 3 - Q-Learning and Actor-Critic Algorithms
No ratings yet
Assignment 3 - Q-Learning and Actor-Critic Algorithms
6 pages
DL Questions
No ratings yet
DL Questions
30 pages
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
8 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
TRPO Training for LunarLander
No ratings yet
TRPO Training for LunarLander
4 pages
Practical
No ratings yet
Practical
6 pages
Taxi
No ratings yet
Taxi
4 pages
Chapter 2 3 Problem and Methodology RL Report Kiran
No ratings yet
Chapter 2 3 Problem and Methodology RL Report Kiran
3 pages
RLDL File
No ratings yet
RLDL File
31 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
CS 234: Assignment #2: 1 Deep - Networks (DQN) (8 Pts Writeup)
No ratings yet
CS 234: Assignment #2: 1 Deep - Networks (DQN) (8 Pts Writeup)
9 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
HW4 Spec
No ratings yet
HW4 Spec
5 pages
DRL hw2 2022 Fin2
No ratings yet
DRL hw2 2022 Fin2
6 pages
REINFORCE Algorithm Python Guide
No ratings yet
REINFORCE Algorithm Python Guide
15 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Mini Project-9menmorries
100% (1)
Mini Project-9menmorries
7 pages
RL Algorithms in Gymnasium
No ratings yet
RL Algorithms in Gymnasium
59 pages
What Is TD Learning
No ratings yet
What Is TD Learning
15 pages
Continuous Control
No ratings yet
Continuous Control
28 pages
Reinforcement Learning Presentation
No ratings yet
Reinforcement Learning Presentation
14 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
RLDL128
No ratings yet
RLDL128
73 pages
Deep Q Learning
No ratings yet
Deep Q Learning
5 pages
CS6700 Reinforcement Learning Assignment
No ratings yet
CS6700 Reinforcement Learning Assignment
17 pages
Report
No ratings yet
Report
11 pages
Chapter 4 5 Formatted RL Report Kiran
No ratings yet
Chapter 4 5 Formatted RL Report Kiran
3 pages
Program Explanation
No ratings yet
Program Explanation
37 pages
15 ML
No ratings yet
15 ML
60 pages
RLDL
No ratings yet
RLDL
23 pages
01 Module 2 Neural Network Based Reinforcement Learning
No ratings yet
01 Module 2 Neural Network Based Reinforcement Learning
133 pages
201CS240-ML Obtr 2
No ratings yet
201CS240-ML Obtr 2
16 pages
Experiment Number 5
No ratings yet
Experiment Number 5
2 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
TD Learning & Deep Q-Networks
No ratings yet
TD Learning & Deep Q-Networks
20 pages
3 Balakrishnan, Kaushik TensorFlow Reinforcement Learning Quick
No ratings yet
3 Balakrishnan, Kaushik TensorFlow Reinforcement Learning Quick
257 pages
Deep QL
No ratings yet
Deep QL
29 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Code Principale de Ait Omar
No ratings yet
Code Principale de Ait Omar
13 pages
Why There Is High Tide During A Full Moon
No ratings yet
Why There Is High Tide During A Full Moon
1 page
Islamic Calendar Dates 2005-2015
No ratings yet
Islamic Calendar Dates 2005-2015
6 pages
4TH Summative Test in Sci7
No ratings yet
4TH Summative Test in Sci7
3 pages
Science Aptitude Guide for Students
No ratings yet
Science Aptitude Guide for Students
28 pages
EVALUATE Inscience
No ratings yet
EVALUATE Inscience
6 pages
INAO 2023 Astronomy Exam
No ratings yet
INAO 2023 Astronomy Exam
14 pages
Lunar Photographs From Apollos 8, 10, and 11
100% (8)
Lunar Photographs From Apollos 8, 10, and 11
131 pages
TOEFL Tasks
No ratings yet
TOEFL Tasks
13 pages
P.Chandra International School (Borsad) : Lesson Plan
No ratings yet
P.Chandra International School (Borsad) : Lesson Plan
22 pages
Grade 4 - Science
No ratings yet
Grade 4 - Science
2 pages
Cpe Gold Reading Use Practice
No ratings yet
Cpe Gold Reading Use Practice
4 pages
Disha 1000 MCQ 3
No ratings yet
Disha 1000 MCQ 3
100 pages
Lưu ý: - Thí sinh làm bài trên tờ giấy thi, ghi theo đúng thứ tự câu từ 1 đến 100
No ratings yet
Lưu ý: - Thí sinh làm bài trên tờ giấy thi, ghi theo đúng thứ tự câu từ 1 đến 100
6 pages
Theory of Reflectance and Emittance Spectroscopy 2nd Edition Bruce Hapke Instant Download
No ratings yet
Theory of Reflectance and Emittance Spectroscopy 2nd Edition Bruce Hapke Instant Download
61 pages
Siva - Luk - 01 - Jan 2025
No ratings yet
Siva - Luk - 01 - Jan 2025
24 pages
Ramayans Dating
No ratings yet
Ramayans Dating
4 pages
Unit 12: Space Conquest C. Listening
No ratings yet
Unit 12: Space Conquest C. Listening
5 pages
Unit 4-6.2 Revision Practice
No ratings yet
Unit 4-6.2 Revision Practice
5 pages
Giordano Bruno
100% (6)
Giordano Bruno
115 pages
Ing Sma B12
No ratings yet
Ing Sma B12
14 pages
Reflector Booklet 2020 PDF
67% (3)
Reflector Booklet 2020 PDF
11 pages
Edgar Allan Poe Complete Tales - Edgar Allan Poe PDF
100% (6)
Edgar Allan Poe Complete Tales - Edgar Allan Poe PDF
921 pages
Moon Phase Flipbook
No ratings yet
Moon Phase Flipbook
4 pages
Physical Science - Q2 - Module09 - Jhunel L. Nevado
No ratings yet
Physical Science - Q2 - Module09 - Jhunel L. Nevado
25 pages
Science Last Reviewer MG
No ratings yet
Science Last Reviewer MG
6 pages
Junior Olympiad Booklet For Class 8 Science
No ratings yet
Junior Olympiad Booklet For Class 8 Science
21 pages
Financial Astrology Insights
100% (1)
Financial Astrology Insights
43 pages
Upsc Geography Notes
No ratings yet
Upsc Geography Notes
3 pages
7 SA 2 Holidays and Travel
No ratings yet
7 SA 2 Holidays and Travel
4 pages
Social Studies Lesson Plan 3 16
No ratings yet
Social Studies Lesson Plan 3 16
2 pages

Moon Lander Algorithm

Uploaded by

Moon Lander Algorithm

Uploaded by

Moonlander Lander Algorithm

Utpal Sharma [2361559]

Graphic Era Hill University, Bhimtal

With renewed interest in lunar exploration—evident in recent and upcoming

Advanced models may also incorporate artificial intelligence, enabling adaptive

State Representation: The agent receives an environment state (vector of

4. Action Selection Module

5. Experience Replay Buffer

Type: Prioritized Experience Replay (using TD-error as priority)

IMPLEMENTATION & RESULTS

1. Implementation of MoonLander Algorithm

gym.make("LunarLander-v2") for the classic environment, or

DQN (Deep Q-Network) – good for discrete actions.

c. Training the Agent:

Flask Backend Integration

# Enhanced neural network architecture with larger

def forward(self, x):

# Priority item for experience replay

# Double DQN flag

# Optimizer and loss function with gradient clipping

def select_action(self, state):

def store_transition(self, state, action, reward, next_state,

# Store with priority

# Sample batch with prioritization if enabled

# Calculate sampling probabilities

# Sample based on priorities

# Calculate importance sampling weights

# Extract batch data

# Compute current Q values

# Compute target Q values with Double DQN

target_q_values = rewards + (1 - dones) * self.gamma *

# Compute weighted loss for prioritized replay

# Store loss for monitoring

# Optimize the model

# Update priorities in replay memory for sampled

# Decay exploration rate

def save(self, filename):

def load(self, filename):

import torch.optim as optim

from collections import deque

def __init__(self, input_dim, output_dim):

# Enhanced neural network architecture with larger hidden layers

nn.Linear(input_dim, 128), # Increased from 64 to 128

nn.Linear(128, 128), # Increased from 64 to 128

def forward(self, x):

# Priority item for experience replay

def __init__(self, state, action, reward, next_state, done, error=None):

def __init__(self, state_dim, action_dim, learning_rate=0.001):

self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

self.policy_network = DeepQNetwork(state_dim, action_dim).to(self.device)

self.target_network = DeepQNetwork(state_dim, action_dim).to(self.device)

self.gamma = 0.99 # Discount factor

self.epsilon = 1.0 # Exploration rate

self.epsilon_decay = 0.99 # Faster decay (was 0.995)

self.tau = 0.01 # Soft update parameter

# Double DQN flag

# Optimizer and loss function with gradient clipping

self.optimizer = optim.Adam(self.policy_network.parameters(), lr=learning_rate)

# Prioritized experience replay

self.alpha = 0.6 # Priority exponent

self.beta = 0.4 # Importance sampling weight

self.beta_increment = 0.001 # Increment beta each training step

def select_action(self, state):

# Epsilon-greedy action selection with decaying epsilon

if random.random() < self.epsilon:

return random.randint(0, 3) # Random action

def store_transition(self, state, action, reward, next_state, done):

# Calculate priority based on TD error

current_q = self.policy_network(state_tensor)[0, action].item()

# Store with priority

self.replay_memory.append(PriorityItem(state, action, reward, next_state, done, error))

# Standard replay memory

self.replay_memory.append(PriorityItem(state, action, reward, next_state, done))

# Check if enough samples are available

if len(self.replay_memory) < self.batch_size:

# Sample batch with prioritization if enabled

def init(self, input_dim, output_dim):

def init(self, state, action, reward, next_state, done, error=None):

def init(self, state_dim, action_dim, learning_rate=0.001):

def init(self, input_dim, output_dim):

def init(self, state, action, reward, next_state, done, error=None):

def init(self, state_dim, action_dim, learning_rate=0.001):