Overview of GPS Clustering Code
1. Loading and Cleaning Data
You first load a CSV file named `Druidtest_sufan.csv` using Pandas and handle cases where the
file might not be found:
data_file = 'Druidtest_sufan.csv' # Path to the data file
try:
data = pd.read_csv(data_file)
print("Data loaded successfully!")
except FileNotFoundError:
print(f"File '{data_file}' not found. Please check the file path.")
exit()
This attempts to load the data and, if the file doesn't exist, it prints an error message and exits the
program.
After the data is loaded, you inspect its structure:
print("Data Info:")
print(data.info()) # Information about the dataset
print("\nFirst few rows:")
print(data.head()) # First few rows for a quick overview
You then clean the data by removing rows with NaN values in the critical columns (`location*lat`
and `location*long`) since these are needed to calculate distances and map locations:
data = data.dropna(subset=['location*lat', 'location*long'])
print("\nCleaned Data Info:")
print(data[['location*lat', 'location*long']].describe()) # Summary stats
2. Calculating Speed and Distance
To calculate speed and distance between consecutive GPS coordinates, you:
* Convert the `timestamp` column to datetime format to ensure proper calculations:
data['timestamp'] = pd.to_datetime(data['timestamp'])
* Sort the data by timestamp to ensure the calculations are in chronological order:
data = data.sort_values(by='timestamp')
You then iterate over each consecutive pair of GPS points to calculate:
* Distance: Using Geopy's `geodesic` function.
* Speed: Using the formula: `speed = distance / time`.
Here's the code:
distances = []
speeds = []
for i in range(len(data) * 1):
loc1 = (data.iloc[i]['location*lat'], data.iloc[i]['location*long'])
loc2 = (data.iloc[i + 1]['location*lat'], data.iloc[i + 1]['location*long'])
distance = geodesic(loc1, loc2).meters
time_diff = (data.iloc[i + 1]['timestamp'] * data.iloc[i]['timestamp']).total_seconds()
distances.append(distance)
speeds.append(distance / time_diff if time_diff > 0 else 0)
# Final row
distances.append(0)
speeds.append(0)
data['distance_m'] = distances
data['speed_m_s'] = speeds
data['cumulative_distance_m'] = data['distance_m'].cumsum()
3. Clustering GPS Points Using KMeans
You applied KMeans clustering to group the GPS data into 5 clusters based on latitude and
longitude:
kmeans = KMeans(n_clusters=5, random_state=42)
data['cluster'] = kmeans.fit_predict(data[['location*lat', 'location*long']])
* `n_clusters=5` specifies you want 5 clusters.
* `fit_predict()` computes the clusters and assigns each data point a cluster label.
4. Creating a Folium Map with Clustered Markers
You created a Folium map to visualize the GPS data points. First, you calculate the map center:
map_center = [data['location*lat'].mean(), data['location*long'].mean()]
Then, you create a Folium map with MarkerCluster:
gps_map = folium.Map(location=map_center, zoom_start=15, tiles='CartoDB positron')
marker_cluster = MarkerCluster().add_to(gps_map)
You loop through each row of data and add a Marker with a popup:
for _, row in data.iterrows():
popup_text = (
f"Event ID: {row['event*id']}<br>"
f"Timestamp: {row['timestamp']}<br>"
f"Speed: {row['speed_m_s']:.2f} m/s<br>"
f"Distance: {row['distance_m']:.2f} m<br>"
f"Cumulative Distance: {row['cumulative_distance_m']:.2f} m<br>"
f"Cluster: {row['cluster']}"
)
folium.Marker(
location=[row['location*lat'], row['location*long']],
popup=popup_text,
).add_to(marker_cluster)
The map is saved as an HTML file:
gps_map.save('optimized_gps_map.html')
print("Optimized map generated successfully!")
Summary of Key Steps
1. Data Cleaning: Removed rows with missing latitude or longitude.
2. Speed and Distance Calculation: Calculated speed and distance between consecutive points.
3. Clustering: Grouped data using KMeans.
4. Map Generation: Created an interactive map with clustered markers.
Open the generated `optimized_gps_map.html` to view the interactive map.
video