CanterburyCommuto

The aim of CanterburyCommuto is to find commuting information including time and distance travelled before, during, and after the overlap, if it exists, between two routes.

It relies on the Google Maps API.

How to use it

Install the package

To use CanterburyCommuto, you need to clone the respository first. You can do this by running the following command in your terminal:

git clone https://github.com/PeirongShi/CanterburyCommuto.git

And then install the requirements

cd CanterburyCommuto
pip install -r requirements.txt

API and Mapping Setup

To configure the required APIs and routing tools for this project:

Refer to the Google API Setup Guide for instructions on obtaining and integrating a Google Maps API key.
Refer to the Graphhopper Setup Guide for instructions on running a local Graphhopper server for distance and routing calculations.

Launch the computation

You can generate a test dataset with the script

python CanterburyCommuto/canterburycommuto/Sample.py

Otherwise, you need to create a csv file with the following columns:

ID - Each observation's ID (optional).
OriginA_latitude – The latitude of the starting location of Route A.
OriginA_longitude – The longitude of the starting location of Route A.
DestinationA_latitude – The latitude of the ending location of Route A.
DestinationA_longitude – The longitude of the ending location of Route A.
OriginB_latitude – The latitude of the starting location of Route B.
OriginB_longitude – The longitude of the starting location of Route B.
DestinationB_latitude – The latitude of the ending location of Route B.
DestinationB_longitude – The longitude of the ending location of Route B.

Next, import the main function.

from canterburycommuto.CanterburyCommuto import Overlap_Function

Before running the main function to retrieve commuting data, it's recommended to first run the estimation command. This provides an estimate of the number of Google API requests and the potential cost, assuming the free tier is exceeded. This helps users make informed decisions, as extensive API use can become costly depending on route complexity and Google's pricing.

python -m canterburycommuto estimate \
    --csv_file origin_destination_coordinates.csv \
    --input_dir "C:\Users\HUAWEI\CanterburyCommuto\CanterburyCommuto" \
    --approximation "exact" \
    --commuting_info "no" \
    --home_a_lat "home_A" \
    --home_a_lon "home_A_lon" \
    --work_a_lat "work_A" \
    --work_a_lon "work_A_lon" \
    --home_b_lat "home_B" \
    --home_b_lon "home_B_lon" \
    --work_b_lat "work_B" \
    --work_b_lon "work_B_lon" \
    --id_column "ID" \
    --skip_invalid True

Then, to use CanterburyCommuto, you can run the command in a way like the example illustrated below. This example chooses to create 150-meter buffers along the two routes to find the buffers' intersection ratios for each route. The output is "buffer_output.csv". The --skip_invalid True option tells the program to skip over rows with missing or invalid data, allowing the analysis to continue uninterrupted. The --save_api_info True option enables saving API responses to a file for future reference or debugging purposes.

!python -m canterburycommuto overlap \
    --csv_file origin_destination_coordinates.csv \
    --input_dir "C:\Users\HUAWEI\CanterburyCommuto\CanterburyCommuto" \
    --api_key "API_KEY" \
    --method "google" \
    --buffer 150 \
    --approximation "yes with buffer" \
    --home_a_lat "home_A" \
    --home_a_lon "home_A_lon" \
    --work_a_lat "work_A" \
    --work_a_lon "work_A_lon" \
    --home_b_lat "home_B" \
    --home_b_lon "home_B_lon" \
    --work_b_lat "work_B" \
    --work_b_lon "work_B_lon" \
    --id_column "ID" \
    --output_file "buffer_percentage_output.csv" \
    --skip_invalid True \
    --save_api_info True \
    --yes

You can run this package on as many route pairs as you wish, as long as these route pairs are stored in a csv file in a way similar to the output of Sample.py in the repository. Don't worry if the order of the columns in your csv file is different from that of the Sample.py output, as you can manually fill in the column names corresponding to the origins and destinations of the route pairs in CanterburyCommuto.

The parameter input_dir specifies the input directory where the source CSV file is located. The output data generated by the package will be saved in the same directory.

The parameter skip_invalid is a Boolean flag that controls error handling. If set to True, the script will skip over any invalid or malformed rows in the input file and continue processing the remaining data. If set to False, the script will terminate upon encountering the first error.

For simplified execution using a configuration file and additional usage details, please refer to the `example.ipynb` file located in the example folder: Example Jupyter Notebook . As an alternative to the Google Maps API, the free and locally hosted GraphHopper method may be used. Please refer to `example.ipynb` for detailed setup and usage instructions.

Results

The output will be a csv file including the GPS coordinates of the route pairs' origins and destinations and the values describing the overlaps of route pairs. Graphs are also produced to visualize the commuting paths on the OpenStreetMap background. By placing the mouse onto the markers, one is able to see the origins and destinations of route A and B marked as Origin A and Destination A in red and Origin B and Destination B in green. Each generated map file includes the ID of the corresponding observation in its filename. This ID is either taken from the user’s original dataset (if provided) or automatically generated by the package when no explicit ID is present.

Distances are measured in kilometers and the time unit is minute. Users are able to calculate percentages of overlaps, for instance, with the values of the following variables. As shown below, the list explaining the meaning of the possible output variables:

ID - Each observation's ID (optional).
OriginA_latitude – The latitude of the starting location of Route A.
OriginA_longitude – The longitude of the starting location of Route A.
DestinationA_latitude – The latitude of the ending location of Route A.
DestinationA_longitude – The longitude of the ending location of Route A.
OriginB_latitude – The latitude of the starting location of Route B.
OriginB_longitude – The longitude of the starting location of Route B.
DestinationB_latitude – The latitude of the ending location of Route B.
DestinationB_longitude – The longitude of the ending location of Route B.
aDist: Total distance of route A.
aTime: Total time to traverse route A.
bDist: Total distance of route B.
bTime: Total time to traverse route B.
overlapDist: Distance of the overlapping segment between route A and route B.
overlapTime: Time to traverse the overlapping segment between route A and route B.
aBeforeDist: Distance covered on route A before the overlap begins.
aBeforeTime: Time spent on route A before the overlap begins.
bBeforeDist: Distance covered on route B before the overlap begins.
bBeforeTime: Time spent on route B before the overlap begins.
aAfterDist: Distance covered on route A after the overlap ends.
aAfterTime: Time spent on route A after the overlap ends.
bAfterDist: Distance covered on route B after the overlap ends.
bAfterTime: Time spent on route B after the overlap ends.
aIntersecRatio: The proportion of the buffer area of Route A that intersects with the buffer of Route B. It is calculated as:

aIntersecRatio = Intersection Area / Area of A
bIntersecRatio: The proportion of the buffer area of Route B that intersects with the buffer of Route A.
aoverlapDist: Distance of the overlapping segment on route A inside the buffer intersection with route B.
aoverlapTime: Time to traverse the overlapping segment on route A.
boverlapDist: Distance of the overlapping segment on route B inside the buffer intersection with route A.
boverlapTime: Time to traverse the overlapping segment on route B.

Overlap Function Options

This table summarizes the available options for the package's main function, including whether commuting information before and after the overlap can be considered, how realistic the results are, and a brief description. "Commuting Information (Pre/Post Overlap) Available?" refers to whether the system can provide separate commuting data for the parts of the route before and after the overlapping segment of a shared commute.

Option Name	Commuting Information (Pre/Post Overlap) Available?	Closeness to Reality (0 = Not Close, 10 = Very Close)	Description
Common Node	Yes	6	This option finds the first and last common nodes along the two routes' polylines given by Google Maps. The overlapping information is obtained via these nodes.
Rectangle Approximation	Yes	5 to 7	As a modified variant of the Common Node Method, this option draws rectangles along the route segments before and after the first and last common nodes of the two routes. It may extend the overlapping range of the route pair if the overlapping area ratio of these rectangles exceeds certain thresholds, which is set to 50% by default, but adjustable by the users.
Buffer Area Ratio	No	8	This option creates 100-meter (m) buffers along the two routes to find the ratios of the buffers' intersection area for each route separately. The buffer width is 100m by default, but it may be adjusted upon the users' wishes.
Buffer Route Node	Yes	6 to 8	This option considers the routes and buffers as lines and geometric shapes. It finds the closest nodes to the points of intersections among the buffer polygons and route lines. The overlapping information is determined based on these closest nodes.
Buffer Route Intersection	Yes	9	As an improved version of the Buffer Route Node method, this option directly records the GPS coordinates corresponding to the points of intersections among the buffer polygons and the route lines and then proceeds to compute the overlapping distance and time information based on these GPS coordinates.

Additional Notes and Features

Interrupting the Script

You can stop the script during execution by pressing Ctrl + C in the terminal or command prompt.
If interrupted, the script will gracefully exit and save all completed results to the ResultsCommuto/ folder. This ensures partial progress is not lost.

Output Order and Row Identification

To improve performance, the package uses multithreading, which may result in a slight reordering of the output rows—particularly within processing blocks.
To assist with traceability, a row_id field is included in the output. This allows users to easily match results back to the original input or re-sort if needed.

Output Folder Structure

All results, including CSV files, maps, and logs, are now saved in a dedicated folder named ResultsCommuto/, located in the same directory as the input file.
This helps keep input and output files organized and clearly separated.

Acknowledgment

The Python package CanterburyCommuto was developed under the guidance of Professor Florian Grosset-Touba and software engineer Émilien Schultz, with additional support from AI tools such as ChatGPT and GitHub Copilot.

If you have any question, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 497 Commits
CanterburyCommuto		CanterburyCommuto
documentation		documentation
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CanterburyCommuto

How to use it

Install the package

API and Mapping Setup

Launch the computation

Results

Overlap Function Options

Additional Notes and Features

Interrupting the Script

Output Order and Row Identification

Output Folder Structure

Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

PeirongShi/CanterburyCommuto

Folders and files

Latest commit

History

Repository files navigation

CanterburyCommuto

How to use it

Install the package

API and Mapping Setup

Launch the computation

Results

Overlap Function Options

Additional Notes and Features

Interrupting the Script

Output Order and Row Identification

Output Folder Structure

Acknowledgment

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages