0% found this document useful (0 votes)

24 views20 pages

Project Report Edit

project report

Uploaded by

Niruthiya Narashiman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views20 pages

Project Report Edit

project report

Uploaded by

Niruthiya Narashiman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Driving Insights:

A Comprehensive Analysis of Taxi

Operations

Team Members:
Manoj Velu
Nivethitha Avarampalayam Manoharan
Varshini Vaisnavi Srinivasan
1. Introduction

1.1 Objectives:
• Understand Demand Patterns and Identify Peak Hours.
• Analyze Cumulative Revenue Over Time.
• Explore RatecodeID and Average Payments.
• Understand Customer Payment Preferences.

1.2 Impact:
These objectives drive strategic decisions for taxi services, optimizing
deployment, rate structures, and customer interactions. The analysis is the compass
guiding taxis toward enhanced efficiency and a more customer-centric service.

2. Data Overview
Field Name Description

VendorID A code indicating the TPEP provider that provided the

record.
1= Creative Mobile Technologies, LLC; 2= VeriFone Inc.

tpep_pickup_datetime The date and time when the meter was engaged.

tpep_dropoff_datetime The date and time when the meter was disengaged.

Passenger_count The number of passengers in the vehicle.

This is a driver-entered value

Trip_distance The elapsed trip distance in miles reported by the taximeter.

PULocationID TLC Taxi Zone in which the taximeter was engaged

DOLocationID TLC Taxi Zone in which the taximeter was disengaged

RateCodeID The final rate code in effect at the end of the trip.
1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester
5=Negotiated fare 6=Group ride
Store_and_fwd_flag This flag indicates whether the trip record was held in
vehicle memory before sending to the vendor, aka “store and
forward,” because the vehicle did not have a connection to
the server. Y= store and forward trip N= not a store and
forward trip

Payment_type A numeric code signifying how the passenger paid for the
trip. 1= Credit card 2= Cash 3= No charge 4= Dispute

Fare_amount The time-and-distance fare calculated by the meter

Extra Miscellaneous extras and surcharges. Currently, this only

includes the $0.50 and $1 rush hour and overnight charges.

MTA_tax $0.50 MTA tax that is automatically triggered based on the

metered rate in use

Improvement_surcharge $0.30 improvement surcharge assessed trips at the flag drop.

The improvement surcharge began being levied in 2015.

Tip_amount Tip amount – This field is automatically populated for credit

card tips. Cash tips are not included.

Tolls_amount Total amount of all tolls paid in trip.

Total_amount The total amount charged to passengers. Does not include

cash tips.

3. GCP Setup:
Fig : Created the GCP Account
Fig : Opened BigQuery Console

Fig : Created a Project Name called ‘Analytics’

Fig : Created a dataset called ‘data’

Fig : Created an empty table called ‘table_dbms’

Fig: Created a table called ‘table_dbms_transformed’
4. ETL
Step 1: Authenticate User
- log in securely to GCP using the Google Cloud CLI to ensures proper access to
GCP resources.
Step 2: Download Uber Dataset
- The dataset is fetched from a GitHub repository using 'curl' command to
download and save the dataset.

Fig : Authenticate User and Download Uber Dataset

Step 3: List Files in Google Drive

-used 'ls' command for checking if the downloaded file is present in a
specified Google Drive location.
Step 4: Define Schema for Big Query
- The schema, or structure, for the Big Query table is defined here. It
outlines the types of data each column will contain.
Step 5: Load Data into Big Query
- Loaded the dataset into Big Query and the Auto-detection of the schema is
used, and leading rows are skipped.
Fig : List Files in Google Drive

Step 6: Load Data into Big Query (No Auto-detection)

- Explored alternative method where the schema is not auto-detected, but a
predefined schema is used and loaded the data without auto-detecting the schema.

Step 7: Query Big Query Table

- Querying the Big Query table to display the first 10 rows of data and
provided a glimpse of the data being processed.
Step 8: Authenticate User with Google Colab
- Installed 'google-colab' and authenticating the user.

Step 9: Set GCP Project ID

- The 'gcloud config set project' command is used for setting the GCP project
ID for further operations.
Step 10: Install Dependencies for Cloud SDK
- Installed necessary dependencies for the Google Cloud SDK and Prepared
the environment for GCP operations.
Step 11: Install Cloud SDK
- Updating and installing the Cloud SDK.
Step 12: Install Big Query Library
- Installed Google Cloud Big Query library for Python and Prepared the
environment for running Big Query queries.

Step 13: Create Big Query Client

- Created a Big Query client object for interacting with Big Query services
and the client object is now ready for running queries.
5. Data Transformation
Performed data transformations using SQL on the extracted data. Creation of a new
table in Big Query that stores the transformed data. The transformations were
meaningful and prepared the data for insightful analysis.
Table_dbms_transformed schema

Table_dbms_transformed

6. Data Validation

Null Values:
• Checked for null values in columns.
• No null values found in the dataset.
Data Types:
• Verified data types for columns.
• Ensured that data types align with expectations.
7. Data Analysis and Insights
7.1 Demand Pattern and Peak Hours:

Query Output:
Looker Studio Visualization

7.2 Cumulative Revenue Over Time:

Query:
Query Output:

Looker Studio Visualization:

7.3 Rate code Statistics:

Output of Query
Rate Codes:

Looker Studio Visualization

7.4 Payment Preference:

Output of the Query:

Looker Studio Visualization

8. Recommendations
 Fleet Optimization for Peak Demand based on Visualizations
 Capitalizing on Business Hour Opportunities I.e. Peak Hours
 Strategic Resource Allocation based on Insights
 Targeted Marketing for High-Average Rate Codes to drive the profit margins
upwards
 Optimization of Low-Performing Rate Codes to gain more users
 Enhancing Digital Payment Infrastructure to ensure smooth experience
 Encouraging Cash and Card Flexibility
 Reviewing No Charge Instances
 Effective Dispute Resolution for improved trust

9. Conclusion
 We have extracted the dataset and leveraged GCP and its tools to run queries that
give us a bigger picture about the dataset.
 This analysis provides some key insights for taxi services to excel in operational
demands, revenue generation, and customer satisfaction.
 Leveraging these insights can ensure a thriving future in the competitive
transportation industry by exceeding customer expectations.
10.Reference
• https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
• https://www.nyc.gov/assets/tlc/downloads/pdf/
data_dictionary_trip_records_yellow.pdf
• https://github.com/darshilparmar/uber-etl-pipeline-data-engineering-project/blob/
main/README.md

Taxis Management System
No ratings yet
Taxis Management System
25 pages
BS en Iso 24373-2009
No ratings yet
BS en Iso 24373-2009
18 pages
Razor Wire Price Quotation
No ratings yet
Razor Wire Price Quotation
2 pages
Uber Data Analytics Project
No ratings yet
Uber Data Analytics Project
9 pages
Oswal Pumps - DCF & Valuation
No ratings yet
Oswal Pumps - DCF & Valuation
10 pages
Report NYC Taxi Operations Starter23
No ratings yet
Report NYC Taxi Operations Starter23
5 pages
UML Taxi Service PDF
No ratings yet
UML Taxi Service PDF
10 pages
Food Processing
50% (2)
Food Processing
201 pages
UML Taxi Service PDF
No ratings yet
UML Taxi Service PDF
10 pages
Data Analysis of Cab Booking Systems
No ratings yet
Data Analysis of Cab Booking Systems
3 pages
Online Vehicle Booking
No ratings yet
Online Vehicle Booking
26 pages
TDIA2 TP3 Spark
No ratings yet
TDIA2 TP3 Spark
2 pages
Hadoop To GCP Migration Plan
No ratings yet
Hadoop To GCP Migration Plan
3 pages
Amol Rathod GCP 7+years
No ratings yet
Amol Rathod GCP 7+years
3 pages
Abhishek Guler I A
No ratings yet
Abhishek Guler I A
2 pages
Case Study Instructions
No ratings yet
Case Study Instructions
2 pages
Belch 12e PPT Ch07 Accessible
No ratings yet
Belch 12e PPT Ch07 Accessible
53 pages
Notice Under Rule 7 (3) of Iepf
No ratings yet
Notice Under Rule 7 (3) of Iepf
1 page
Netzsch PMD PMD Ve Intensive Mixers
No ratings yet
Netzsch PMD PMD Ve Intensive Mixers
4 pages
DCRMS
No ratings yet
DCRMS
37 pages
3 Analyze NYC Taxi Data Using Spark Pool
No ratings yet
3 Analyze NYC Taxi Data Using Spark Pool
3 pages
PowerBI-Unit Summary Project
No ratings yet
PowerBI-Unit Summary Project
5 pages
Revised Form of Price Bid
No ratings yet
Revised Form of Price Bid
3 pages
CMA Study Material For Paper 5 Exams
No ratings yet
CMA Study Material For Paper 5 Exams
20 pages
Pack 2012
No ratings yet
Pack 2012
5 pages
DTT Worksheet FINAL
No ratings yet
DTT Worksheet FINAL
141 pages
Taxicab Oncall Data Data Dictionary
No ratings yet
Taxicab Oncall Data Data Dictionary
3 pages
Contract Form
No ratings yet
Contract Form
3 pages
Group27 CS661 Report
No ratings yet
Group27 CS661 Report
3 pages
Data Dictionary Trip Records Yellow
No ratings yet
Data Dictionary Trip Records Yellow
1 page
Exampler Activity Templates - Executive Summaries
No ratings yet
Exampler Activity Templates - Executive Summaries
7 pages
NYC Taxi Data Analysis with HiveQL
No ratings yet
NYC Taxi Data Analysis with HiveQL
2 pages
Assignment
No ratings yet
Assignment
4 pages
Cse (Aim&Ml) Section 2 Batch 2
No ratings yet
Cse (Aim&Ml) Section 2 Batch 2
42 pages
Taxi Trip Analysis Using Hive
No ratings yet
Taxi Trip Analysis Using Hive
3 pages
Major Project Mid Sem
No ratings yet
Major Project Mid Sem
9 pages
Power Bi Project
No ratings yet
Power Bi Project
7 pages
"Villalon vs. Rural Bank: Mortgage Dispute"
No ratings yet
"Villalon vs. Rural Bank: Mortgage Dispute"
4 pages
Cab Management Syatem
No ratings yet
Cab Management Syatem
10 pages
Fnbslw444 - Case Study
No ratings yet
Fnbslw444 - Case Study
5 pages
CAInter - FM - Paper Analysis - IL..
No ratings yet
CAInter - FM - Paper Analysis - IL..
7 pages
Al-Dohuki Et Al. - 2017 - SemanticTraj A New Approach To Interacting With Massive Taxi Trajectories
No ratings yet
Al-Dohuki Et Al. - 2017 - SemanticTraj A New Approach To Interacting With Massive Taxi Trajectories
10 pages
Lab 4 Creating A Streaming Data Pipeline For A Real
No ratings yet
Lab 4 Creating A Streaming Data Pipeline For A Real
18 pages
Text Completion Nouns Adj. Adv.
No ratings yet
Text Completion Nouns Adj. Adv.
6 pages
The Impact of Sustainability Reporting On Promoting Firm Performance
No ratings yet
The Impact of Sustainability Reporting On Promoting Firm Performance
11 pages
Foundation Acc Assignment
No ratings yet
Foundation Acc Assignment
4 pages
Uber
No ratings yet
Uber
14 pages
Notes
No ratings yet
Notes
6 pages
From Data To Knowledge To Action: A Taxi Business Intelligence System
No ratings yet
From Data To Knowledge To Action: A Taxi Business Intelligence System
6 pages
OLA Analysis Report
No ratings yet
OLA Analysis Report
5 pages
Uber Trip Analysis Machine Learning Project (Data Analyst)
No ratings yet
Uber Trip Analysis Machine Learning Project (Data Analyst)
27 pages
Data Dictionary Trip Records Yellow PDF
No ratings yet
Data Dictionary Trip Records Yellow PDF
1 page
Uber 240119080622 21f5d214
No ratings yet
Uber 240119080622 21f5d214
30 pages
NYC Taxi Trip Analytics Dashboard
No ratings yet
NYC Taxi Trip Analytics Dashboard
2 pages
BigQuery Lab
No ratings yet
BigQuery Lab
13 pages
BDA - Case Study 1&2
No ratings yet
BDA - Case Study 1&2
6 pages
Fillle
No ratings yet
Fillle
6 pages
tender Enquiry No:Enq/Ntpl/21-22/110090/Candp01 (01) /date:28/06/2021 /page - 1
No ratings yet
tender Enquiry No:Enq/Ntpl/21-22/110090/Candp01 (01) /date:28/06/2021 /page - 1
32 pages
Case Study 1
No ratings yet
Case Study 1
11 pages
Slides 2010 Esg
No ratings yet
Slides 2010 Esg
53 pages
Synopsis: Cab Booking System
No ratings yet
Synopsis: Cab Booking System
14 pages
Final Presentation
No ratings yet
Final Presentation
17 pages
Project Report DM Malyka & Qurat
No ratings yet
Project Report DM Malyka & Qurat
22 pages
14 Denise Garcia Ocampo
No ratings yet
14 Denise Garcia Ocampo
15 pages
DMDS Mini Project Final
No ratings yet
DMDS Mini Project Final
15 pages
Bda Report1
No ratings yet
Bda Report1
17 pages
Firman Muqiita - 041711433131 - UTS Seminar EKIS
No ratings yet
Firman Muqiita - 041711433131 - UTS Seminar EKIS
8 pages
A4 - Hadiya For Uploading
No ratings yet
A4 - Hadiya For Uploading
32 pages
2021 NS BDA Assign1
No ratings yet
2021 NS BDA Assign1
4 pages
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
No ratings yet
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
17 pages
Faab Manual For Smallholder Farmers
No ratings yet
Faab Manual For Smallholder Farmers
45 pages
Uber Trips Analysis
No ratings yet
Uber Trips Analysis
26 pages
NYC Green Taxi Data Pipeline 2022
No ratings yet
NYC Green Taxi Data Pipeline 2022
19 pages
Taxi Fare Team 09
No ratings yet
Taxi Fare Team 09
25 pages
Hands On Lab Guide For Data Lake PDF
No ratings yet
Hands On Lab Guide For Data Lake PDF
19 pages
Question 2 Test Law299
No ratings yet
Question 2 Test Law299
9 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
Avi Watwani d17b 75 Bda Project Report
No ratings yet
Avi Watwani d17b 75 Bda Project Report
13 pages
Aarohan Subedi
No ratings yet
Aarohan Subedi
25 pages
Mutual NDA Templat
No ratings yet
Mutual NDA Templat
6 pages
Internship
No ratings yet
Internship
24 pages
Taxi Driver Assistant - A Proposal For A Recommendation System
No ratings yet
Taxi Driver Assistant - A Proposal For A Recommendation System
21 pages
Interview Task - Locale
No ratings yet
Interview Task - Locale
5 pages
NYC Taxi Data Analysis with R
No ratings yet
NYC Taxi Data Analysis with R
39 pages
Solutions Manual To Accompany Cost Accounting: A Managerial Emphasis 13rd Edition 9780136126638 PDF Version
No ratings yet
Solutions Manual To Accompany Cost Accounting: A Managerial Emphasis 13rd Edition 9780136126638 PDF Version
159 pages
Model-Driven Development of Real-Time and Distributed Project Report
No ratings yet
Model-Driven Development of Real-Time and Distributed Project Report
10 pages
GST Notes For Sem 4
100% (2)
GST Notes For Sem 4
7 pages
Factors Influencing Price of A Stock in Indian Stock Market
No ratings yet
Factors Influencing Price of A Stock in Indian Stock Market
5 pages