Fundamentals of Programming
FINAL PROJECT
Efforts by :-
Mohit Thareja
Paul Asare Anim
Komal Kaur
Mahir
Dhruv Arora
• Dataset Selection: Exploring the dynamics of Online
Shopping
Dataset Overview
• Data Cleaning and Pre-processing
Steps taken to clean the dataset with before and after.
• Variables and their significance
Explains how the variables in our dataset relate to the
business question
• Factors influencing Sales
• Customer Retention and Revenue Growth
• Continued Strategies
• Reflections on Technical Learning
• Q/A
Exploring the Dynamics
of Online Shopping:
Dataset Selection
Dataset Overview
Our chosen dataset delves into the world of online shopping, focusing on consumer
behaviour and company strategies. It provides a comprehensive view of the factors
influencing the booming e-commerce industry.
Company Strategies:
On the flip side, the dataset sheds light on the strategic decisions made by companies to
harness the potential of e-commerce. We can explore how businesses adapt, leverage
technology, and innovate to thrive in the digital marketplace.
Rich in Variables:
The dataset is rich in variables that capture various dimensions, including convenience factors,
product choices, pricing dynamics, and the impact of online reviews. This wealth of
information allows for in-depth analysis and meaningful insights.
Relevance to Today's Trends:
In a world where digital transformation is reshaping industries, the dataset aligns with current
trends and provides a valuable snapshot of the evolving dynamics between consumers and
companies in the online shopping realm.
Variables and their significance
CustomerID:
Unique identifier for each customer. Enables tracking individual customer behavior,
preferences, and transaction history.
Gender:
Categorical variable indicating the gender of the customer. Allows for gender-based analysis of
shopping habits and preferences.
Location:
Text variable representing the location or address of the customer. Enables geographical
analysis and tailoring of marketing strategies based on location.
Tenure_Months:
Numeric variable indicating the number of months a customer has been associated with the
platform. Helps identify long-term customers and assess customer loyalty over time.
Transaction_ID:
Unique identifier for each transaction. Essential for tracking individual purchases,
understanding transaction patterns, and calculating metrics like purchase frequency.
Variables and their significance
Transaction_Date:
Date variable indicating when the transaction occurred. Allows for temporal analysis of sales
trends, seasonality, and the impact of time on purchasing behavior.
Product_SKU:
Text variable representing the Stock Keeping Unit (SKU) for each product. Enables product-
specific analysis and tracking of sales for individual items.
Product_Description:
Text variable describing the product. Facilitates understanding the nature of products purchased
and their popularity.
Product_Category:
Categorical variable indicating the category to which the product belongs. Enables the analysis
of sales performance across different product types.
Quantity:
Numeric variable indicating the quantity of the product purchased in the transaction. Essential
for calculating metrics like average order value and understanding product demand.
Variables and their significance
Avg_Price:
Numeric variable representing the average price of the product. Influences metrics like average
order value and contributes to the analysis of pricing strategies.
Delivery_Charges:
Numeric variable indicating the charges associated with the delivery of the product. Essential for
understanding the cost structure and profitability of transactions.
Coupon_Status:
Categorical variable representing the status of the coupon associated with the transaction.
Influences customer behavior and allows for the analysis of promotional impact.
GST:
Numeric variable representing Goods and Services Tax associated with the transaction.
Contributes to the understanding of transaction costs and financial analysis.
Date:
Date variable (potentially redundant with Transaction_Date). Can be used for cross-verification
and as an additional temporal variable for analysis.
Variables and their significance
Offline_Spend:
Numeric variable indicating the amount spent offline by the customer. Contributes to the
understanding of overall customer spending behavior across channels.
Online_Spend:
Numeric variable indicating the amount spent online by the customer. Crucial for understanding
the contribution of online channels to overall sales.
Month:
Categorical variable indicating the month of the transaction. Facilitates analysis of monthly sales
trends, seasonality, and marketing effectiveness.
Coupon_Code:
Text variable representing the code associated with a coupon, if applicable. Essential for tracking
the impact of specific promotions and campaigns.
Discount_pct:
Numeric variable indicating the percentage of discount applied to the transaction. Critical for
assessing the impact of discounts on customer behavior and overall sales.
Data Cleaning and Pre-processing
Data cleaning and pre-processing are essential steps in preparing datasets
for analysis or machine learning tasks. Python offers various libraries and
tools to streamline these processes.
Before cleaning snapshot
Steps we took to prepare our dataset
for this project
# First, we imported the necessary libraries to kickstart our analysis. Following that, we loaded
our dataset and took a moment to explore its contents, checking its shape, which initially was
(52955, 21).
# Our next step was to ensure data integrity by checking for any missing values. After a round of
data cleaning, we successfully addressed these issues, resulting in a more refined dataset with a
shape of (52524, 21).
# As part of our cleanup process, we identified and removed an unnecessary column labeled
"unnamed," streamlining our dataset further. This brought us to a more focused dataset with a
shape of (52524, 20), marking the completion of our data cleaning phase.
# To enhance our dataset for predictive modeling, we introduced a new column labeled "sales
amount." The formula used for calculating this variable can be found in the accompanying file.
# With these steps completed, we now have a well-prepared dataset, boasting a shape of (52524,
21), ready to fuel our project's analyses and predictions.
AFTER CLEANING SNAPSHOT
FACTORS INFLUENCING
Convenience and Accessibility:
- Online: Shop from anywhere.
- Offline: Immediate product availability.
24/7 Availability:
- Online: Round-the-clock access.
- Offline: Operating hours for convenience.
Variety and Price Comparisons:
- Online: Extensive product range and easy price comparisons.
- Offline: Exclusive in-store deals and hands-on experience.
Reviews and Ratings:
- Online: Reliability of positive reviews.
- Offline: Immediate feedback from knowledgeable staff.
Customer Retention & Revenue Growth
Data Collection and Analysis:
- Online: Targeted marketing with data analytics.
- Offline: Personalized in-store promotions.
Marketing and Branding:
- Online: Social media and content marketing.
- Offline: Consistent in-store branding.
Diversification of Sales Channels:
- Online: Reach a broader market.
- Offline: Explore partnerships and collaborations.
Customer Convenience:
- Online: Optimize website usability.
- Offline: Seamless in-store experience.
Continued Strategies
Adaptation to Digital Trends:
- Online: Stay updated on e-commerce trends.
- Offline: Integrate digital tools for in-store enhancement.
Scalability:
- Online: Regularly update and expand the online catalog.
- Offline: Optimize in-store operations for efficiency.
Cost Efficiency:
- Online: Competitive pricing and promotions.
- Offline: Operational efficiency for cost savings.
Customer Loyalty Programs:
- Online: Implement online loyalty programs.
- Offline: In-store loyalty programs for repeat customers.
By addressing these factors strategically, we can enhance both online and offline channels for
improved customer retention and increased revenue growth.
Reflections on Technical Learning
In conclusion, the project provided a holistic learning
experience, from understanding business questions to
applying technical skills in data manipulation and
analysis. The use of tools like Pandas and PowerPoint
enriched our technical toolkit, and the collaborative
nature of the project improved teamwork and
communication skills. The exploration of real-world
business questions added depth to the technical
learning, demonstrating the interconnectedness of
domain knowledge and data analysis.