https://amazon-sales-analysis-in-india.streamlit.app/
This project analyzes Amazon India's sales data to uncover insights about order patterns, customer behavior, product performance, and geographic trends. The analysis covers data from April to June 2022, containing 128,975 order records.
- Source: Amazon India Sales Report from Kaggle
- Time Period: April - June 2022
- Records: 128,975 orders
- Key Features: Order details, product information, shipping data, customer type, and financial metrics
What: Converted all column names to lowercase and replaced spaces/hyphens with underscores Why: Ensures consistent naming convention and easier programmatic access
df.columns = df.columns.str.strip()
df.columns = df.columns.str.lower().str.replace(' ', '_')
df.columns = df.columns.str.lower().str.replace('-', '_')What: Dropped 33 rows with completely missing address information Why: These represented only 0.026% of data and couldn't be imputed meaningfully Result: Maintained 99.97% of the original data
What: Dropped 'unnamed:_22' and 'fulfilled_by' columns Why: These columns contained no useful information for analysis
What:
- Converted 'date' to datetime format
- Changed 'ship_postal_code' to integer
- Renamed 'qty' to 'Quantity' for consistency Why: Proper data types enable time-series analysis and consistent naming improves code readability
What: Cleaned and standardized state and city names Why: Different spellings and variations of the same locations were creating artificial duplicates Result:
- States reduced from 69 to 47 unique values
- Cities reduced from 8,955 to a more manageable number Examples:
- 'delhi', 'new delhi', 'Delhi' → 'delhi'
- 'rajasthan', 'rajshthan', 'RJ' → 'rajasthan'
What: Filled missing currency values with 'INR' Why: All transactions were in Indian Rupees; missing values were data entry errors
What:
- Replaced missing promotion IDs with 'No Promotion'
- Set courier status to 'Cancelled' when order status was 'Cancelled' Why: Creates clear categories for analysis and ensures data consistency
What: Filled missing amount values using median unit price by product style Why: Preserves pricing patterns while handling missing financial data Method: Calculated median unit price per style, then multiplied by quantity
- month: Numerical month (4-6)
- month_name: Full month name (April, May, June)
- day_of_week: Day name (Monday-Sunday)
- day_of_month: Day number (1-31)
- week_of_year: Week number
- has_promotion: Boolean indicating promotion usage
- price_tier: Categorized prices into Budget (<₹300), Mid-range (₹300-600), Premium (₹600-900), Luxury (>₹900)
- unit_price: Calculated as amount/quantity
- customer_type: Converted B2B boolean to 'B2B'/'B2C' text
- total_revenue: Calculated as amount × quantity
- Set items are the best-selling category with highest quantity sold
- kurta follows as the second most popular category
- Women's ethnic wear dominates the sales
- Overall cancellation rate varies by category
- Higher-priced items tend to have lower cancellation rates
- Categories with size issues show higher cancellations
- Maharashtra generates the highest revenue
- Karnataka follows as second-highest revenue state
- Top 5 states account for majority of total revenue
- Orders with promotions have higher average order values
- Promotion usage varies by customer type
- B2B customers use promotions less frequently than B2C
- M (Medium) is the most popular size across categories
- L (Large) and XL follow in popularity
- Size preferences vary significantly by product category
- May shows peak revenue among the three months
- Weekend sales are generally lower than weekdays
- End-of-month periods show increased order volumes
- B2B customers have higher average order values (AOV)
- B2C customers represent the majority of orders
- B2B proportion varies significantly by state
- Kurtas have highest average unit prices
- Budget tier represents largest order volume
- Premium and luxury tiers have better completion rates
- Standard shipping is preferred over expedited
- Amazon fulfillment shows lower cancellation rates than merchant
- Delivery success rate is higher for expedited shipping
- Mumbai leads in both order volume and revenue
- Bangalore shows highest average order values
- Metro cities dominate the top 10 revenue generators
- Framework: Streamlit for interactive web application
- Visualization: Plotly for dynamic charts
- State Management: Session state for global filters
- Modular Design: Separate pages for different analysis aspects
- Global Filters: State, Month, and Day filters applied across all pages
- Geographic Analysis: State and city performance metrics
- Time Analysis: Monthly, weekly, and daily patterns
- Product Analysis: Category, size, and price tier insights
- Customer Analysis: B2B vs B2C segmentation
- Inventory Management: Focus on medium and large sizes for ethnic wear
- Marketing Strategy: Target promotions during weekdays for better conversion
- Geographic Expansion: Leverage success in Maharashtra model for other states
- Customer Retention: Address high cancellation categories with better product descriptions
- Pricing Strategy: Maintain focus on budget and mid-range tiers while improving premium tier services
The analysis reveals strong geographic concentration, clear customer segmentation patterns, and seasonal trends. The insights provide actionable recommendations for inventory, marketing, and operational improvements.