This project analyzes product prices across three Spanish supermarkets (Eroski, Dia, and Alcampo) to identify which store offers the lowest prices.
Using web scraping and APIs, we gathered pricing data for various products across the three supermarkets. Our Python code allows you to search for any product, returning a dataframe with that product's prices across stores. We further analyzed a sample of 10 products to identify the supermarket with the most frequent lowest prices and created a heatmap to visualize the price distribution.
-
Data Extraction:
- Used web scraping and APIs to collect product data, including names, prices, and package sizes, from each supermarket’s website.
-
Data Cleaning & Transformation:
- Utilized
replace,strip, andstr.containsto standardize product data and remove irrelevant items. - Aggregated and sorted product data using
concatandsort_valuesfor each store to prepare for analysis.
- Utilized
-
Analysis & Visualization:
- Calculated average prices per supermarket with
groupbyandmean. - Used
value_countsto identify the store with the lowest prices most frequently. - Visualized the results through a bar chart and a price heatmap for a clear comparison across products and stores.
- Calculated average prices per supermarket with
-
Clone the repository:
git clone https://github.com/your_user/supermarket-price-comparison.git cd supermarket-price-comparison -
Install requirements:
pip install -r requirements.txt
Run the main analysis script to retrieve prices and generate visualizations. Adjust the search parameters to analyze the price distribution of any desired product across the supermarkets.
- Python for data analysis and processing.
- Pandas for data cleaning and transformation.
- Matplotlib & Seaborn for data visualization.
- Web Scraping & APIs for data extraction from the supermarkets.