Football Scouting Agency ™ offers data-driven solutions for football clubs to gain high-quality insights on top football talents worldwide. Leveraging APIs, web scraping, and advanced analytics, we help clubs discover players who deliver top performance at competitive costs.
- Project Overview
- Problem Statement
- Hypotheses
- Data Sources & Collection
- Data Cleaning & Wrangling
- Exploratory Data Analysis (EDA)
- Visualization
- Project Timeline
- Team & Contribution Guidelines
- License
Football Scouting Agency ™ provides football clubs with actionable insights by analyzing performance metrics and transfer market data. Our objective is to identify top-performing yet cost-efficient players from the world's elite leagues.
2024 Challenge:
Identify the top 5 most performance-efficient players in 2024 from the top 5 leagues—while ensuring they are also among the cheapest based on performance efficiency.
- Leagues & IDs:
- Premier League: “GB1”
- La Liga: “ES1”
- Série A: “IT1”
- Ligue 1: “FR1”
- Bundesliga: “L1”
Total players from these leagues: 2,805
- Cost Efficiency: None of the top 5 players will be cheaper than 50M EUR.
- Performance Efficiency: Performance efficiency is measured by goal contributions relative to minutes played. (Example metrics: minutes played / (goals + assists)
- Player Movement: None of the top 5 players in 2022 will appear in the top 5 in 2024.
Data is collected from multiple trusted sources:
- APIs: For example,
https://transfermarkt6.p.rapidapi.com/players/profileto retrieve player information.
Note: Always save the fetched data locally (e.g., CSV files) to minimize API calls and manage rate limits.
Our data preprocessing involves:
- Handling null values and duplicates
- Dropping irrelevant columns
- Manipulating strings and formatting fields
- Creating new variables (e.g., goal contributions efficiency)
We employ EDA to:
- Validate hypotheses through univariate, bivariate, and multivariate analysis.
- Compare data across 2022 and 2024.
- Utilize descriptive statistics and visualization tools to extract actionable insights.
Key insights include:
- Goal Contributions per Million Euros: Efficiency comparison.
- Minutes per Goal: Performance metrics for top players.
Visualization strategies are chosen to clearly communicate insights:
- Chart Types: Bar charts for categorical comparisons, line charts for trends.
- Design Principles: Minimal clutter, focused attention using bold text and contrasting colors.
- Interactive Elements: Dashboards built with Python libraries, Seaborn and matplotlib
- Day 1 (Thursday):
- Brainstorm interesting topics and formulate hypotheses.
- Set up GitHub repository and create a Kanban board.
- Day 2/3 (Saturday - Tuesday):
- Data Collection: Fetch data via APIs and web scraping.
- Data Wrangling: Clean, transform, and structure the data.
- Day 4 (Thursday):
- Finalize data cleaning, perform EDA, and refine code.
- Prepare initial visualizations.
- Day 5 (Saturday):
- Final presentation and demo.
Checklist Highlights:
- Decide on columns to keep for 2022 and 2024.
- Calculate and add goal contribution efficiency columns.
- Rank top 5 players.
- Merge DataFrames (performance metrics and additional info).
- Build charts comparing key performance indicators across years.
Team Members:
Jorge info:
Sherif info :
Contribution Guidelines:
- Merge individual work into the shared document only after thorough testing.
https://rapidapi.com/ntd119/api/transfermarkt6
| Parameter | Type | Description |
|---|---|---|
api_key |
string |
Required. Your API key |
To run this project, you will need to add the following environment variables to your .env file
API_KEY
ex: x-rapidapi-key=**************************************************