Job Skills & Demand Analysis from LinkedIn Job Listings

Overview

This project involves web scraping job listings from LinkedIn to analyze the demand for various data-related roles. The automated script collects job data for Data Analyst, Data Scientist, and Data Engineer roles, providing insights into job trends, required skills, and industry demands.

Data Collection

Platform: LinkedIn(March 2025)
Job Titles Scraped:
- Data Analyst
- Data Scientist
- Data Engineer
Total Listings Scraped: ~4372 for Germany and Spain.
** Saving cookies:** The cookies are saved for login details on LinkedIN with username and password using a pickle file.
Pagination Handling: The script navigates through 40 pages per job title.
Scrolling & Loading: Implemented scrolling and wait time to ensure all job listings load before moving to the next page.

Challenges

Collecting dynamic job data from LinkedIn platform was very difficult.
Also to put specific set of job skill set using regex added complexity.

Technologies Used

Python (for automation & data processing)
SQL (for table relations)
Selenium (for web scraping)
ChromeDriver Installer (to automate browser interactions)
UV (Unplugged Virtual Environments) (for managing dependencies)
Design KPI metrics(to measure results)
Hypothesis Testing(to verify results statistically)

Project Environment & Dependencies

A project environment was created using uv, which helps manage dependencies efficiently. The required dependencies are listed below and can be installed using a pyproject.toml file.

Dependencies

dependencies = [
    "numpy>=2.0.2",
    "pandas>=2.2.3",
    "matplotlib>=3.9.2",
    "seaborn>=0.13.2",
    "jupyter>=1.1.1",
    "ipykernel>=6.29.5",
    "selenium>=4.0.0",  # Added Selenium
    "webdriver_manager>=4.0.0",
    "langdetect>=1.0.9",
    "wordcloud>=1.9.4",
    "statsmodels>=0.14.4",
    "matplotlib-venn>=1.1.2"
]

Exploratory Data Analysis (EDA)

After collecting the job data, Exploratory Data Analysis (EDA) was performed to identify trends and insights. The following techniques were used:

Data Cleaning: Removed duplicates, handled missing values, and standardized job titles.
Skill Analysis: Extracted key skills mentioned in job descriptions using text analysis.
**Regex-based Skill Extraction: Used regular expressions (regex) to search for specific skill keywords within job descriptions and stored the extracted skills corresponding to each job ID.
Job SKILL SET Demand: Visualized demand for different roles in Data Anaytics and across two locations I.e GERMANY and SPAIN
**Skill and in demand Gap analysis :Coverage for skill sets in the boot camps for different job roles.

Data Visualizations in Python

To better understand the data, multiple visualizations were created using Matplotlib, Seaborn, and additional libraries:

Word Clouds (wordcloud) to highlight the most frequently mentioned skills.
Statistical Analysis (statsmodels) to identify significant trends.
Venn Diagrams (matplotlib-venn) to compare skill overlaps between job roles.
Bar Charts & Heatmaps (matplotlib & seaborn) to visualize job distribution, required skills, and company trends.

Tableau Dashboard

To make the insights more interactive, a Tableau dashboard was created. The dashboard includes:

Job Market Skill Set Analysis: View of how the skill sets are in demand according to current Job postings on LinkedIn for below three roles.: Data ANALYST Data ENGINEER Data SCIENTIST
Skill Frequency Analysis: A breakdown of the most requested skills across roles.

https://public.tableau.com/app/profile/muskan.chawla2373/viz/JobListingAnalysis_17417071327610/Dashboard1?publish=yes Also available as tableau work book in git hub repo.

Presentation link-

Presentation Link : https://docs.google.com/presentation/d/12E7MEW84s4_W8v-4T5Ldzq2DhbVKjqFdJgOeIMZZ-ak/edit#slide=id.g33f481e1cda_0_37

How to Run the Script

Install dependencies:
```
uv pip install -r project.toml
```
Ensure you have Google Chrome installed.
Run the script:
```
python GetLinkedInJobListings.ipynb
```
The output will be stored in a CSV/JSON file for further analysis.

Future Enhancements

Expand job search to other platforms like Indeed and Glassdoor.
Perform text analysis on job descriptions using some Machine Learning Model to identify the most in-demand skills.
Enhance Tableau dashboard with more advanced filters and real-time updates.

Contact

For any queries or collaborations, feel free to reach out! [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ERD_diagrams		ERD_diagrams
SQLScripts		SQLScripts
Tableau WB		Tableau WB
anaconda_projects/db		anaconda_projects/db
data		data
notebooks		notebooks
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Job Skills & Demand Analysis from LinkedIn Job Listings

Overview

Data Collection

Challenges

Technologies Used

Project Environment & Dependencies

Dependencies

Exploratory Data Analysis (EDA)

Data Visualizations in Python

Tableau Dashboard

Presentation link-

How to Run the Script

Future Enhancements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Muskanc02/Job-SkillSet-Demand-Analysis

Folders and files

Latest commit

History

Repository files navigation

Job Skills & Demand Analysis from LinkedIn Job Listings

Overview

Data Collection

Challenges

Technologies Used

Project Environment & Dependencies

Dependencies

Exploratory Data Analysis (EDA)

Data Visualizations in Python

Tableau Dashboard

Presentation link-

How to Run the Script

Future Enhancements

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages