Group 4 Ins1053 Ins105301
Group 4 Ins1053 Ins105301
INTERNATIONAL SCHOOL
⁎⁎⁎
REPORT
Introduction to Business Data Analytics
2
INTRODUCTION
In today’s competitive job market, understanding the relationship between job skills and
salaries is crucial for career success and financial stability. This knowledge guides
can tailor their education and training efforts to enhance employability and job satisfaction.
This understanding also empowers workers during salary negotiations, enabling them to
Comparing current salaries with industry standards ensures professionals are not underpaid,
contributing to overall job satisfaction and financial well-being. In fields like data science,
experiencing significant growth, roles such as data analysts, machine learning engineers, and
AI specialists are highly sought after due to the increasing reliance on data-driven decision-
making. Skills like programming (Python, R), statistical analysis, data visualization, and
Therefore, knowing which skills can enhance qualifications for high-paying roles in these
fields helps professionals prioritize their education and career paths for maximum earning
potential. Staying updated on technological advancements and market trends is essential for
securing high-paying roles and advancing careers effectively in today’s dynamic labor
market.
3
The primary objectives of this report are to thoroughly analyze and understand the
relationship between job skills and salaries in the data science industry. Firstly, it aims to
identify key job skills, certifications, and educational qualifications associated with higher
salaries. Secondly, the report seeks to analyze how factors like job title, education level,
experience, and technical skills impact salaries and benefits. It will develop a predictive
model using statistical and machine learning techniques to estimate expected salaries for
across sectors like finance and healthcare, and assessing the impact of gender and diversity
on salaries. The report will also evaluate the role of continuing education in career
advancement and examine how compensation affects job satisfaction and work-life balance
salary determinants in data science, offering insights for professionals to maximize their
o Salary Trends in Each Industry: How do salaries in the data science industry
vary across sectors such as finance, healthcare, technology, and retail? Are there
salaries in the data science industry? What specific types of education and
4
o Relationship Between Salary and Job Satisfaction: How do salary and benefits
correlate with job satisfaction and work-life balance among professionals in the
The dataset used for this analysis is sourced from Kaggle and contains information on job
postings and salaries in the data science industry, primarily focusing on the state of
California. The dataset comprises three main components: company salary information, job
qualifications, and employee benefits. The dataset consists of 1,342 observations and key
variables such as job title, salary (USD/ year), skill requirements, educational qualifications,
science/data)
While the initial salaries are reported in US Dollars (USD), they will be converted to
Vietnamese Dong (VND) based on the exchange rate at the time of analysis to increase
relevance for Vietnamese readers (1 USD = 25,450.00 VND as of June 15, 2024 11:00 am).
Variables such as position, company, and ID are not used in the analysis as they are not
Company Salary Information: This dataset provides details on various data science
roles, including job titles, salary ranges, and job levels (e.g., junior, senior, staff). It
covers a wide range of positions such as Data Scientists, Data Analysts, Machine
Learning Engineers, and Data Science Managers, among others. The salary
information is a crucial aspect of this analysis, as it serves as the target variable for
5
Job Qualifications The qualifications dataset contains a comprehensive list of skills,
learning expertise, and degrees (Bachelor's, Master's, or Doctoral). This dataset will
be utilized to identify the most valuable skills and qualifications that contribute to
Employee Benefits The benefits dataset provides information on various perks and
benefits offered by companies, such as health insurance, paid time off, retirement
plans, stock options, and professional development opportunities. This data will be
This study provides an opportunity to evaluate the necessary job skills and competitive
advantages in the data science industry. It also explores the relationship between educational
qualifications, skills, salaries, job positions, and provided benefits. The findings from this
study will be highly beneficial for students in preparing for their future careers, helping them
gain a better understanding of the skills they need to develop, as well as the salaries and
benefits they can expect when joining the workforce in the data science field.
METHODOLOGY
6
Regression analysis is a statistical technique used to model the relationship between one or
more independent variables and a dependent variable. In this case, we aim to identify the
primary factors that influence salary levels in the data science industry. Multiple linear
regression can be employed to model the relationship between independent variables (such as
skills, educational qualifications, experience) and the dependent variable (salary). The results
squared, and residual analysis to determine the main factors contributing to higher salary
levels.
Levels
Classification analysis is used to separate employees with high salaries from those with lower
salaries and analyze the factors influencing this difference. A predictive model can be built
using classification algorithms such as logistic regression. The model can then be used to
predict the salary level of new employees based on their characteristics, and the most
influential factors can be identified through feature importance analysis. To analyze each
influencing factor in detail, salary rates can be calculated for different subgroups of
employees. Data visualization techniques such as box plots, scatter plots, and heat maps can
also be utilized to explore the relationship between each factor and salary levels.
Additionally, descriptive statistical analyses, such as calculating mean, median, and standard
deviation, can help elucidate the distribution of employee salary and benefit variables.
Exploratory data analysis using histograms and box plots can also help identify patterns,
7
CHAPTER 1: DATA PREPARATION AND
CLEANING
Employee:
Benefits:
Qualifications:
8
Figure 3: Employee Qualifications (Degree and Skills)
To preprocess the "Data science salaries" dataset, we performed several key steps.
Firstly, we checked for missing values in the dataset. We found that there were no
Secondly, we checked for duplicate values, and then dropped them from the dataset.
This ensured that the dataset only contained unique records, which made our analysis more
accurate.
since they did not provide any valuable information for our analytical or predictive models.
values using one-hot encoding. This was necessary because most machine learning
variables into numerical ones, we were able to use them in our analytical and predictive
models with greater accuracy. Our data will be presented in the manner below
Career | Count
9
Table 1. Career count and Percentage
Statistician 6 0.466%
10
Head of Data Science 6 0.466%
Observation: The bar chart clearly shows the disparity in the frequency distribution among
the occupations.
The most frequent value is "Data Scientist" with 660 observations, accounting for
Learning Scientist" with 136 observations (10.6%), and "Software Engineer" with 78
observations (6.06%).
Other values such as "Data Engineer" (72 observations, 5.59%), "Data Science
Manager" (63 observations, 4.9%), "Data Analyst" (57 observations with 4.43%),
11
"Applied Scientist" (36 observations with 2.8%), "Director of Data Science" (22
Level | Count
Staff 80 6.2%
Junior 73 5.7%
Principal 41 3.2%
Lead 34 2.6%
Distinguished 1 0.1%
12
Observation: The bar chart clearly shows the disparity in the frequency distribution among
The most frequent value is "Unknown" with 573 observations, accounting for
approximately 44.5% of the total observations. We can assume that the “unknown”
levels, despite having a job title would mean that they are regular employees for data
interpretation’s sake and are higher than junior and staff level employees but lower
than senior.
Next is "Sr." (Senior) with 485 observations (37.7%), "Staff" with 80 observations
Other values such as "Principal" (41 observations with 3.2%) and "Lead" (34
o Distinguished
o Principal
o Lead
o Senior
o Regular (Unknown)
13
o Staff
o Junior
Salary | Values
Statistic Value
Count 1209
Mean 150969.91
Std 33377.15
Min 0.0
25% 130000.0
75% 170000.0
Max 434000.0
Observations:
14
There are a total of 1209 observations for the "Salary" variable.
The mean salary is 150969.91 USD which is around 3.8 billion VND per year or
around 320 million per Month. The average wage for VN Data Scientists is 27.5
the data, which is an estimated 850 million VND difference per year
The minimum salary value is 0, which is due to people working for free or taking the
opportunity to gain experience. The number of those working without pay is very low.
The salary at the 25th percentile (Q1) is 130000.00 or around 3.3 billion or 275
The median salary is 150000.00 or around 3.8 billion or 320 million per month
The salary at the 75th percentile (Q3) is 170000.00 or around 4.3 billion or 360
The maximum salary value is 434000.00 or around 11 billion or 920 million per
month
Unique 13 7
15
Top Data Scientist Regular
Observations:
The "Career" variable has 13 unique values, with "Data Scientist" being the most
The "Levels" variable has 7 unique values, with "Regular" being the most frequent
(573 times).
In the section, we will explore which features exhibit a positive correlation with each other.
This helps us determine if there is a relationship between two variables. The values in a
correlation between the respective variables, while values closer to -1 indicate a stronger
16
Observations:
Job Level and Salary Correlation: The Senior level and Salary have a positive
correlation of 0.22. This indicates that senior-level employees tend to have higher
salaries. Similarly, principal job level also shows a positive correlation of 0.11 with
Career Path and Salary Correlation: Careers like Director of Data Science have the
highest positive correlation with salary (0.33), suggesting that individuals in this role
tend to earn higher salaries. Other significant correlations include Head of Data
Science (0.25) and Vice President of Data Science (0.24), indicating that these high-
Weak Correlations with Salary: Certain career paths have weak or negative
correlations with salary, such as Data Analyst (-0.19) and Statistician (-0.09). This
suggests that these roles might not be as lucrative compared to others within the
dataset. However, given that we start out at entry level jobs like this, it allows us to
17
Inter-career Path Correlations: There are strong positive correlations between
various managerial and senior roles. For example, Head of Data Science and Director
Career and Job Level Correlations: Applied Scientist has a low positive correlation
with salary (0.01) and a moderate correlation with Data Scientist (0.23). This suggests
From these observations, to predict salary, we should focus on variables with higher
correlations with salary, such as the following: Career paths like Director of Data Science,
Head of Data Science, and Vice President of Data Science, which show significant positive
correlations with salary. Moreover, Job levels such as Senior and Principal, which also
In this chart, you can see the salary column chart statistics. Salaries are concentrated between
120,000 and 160,000 with the majority of people surveyed at these levels. The number of
18
people earning a salary of 160,000 to 180,000 accounts for the largest number, followed by
the range of 120,000 to 140,000. Ranked 3rd is 140,000 to 160,000. There is a smaller
percentage for the remaining salaries. And there are very few people who reach the lowest
and near low salaries like 20,000 and below 60,000. A very few people achieve salaries
above 420,000.
The statistical chart shows variables that are highly correlated with salary. The most
influential variable is Ontology with a correlation level above 0.3 and the least influential
variable is ARQL with a correlation close to 0. The correlation level shows which factors the
salary is affected by. It does not indicate a positive or negative effect, but it can indicate what
19
Figure 10. Pivot Table
The table above is a Pivot table providing information on average salary by career and levels.
Applied Scientist: The highest salary is at the ‘Principal’ rank at 130,000 USD,
Data Analyst: Salaries at the ‘Junior’ and ‘Regular’ ranks are lower than at other
Data Architect: Only salary data is available for the ‘Distinguished’ and ‘Senior’
Data Engineer: ‘Senior’ salary is 139,310.34 USD and ‘Regular’ is 133,846.15 USD.
Data Science Manager: Only data is available for the rank ‘Regular’ with a salary of
160,365.08 USD.
Data Scientist: Salary is evenly distributed across ranks, with 'Lead' the highest at
Director of Data Science: Only has data for the rank ‘Regular’ with a very high
Head of Data Science: Also only has data for ‘Regular’ with a salary of 175,000
USD.
20
Machine Learning Engineer: The highest salary is at the ‘Principal’ level with
200,000 USD and the lowest is at the ‘Regular’ level with 143,733.33 USD.
'Principal' the highest at 170,000 USD and 'Lead' the lowest at 120,000 USD.
Software Engineer: The highest salary for ‘Senior’ is 174,285.71 USD and ‘Junior’
is 151,379.31 USD.
Statistician: Only data is available for ‘Junior’ and ‘Senior’, with ‘Senior’ being
- With the salary of a Data Scientist, it can be seen that the salary ranges from 150,000 to
170,000 USD. (with over 50% of people reaching this salary level). Expanded, most
surveyors have a salary range from 100,000 to 190,000 USD with over 90% of surveyors
falling in this range. The salaries that few people achieve are those with salaries under 90,000
USD and over 200,000 USD. Especially in the range of 100,000 to 190,000 USD, there are
- With Data Analyst: salary has less differentiation than with Data Scientist. The number of
people reaching a salary of 125,000 to 150,000 USD is the salary level most people achieve.
Next are those at 100,000 to 150,000, 150,000 to 175,000 and the least are those with salaries
21
- Regarding salary distribution in the Machine Learning Industry: The salary level is very
highly differentiated when there are over 70 people reaching 160,000 USD and a few people
are distributed at the remaining salary level. This is also the average level in the industry. It
shows that the majority of survey participants have a salary equal to the average salary.
- With Director of Data Science: Salary also has a clear differentiation when divided into
two different sides. The majority of surveyors are completely below 250,000 USD and the
most are those with a salary of 200,000 USD. Meanwhile, in the range from 250,000 to
400,000 USD, no one reaches this level and above 400,000 USD, there are 3 people who
22
reach this level. The number of people working in this industry is quite small in the data, so
100,000 to 175,000 USD (accounting for over 90% of survey participants). The salary level
of 150,000 USD has the most people achieving it with more than 50 people reaching this
level.
- Data engineer: The industry has the most skewed salary distribution when over 80% have a
salary of 100,000 to 150,000 USD and very few in the remaining levels.
23
Regarding the salary of the data engineering industry, the salary of the software engineer
salary is concentrated in the range of 130,000 to 180,000 USD. The salary has a fairly large
distribution when the majority of survey participants are concentrated in this range and there
are a few remaining in the range. Only a few people participating in the survey have a salary
- For the Data Science Management industry, the salary is evenly distributed when
concentrated between 120,000 and 180,000 people with salaries of 120,000, 165,000 and
24
180,000 have a fairly similar number of survey participants. People with salaries of 140,000,
- In the survey of people in the data architect salary, the number of surveys in this industry
is relatively small with two people achieving salaries from 200,000 to 210,000 and one
person for each salary range from 170,000 to 180,000, 180,000 to 190,000 and from 190,000
to 200,000. This gave a relatively smaller data set compared to other jobs in the data science
field.
- The number of people in the Applied Sciences is relatively fewer than the other fields of
data science. They also average around 160,000 USD per annum while the lower salaries can
25
go as low as 110,000 USD. They also have peaked around 180,000 USD per annum.
- Data collected from people in the statistics industry showed that there were a total of three
people participating in the survey with two people who were paid 85,000 to 90,000 and only
one person paid around 105,000. There is seemingly a gap between salary ranges as they had
- For the data science leadership positions, they tend to have quite good salaries as they, at
their lowest gain around 120,000 USD and at their peak can reach up to 220,000 USD as they
26
represent and manage their teams, there should be a corresponding increase in the salaries
they receive.
- Main Observation: We can see that the charts with high correlation are the charts where
the number of survey participants is very small and, in these charts, the statistical significance
percentiles, outliers
27
DS: an average of 150,000 for this job and a concentrated distribution in the range of
130,000 to 170,000 and a small concentration in the remaining ranges, lasting from
70,000 to 230,000. There are 2 Outliers are those at salary levels 0 and 340,000
MLE: average salary reaches 170 000 and concentrated values from 140 000 to 180
000, values extend from 100 000 to 210 000. There are 2 salaries that are separate
from the survey are those reaching 80 000 and 250 000
MLS: the majority of salaries range from 140,000 to 170,000, with a concentration
profession is relatively small, so the number is divided and spread across many
SE: have an average salary of 150,000 and their salary figures range from 130,000 to
180,000, with the majority concentrated around 90,000. to 240,000 for those working
in data analysis, the salary is lower than average at 130,000 and ranges from about
100,000 to 140,000 with the focus of mainly working around 60,000 to 200,000 and
For the data science management industry, like the DSM, DDM and HDM, the
240,000. The salary for workers is from about 160,000 up to 250,000. There are only
two exceptions for people with a salary of 140,000, while a high salary range is may
28
Figure 11. Salary Distribution by Job Level
of 130,000 and are concentrated in the range of 120,000 to 140,000. Smaller numbers
range from 60,000 to 130,000 and 160,000 to 210,000. There is a small amount
distributed from 100,000 to 210,000 and a few small values scattered around 90,000,
29
Staff: The average is around 150,000, the majority is concentrated from 140,000 to
170,000 and over 90% concentrated is from 80,000 to 190,000, with an outlier at
250,000.
Junior: The average is around 150,000 and concentrated from 130,000 to 160,000,
the majority of people at this level are distributed from 90,000 to 190,000 and there
190,000, the majority of people at this level have salaries ranging from 100,000 to
220,000 and only a very small group is at the 80,000 and 220,000 marks.
Lead: The average is around 170,000 and there is a concentrated distribution from
150,000 to 180,000. Surveyors over 95% between 110,000 and 200,000 and only 1
outlier at 240,000.
roles within the Data Science and Information Technology fields. See Figure 12.
30
Observations:
reflecting the high demand for professionals who can apply scientific methods to
other roles in the data science domain. This difference is likely due to the focus of
Data Analysts on data processing and analysis rather than developing complex
models.
Data Architects top the chart with an average salary of $192,000.00. This significant
Data Science Managers have an average salary of $160,365.08. This role requires
both management skills and extensive experience in data science, which justifies the
Data Scientists have a solid average salary of $147,318.79. This popular role
demands deep analytical skills and the ability to build models, making it one of the
The Director of Data Science position boasts an average salary of $231,909.09, the
highest in the chart. This role involves significant responsibility in strategic direction
31
Machine Learning Engineers have an average salary of $158,820.59, indicating the
Statisticians have the lowest average salary in the chart at $93,333.33. This may be
due to their focus on traditional statistical analysis rather than the more complex tasks
in data science.
Overall, the salary chart clearly indicates the value placed on roles within the Data Science
and IT fields, with leadership and highly specialized technical roles commanding the highest
salaries.
The provided bar chart below (Figure 13) illustrates the top 10 average salaries by career
within the Data Science and Information Technology fields, highlighting the highest and
32
Observations:
The Director of Data Science boasts the highest average salary at approximately
$231,909. This makes it an ideal role for individuals seeking leadership positions with
reflects the critical nature of the role in guiding and overseeing the data science
Following closely is the Data Architect, with an average salary of $192,000. This
role is essential for designing and maintaining data infrastructure, making it suitable
for those with strong skills in data management and architecture. The high salary
underscores the importance of robust data systems in supporting the analytical and
The Data Scientist has the lowest average salary among the top 10 careers, at
$147,319. While it is the lowest in this elite group, it remains a highly lucrative
position. Data Scientists play a crucial role in extracting insights and building
predictive models from data, highlighting the significant demand and value of their
33
expertise. It also often plays as the first stepping stone into the field as an Entry Level
The salary chart below provides information on the salary levels for different position levels
Observations:
The Principal level is the highest position for Applied Scientists with an average
The Distinguished level is the highest position for Data Architects with an average
The Principal level is the highest position for roles such as Data Scientist
34
and Applied Scientist ($130,000.00). This high salary reflects their extensive
The Lead level also commands a relatively high salary, e.g., Data Scientist
The Senior level typically has a lower salary than the Principal and Lead levels but
higher than the Staff and Junior levels (a Senior Data Scientist earns $165,748.32,
while a The Junior level is usually the lowest, with more modest salaries such as
$126,736.00 for a Junior Data Scientist and $154,000.00 for a Junior Machine
The Regular level has varying salaries depending on the position but is generally
The bar chart depicts the average salaries across 7 main position levels: Regular, Staff,
Senior, Lead, Principal, Distinguished, clearly showing the trend of salaries increasing with
Observations:
The Distinguished level has the highest average salary, around $190,000, befitting the
35
Next is the Principal level with an average salary of approximately $169,231,
considerably higher than the Lead ($154,309.09) and Senior ($154,093.79) levels.
The Regular and Junior levels have the lowest average salaries, around $62,727.
The salary gap between levels is substantial (the Principal level is around $27,000
higher than the Senior level), reflecting the premium placed on higher-level
Based on the chart, it is evident that the position level factor has a significant impact on
salaries in Data Science. Higher levels like Distinguished, Principal, and Lead command
considerably higher salaries compared to lower levels. This reflects the high regard for the
However, the salary gap between levels varies somewhat across job roles. Observing the
table, one can see that the difference in Principal level salaries between Data Analyst
($96,000.00) and Data Scientist ($179,760.00) is quite large. While the salary differential
between levels within the same field is considerable, the gap appears smaller for Data
Engineers due to a lack of data for multiple levels. Additionally, the Regular level typically
has lower salaries, but there are exceptions, such as the Regular Data Architect earning more
36
Figure 14. Top 10 Benefits
Observations:
Companies prioritize offering health insurance, dental insurance, social security, paid
time off (PTO), and vision insurance as essential benefits for several compelling
reasons.
Social security and disability insurance provide financial security, offering peace of
Paid time off and vision insurance further contribute to employee wellness,
Visa sponsorship opens doors for diverse talent, enriching teams with varied
37
Paid training and opportunities for advancement demonstrate a commitment to
company.
employee retention and loyalty. 401k Matching is the company matching your savings
for your retirement. This would be similar to Vietnam’s Social Security System;
however, this has a higher rate of return than that of Vietnam as they essentially
Together, these benefits not only attract top talent but also nurture a motivated
Qualifications in the workplace are fundamental prerequisites that ensure individuals possess
the knowledge, skills, and competencies necessary to perform their roles effectively. These
qualifications serve several crucial purposes in fostering a productive and efficient work
environment.
Firstly, qualifications act as a benchmark for competency and proficiency. They provide
employers with a reliable means to assess an individual's suitability for a specific job role or
reasonably expect that employees have acquired the necessary theoretical knowledge and
fields such as healthcare, engineering, finance, and law, specific qualifications are often
38
standards and ethical practices. This not only safeguards the integrity of services provided but
also instills trust and confidence among clients, customers, and stakeholders.
development. They encourage individuals to acquire new knowledge, update their skills, and
stay abreast of industry advancements. This ongoing learning process is vital in a rapidly
evolving global economy where technological innovations and market dynamics continually
Furthermore, qualifications enhance career opportunities and mobility for individuals. They
making hiring decisions, recognizing the added value and expertise these individuals can
industry standards, promoting lifelong learning, and advancing career opportunities. They are
essential not only for individual career growth but also for organizational success,
We can separate qualifications into two categories, background of education and skills. We
will first tackle the given data for the background of education then followed by the skills
category. In the dataset, we are given the following data for their background of education
overall.
39
Background of Education Observations:
Doctor’s Degree - With the largest sample in the dataset, they compose 42.6% of the
population. This shows that they are achievers in this field and have studied heavily to
enter into the field. See the Figure below for their Job Titles.
Doctor of Philosophy - Being the highest standard of education, they also have the
most respondents at 41% of the sample population. The remaining 1.6% did not
40
Figure 16. Doctorate Level of Education | Job Titles
Data Science Manager (DSM), Director of Data Science (DDS), Data Analyst
- Each at around 2%, People with Doctorate Degrees have been able to reach high
levels like the DSM and DDS having studied and worked hard to reach that far.
Master’s Degree - Overall, the dataset has 20.4% samples that have achieved their
Master’s Degree. 15.1% of which did not disclose their type of degree. See the Figure
41
Master’s Degree of Science - They compose 4.8% of the Master’s degree
holders.
42
Bachelor’s Degree: Overall, the dataset has 14.7% samples that have achieved their
Bachelor’s Degree. 10.4% of which did not disclose their type of degree. See the
holder
43
None Applicable or Not Disclosed - They did not disclose what they achieved in
As we can see, the Job Titles that they have are different with each level of educational
background. As we aim to finish this course with a Bachelor's Degree in Business Data
Analytics, we have every opportunity to rise to similar levels of Job Titles and their Salaries
44
Skills Observations:
They are the top skills in the data science field as we need these to program and
assist us in our data analysis. Based on the dataset, at least one programming
language is known by each personnel. Thus, making these types of skills essential
in the field.
Under machine learning we have lots of different areas to know and learn how to
utilize it well. They were broken down to multiple different applications that assist
45
General Skills - Communication Skills, Analysis Skills, Research, Tableau,
These skills are transferable between jobs and job positions, as some would allow
you to work as a leader like the communication skills, while the rest allow you to
perform well even in different industries like research, analytics and data
visualization.
In conclusion we must have skills in these three areas in order to succeed in our chosen fields.
Logistic Regression
In order to figure out the factors to the Attrition rate, we analyzed the Training Data by
The normal regression equation is a statistical model used to explain the relationship between
a dependent variable and independent variables. The equation takes the form:
46
- X₁, X₂,.. are the values of the independent variables
- β₁, β₂,.., βᵣ are regression coefficients, which represent the degree of influence of
The linear regression equation for the effect on Salary provided by the model is
The regression model results show that Levels has a positive effect on salary, while career
has a negative effect. Qualifications and Benefits have P value within the allowable range
<0.05 but the level of influence is smaller. The standard deviation of the residual values
shows that the model may not fully explain the variation of Salary.
R square = 0.75 indicates that the model explains 75% of the data.
47
Levels: Levels have the biggest influence on the salary of survey participants,
showing that the higher the level, the higher the surveyor's salary. It can be seen from
the coefficient table that the Levels, Qualifications and Benefits variables have the
same impact as the Salary variable with a positive coefficient, showing that when
these factors increase, the salary of surveyors also increases. First of all, the Levels
variable has the largest positive impact with coefficient = 0.643. Next is the Benefits
variable with coefficient = 0.308. So it can be understood that the higher the level of
work experience, the higher the salary, and the more benefits a company has, the
Career: The relationship between Careers and Salary is very weak and in the
opposite direction, it shows that in the future, when changing to another job, the
salary will tend to decrease. This means that when there is advancement, as your
career progresses, your salary tends to decrease, or conversely, when your career does
not develop, your salary may increase. This may reflect some phenomenon such as
people whose careers are not developing can receive high salaries, due to many years
of working experience. Next, there is the variable with negative coefficient which is
career. The negative coefficient effect shows that job and degree/certificate quality
career and salary may indicate a negative relationship between these two variables.
and in the same direction, showing that as qualifications increase, income will also
increase and in the future the income of those who study to improve their
coefficient of 0.503 with salary, which shows a positive relationship between these
48
two variables. That means, as the level of education increases, the salary also tends to
increase, and vice versa. This reflects the view that higher levels of education are
often associated with greater productivity and job performance, leading to higher
wages.
Benefits: This means that as benefits increase, salaries also tend to increase slightly.
However, this relationship is not strong enough to conclude that benefits are the main
On the other hand, another study found that only salary, not benefits, significantly
predicted job satisfaction among higher education employees. This shows that while
benefits can have an influence on salary, they are not always the most important
factor. The correlation coefficient of 0.308 between benefits and salary shows a
o Insight: The model predicts that higher levels of work experience correlate with
higher salaries.
49
2. Job Change and Salary:
of changing jobs. Negotiating a competitive salary based on current market rates and
demonstrating the value of skills and experience can mitigate potential salary
decreases.
o Insight: Higher levels of education are associated with greater productivity and lead to
higher wages.
levels as a factor in salary determination, aligning compensation with the value added
o Insight: Benefits tend to increase with wages, and higher wages lead to better benefits.
benefits that align with employees' needs and expectations. As wages increase, revisit
offering.
compensation strategy:
50
Design comprehensive compensation packages that reflect the interplay between
companies can foster a motivated and productive workforce while remaining competitive in
the market.
o Continuously seek opportunities to gain relevant work experience and skills that
are in demand.
o Research and benchmark your salary against industry standards to ensure you are
o Research and understand current market salary trends and ranges for your
o Highlight your skills, achievements, and unique value proposition during job
o Negotiate not just base salary but also consider other benefits and perks that can
offset a potential decrease in salary (e.g., signing bonuses, stock options, flexible
51
work arrangements).
o Evaluate the total compensation package, including benefits, to assess the overall
o Pursue higher education or certifications that are relevant to your field and career
goals.
o Seek out roles or projects that specifically value and require higher levels of
education.
o Negotiate salary adjustments or promotions that reflect the added value of your
educational achievements.
o Understand how benefits are structured and linked to your salary level within
o Negotiate for benefits that align with your personal and financial needs, such as
opportunities.
o Consider the long-term implications of benefits like retirement plans and stock
options, which can enhance your overall compensation beyond base salary.
52
By strategically approaching these recommendations, individuals can optimize their total
compensation package while navigating the dynamics of salary, benefits, experience, and
education as predicted by the model. It's crucial to tailor these strategies to individual career
goals, market conditions, and organizational contexts for the best outcomes.
CONCLUSION
The goal of this report is to analyze, understand, and decode from qualifications to benefits,
clarifying the relationship between career paths and employee salaries. Based on the analysis
of the "Data Science Job Postings and Salaries Dataset" regarding occupations, annual
salaries, average salaries, job levels, and required skills, we have performed data cleaning,
recommendations.
One of the most significant findings is the substantial impact of work experience on salary
levels. Our analysis revealed a positive correlation between work experience and employee
Additionally, the report shows that frequent job changes can lead to a decrease in salary. This
emphasizes the need for careful career decisions, considering the potential financial impacts
Furthermore, our analysis clarified the positive relationship between benefits and salaries,
indicating that higher salaries are often accompanied by better benefit packages. This
53
underscores the importance of considering both salary and benefits when evaluating job
Based on these significant findings, we also provide targeted strategies for individuals to
maximize their salaries and benefits, including skill development, effective negotiation
benefits.
salaries and benefits, assessing the necessary job skills in the data science field. The findings
from this research will be invaluable for students preparing for their future careers, helping
them better understand the skills they need to develop, and the salaries and benefits they can
expect when entering the job market in the data science industry.
APPENDIX
CODE 2: file:///C:/Users/Admin/Downloads/Untitled15-1%20(2).html
(PANDAS)
CONTRIBUTION
54
Student’s name Student ID Contribution Evaluate
Contribution
Dataset
-Methodology
-Chapter 1: Descriptive A
Variables, Descriptive
Variables
-Chapter 2: Relationship
level
-Conclusion
Code 3)
-Assign work
formatting
-Slides
55
Nguyễn Lê Kiều Trang 23070838 -Introduction: Background
of the Analysis
-Chapter 1: Data
Preprocessing A
-Chapter 2: Overview of
salary level
(Appendix: Code 1)
-Slides
Salaries, Relationship
Evaluation, Interpreting
Model Results
(Appendix: Code 2)
56
Nguyễn Ánh Nguyệt 23070990 -Chapter 2: Overview of
Salaries, Relationship B
Evaluation, Interpreting
Model Results
(Appendix: Code 2)
benefits
-Slides
-Paper formatting
57