0% found this document useful (0 votes)

2 views8 pages

Python Codes

Python codes

Uploaded by

sumedh.mba24070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views8 pages

Python Codes

Python codes

Uploaded by

sumedh.mba24070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

IIM Kashipur

Master of Business Administration

Analytics in Business, Term III, Academic Year 2024-26

AB Assignment 4

Section A
Submitted to: Prof. Rajiv Kumar
Submitted on: 09th February 2025

(Group 6)
MBA24040 – Yash Kulkarni
MBA24012 – Ashutosh Singh
MBA24019 – Devansh Rawat
MBA24307 – Rigzin Angmo
MBA24070 – Sumedh Inamdar
MBA24005 – Akash Bhanja
Q1. Clean the data (You may find 999, 998 values, which need to be taken care. You can
exclude those responses)

Code:

import pandas as pd
import matplotlib.pyplot as plt
file_path = "MBA Starting Salaries.xlsx"
xls = pd.ExcelFile(file_path)
df = pd.read_excel(xls, sheet_name="Salaries")
# Define columns to clean (assuming numerical values should not contain 999 or 998)
columns_to_clean = ["salary", "satis"] # You can add more columns if needed
# Remove rows where any of the selected columns contain 999 or 998
cleaned_df = df[~df[columns_to_clean].isin([999, 998]).any(axis=1)]
# Plotting the cleaned data
plt.figure(figsize=(8, 5))
plt.hist(cleaned_df['salary'], bins=20, color='orange', edgecolor='black')
plt.xlabel('Salary')
plt.ylabel('Frequency')
plt.title('Distribution of Cleaned Salaries')
plt.show()
Output

1
Q2. Is there any variable (e.g., age, gender, language spoken, work experience) that
affect how much a student can expect to make?
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load and clean data
file_path = "MBA Starting Salaries.xlsx"
df = pd.read_excel(file_path, sheet_name="Salaries")
df = df[~df[["salary", "satis"]].isin([999, 998]).any(axis=1)]
# Correlation analysis
salary_correlation = df.corr()["salary"].sort_values(ascending=False)
# Visualization: Boxplot of salary by gender
plt.figure(figsize=(8, 5))
sns.boxplot(x=df["sex"], y=df["salary"])
plt.xlabel("Gender (1=Male, 2=Female)")
plt.ylabel("Salary")
plt.title("Salary Distribution by Gender")
plt.show()
# Display correlation results
salary_correlation

2
Output:

The correlation analysis reveals that most academic and experience-related factors do not
strongly influence starting salaries.
• No significant correlation exists between GMAT scores, GPA, work experience, and
salary, indicating that higher academic performance or prior experience does not
necessarily lead to higher pay.
• A weak but significant positive correlation is found between satisfaction and salary,
suggesting that students who earn higher salaries may feel more satisfied with the MBA
program.
• Work experience has a weak positive correlation with Spring GPA but a weak negative
correlation with GMAT scores, possibly indicating that students with more experience
scored slightly lower on GMAT but performed slightly better academically during the
program.
• GPA (Spring & Fall) shows a strong positive correlation, meaning students who perform
well in one semester tend to maintain their performance.

3
Q3. Is there any correlations exists between the variables (e.g., GMAT Total Score,
Salary, Spring Grades, Years of Work Experience, Satisfaction with
program)?
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as st
# Load and clean data
df = pd.read_excel("MBA Starting Salaries.xlsx", sheet_name="Salaries").dropna()
df = df[~df.isin([999, 998]).any(axis=1)]
#All the correlation and p-values
corr_vars = [
('gmat_tot', 'satis'),
('salary', 'work_yrs'),
('s_avg', 'salary'),
('f_avg', 'salary'),
('gmat_tot', 'salary'),
('satis', 'salary'),
('work_yrs', 's_avg'),
('gmat_tot', 'work_yrs'),
('s_avg', 'f_avg'),
('satis', 'work_yrs')
]
for var1, var2 in corr_vars:
corr, p_val = st.pearsonr(df[var1], df[var2])
print(f"Correlation between {var1} and {var2}: r = {corr:.2f}, p-value = {p_val:.4f}")
# Correlation and visualization
sns.heatmap(df[["gmat_tot", "salary", "s_avg", "work_yrs", "satis"]].corr(), annot=True,
cmap="coolwarm", fmt=".2f")
plt.show()

4
Output:

• Satisfaction & Salary: A weak but significant positive correlation (r = 0.1564, p =

0.0298).
• Work Experience & Spring GPA: Weak positive correlation (r = 0.1591, p = 0.0270).
• GMAT & Work Experience: Weak negative correlation (r = -0.1737, p = 0.0157).
• No relationship between GMAT Score & Salary or GPA & Salary.

Overall, the findings suggest that academic performance and work experience have minimal
impact on starting salaries.

5
Q4. Can starting salaries be Predicted Based on Graduating Data (i.e., GPA,
gender, work experience)?
Code:
import pandas as pd
import statsmodels.api as sm
# Load and clean data
df = pd.read_excel("MBA Starting Salaries.xlsx" , sheet_name="Salaries").dropna()
df = df[~df.isin([999, 998]).any(axis=1)]
#Building regression model
x=df[["s_avg", "sex", "work_yrs"]]
x=sm.add_constant(x)
print(x)
y=df[["salary"]]
print(y)
model=sm.OLS(y,x).fit()
print(model.summary())

Output:

6
The regression model aimed to predict starting salary based on GPA (Spring), gender, and
work experience, but the results indicate poor predictive power.
• Adjusted R² = -0.0014 → The model performs worse than a baseline model (mean
salary prediction) and explains virtually 0% of salary variation.
• P-values for GPA, gender, and work experience are all > 0.05, meaning none of these
factors have a statistically significant impact on salary.
• Negative Adjusted R² suggests that including these variables adds no real explanatory
power and may even introduce unnecessary complexity.
Conclusion
The regression analysis confirms that GPA, gender, and work experience do not significantly
predict starting salaries. The negative Adjusted R² suggests that salary variations are likely
driven by other external factors, such as industry trends, negotiation skills, job location, or
personal networking.
This aligns with the correlation analysis, reinforcing that traditional academic and experience
metrics are poor indicators of salary outcomes.

Frequencies
No ratings yet
Frequencies
14 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
m2023hrm048 QRM Ete
No ratings yet
m2023hrm048 QRM Ete
9 pages
Salary Prediction
No ratings yet
Salary Prediction
32 pages
MBA Group Case
No ratings yet
MBA Group Case
5 pages
Empid Name
No ratings yet
Empid Name
2 pages
Exploratory Data Analysis and Preprocessing Pipeline
No ratings yet
Exploratory Data Analysis and Preprocessing Pipeline
18 pages
Excel Linear Regression Guide
No ratings yet
Excel Linear Regression Guide
22 pages
Group 3
No ratings yet
Group 3
21 pages
EXAM PAPER FORMAT Statistics Question SET A 1
No ratings yet
EXAM PAPER FORMAT Statistics Question SET A 1
11 pages
Pps Ui22cs57lab 10
No ratings yet
Pps Ui22cs57lab 10
17 pages
Project 144520
No ratings yet
Project 144520
2 pages
Group 3
No ratings yet
Group 3
18 pages
Business Analytics Executive MBA Exercises
No ratings yet
Business Analytics Executive MBA Exercises
5 pages
Managerial Statistics
100% (1)
Managerial Statistics
12 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Data Analytics Presentation
No ratings yet
Data Analytics Presentation
39 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
MBA Starting Salaries - Class Demonstration
No ratings yet
MBA Starting Salaries - Class Demonstration
26 pages
A Model To Predict Pay Scale Fixation in Job Marke
No ratings yet
A Model To Predict Pay Scale Fixation in Job Marke
6 pages
DR Crsitina Mary Alexander (HRA&M) Virtual Lab Manual
No ratings yet
DR Crsitina Mary Alexander (HRA&M) Virtual Lab Manual
19 pages
Gender Age Prior - Experience Beta - Experience Education Annual - Salary
No ratings yet
Gender Age Prior - Experience Beta - Experience Education Annual - Salary
10 pages
DAV Using Spreadsheet GE Syllabus
No ratings yet
DAV Using Spreadsheet GE Syllabus
6 pages
Q7S3
No ratings yet
Q7S3
1 page
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
No ratings yet
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
10 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
Data Science Jobs & Salaries Report
No ratings yet
Data Science Jobs & Salaries Report
8 pages
QNT 561 Assignment Statistics Concepts
No ratings yet
QNT 561 Assignment Statistics Concepts
5 pages
NSE Project
No ratings yet
NSE Project
11 pages
Maxbox Starter139 Top5 Data Diagram Types
No ratings yet
Maxbox Starter139 Top5 Data Diagram Types
4 pages
Data Science Salaries by Role and Location
No ratings yet
Data Science Salaries by Role and Location
253 pages
Data
No ratings yet
Data
17 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Lecture 01
No ratings yet
Lecture 01
26 pages
Questions - Two Variable
No ratings yet
Questions - Two Variable
3 pages
Questions - Two Variable
No ratings yet
Questions - Two Variable
3 pages
MBA Salary Case Study
No ratings yet
MBA Salary Case Study
18 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
QTA Interpretation
No ratings yet
QTA Interpretation
17 pages
Weka 2
No ratings yet
Weka 2
54 pages
Salary Analysis Using ANOVA & PCA
No ratings yet
Salary Analysis Using ANOVA & PCA
16 pages
Data Science Project
No ratings yet
Data Science Project
19 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
MGT555 Individual Assignment 2
No ratings yet
MGT555 Individual Assignment 2
9 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
Descriptive Analytics and ANOVA
No ratings yet
Descriptive Analytics and ANOVA
31 pages
Salary Prediction
No ratings yet
Salary Prediction
28 pages
Stata Instructions
No ratings yet
Stata Instructions
7 pages
Assignment Ds Midterm
No ratings yet
Assignment Ds Midterm
2 pages
Adult Income Prediction Using Machine Learning Algorithms: Submitted by
No ratings yet
Adult Income Prediction Using Machine Learning Algorithms: Submitted by
9 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Charts in Spreadsheet:: Exam Results - Subject Marks Student Name English Science Mathematics
No ratings yet
Charts in Spreadsheet:: Exam Results - Subject Marks Student Name English Science Mathematics
4 pages
Assignment 2 297
No ratings yet
Assignment 2 297
6 pages
Module 6B Regression - Modelling Possibilities
No ratings yet
Module 6B Regression - Modelling Possibilities
62 pages
Advanced Excel
No ratings yet
Advanced Excel
48 pages
Employee Analysis
No ratings yet
Employee Analysis
19 pages
Hypothesis Testing Part 1
No ratings yet
Hypothesis Testing Part 1
31 pages
Data Analysts: Visualize Correlations
No ratings yet
Data Analysts: Visualize Correlations
2 pages
Yamane, 1973 - Statistics An Introductory Analysis
80% (5)
Yamane, 1973 - Statistics An Introductory Analysis
8 pages
Intro to Data Statistics
No ratings yet
Intro to Data Statistics
9 pages
Analytical Chemistry. - Curvas de Calibracion, Bondad de Ajuste, Ponderación
No ratings yet
Analytical Chemistry. - Curvas de Calibracion, Bondad de Ajuste, Ponderación
8 pages
MTP 4 32 QUESTIONS Math
No ratings yet
MTP 4 32 QUESTIONS Math
17 pages
JMP Doe (Design of Experiment) Guide-SAS Institute Inc (2004) PDF
No ratings yet
JMP Doe (Design of Experiment) Guide-SAS Institute Inc (2004) PDF
196 pages
Tutorial Stat 322 PDF
No ratings yet
Tutorial Stat 322 PDF
58 pages
Psychometric Test Properties Guide
No ratings yet
Psychometric Test Properties Guide
44 pages
Tutorial 6
No ratings yet
Tutorial 6
2 pages
Quasi Experimental Design
No ratings yet
Quasi Experimental Design
11 pages
Math SL Statistics Exam Questions
No ratings yet
Math SL Statistics Exam Questions
11 pages
Lectures PowerPoints PDF
No ratings yet
Lectures PowerPoints PDF
459 pages
VECM Guide for Econometrics Students
100% (3)
VECM Guide for Econometrics Students
9 pages
Incanter Cheat Sheet
No ratings yet
Incanter Cheat Sheet
1 page
Simultaneous Equation Model
No ratings yet
Simultaneous Equation Model
71 pages
(Ebook PDF) Statistics For Managers Using Microsoft Excel 7th Editionpdf Download
100% (1)
(Ebook PDF) Statistics For Managers Using Microsoft Excel 7th Editionpdf Download
54 pages
OMBC106
No ratings yet
OMBC106
3 pages
Bootstrap Methods for Standard Error
No ratings yet
Bootstrap Methods for Standard Error
7 pages
Precision Statements in UOP Methods
No ratings yet
Precision Statements in UOP Methods
15 pages
Directions For One-Way ANOVA in Microsoft Excel 2007: Part 1: Making The Data Analysis Tab Visible
No ratings yet
Directions For One-Way ANOVA in Microsoft Excel 2007: Part 1: Making The Data Analysis Tab Visible
4 pages
Cross-Validation Techniques Guide
No ratings yet
Cross-Validation Techniques Guide
10 pages
Jerome en
No ratings yet
Jerome en
10 pages
Ch15S - Sampling For TOC & STOT 2020
No ratings yet
Ch15S - Sampling For TOC & STOT 2020
13 pages
Position of Fovea Palatinae Relative To The Vibrating Line in Various Soft Palate Classifications Among Jordanian Edentulous Population
No ratings yet
Position of Fovea Palatinae Relative To The Vibrating Line in Various Soft Palate Classifications Among Jordanian Edentulous Population
9 pages
Business Statistics UST Past Year Quiz1
No ratings yet
Business Statistics UST Past Year Quiz1
10 pages
Nonparametric Tests in R
No ratings yet
Nonparametric Tests in R
5 pages
Appliedsem Amos
No ratings yet
Appliedsem Amos
367 pages
New Tukey Table
No ratings yet
New Tukey Table
3 pages
Solutions RVCE AIML Test 2
No ratings yet
Solutions RVCE AIML Test 2
5 pages

Python Codes

Uploaded by

Python Codes

Uploaded by

IIM Kashipur

Master of Business Administration

• Satisfaction & Salary: A weak but significant positive correlation (r = 0.1564, p =

You might also like