Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views21 pages

Spark 1

The document outlines a series of Spark interview questions focused on data manipulation using DataFrames in Scala and PySpark. It includes tasks such as classifying student grades, creating employee age and salary groups, categorizing purchase amounts, and filtering data based on specific criteria. Each question is accompanied by a dataset and expected outputs, aimed at assessing candidates' proficiency in Spark SQL and DataFrame operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views21 pages

Spark 1

The document outlines a series of Spark interview questions focused on data manipulation using DataFrames in Scala and PySpark. It includes tasks such as classifying student grades, creating employee age and salary groups, categorizing purchase amounts, and filtering data based on specific criteria. Each question is accompanied by a dataset and expected outputs, aimed at assessing candidates' proficiency in Spark SQL and DataFrame operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Most Asked Spark Interview

Questions
12LPA - 20LPA

Question 1:
1. Student Grade Classification

Problem:

You have a Data Frame of students with the following columns: student_id, name, score, and
subject.

Create a new column grade based on the score:

o 'A' if score >= 90

o 'B' if 80 <= score < 90

o 'C' if 70 <= score < 80

o 'D' if 60 <= score < 70

o 'F' if score < 60

Data Set
student_id name score subject

1 Alice 92 Math

2 Bob 85 Math

3 Carol 77 Science

4 Dave 65 Science

5 Eve 50 Math

6 Frank 82 Science
Scala Spark

Spark - SQL
PySpark

Output -
Question 2:
You have a DataFrame employees with columns: employee_id, name, age, and salary.

Create a new column age_group based on age:

o 'Young' if age < 30

o 'Mid' if 30 <= age <= 50

o 'Senior' if age > 50

Create a new column salary_range based on salary:

o 'High' if salary > 100000

o 'Medium' if 50000 <= salary <= 100000

o 'Low' if salary < 50000

Filter employees whose name starts with 'J'.

Filter employees whose name ends with 'e'.

Data Set -
data = [

(1, "John", 28, 60000),

(2, "Jane", 32, 75000),

(3, "Mike", 45, 120000),

(4, "Alice", 55, 90000),

(5, "Steve", 62, 110000),

(6, "Claire", 40, 40000)

]
Scala Spark -

Spark - SQL
PySpark -

Output -
Question 3:
You have a DataFrame purchase_history with columns: purchase_id, customer_id,
purchase_amount, and purchase_date.

Create a new column purchase_category based on purchase_amount:

o 'Large' if purchase_amount > 2000

o 'Medium' if 1000 <= purchase_amount <= 2000

o 'Small' if purchase_amount < 1000

Filter purchases that occurred in 'January 2024'

Data Set -
[(1,1,2500,"2024-01-05"),

(2,2,1500,"2024-01-15"),

(3,3,500,"2024-02-20"),

(4,4,2200,"2024-03-01"),

(5,5,900,"2024-01-25"),

(6,6,3000,"2024-03-12")]
Scala Spark

Spark - SQL
PySpark

Output -
Question 4:

Data set -

val employees = List(

(1, "John", "2020-01-01", "active"),

(2, "Jane", "2020-06-01", "inactive"),

(3, "Mike", "2020-03-01", "active"),

(4, "Alice", "2020-09-01", "inactive"),

(5, "Steve", "2020-02-01", "active")

)
Scala Spark -

PySpark -
Output -
Question 5 -

Data Set -
data = [

(1,"Order-001","2022-01-01",100.0),

(2,"Order-002","2022-06-01",200.0),

(3,"Order-003","2022-03-01",50.0),

(4,"Order-004","2022-09-01",160.0),

(5,"Order-005","2022-02-01",250.0)

]
Scala Spark -

PySpark -

Output -
Created By :

Harshavardhana I

Data Engineer

You might also like