0% found this document useful (0 votes)

17 views21 pages

Spark 1

The document outlines a series of Spark interview questions focused on data manipulation using DataFrames in Scala and PySpark. It includes tasks such as classifying student grades, creating employee age and salary groups, categorizing purchase amounts, and filtering data based on specific criteria. Each question is accompanied by a dataset and expected outputs, aimed at assessing candidates' proficiency in Spark SQL and DataFrame operations.

Uploaded by

vidhyasrividhya10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views21 pages

Spark 1

Uploaded by

vidhyasrividhya10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Most Asked Spark Interview

Questions
12LPA - 20LPA

Question 1:
1. Student Grade Classification

Problem:

You have a Data Frame of students with the following columns: student_id, name, score, and
subject.

Create a new column grade based on the score:

o 'A' if score >= 90

o 'B' if 80 <= score < 90

o 'C' if 70 <= score < 80

o 'D' if 60 <= score < 70

o 'F' if score < 60

Data Set
student_id name score subject

1 Alice 92 Math

2 Bob 85 Math

3 Carol 77 Science

4 Dave 65 Science

5 Eve 50 Math

6 Frank 82 Science
Scala Spark

Spark - SQL
PySpark

Output -
Question 2:
You have a DataFrame employees with columns: employee_id, name, age, and salary.

Create a new column age_group based on age:

o 'Young' if age < 30

o 'Mid' if 30 <= age <= 50

o 'Senior' if age > 50

Create a new column salary_range based on salary:

o 'High' if salary > 100000

o 'Medium' if 50000 <= salary <= 100000

o 'Low' if salary < 50000

Filter employees whose name starts with 'J'.

Filter employees whose name ends with 'e'.

Data Set -
data = [

(1, "John", 28, 60000),

(2, "Jane", 32, 75000),

(3, "Mike", 45, 120000),

(4, "Alice", 55, 90000),

(5, "Steve", 62, 110000),

(6, "Claire", 40, 40000)

]
Scala Spark -

Spark - SQL
PySpark -

Output -
Question 3:
You have a DataFrame purchase_history with columns: purchase_id, customer_id,
purchase_amount, and purchase_date.

Create a new column purchase_category based on purchase_amount:

o 'Large' if purchase_amount > 2000

o 'Medium' if 1000 <= purchase_amount <= 2000

o 'Small' if purchase_amount < 1000

Filter purchases that occurred in 'January 2024'

Data Set -
[(1,1,2500,"2024-01-05"),

(2,2,1500,"2024-01-15"),

(3,3,500,"2024-02-20"),

(4,4,2200,"2024-03-01"),

(5,5,900,"2024-01-25"),

(6,6,3000,"2024-03-12")]
Scala Spark

Spark - SQL
PySpark

Output -
Question 4:

Data set -

val employees = List(

(1, "John", "2020-01-01", "active"),

(2, "Jane", "2020-06-01", "inactive"),

(3, "Mike", "2020-03-01", "active"),

(4, "Alice", "2020-09-01", "inactive"),

(5, "Steve", "2020-02-01", "active")

)
Scala Spark -

PySpark -
Output -
Question 5 -

Data Set -
data = [

(1,"Order-001","2022-01-01",100.0),

(2,"Order-002","2022-06-01",200.0),

(3,"Order-003","2022-03-01",50.0),

(4,"Order-004","2022-09-01",160.0),

(5,"Order-005","2022-02-01",250.0)

]
Scala Spark -

PySpark -

Output -
Created By :

Harshavardhana I

Data Engineer

XII IP Practical List 2025-26 - KV1UDR
No ratings yet
XII IP Practical List 2025-26 - KV1UDR
4 pages
Pyspark Interview: Abhinav Singh
No ratings yet
Pyspark Interview: Abhinav Singh
275 pages
Quantiphi Interview
No ratings yet
Quantiphi Interview
2 pages
Pyspark Hands On
No ratings yet
Pyspark Hands On
189 pages
Pyspark IQ FREE Guide
100% (1)
Pyspark IQ FREE Guide
57 pages
Py Spark
No ratings yet
Py Spark
10 pages
07 Structured Data Processing
No ratings yet
07 Structured Data Processing
91 pages
SP 3
No ratings yet
SP 3
18 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Solutions 1742312993
No ratings yet
Solutions 1742312993
14 pages
Pyspark Basics
No ratings yet
Pyspark Basics
74 pages
DataFrame Practical Questions
No ratings yet
DataFrame Practical Questions
8 pages
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
BCM Arya Model Sample Paper Informatics Practices
No ratings yet
BCM Arya Model Sample Paper Informatics Practices
5 pages
30 Pyspark Coding Questions
No ratings yet
30 Pyspark Coding Questions
9 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
3 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
22 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
10 pages
Apache Spark
No ratings yet
Apache Spark
5 pages
Shivansh Rawat IP Practical File XII
No ratings yet
Shivansh Rawat IP Practical File XII
43 pages
Nitin
No ratings yet
Nitin
41 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Pyspark Tutorial 3
No ratings yet
Pyspark Tutorial 3
5 pages
PySpark Data Frame Questions PDF
100% (2)
PySpark Data Frame Questions PDF
57 pages
SP 6
No ratings yet
SP 6
14 pages
Pyspark Questions
No ratings yet
Pyspark Questions
2 pages
CS 2018 042
No ratings yet
CS 2018 042
8 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
PySpark Cheatsheet - Elaborate
No ratings yet
PySpark Cheatsheet - Elaborate
14 pages
4 - Spark SQL
No ratings yet
4 - Spark SQL
58 pages
Pyspark Syntax Using Simple Examples
No ratings yet
Pyspark Syntax Using Simple Examples
28 pages
Python Pandas Worksheet
No ratings yet
Python Pandas Worksheet
3 pages
Pyspark Coding Questions From StrataScratch Platform
No ratings yet
Pyspark Coding Questions From StrataScratch Platform
23 pages
Grade 12 Pandas & SQL Practical File
No ratings yet
Grade 12 Pandas & SQL Practical File
36 pages
Mod5 Bda
No ratings yet
Mod5 Bda
9 pages
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
Spark Test Que
No ratings yet
Spark Test Que
3 pages
Day11 Notes
No ratings yet
Day11 Notes
2 pages
Informatics Practices Practical List22-2323
No ratings yet
Informatics Practices Practical List22-2323
6 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
Python & SQL Practical Worksheet
No ratings yet
Python & SQL Practical Worksheet
7 pages
Interview
No ratings yet
Interview
2 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Pandas and SQL Programming Tasks
No ratings yet
Pandas and SQL Programming Tasks
37 pages
Grade 12 IP - Practical File Questions 2024-2025
No ratings yet
Grade 12 IP - Practical File Questions 2024-2025
6 pages
Interviewsss
No ratings yet
Interviewsss
4 pages
Basic DataFrame Operation
No ratings yet
Basic DataFrame Operation
11 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
CFE
No ratings yet
CFE
5 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
PySpark DataFrame Operations Guide
No ratings yet
PySpark DataFrame Operations Guide
10 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
Datasets and Dataframes: Org - Apache.Spark - Sql.Sparksession
No ratings yet
Datasets and Dataframes: Org - Apache.Spark - Sql.Sparksession
17 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Pyspark and SQL
No ratings yet
Pyspark and SQL
57 pages

Spark 1

Uploaded by

Spark 1

Uploaded by

Most Asked Spark Interview

Create a new column grade based on the score:

o 'A' if score >= 90

o 'B' if 80 <= score < 90

o 'C' if 70 <= score < 80

o 'D' if 60 <= score < 70

o 'F' if score < 60

Create a new column age_group based on age:

o 'Young' if age < 30

o 'Mid' if 30 <= age <= 50

o 'Senior' if age > 50

Create a new column salary_range based on salary:

o 'High' if salary > 100000

o 'Medium' if 50000 <= salary <= 100000

o 'Low' if salary < 50000

Filter employees whose name starts with 'J'.

Filter employees whose name ends with 'e'.

(1, "John", 28, 60000),

(2, "Jane", 32, 75000),

(3, "Mike", 45, 120000),

(4, "Alice", 55, 90000),

(5, "Steve", 62, 110000),

(6, "Claire", 40, 40000)

Create a new column purchase_category based on purchase_amount:

o 'Large' if purchase_amount > 2000

o 'Medium' if 1000 <= purchase_amount <= 2000

o 'Small' if purchase_amount < 1000

Filter purchases that occurred in 'January 2024'

val employees = List(

(1, "John", "2020-01-01", "active"),

(2, "Jane", "2020-06-01", "inactive"),

(3, "Mike", "2020-03-01", "active"),

(4, "Alice", "2020-09-01", "inactive"),

(5, "Steve", "2020-02-01", "active")

You might also like