Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views2 pages

Linked Int Question Experience

The document outlines a series of tasks related to data manipulation using Spark DataFrames, including creating an employee database, filtering employees based on salary, calculating averages, and handling CSV and JSON files. It covers operations such as grouping, counting, sorting, and adding new columns, as well as reading data from external sources. Additionally, it includes tasks for analyzing cricket player statistics and city populations in India.

Uploaded by

parth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

Linked Int Question Experience

The document outlines a series of tasks related to data manipulation using Spark DataFrames, including creating an employee database, filtering employees based on salary, calculating averages, and handling CSV and JSON files. It covers operations such as grouping, counting, sorting, and adding new columns, as well as reading data from external sources. Additionally, it includes tasks for analyzing cricket player statistics and city populations in India.

Uploaded by

parth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Create a DataFrame for an Indian Employee Database

data = [
(1, "Amit", "IT", 60000),
(2, "Priya", "HR", 55000),
(3, "Rahul", "Finance", 75000),
(4, "Sneha", "IT", 80000),
(5, "Karan", "HR", 65000)]
columns = ["EmpID", "Name", "Department", "Salary"]
df = spark.createDataFrame(data, columns)

Task: Display the schema and first 3 rows.

2. Filter Employees Earning More than 70,000


Task: Write a query to filter employees earning more than ₹70,000.

3. Calculate Average Salary per Department


Task: Use `groupBy` to get the average salary for each department.

4. Find Employees whose Name Starts with 'A'


Task: Filter employees whose names start with the letter 'A'.

5. Count the Number of Employees per Department


Task: Use `groupBy` and `count()` to find the number of employees in each
department.

6. Add a New Column for Tax Deduction (10% of Salary)


Task: Add a new column `Tax` that deducts 10% from `Salary`.

7. Sort Employees by Salary in Descending Order


Task: Display employees sorted in descending order of salary.

8. Get the Second Highest Salary


Task: Find the second highest salary without using `LIMIT` and `OFFSET`.

9. Get Employees Who are in the HR or IT Department


Task: Filter records where the department is either "HR" or "IT".

10. Find the Total Salary Paid by the Company


Task: Calculate the sum of all salaries.

11. Read a CSV File of Cricket Players


Sample CSV (`players.csv`):

Player,Country,Runs,Wickets
Virat Kohli,India,12000,4
Rohit Sharma,India,11000,8
Jasprit Bumrah,India,1200,200
Steve Smith,Australia,9500,20

Task: Read this CSV file into a DataFrame and display its contents.

12. Find the Player with Maximum Runs


Task: Find the player who has scored the maximum runs.

13. Find the Average Runs Scored by Indian Players


Task: Filter players from "India" and calculate the average runs scored.

14. Get Players Who Have Taken More than 50 Wickets


Task: Filter players who have taken more than 50 wickets.
15. Read a JSON File Containing Indian Cities Population
Sample JSON (`cities.json`):
json
[
{"City": "Mumbai", "State": "Maharashtra", "Population": 20000000},
{"City": "Delhi", "State": "Delhi", "Population": 18000000},
{"City": "Bangalore", "State": "Karnataka", "Population": 12000000},
{"City": "Hyderabad", "State": "Telangana", "Population": 10000000}
]
Task: Read this JSON file into a DataFrame and display its contents.

16. Find Cities with a Population Greater than 15 Million


Task: Filter cities with a population greater than 15 million.

17. Calculate Total Population per State


Task : Group by `State` and sum the `Population`.

18. Find the State with the Highest Total Population


Task: Identify which state has the highest total population.

19. Convert a DataFrame to Pandas


Task: Convert the `df` DataFrame into a Pandas DataFrame.

You might also like