0% found this document useful (0 votes)

10 views6 pages

Day 1 1720441733

The document outlines a PySpark program to identify pairs of actors and directors who have collaborated at least three times. It includes the problem statement, input data structure, expected output, and the PySpark code to achieve the solution. The final output shows that actor 1 has worked with director 1 three times.

Uploaded by

abhiramkakinada2409

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

Day 1 1720441733

Uploaded by

abhiramkakinada2409

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

PYSPARK LEARNING HUB

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB

Step - 1 : Problem Statement

Actors and Directors Who Cooperated At Least

Three Times
Write a pyspark Program for a report that provides the pairs
(actor_id, director_id) where the actor has cooperated with
the director at least 3 times.

Difficult Level : EASY

DataFrame:
schema = StructType([
StructField("ActorId",IntegerType(),True),
StructField("DirectorId",IntegerType(),True),
StructField("timestamp",IntegerType(),True)
])

data = [
(1, 1, 0),
(1, 1, 1),
(1, 1, 2),
(1, 2, 3),
(1, 2, 4),
(2, 1, 5),
(2, 1, 6)
]

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB

Step - 2 : Identifying The Input Data And Expected

Output
INPUT
INPUT
ACTOR_ID DIRECTOR_ID TIMESTAMP
1 1 0
1 1 1
1 1 2
1 2 3
1 2 4
2 1 5
2 1 6

OUTPUT
OUTPUT
ACTOR_ID DIRECTOR_ID
1 1

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB

Step - 3 : Writing the pyspark code to solve

the problem
# Creating Spark Session

from pyspark.sql import SparkSession

from pyspark.sql.types import StructType,StructField,IntegerType

#creating spark session

spark = SparkSession. \
builder. \
config('spark.shuffle.useOldFetchProtocol', 'true'). \
config('spark.ui.port','0'). \
config("spark.sql.warehouse.dir", "/user/itv008042/warehouse"). \
enableHiveSupport(). \
master('yarn'). \
getOrCreate()

schema = StructType([
StructField("ActorId",IntegerType(),True),
StructField("DirectorId",IntegerType(),True),
StructField("timestamp",IntegerType(),True)
])

data = [
(1, 1, 0),
(1, 1, 1),
(1, 1, 2),
(1, 2, 3),
(1, 2, 4),
(2, 1, 5),
(2, 1, 6)
]

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB

df=spark.createDataFrame(data,schema)
df.show()

df_group=df.groupBy('ActorId','DirectorId').count()
df_group.show()

+-------+----------+-----+
|ActorId|DirectorId|count|
+-------+----------+-----+
| 1| 2| 2|
| 1| 1| 3|
| 2| 1| 2|
+-------+----------+-----+

df_group.filter(df_group['count'] >= 3).show()

+-------+----------+-----+
|ActorId|DirectorId|count|
+-------+----------+-----+
| 1| 1| 3|
+-------+----------+-----+

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR

Day 1 1720441733
No ratings yet
Day 1 1720441733
6 pages
Pyspark Hands On
No ratings yet
Pyspark Hands On
189 pages
Pyspark Gold Level Practice Resource - pdf-1
No ratings yet
Pyspark Gold Level Practice Resource - pdf-1
3 pages
Pyspark Coding Questions From StrataScratch Platform
No ratings yet
Pyspark Coding Questions From StrataScratch Platform
23 pages
Spark Structured API Solutions
No ratings yet
Spark Structured API Solutions
10 pages
Indrani Cheat Sheet
No ratings yet
Indrani Cheat Sheet
2 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
PySpark Cheatsheet - Elaborate
No ratings yet
PySpark Cheatsheet - Elaborate
14 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
TCS Rejected Many Due To Weak PySpark Logic!?
No ratings yet
TCS Rejected Many Due To Weak PySpark Logic!?
7 pages
Journal
No ratings yet
Journal
47 pages
PySpark DataFrame Merging Guide
No ratings yet
PySpark DataFrame Merging Guide
42 pages
Pyspark
No ratings yet
Pyspark
44 pages
Spark Test Que
No ratings yet
Spark Test Que
3 pages
Spark Cheatsheet
No ratings yet
Spark Cheatsheet
9 pages
Basic DataFrame Operation
No ratings yet
Basic DataFrame Operation
11 pages
Day 19 Master Pyspark
No ratings yet
Day 19 Master Pyspark
2 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Interview
No ratings yet
Interview
2 pages
PySpark Cheatsheet: Key Operations
No ratings yet
PySpark Cheatsheet: Key Operations
8 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Top 100 Pyspark Functions For Data Engineers 1738131847
No ratings yet
Top 100 Pyspark Functions For Data Engineers 1738131847
30 pages
PySpark Notes
No ratings yet
PySpark Notes
4 pages
Day 48
No ratings yet
Day 48
9 pages
Pyspark IQ FREE Guide
100% (1)
Pyspark IQ FREE Guide
57 pages
Pyspark Code
No ratings yet
Pyspark Code
3 pages
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
Spark Commands
No ratings yet
Spark Commands
3 pages
Interview Qs - Batch 34
No ratings yet
Interview Qs - Batch 34
5 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
PySpark Data Frame Questions PDF
100% (2)
PySpark Data Frame Questions PDF
57 pages
PySpark Interview Questions Guide
100% (3)
PySpark Interview Questions Guide
126 pages
Py Spark
No ratings yet
Py Spark
19 pages
Data Science Midterm: SCD Challenges
No ratings yet
Data Science Midterm: SCD Challenges
56 pages
CCA175 Demo Examenes
No ratings yet
CCA175 Demo Examenes
19 pages
1st Question
No ratings yet
1st Question
34 pages
Big Data Analytics in Apache Spark
No ratings yet
Big Data Analytics in Apache Spark
79 pages
Spark Handbook
No ratings yet
Spark Handbook
7 pages
Day 24
No ratings yet
Day 24
8 pages
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
RDD Actions
No ratings yet
RDD Actions
18 pages
Python Pyspark Q's
No ratings yet
Python Pyspark Q's
16 pages
Course-02 Upgrad MS DS (UoA) Exam Paper Guidelines - Learner's
100% (2)
Course-02 Upgrad MS DS (UoA) Exam Paper Guidelines - Learner's
9 pages
PySpark DataFrame Operations
No ratings yet
PySpark DataFrame Operations
103 pages
01 Spark Session
No ratings yet
01 Spark Session
3 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
CS 2018 042
No ratings yet
CS 2018 042
8 pages
BigData - Recent Interview Q's
No ratings yet
BigData - Recent Interview Q's
25 pages
07 Structured Data Processing
No ratings yet
07 Structured Data Processing
91 pages
Quantiphi Interview
No ratings yet
Quantiphi Interview
2 pages
Pyspark Module 1
No ratings yet
Pyspark Module 1
63 pages
Practical Examination Sample Paper
No ratings yet
Practical Examination Sample Paper
4 pages
Apache Spark
No ratings yet
Apache Spark
6 pages
DP 900
No ratings yet
DP 900
1 page
Detailed SQL Interview Questions
No ratings yet
Detailed SQL Interview Questions
4 pages
Bupropis
No ratings yet
Bupropis
2 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
4 pages
Ade 1737191501
No ratings yet
Ade 1737191501
29 pages
Azure Etl 1741608374
No ratings yet
Azure Etl 1741608374
14 pages
Azure Analytics Interview Answers Complete
No ratings yet
Azure Analytics Interview Answers Complete
5 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
Day 10 1729086189
No ratings yet
Day 10 1729086189
14 pages
Vlsi Interview Questions
0% (1)
Vlsi Interview Questions
10 pages
HTML Cheatsheet
No ratings yet
HTML Cheatsheet
6 pages
1KHW002589 - E Firmware Download For ETL600R4
No ratings yet
1KHW002589 - E Firmware Download For ETL600R4
7 pages
People Central Hub Configuration Workbook
No ratings yet
People Central Hub Configuration Workbook
2,487 pages
Test Cases For Irctc 21
58% (26)
Test Cases For Irctc 21
14 pages
Advanced Container Loading Strategies
No ratings yet
Advanced Container Loading Strategies
15 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Sap Certification Orientation Sep9
No ratings yet
Sap Certification Orientation Sep9
23 pages
Apple's Brand Loyalty
No ratings yet
Apple's Brand Loyalty
10 pages
Vigenere Cipher: By: Mohsin Tahir Waqas Akram Numan-Ul-Haq Ali Asghar Rao Arslan
No ratings yet
Vigenere Cipher: By: Mohsin Tahir Waqas Akram Numan-Ul-Haq Ali Asghar Rao Arslan
15 pages
Accessibility Checklist
No ratings yet
Accessibility Checklist
25 pages
Word 2019 Intermediate and Advanced
No ratings yet
Word 2019 Intermediate and Advanced
1 page
SMTS File - 1 RS20200105 2020 05 19 14 - 26 - 04
No ratings yet
SMTS File - 1 RS20200105 2020 05 19 14 - 26 - 04
2 pages
2 Smartforms
No ratings yet
2 Smartforms
7 pages
en - Assets - Mastersizer 3000 Basic Guide English Man0475 07 en 00 - tcm50 11650 PDF
100% (2)
en - Assets - Mastersizer 3000 Basic Guide English Man0475 07 en 00 - tcm50 11650 PDF
104 pages
2019 VSC Company Profile
No ratings yet
2019 VSC Company Profile
30 pages
Through A Gender Lens: An Empirical Study of Emoji Usage Over Large-Scale Android Users
No ratings yet
Through A Gender Lens: An Empirical Study of Emoji Usage Over Large-Scale Android Users
20 pages
Exponent Rules Simplified
No ratings yet
Exponent Rules Simplified
8 pages
2025 UP College of Law LAE Manual For Examinees
No ratings yet
2025 UP College of Law LAE Manual For Examinees
23 pages
Spectroil Q100
67% (3)
Spectroil Q100
100 pages
CS335 Lecture 1 Slides
No ratings yet
CS335 Lecture 1 Slides
30 pages
A Project Report ON Coaching Management System
100% (1)
A Project Report ON Coaching Management System
66 pages
HyperX Cloud Flight S FW Update Instructions Rev 3102 4107
No ratings yet
HyperX Cloud Flight S FW Update Instructions Rev 3102 4107
3 pages
Approach 2 - Middleware - SAP ECC or S4HANA BTP
No ratings yet
Approach 2 - Middleware - SAP ECC or S4HANA BTP
20 pages
Dynamic Planning With A LLM
No ratings yet
Dynamic Planning With A LLM
9 pages
Digital Literacy
No ratings yet
Digital Literacy
19 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
2 pages
Final ETI Micro Project Report
0% (1)
Final ETI Micro Project Report
17 pages
Salesforce Developer Cheat Sheet
No ratings yet
Salesforce Developer Cheat Sheet
2 pages
INTERNAL
No ratings yet
INTERNAL
11 pages

Day 1 1720441733

Uploaded by

Day 1 1720441733

Uploaded by

PYSPARK LEARNING HUB

Step - 1 : Problem Statement

Actors and Directors Who Cooperated At Least

Difficult Level : EASY

Step - 2 : Identifying The Input Data And Expected

Step - 3 : Writing the pyspark code to solve

from pyspark.sql import SparkSession

#creating spark session

df_group.filter(df_group['count'] >= 3).show()

You might also like