0% found this document useful (0 votes)

424 views7 pages

4.1 The Spark UI - Databricks

SPARK DataBricks

Uploaded by

Javier Melendrez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

424 views7 pages

4.1 The Spark UI - Databricks

SPARK DataBricks

Uploaded by

Javier Melendrez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

29/4/2021 4.

1 The Spark UI - Databricks

4.1 The Spark UI

%run ../Includes/Classroom-Setup

Mounting course-specific datasets to /mnt/training...

Datasets are already mounted to /mnt/training from s3a://databricks-corp-training/common

res1: Boolean = false

res2: Boolean = false

DROP TABLE IF EXISTS People10M;

CREATE TABLE People10M
USING csv
OPTIONS (
path "/mnt/training/dataframes/people-10m.csv",
header "true");

DROP TABLE IF EXISTS ssaNames;

CREATE TABLE ssaNames USING parquet OPTIONS (
path "/mnt/training/ssn/names.parquet",
header "true"
);

Catalog Error

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 1/7

29/4/2021 4.1 The Spark UI - Databricks

SELECT
firstName,
lastName,
birthDate
FROM
People10M
WHERE
year(birthDate) > 1990
AND gender = 'F'

  
firstName lastName birthDate
1 An Cowper 1992-02-08T05:00:00.000Z
2 Caroyln Cardon 1994-05-15T04:00:00.000Z
3 Yesenia Goldring 1997-07-09T04:00:00.000Z
4 Hedwig Pendleberry 1998-12-02T05:00:00.000Z
5 Kala Lyfe 1994-06-23T04:00:00.000Z
6 Gussie McKeeman 1991-11-15T05:00:00.000Z
7 Pansy Shrieves 1991-05-24T04:00:00.000Z
Showing the first 1000 rows.

Plan Optimization Example

CREATE OR REPLACE TEMPORARY VIEW joined AS

SELECT People10m.firstName,
to_date(birthDate) AS date
FROM People10m
JOIN ssaNames ON People10m.firstName = ssaNames.firstName;

CREATE OR REPLACE TEMPORARY VIEW filtered AS

SELECT firstName,count(firstName)
FROM joined
WHERE
date >= "1980-01-01"
GROUP BY
firstName, date;

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 2/7

29/4/2021 4.1 The Spark UI - Databricks

SELECT * FROM filtered;

 
firstName count(firstName)
1 Ellan 49
2 Charline 117
3 Latisha 72
4 Tonita 73
5 Gwenn 76
6 Nidia 67
7 Torri 91
Showing the first 1000 rows.

CACHE TABLE filtered;

SELECT * FROM filtered;

 
firstName count(firstName)
1 Ellan 49
2 Charline 117
3 Latisha 72
4 Tonita 73
5 Gwenn 76
6 Nidia 67
7 Torri 91
Showing the first 1000 rows.

SELECT * FROM filtered WHERE firstName = "Latisha";

 
firstName count(firstName)
1 Latisha 72
2 Latisha 72

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 3/7

29/4/2021 4.1 The Spark UI - Databricks

3 Latisha 72
4 Latisha 72
5 Latisha 72
6 Latisha 72
7 Latisha 72
Showing all 513 rows.

UNCACHE TABLE IF EXISTS filtered;

SELECT * FROM filtered WHERE firstName = "Latisha";

 
firstName count(firstName)
1 Latisha 72
2 Latisha 72
3 Latisha 72
4 Latisha 72
5 Latisha 72
6 Latisha 72
7 Latisha 72
Showing all 513 rows.

Set Partitions
DROP TABLE IF EXISTS bikeShare;
CREATE TABLE bikeShare
USING csv
OPTIONS (
path "/mnt/training/bikeSharing/data-001/hour.csv",
header "true")

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 4/7

29/4/2021 4.1 The Spark UI - Databricks

SELECT
*
FROM
bikeShare
WHERE
hr = 10

    
instant dteday season yr mnth hr
1 11 2011-01-01 1 0 1 10
2 34 2011-01-02 1 0 1 10
3 56 2011-01-03 1 0 1 10
4 79 2011-01-04 1 0 1 10
5 102 2011-01-05 1 0 1 10
6 125 2011-01-06 1 0 1 10
7 148 2011-01-07 1 0 1 10
Showing all 727 rows.

DROP TABLE IF EXISTS bikeShare_partitioned;

CREATE TABLE bikeShare_partitioned
PARTITIONED BY (p_hr)
AS
SELECT
instant,
dteday,
season,
yr,
mnth,
hr as p_hr,
holiday,
weekday,
workingday,
weathersit,
temp
FROM
bikeShare

Query returned no results

SELECT * FROM bikeShare_partitioned WHERE p_hr = 10

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 5/7

29/4/2021 4.1 The Spark UI - Databricks

    
instant dteday season yr mnth p_hr
1 11 2011-01-01 1 0 1 10
2 34 2011-01-02 1 0 1 10
3 56 2011-01-03 1 0 1 10
4 79 2011-01-04 1 0 1 10
5 102 2011-01-05 1 0 1 10
6 125 2011-01-06 1 0 1 10
7 148 2011-01-07 1 0 1 10
Showing all 727 rows.

Beware of small files!

DROP TABLE IF EXISTS bikeShare_parquet;
CREATE TABLE bikeShare
PARTITIONED BY (p_instant)
AS
SELECT
instant AS p_instant,
dteday,
season,
yr,
mnth,
hr
holiday,
weekday,
workingday,
weathersit,
temp
FROM
bikeShare_csv

%run ../Includes/Classroom-Cleanup

Citations
Bike Sharing Data

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 6/7

29/4/2021 4.1 The Spark UI - Databricks

[1] Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors
and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15,
Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3.

@article{ year={2013}, issn={2192-6352}, journal={Progress in Artificial Intelligence},

doi={10.1007/s13748-013-0040-3}, title={Event labeling combining ensemble
detectors and background knowledge}, url={http://dx.doi.org/10.1007/s13748-013-
0040-3} (http://dx.doi.org/10.1007/s13748-013-0040-3}), publisher={Springer Berlin
Heidelberg}, keywords={Event labeling; Event detection; Ensemble learning;
Background knowledge}, author={Fanaee-T, Hadi and Gama, Joao}, pages={1-15} }

Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache
Software Foundation (http://www.apache.org/).

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 7/7

Databricks Associate Data Engineer Notes
No ratings yet
Databricks Associate Data Engineer Notes
39 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Azure Data Engineer Interview Questions and Answers
No ratings yet
Azure Data Engineer Interview Questions and Answers
7 pages
Siva
No ratings yet
Siva
4 pages
Srikanth M - Data Engineer
No ratings yet
Srikanth M - Data Engineer
5 pages
Cloud Migration for Banking Data
No ratings yet
Cloud Migration for Banking Data
1 page
Databricks Delta for Developers
No ratings yet
Databricks Delta for Developers
11 pages
SKEE BALL Classic: Installation and Operation Single Ball Release
No ratings yet
SKEE BALL Classic: Installation and Operation Single Ball Release
30 pages
Databricks Pyspark 1712042928
100% (1)
Databricks Pyspark 1712042928
21 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
DBT Flow
No ratings yet
DBT Flow
15 pages
Azure Data Engineering for Pharma
100% (1)
Azure Data Engineering for Pharma
5 pages
SCD Type-2 with Pandas in Spark
0% (1)
SCD Type-2 with Pandas in Spark
8 pages
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Sr. Data Engineer with Azure Expertise
No ratings yet
Sr. Data Engineer with Azure Expertise
6 pages
Spark Optimization PDF
100% (1)
Spark Optimization PDF
14 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Senior Data Engineer Resume Example
No ratings yet
Senior Data Engineer Resume Example
1 page
Spark QA
No ratings yet
Spark QA
34 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
SQL and PySpark Interview Questions
No ratings yet
SQL and PySpark Interview Questions
15 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Ajay Resume VLaF
No ratings yet
Ajay Resume VLaF
2 pages
Databricks Python & Linux Commands Guide
No ratings yet
Databricks Python & Linux Commands Guide
109 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
Naresh DE
No ratings yet
Naresh DE
5 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
Data Migration and CDC Tasks
No ratings yet
Data Migration and CDC Tasks
11 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Senior Data Engineer Expertise
No ratings yet
Senior Data Engineer Expertise
5 pages
PySpark Optimization Scenarios - Wipro
No ratings yet
PySpark Optimization Scenarios - Wipro
8 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
Snowflake Interview Question
No ratings yet
Snowflake Interview Question
20 pages
Bhaskar ADE - Altimetrik
No ratings yet
Bhaskar ADE - Altimetrik
3 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Azure Databricks Team Data Science Lab
No ratings yet
Azure Databricks Team Data Science Lab
18 pages
Vijay Kanth - Azure Data Engineer
No ratings yet
Vijay Kanth - Azure Data Engineer
2 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
Databricks Spark Reference Applications
No ratings yet
Databricks Spark Reference Applications
37 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
No ratings yet
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
14 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Pyspark Hands On
No ratings yet
Pyspark Hands On
189 pages
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
No ratings yet
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
9 pages
Databricks Sparkconfig 1669383836
No ratings yet
Databricks Sparkconfig 1669383836
1 page
Ajay Kadiyala Resume 2023 PDF
No ratings yet
Ajay Kadiyala Resume 2023 PDF
6 pages
Azure Data Engineer Resume
No ratings yet
Azure Data Engineer Resume
2 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Databricks Question
No ratings yet
Databricks Question
89 pages
Spark Tuning
No ratings yet
Spark Tuning
26 pages
What Is Azure Data Engineer
No ratings yet
What Is Azure Data Engineer
74 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
Building Data Pipelines - 1
No ratings yet
Building Data Pipelines - 1
25 pages
Azure Data Factory: Cloud ETL & Integration
No ratings yet
Azure Data Factory: Cloud ETL & Integration
10 pages
From Query Plan To Query Performance:: Supercharging Your Spark Queries Using The Spark UI SQL Tab
No ratings yet
From Query Plan To Query Performance:: Supercharging Your Spark Queries Using The Spark UI SQL Tab
52 pages
Getting Started With Databricks
No ratings yet
Getting Started With Databricks
39 pages
Dempster 2016
No ratings yet
Dempster 2016
9 pages
Angle of Arrival Measurement Using Multiple Static Monopole Antennas
No ratings yet
Angle of Arrival Measurement Using Multiple Static Monopole Antennas
11 pages
TWC20172754369
No ratings yet
TWC20172754369
31 pages
GNU Radio OOT Module Guide
No ratings yet
GNU Radio OOT Module Guide
28 pages
SCCM SUP Role Installation Guide
No ratings yet
SCCM SUP Role Installation Guide
30 pages
Revised Syllabus TY Information Technology W.e.f.ay 2020 21
No ratings yet
Revised Syllabus TY Information Technology W.e.f.ay 2020 21
28 pages
MIL Module 2
No ratings yet
MIL Module 2
2 pages
A Project Report ON Coaching Management System
100% (1)
A Project Report ON Coaching Management System
66 pages
TDX Agentforce Hackathon Rules
No ratings yet
TDX Agentforce Hackathon Rules
11 pages
Exponent Rules Simplified
No ratings yet
Exponent Rules Simplified
8 pages
Scrolling Message Display - Project Report - Nov 15, 2011
33% (3)
Scrolling Message Display - Project Report - Nov 15, 2011
71 pages
Through The Language Glass Why The World PDF
0% (6)
Through The Language Glass Why The World PDF
7 pages
Indradrive MPX - 1x
No ratings yet
Indradrive MPX - 1x
90 pages
Resume For Internship With No Work Experience
100% (1)
Resume For Internship With No Work Experience
6 pages
Hitachi
No ratings yet
Hitachi
7 pages
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
No ratings yet
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
6 pages
AI Documentary Project Plan
No ratings yet
AI Documentary Project Plan
5 pages
TTC Catalog - EN 2013
No ratings yet
TTC Catalog - EN 2013
148 pages
UPSC EPFO APFC Exam Syllabus
0% (1)
UPSC EPFO APFC Exam Syllabus
5 pages
Machine Learning For Cyber: Unit 1: Introduction
No ratings yet
Machine Learning For Cyber: Unit 1: Introduction
23 pages
x300b User-Manual 20220527
No ratings yet
x300b User-Manual 20220527
42 pages
MLT Course Content-4
No ratings yet
MLT Course Content-4
209 pages
How To Restore Deleted Files From The Recycle Bin
No ratings yet
How To Restore Deleted Files From The Recycle Bin
1 page
Dit 0305 Ooad Notes
100% (1)
Dit 0305 Ooad Notes
30 pages
PARAM Siddhi-AI System Manual Ver1.0
No ratings yet
PARAM Siddhi-AI System Manual Ver1.0
88 pages
Java Notes Module 4 3rd Year
No ratings yet
Java Notes Module 4 3rd Year
24 pages
Cyble Sensor CM3030 CYBLE Manual
No ratings yet
Cyble Sensor CM3030 CYBLE Manual
2 pages
(4th Year) Roadmap To Dream Placement
No ratings yet
(4th Year) Roadmap To Dream Placement
1 page
Clone Blango Repo Clone Blango Repo: in The Terminal
No ratings yet
Clone Blango Repo Clone Blango Repo: in The Terminal
18 pages
Versa Training Lab Guide: Groups 1 - 2
No ratings yet
Versa Training Lab Guide: Groups 1 - 2
20 pages
A7670 Series Hardware Design - V1.00
No ratings yet
A7670 Series Hardware Design - V1.00
61 pages
What Is A Computer
No ratings yet
What Is A Computer
6 pages
06 Synchronization
No ratings yet
06 Synchronization
52 pages

4.1 The Spark UI - Databricks

Uploaded by

4.1 The Spark UI - Databricks

Uploaded by

29/4/2021 4.

1 The Spark UI - Databricks

4.1 The Spark UI

Mounting course-specific datasets to /mnt/training...

res1: Boolean = false

res2: Boolean = false

DROP TABLE IF EXISTS People10M;

DROP TABLE IF EXISTS ssaNames;

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 1/7

Plan Optimization Example

CREATE OR REPLACE TEMPORARY VIEW joined AS

CREATE OR REPLACE TEMPORARY VIEW filtered AS

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 2/7

SELECT * FROM filtered;

CACHE TABLE filtered;

SELECT * FROM filtered;

SELECT * FROM filtered WHERE firstName = "Latisha";

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 3/7

UNCACHE TABLE IF EXISTS filtered;

SELECT * FROM filtered WHERE firstName = "Latisha";

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 4/7

DROP TABLE IF EXISTS bikeShare_partitioned;

Query returned no results

SELECT * FROM bikeShare_partitioned WHERE p_hr = 10

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 5/7

Beware of small files!

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 6/7

@article{ year={2013}, issn={2192-6352}, journal={Progress in Artificial Intelligence},

© 2020 Databricks, Inc. All rights reserved.

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 7/7

You might also like