1.
Which one of the following is not a operations that can be
performed using Azure Databricks?
A. It is Apache Spark based analytics platform
B. It helps to extract, transform and load the data
C. Visualization if data is not possible with it
D. All of the above
View Answer
Ans : C
Explanation: Azure Databricks also helps in visualization of data.
2. To which one of the following sources do Azure Databricks connect
for collecting streaming data?
A. Kafka
B. Azure data lake
C. CosmosDB
D. None of the above
View Answer
Ans : A
Explanation: Azure Databricks can be connected with sources like Kafka, Event Hubs for
the purpose of collecting streaming data.
3. Which one of the following is a Databrick concept?
A. Workspace
B. Authentication and authorization
C. Data Management
D. All of the above
View Answer
Ans : D
Explanation: There are mainly 5 categories of Databricks concepts viz Workspace, Data
Management, Computational Management, Model Management and Authentication and
authorization.
4. Which of the following ensures data reliability even after
termination of cluster in Azure Databricks?
A. Databricks Runtime
B. Databricks File System
C. Dashboards
D. Workspace
View Answer
Ans : B
Explanation: DataBricks File System is a distributed file system available on DataBricks
clusters.
5. Choose the correct option with respect to ETL operations of data in
Azure Databricks?
A. For loading of data, data is moved from databricks to data warehouse
B. for loading of data, blob storage is used
C. Blob storage serves as a temporary storage
D. All of the above
View Answer
Ans : D
Explanation: All the above statements are true. Loading is basically to load the data to
SQL Datawarehouse.
6. Which one of the following is incorrect regarding Workspace of
Azure Databricks concept?
A. It manages ETL operations of data
B. It can store notebooks, libraries and dashboards
C. It is the root folder of Azure Databricks
D. None of the above
View Answer
Ans : A
Explanation: ETL i.e Extract, Transform and Load operations come under computational
management.
7. Which of the following Azure datasources can be connected to
Azure Databricks?
A. Azure Blob Storage
B. Azure Datawarehouse
C. Azure CosmosDB
D. All of the above
View Answer
Ans : D
Explanation: Azure Databricks can connect to different datasources which includes all
the above three data sources.
8. Streaming data can be captured by?
A. Kafka
B. Event Hubs
C. Both A and B
D. None of the above
View Answer
Ans : C
Explanation: Both Kafka and Event Hubs are capable of capturing streaming data.
9. Authentication and authorization in databricks can be managed for :
A. User, Group, Access Control List
B. User, Group
C. Access Control List
D. Group, Access Control List
View Answer
Ans : A
Explanation: Azure DataBricks has a benefit that authentication and authorization can be
managed for user, group as well as Access Control List.
10. Which one of the following is a set of components that run on
clusters of Azure Databricks?
A. DataBricks File System
B. Databricks Runtime
C. CosmosDB
D. Azure Data Lake
View Answer
Ans : B
Explanation: Databricks Runtime is built on Apache Spark which is a set of components
that be used to run in databricks.
11 Given a dataframe df, select the code that returns its
number of rows:A. df.take('all')
B. df.collect()
C. df.show()
D. df.count() --> CORRECT
E. df.numRows()
12 iven a DataFrame df that includes a number of columns among
which a column named quantity and a column named price, complete
the code below such that it will create a DataFrame including
all the original columns and a new column revenue defined as
quantity*price:df._1_(_2_ , _3_)A. withColumnRenamed, "revenue",
expr("quantity*price")
B. withColumn, revenue, expr("quantity*price")
C. withColumn, "revenue", expr("quantity*price") --> CORRECT
D. withColumn, expr("quantity*price"), "revenue"
E. withColumnRenamed, "revenue", col("quantity")*col("price")
13. Given a DataFrame df that has some null values in the
column created_date, complete the code below such that it will
sort rows in ascending order based on the column creted_date
with null values appearing last.df._1_(_2_)A. orderBy,
asc_nulls_last("created_date")
B. sort, asc_nulls_last("created_date")
C. orderBy, col("created_date").asc_nulls_last() --> CORRECT
D. orderBy, col("created_date"), ascending=True)
E. orderBy, col("created_date").asc()
14. Which one of the following commands does NOT trigger an
eager evaluation?A. df.collect()
B. df.take()
C. df.show()
D. df.saveAsTable()
E. df.join() --> CORRECT
15 The code below should return a new DataFrame with 50 percent
of random records from DataFrame df without replacement. Choose
the response that correctly fills in the numbered blanks within
the code block to complete this task.df._1_(_2_,_3_,_4_)A.
sample, False, 0.5, 5 --> CORRECT
B. random, False, 0.5, 5
C. sample, False, 5, 25
D. sample, False, 50, 5
E. sample, withoutReplacement, 0.5, 5