0% found this document useful (0 votes)

4 views9 pages

Full MCQ Questions With Answers Cleaned

The document contains multiple-choice questions (MCQs) related to Exploratory Data Analysis (EDA), Spark MLlib, and NoSQL databases, along with their correct answers. It covers topics such as data manipulation methods, machine learning algorithms, and database types. Each section provides specific questions aimed at assessing knowledge in data science and database management.

Uploaded by

omarmanasra07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views9 pages

Full MCQ Questions With Answers Cleaned

Uploaded by

omarmanasra07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

ehensive MCQ Questions with Correct Answers Below (No 'Correct' Marks in O

EDA Chapter.pdf
1. What is the purpose of the `replace` method in the EDA script?
A. To visualize data
B. To drop columns
C. To convert categorical data into numerical
A. To visualize data

2. What does the `corr()` method compute?

A. The average of numerical columns
B. The minimum and maximum of each column
C. The pairwise correlation between numerical columns
A. The average of numerical columns

3. What does the `annot=True` argument in `sns.heatmap` do?

A. Hides the labels
B. Adds numerical values to each cell
C. Changes the color map
A. Hides the labels

4. Which feature in the dataset is most likely the target variable?

A. `default`
B. `housing`
C. `deposit`
A. `default`

5. Why are categorical columns transformed to numerical?

A. ML algorithms require numerical input
B. It looks better in visualizations
C. To reduce file size
A. ML algorithms require numerical input

6. What is the value of `default` when a customer has no credit default?

A. 0
B. 1
C. 2
A. 0

7. What is the default correlation method used by pandas?

A. Pearson
B. Kendall
C. Spearman
A. Pearson

8. What library is used to plot the heatmap?

A. pandas
B. seaborn
C. numpy
A. pandas

9. What is the main purpose of the Boruta algorithm?

A. Feature selection
B. Data cleaning
C. Model evaluation
A. Feature selection

10. What does `n_jobs=4` mean in the RandomForestClassifier?

A. Use 4 CPU cores
B. Use 4 datasets
C. Build 4 decision trees
A. Use 4 CPU cores

11. In Boruta, what are "shadow features"?

A. Randomly shuffled copies of real features
B. Features with missing values
C. Features that are always zero
A. Randomly shuffled copies of real features

12. Which features does Boruta mark as "confirmed"?

A. Those with the lowest correlation
B. Those more important than shadow features
C. Those that have zero values
A. Those with the lowest correlation

13. What does `.drop(['deposit'], axis=1)` do?

A. Removes the 'deposit' column
B. Drops rows with missing values
C. Drops all numeric columns
A. Removes the 'deposit' column
14. What does `verbose=2` in BorutaPy indicate?
A. Detailed output
B. No output
C. Only final result
A. Detailed output

15. What kind of visualization is used to display feature rankings?

A. Line plot
B. Bar plot
C. Scatter plot
A. Line plot

Chapter 8.pdf (Spark MLlib)

1. What does `SparkSession.builder` do?
A. Configures and creates a Spark session
B. Builds a dataset
C. Runs a machine learning model
A. Configures and creates a Spark session

2. Which column contains the target variable in the gold price dataset?
A. features
B. stock_index
C. gold_price
A. features

3. What does `VectorAssembler` do?

A. Predicts prices
B. Combines columns into a single vector
C. Cleans missing data
A. Predicts prices

4. What is the metric used to evaluate the model?

A. Mean Absolute Error
B. Accuracy
C. Root Mean Squared Error
A. Mean Absolute Error

5. What command splits the dataset?

A. df.split()
B. df.divide()
C. df.randomSplit()
A. df.split()

6. Which column is created by `VectorAssembler`?

A. features
B. vector
C. labels
A. features

7. How is the Spark session stopped?

A. spark.halt()
B. spark.exit()
C. spark.stop()
A. spark.halt()

8. What library provides linear regression in Spark?

A. pyspark.ml.regression
B. pandas
C. matplotlib
A. pyspark.ml.regression

9. What does `.show()` do?

A. Displays data
B. Plots a graph
C. Exports a CSV
A. Displays data

10. What does the `predictionCol` in RegressionEvaluator specify?

A. The column with predicted values
B. The actual prices
C. Features vector
A. The column with predicted values

11. Which command displays the full feature vector?

A. show()
B. show(truncate=False)
C. display()
A. show()

12. Which component manages task scheduling in Spark?

A. MLlib
B. Spark Streaming
C. Spark Core
A. MLlib

13. What is the default split ratio for training/testing in the tutorial?
A. 80% training, 20% testing
B. 70% training, 30% testing
C. 50% training, 50% testing
A. 80% training, 20% testing

14. What does `df.select("features", "gold_price").show()` do?

A. Shows only gold_price
B. Displays features and gold_price
C. Shows only features
A. Shows only gold_price

15. What is the input type for `createDataFrame`?

A. CSV file
B. List of tuples
C. JSON file
A. CSV file

NoSQL_Databases_and_Data_Storage.pptx
1. Which of these is NOT a NoSQL database category?
A. Relational store
B. Document store
C. Column-family store
A. Relational store

2. Which of these stores data as JSON-like documents?

A. Column-family stores
B. Document stores
C. Key-value stores
A. Column-family stores

3. Which database type is optimized for relationship queries?

A. Key-value stores
B. Document stores
C. Column-family stores
A. Key-value stores

4. Which database is an example of a key-value store?

A. MongoDB
B. Redis
C. Neo4j
A. MongoDB

5. What does 'schema flexibility' mean?

A. Different structures for different documents
B. Only one table schema
C. No relationships allowed
A. Different structures for different documents

6. What is a major advantage of NoSQL databases?

A. Strict schema design
B. High performance and scalability
C. Limited scalability
A. Strict schema design

7. Which system uses SSTables for storage?

A. MongoDB
B. Cassandra
C. Redis
A. MongoDB

8. Which NoSQL database uses Cypher query language?

A. Cassandra
B. Redis
C. Couchbase
A. Cassandra

9. In which use case is a document store preferred?

A. Product catalog
B. Fraud detection
C. Timeline caching
A. Product catalog

10. What is a key feature of Redis?

A. In-memory data storage
B. Relational table joins
C. Graph algorithms
A. In-memory data storage

11. Which of the following databases offers multi-model support?

A. Cassandra
B. Neo4j
C. ArangoDB
A. Cassandra

12. What is the primary data structure in column-family stores?

A. JSON documents
B. Column families
C. Key-value pairs
A. JSON documents

13. What is a major challenge with NoSQL databases?

A. High cost
B. No horizontal scaling
C. Consistency models can be complex
A. High cost

14. Which architecture component is used in MongoDB for query routing?

A. mongos
B. mongod
C. config servers
A. mongos

15. Which NoSQL database is designed for high write throughput?

A. Cassandra
B. Neo4j
C. MongoDB
A. Cassandra

16. Which of these is an example of a real-world use of graph databases?

A. Time-series sensor data
B. Social networks and relationships
C. Shopping carts
A. Time-series sensor data

17. What is a core advantage of key-value stores?

A. Flexible document querying
B. High relationship traversal performance
C. Extremely fast read/write performance
A. Flexible document querying

18. What format is often used by document stores to store data?

A. CSV
B. JSON or BSON
C. HTML
A. CSV

19. What does the "tunable consistency" in column-family stores refer to?
A. Schema migration
B. Configuring consistency vs availability
C. Ability to change database types
A. Schema migration

20. Which NoSQL database architecture uses HMaster and RegionServers?

A. HBase
B. Cassandra
C. Redis
A. HBase

21. In Redis, what operation would you use to automatically remove data after a set time?
A. hset
B. setex
C. lpush
A. hset

22. What is a feature of MongoDB's aggregation framework?

A. Pipeline-based data processing
B. Graph-based traversals
C. Vector transformation
A. Pipeline-based data processing

23. Which of these is NOT a feature of graph databases?

A. Path finding
B. Automatic caching of queries
C. Community detection
A. Path finding
24. Which database is known for a memory-first architecture and built-in caching layer?
A. Neo4j
B. Redis
C. Couchbase
A. Neo4j

25. Which NoSQL database integrates well with AWS and offers global tables?
A. Neo4j
B. DynamoDB
C. Redis
A. Neo4j

Pyspark Dumps
No ratings yet
Pyspark Dumps
10 pages
Nptel Big Data Full Assignment Solution 2021
89% (9)
Nptel Big Data Full Assignment Solution 2021
36 pages
Azure Data Engineer - Samatha Gudala
100% (1)
Azure Data Engineer - Samatha Gudala
8 pages
Final MCQ Questions Styled Cleaned
No ratings yet
Final MCQ Questions Styled Cleaned
9 pages
Full MCQ Questions With Answers AnswerOnly
No ratings yet
Full MCQ Questions With Answers AnswerOnly
10 pages
Bda MCQ Set
No ratings yet
Bda MCQ Set
8 pages
Big Data QCM 1 PDF
100% (1)
Big Data QCM 1 PDF
7 pages
AIL Quiz
No ratings yet
AIL Quiz
30 pages
CIS4130 Mock Exam
No ratings yet
CIS4130 Mock Exam
9 pages
Coursera 2
No ratings yet
Coursera 2
17 pages
Questions Certif BigData
No ratings yet
Questions Certif BigData
12 pages
Pre Requisite Form For CCS368
No ratings yet
Pre Requisite Form For CCS368
4 pages
Midterm Exam Practice: Distributed Systems & Apache Spark
No ratings yet
Midterm Exam Practice: Distributed Systems & Apache Spark
24 pages
Machine Learning and AI Quiz
No ratings yet
Machine Learning and AI Quiz
33 pages
End Exam (Solve)
No ratings yet
End Exam (Solve)
6 pages
NEW DST ALL Ques (BEFORE+AFTER) Mid+ExamMid
No ratings yet
NEW DST ALL Ques (BEFORE+AFTER) Mid+ExamMid
33 pages
Questions-1
No ratings yet
Questions-1
8 pages
DS QCM BigData 2021
No ratings yet
DS QCM BigData 2021
6 pages
Big Data MCQ
No ratings yet
Big Data MCQ
47 pages
BD Question Bank MCQ Answered
No ratings yet
BD Question Bank MCQ Answered
8 pages
Advanced DB Unit - 1 Introduction To Databases and Types of Nosql Databases
No ratings yet
Advanced DB Unit - 1 Introduction To Databases and Types of Nosql Databases
36 pages
Apache Spark - Practices 2nd
No ratings yet
Apache Spark - Practices 2nd
26 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
No ratings yet
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
8 pages
Question 1: Your Answer
100% (1)
Question 1: Your Answer
26 pages
DATA MANAGEMENT OFFICER II TRA Qs&AS
No ratings yet
DATA MANAGEMENT OFFICER II TRA Qs&AS
10 pages
Quiz Results: Math & Comp Sci
No ratings yet
Quiz Results: Math & Comp Sci
7 pages
cs441 Big Data Concept by Sial
No ratings yet
cs441 Big Data Concept by Sial
23 pages
Lect 1
No ratings yet
Lect 1
20 pages
Bigdata MCQ QA Part2
No ratings yet
Bigdata MCQ QA Part2
9 pages
MCQ Big
No ratings yet
MCQ Big
7 pages
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
No ratings yet
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
3 pages
NASHEEEEYYYYYY
No ratings yet
NASHEEEEYYYYYY
30 pages
Bda Quiz QA
No ratings yet
Bda Quiz QA
7 pages
Untitled Document
No ratings yet
Untitled Document
21 pages
Data Science 100 MCQs
100% (1)
Data Science 100 MCQs
16 pages
TYCS - SEM6 - Data Science
No ratings yet
TYCS - SEM6 - Data Science
7 pages
Assignment 03 BigData Computing Noc23-Cs112
No ratings yet
Assignment 03 BigData Computing Noc23-Cs112
6 pages
Da 2023
No ratings yet
Da 2023
30 pages
DA QnBank Full 17jan22 NoKey
No ratings yet
DA QnBank Full 17jan22 NoKey
16 pages
CCS334 Prerequisite
No ratings yet
CCS334 Prerequisite
4 pages
Ai ML
No ratings yet
Ai ML
12 pages
Week 3-1
No ratings yet
Week 3-1
8 pages
MCQ Da
No ratings yet
MCQ Da
28 pages
Data Science Multiple Choice Question
No ratings yet
Data Science Multiple Choice Question
9 pages
DBA MCQ's
No ratings yet
DBA MCQ's
15 pages
Bda MCQ
100% (1)
Bda MCQ
44 pages
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
No ratings yet
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
8 pages
DSA307 Lecture 2 Final Out
No ratings yet
DSA307 Lecture 2 Final Out
3 pages
DS BigDATA 2ièmeN2TR UVT 2022 2023
No ratings yet
DS BigDATA 2ièmeN2TR UVT 2022 2023
4 pages
AI - Question Bank
No ratings yet
AI - Question Bank
20 pages
Final MCQ DT
No ratings yet
Final MCQ DT
176 pages
CT 2
No ratings yet
CT 2
8 pages
Big Data
No ratings yet
Big Data
7 pages
Bda Summer 2024 Solution
No ratings yet
Bda Summer 2024 Solution
26 pages
Big Data & Python Quiz
No ratings yet
Big Data & Python Quiz
24 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
NoSQL Model Question
No ratings yet
NoSQL Model Question
7 pages
NoSQL Insights for Data Science Students
No ratings yet
NoSQL Insights for Data Science Students
17 pages
Semester - 7-Big Data Analytics
No ratings yet
Semester - 7-Big Data Analytics
3 pages
Big Data Glossary for IT Professionals
No ratings yet
Big Data Glossary for IT Professionals
8 pages
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
No ratings yet
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
9 pages
Annex 3 - Term of Reference
No ratings yet
Annex 3 - Term of Reference
10 pages
Big Data S All Units
No ratings yet
Big Data S All Units
122 pages
Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
No ratings yet
Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
8 pages
MERN Stack E-Commerce Guide
No ratings yet
MERN Stack E-Commerce Guide
78 pages
Big Data & Hadoop Mastery Guide
No ratings yet
Big Data & Hadoop Mastery Guide
2 pages
Big 22
No ratings yet
Big 22
2 pages
Ishan Summer Training
No ratings yet
Ishan Summer Training
32 pages
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
TY BScIT - Question Bank (3 Units)
No ratings yet
TY BScIT - Question Bank (3 Units)
16 pages
Logical Design of Multi-Model Data Warehouses
No ratings yet
Logical Design of Multi-Model Data Warehouses
38 pages
Unit 6
No ratings yet
Unit 6
143 pages
Ibm Data Engine For Nosql Seller Enablement: October, 2014
No ratings yet
Ibm Data Engine For Nosql Seller Enablement: October, 2014
25 pages
Full Stack Launchpad
No ratings yet
Full Stack Launchpad
21 pages
The Forrester Wave™ - Big Data NoSQL, Q1 2019
100% (1)
The Forrester Wave™ - Big Data NoSQL, Q1 2019
16 pages
Evolution of Database Systems
No ratings yet
Evolution of Database Systems
40 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
90 pages
M.Sc Data Science Syllabus 2021
No ratings yet
M.Sc Data Science Syllabus 2021
34 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
MS805 Course Outline 2022-23
No ratings yet
MS805 Course Outline 2022-23
3 pages
Full Stack Dev Insights
No ratings yet
Full Stack Dev Insights
18 pages
Mentor Matches - For Devs
No ratings yet
Mentor Matches - For Devs
56 pages
NoSQL Database Insights
No ratings yet
NoSQL Database Insights
14 pages
Thing Board PDF
100% (1)
Thing Board PDF
71 pages
HLD vs. LLD in System Design Interviews
No ratings yet
HLD vs. LLD in System Design Interviews
42 pages
01 BigDataDesign
No ratings yet
01 BigDataDesign
38 pages

Full MCQ Questions With Answers Cleaned

Uploaded by

Full MCQ Questions With Answers Cleaned

Uploaded by

ehensive MCQ Questions with Correct Answers Below (No 'Correct' Marks in O

2. What does the `corr()` method compute?

3. What does the `annot=True` argument in `sns.heatmap` do?

4. Which feature in the dataset is most likely the target variable?

5. Why are categorical columns transformed to numerical?

6. What is the value of `default` when a customer has no credit default?

7. What is the default correlation method used by pandas?

8. What library is used to plot the heatmap?

9. What is the main purpose of the Boruta algorithm?

10. What does `n_jobs=4` mean in the RandomForestClassifier?

11. In Boruta, what are "shadow features"?

12. Which features does Boruta mark as "confirmed"?

13. What does `.drop(['deposit'], axis=1)` do?

15. What kind of visualization is used to display feature rankings?

Chapter 8.pdf (Spark MLlib)

3. What does `VectorAssembler` do?

4. What is the metric used to evaluate the model?

5. What command splits the dataset?

6. Which column is created by `VectorAssembler`?

7. How is the Spark session stopped?

8. What library provides linear regression in Spark?

9. What does `.show()` do?

10. What does the `predictionCol` in RegressionEvaluator specify?

11. Which command displays the full feature vector?

12. Which component manages task scheduling in Spark?

14. What does `df.select("features", "gold_price").show()` do?

15. What is the input type for `createDataFrame`?

2. Which of these stores data as JSON-like documents?

3. Which database type is optimized for relationship queries?

4. Which database is an example of a key-value store?

5. What does 'schema flexibility' mean?

6. What is a major advantage of NoSQL databases?

7. Which system uses SSTables for storage?

8. Which NoSQL database uses Cypher query language?

9. In which use case is a document store preferred?

10. What is a key feature of Redis?

11. Which of the following databases offers multi-model support?

12. What is the primary data structure in column-family stores?

13. What is a major challenge with NoSQL databases?

14. Which architecture component is used in MongoDB for query routing?

15. Which NoSQL database is designed for high write throughput?

16. Which of these is an example of a real-world use of graph databases?

17. What is a core advantage of key-value stores?

18. What format is often used by document stores to store data?

20. Which NoSQL database architecture uses HMaster and RegionServers?

22. What is a feature of MongoDB's aggregation framework?

23. Which of these is NOT a feature of graph databases?

You might also like