Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views9 pages

Full MCQ Questions With Answers Cleaned

The document contains multiple-choice questions (MCQs) related to Exploratory Data Analysis (EDA), Spark MLlib, and NoSQL databases, along with their correct answers. It covers topics such as data manipulation methods, machine learning algorithms, and database types. Each section provides specific questions aimed at assessing knowledge in data science and database management.

Uploaded by

omarmanasra07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views9 pages

Full MCQ Questions With Answers Cleaned

The document contains multiple-choice questions (MCQs) related to Exploratory Data Analysis (EDA), Spark MLlib, and NoSQL databases, along with their correct answers. It covers topics such as data manipulation methods, machine learning algorithms, and database types. Each section provides specific questions aimed at assessing knowledge in data science and database management.

Uploaded by

omarmanasra07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ehensive MCQ Questions with Correct Answers Below (No 'Correct' Marks in O

EDA Chapter.pdf
1. What is the purpose of the `replace` method in the EDA script?
A. To visualize data
B. To drop columns
C. To convert categorical data into numerical
A. To visualize data

2. What does the `corr()` method compute?


A. The average of numerical columns
B. The minimum and maximum of each column
C. The pairwise correlation between numerical columns
A. The average of numerical columns

3. What does the `annot=True` argument in `sns.heatmap` do?


A. Hides the labels
B. Adds numerical values to each cell
C. Changes the color map
A. Hides the labels

4. Which feature in the dataset is most likely the target variable?


A. `default`
B. `housing`
C. `deposit`
A. `default`

5. Why are categorical columns transformed to numerical?


A. ML algorithms require numerical input
B. It looks better in visualizations
C. To reduce file size
A. ML algorithms require numerical input

6. What is the value of `default` when a customer has no credit default?


A. 0
B. 1
C. 2
A. 0

7. What is the default correlation method used by pandas?


A. Pearson
B. Kendall
C. Spearman
A. Pearson

8. What library is used to plot the heatmap?


A. pandas
B. seaborn
C. numpy
A. pandas

9. What is the main purpose of the Boruta algorithm?


A. Feature selection
B. Data cleaning
C. Model evaluation
A. Feature selection

10. What does `n_jobs=4` mean in the RandomForestClassifier?


A. Use 4 CPU cores
B. Use 4 datasets
C. Build 4 decision trees
A. Use 4 CPU cores

11. In Boruta, what are "shadow features"?


A. Randomly shuffled copies of real features
B. Features with missing values
C. Features that are always zero
A. Randomly shuffled copies of real features

12. Which features does Boruta mark as "confirmed"?


A. Those with the lowest correlation
B. Those more important than shadow features
C. Those that have zero values
A. Those with the lowest correlation

13. What does `.drop(['deposit'], axis=1)` do?


A. Removes the 'deposit' column
B. Drops rows with missing values
C. Drops all numeric columns
A. Removes the 'deposit' column
14. What does `verbose=2` in BorutaPy indicate?
A. Detailed output
B. No output
C. Only final result
A. Detailed output

15. What kind of visualization is used to display feature rankings?


A. Line plot
B. Bar plot
C. Scatter plot
A. Line plot

Chapter 8.pdf (Spark MLlib)


1. What does `SparkSession.builder` do?
A. Configures and creates a Spark session
B. Builds a dataset
C. Runs a machine learning model
A. Configures and creates a Spark session

2. Which column contains the target variable in the gold price dataset?
A. features
B. stock_index
C. gold_price
A. features

3. What does `VectorAssembler` do?


A. Predicts prices
B. Combines columns into a single vector
C. Cleans missing data
A. Predicts prices

4. What is the metric used to evaluate the model?


A. Mean Absolute Error
B. Accuracy
C. Root Mean Squared Error
A. Mean Absolute Error

5. What command splits the dataset?


A. df.split()
B. df.divide()
C. df.randomSplit()
A. df.split()

6. Which column is created by `VectorAssembler`?


A. features
B. vector
C. labels
A. features

7. How is the Spark session stopped?


A. spark.halt()
B. spark.exit()
C. spark.stop()
A. spark.halt()

8. What library provides linear regression in Spark?


A. pyspark.ml.regression
B. pandas
C. matplotlib
A. pyspark.ml.regression

9. What does `.show()` do?


A. Displays data
B. Plots a graph
C. Exports a CSV
A. Displays data

10. What does the `predictionCol` in RegressionEvaluator specify?


A. The column with predicted values
B. The actual prices
C. Features vector
A. The column with predicted values

11. Which command displays the full feature vector?


A. show()
B. show(truncate=False)
C. display()
A. show()

12. Which component manages task scheduling in Spark?


A. MLlib
B. Spark Streaming
C. Spark Core
A. MLlib

13. What is the default split ratio for training/testing in the tutorial?
A. 80% training, 20% testing
B. 70% training, 30% testing
C. 50% training, 50% testing
A. 80% training, 20% testing

14. What does `df.select("features", "gold_price").show()` do?


A. Shows only gold_price
B. Displays features and gold_price
C. Shows only features
A. Shows only gold_price

15. What is the input type for `createDataFrame`?


A. CSV file
B. List of tuples
C. JSON file
A. CSV file

NoSQL_Databases_and_Data_Storage.pptx
1. Which of these is NOT a NoSQL database category?
A. Relational store
B. Document store
C. Column-family store
A. Relational store

2. Which of these stores data as JSON-like documents?


A. Column-family stores
B. Document stores
C. Key-value stores
A. Column-family stores

3. Which database type is optimized for relationship queries?


A. Key-value stores
B. Document stores
C. Column-family stores
A. Key-value stores

4. Which database is an example of a key-value store?


A. MongoDB
B. Redis
C. Neo4j
A. MongoDB

5. What does 'schema flexibility' mean?


A. Different structures for different documents
B. Only one table schema
C. No relationships allowed
A. Different structures for different documents

6. What is a major advantage of NoSQL databases?


A. Strict schema design
B. High performance and scalability
C. Limited scalability
A. Strict schema design

7. Which system uses SSTables for storage?


A. MongoDB
B. Cassandra
C. Redis
A. MongoDB

8. Which NoSQL database uses Cypher query language?


A. Cassandra
B. Redis
C. Couchbase
A. Cassandra

9. In which use case is a document store preferred?


A. Product catalog
B. Fraud detection
C. Timeline caching
A. Product catalog

10. What is a key feature of Redis?


A. In-memory data storage
B. Relational table joins
C. Graph algorithms
A. In-memory data storage

11. Which of the following databases offers multi-model support?


A. Cassandra
B. Neo4j
C. ArangoDB
A. Cassandra

12. What is the primary data structure in column-family stores?


A. JSON documents
B. Column families
C. Key-value pairs
A. JSON documents

13. What is a major challenge with NoSQL databases?


A. High cost
B. No horizontal scaling
C. Consistency models can be complex
A. High cost

14. Which architecture component is used in MongoDB for query routing?


A. mongos
B. mongod
C. config servers
A. mongos

15. Which NoSQL database is designed for high write throughput?


A. Cassandra
B. Neo4j
C. MongoDB
A. Cassandra

16. Which of these is an example of a real-world use of graph databases?


A. Time-series sensor data
B. Social networks and relationships
C. Shopping carts
A. Time-series sensor data

17. What is a core advantage of key-value stores?


A. Flexible document querying
B. High relationship traversal performance
C. Extremely fast read/write performance
A. Flexible document querying

18. What format is often used by document stores to store data?


A. CSV
B. JSON or BSON
C. HTML
A. CSV

19. What does the "tunable consistency" in column-family stores refer to?
A. Schema migration
B. Configuring consistency vs availability
C. Ability to change database types
A. Schema migration

20. Which NoSQL database architecture uses HMaster and RegionServers?


A. HBase
B. Cassandra
C. Redis
A. HBase

21. In Redis, what operation would you use to automatically remove data after a set time?
A. hset
B. setex
C. lpush
A. hset

22. What is a feature of MongoDB's aggregation framework?


A. Pipeline-based data processing
B. Graph-based traversals
C. Vector transformation
A. Pipeline-based data processing

23. Which of these is NOT a feature of graph databases?


A. Path finding
B. Automatic caching of queries
C. Community detection
A. Path finding
24. Which database is known for a memory-first architecture and built-in caching layer?
A. Neo4j
B. Redis
C. Couchbase
A. Neo4j

25. Which NoSQL database integrates well with AWS and offers global tables?
A. Neo4j
B. DynamoDB
C. Redis
A. Neo4j

You might also like