ehensive MCQ Questions with Correct Answers Below (No 'Correct' Marks in O
EDA Chapter.pdf
1. What is the purpose of the `replace` method in the EDA script?
A. To visualize data
B. To drop columns
C. To convert categorical data into numerical
A. To visualize data
2. What does the `corr()` method compute?
A. The average of numerical columns
B. The minimum and maximum of each column
C. The pairwise correlation between numerical columns
A. The average of numerical columns
3. What does the `annot=True` argument in `sns.heatmap` do?
A. Hides the labels
B. Adds numerical values to each cell
C. Changes the color map
A. Hides the labels
4. Which feature in the dataset is most likely the target variable?
A. `default`
B. `housing`
C. `deposit`
A. `default`
5. Why are categorical columns transformed to numerical?
A. ML algorithms require numerical input
B. It looks better in visualizations
C. To reduce file size
A. ML algorithms require numerical input
6. What is the value of `default` when a customer has no credit default?
A. 0
B. 1
C. 2
A. 0
7. What is the default correlation method used by pandas?
A. Pearson
B. Kendall
C. Spearman
A. Pearson
8. What library is used to plot the heatmap?
A. pandas
B. seaborn
C. numpy
A. pandas
9. What is the main purpose of the Boruta algorithm?
A. Feature selection
B. Data cleaning
C. Model evaluation
A. Feature selection
10. What does `n_jobs=4` mean in the RandomForestClassifier?
A. Use 4 CPU cores
B. Use 4 datasets
C. Build 4 decision trees
A. Use 4 CPU cores
11. In Boruta, what are "shadow features"?
A. Randomly shuffled copies of real features
B. Features with missing values
C. Features that are always zero
A. Randomly shuffled copies of real features
12. Which features does Boruta mark as "confirmed"?
A. Those with the lowest correlation
B. Those more important than shadow features
C. Those that have zero values
A. Those with the lowest correlation
13. What does `.drop(['deposit'], axis=1)` do?
A. Removes the 'deposit' column
B. Drops rows with missing values
C. Drops all numeric columns
A. Removes the 'deposit' column
14. What does `verbose=2` in BorutaPy indicate?
A. Detailed output
B. No output
C. Only final result
A. Detailed output
15. What kind of visualization is used to display feature rankings?
A. Line plot
B. Bar plot
C. Scatter plot
A. Line plot
Chapter 8.pdf (Spark MLlib)
1. What does `SparkSession.builder` do?
A. Configures and creates a Spark session
B. Builds a dataset
C. Runs a machine learning model
A. Configures and creates a Spark session
2. Which column contains the target variable in the gold price dataset?
A. features
B. stock_index
C. gold_price
A. features
3. What does `VectorAssembler` do?
A. Predicts prices
B. Combines columns into a single vector
C. Cleans missing data
A. Predicts prices
4. What is the metric used to evaluate the model?
A. Mean Absolute Error
B. Accuracy
C. Root Mean Squared Error
A. Mean Absolute Error
5. What command splits the dataset?
A. df.split()
B. df.divide()
C. df.randomSplit()
A. df.split()
6. Which column is created by `VectorAssembler`?
A. features
B. vector
C. labels
A. features
7. How is the Spark session stopped?
A. spark.halt()
B. spark.exit()
C. spark.stop()
A. spark.halt()
8. What library provides linear regression in Spark?
A. pyspark.ml.regression
B. pandas
C. matplotlib
A. pyspark.ml.regression
9. What does `.show()` do?
A. Displays data
B. Plots a graph
C. Exports a CSV
A. Displays data
10. What does the `predictionCol` in RegressionEvaluator specify?
A. The column with predicted values
B. The actual prices
C. Features vector
A. The column with predicted values
11. Which command displays the full feature vector?
A. show()
B. show(truncate=False)
C. display()
A. show()
12. Which component manages task scheduling in Spark?
A. MLlib
B. Spark Streaming
C. Spark Core
A. MLlib
13. What is the default split ratio for training/testing in the tutorial?
A. 80% training, 20% testing
B. 70% training, 30% testing
C. 50% training, 50% testing
A. 80% training, 20% testing
14. What does `df.select("features", "gold_price").show()` do?
A. Shows only gold_price
B. Displays features and gold_price
C. Shows only features
A. Shows only gold_price
15. What is the input type for `createDataFrame`?
A. CSV file
B. List of tuples
C. JSON file
A. CSV file
NoSQL_Databases_and_Data_Storage.pptx
1. Which of these is NOT a NoSQL database category?
A. Relational store
B. Document store
C. Column-family store
A. Relational store
2. Which of these stores data as JSON-like documents?
A. Column-family stores
B. Document stores
C. Key-value stores
A. Column-family stores
3. Which database type is optimized for relationship queries?
A. Key-value stores
B. Document stores
C. Column-family stores
A. Key-value stores
4. Which database is an example of a key-value store?
A. MongoDB
B. Redis
C. Neo4j
A. MongoDB
5. What does 'schema flexibility' mean?
A. Different structures for different documents
B. Only one table schema
C. No relationships allowed
A. Different structures for different documents
6. What is a major advantage of NoSQL databases?
A. Strict schema design
B. High performance and scalability
C. Limited scalability
A. Strict schema design
7. Which system uses SSTables for storage?
A. MongoDB
B. Cassandra
C. Redis
A. MongoDB
8. Which NoSQL database uses Cypher query language?
A. Cassandra
B. Redis
C. Couchbase
A. Cassandra
9. In which use case is a document store preferred?
A. Product catalog
B. Fraud detection
C. Timeline caching
A. Product catalog
10. What is a key feature of Redis?
A. In-memory data storage
B. Relational table joins
C. Graph algorithms
A. In-memory data storage
11. Which of the following databases offers multi-model support?
A. Cassandra
B. Neo4j
C. ArangoDB
A. Cassandra
12. What is the primary data structure in column-family stores?
A. JSON documents
B. Column families
C. Key-value pairs
A. JSON documents
13. What is a major challenge with NoSQL databases?
A. High cost
B. No horizontal scaling
C. Consistency models can be complex
A. High cost
14. Which architecture component is used in MongoDB for query routing?
A. mongos
B. mongod
C. config servers
A. mongos
15. Which NoSQL database is designed for high write throughput?
A. Cassandra
B. Neo4j
C. MongoDB
A. Cassandra
16. Which of these is an example of a real-world use of graph databases?
A. Time-series sensor data
B. Social networks and relationships
C. Shopping carts
A. Time-series sensor data
17. What is a core advantage of key-value stores?
A. Flexible document querying
B. High relationship traversal performance
C. Extremely fast read/write performance
A. Flexible document querying
18. What format is often used by document stores to store data?
A. CSV
B. JSON or BSON
C. HTML
A. CSV
19. What does the "tunable consistency" in column-family stores refer to?
A. Schema migration
B. Configuring consistency vs availability
C. Ability to change database types
A. Schema migration
20. Which NoSQL database architecture uses HMaster and RegionServers?
A. HBase
B. Cassandra
C. Redis
A. HBase
21. In Redis, what operation would you use to automatically remove data after a set time?
A. hset
B. setex
C. lpush
A. hset
22. What is a feature of MongoDB's aggregation framework?
A. Pipeline-based data processing
B. Graph-based traversals
C. Vector transformation
A. Pipeline-based data processing
23. Which of these is NOT a feature of graph databases?
A. Path finding
B. Automatic caching of queries
C. Community detection
A. Path finding
24. Which database is known for a memory-first architecture and built-in caching layer?
A. Neo4j
B. Redis
C. Couchbase
A. Neo4j
25. Which NoSQL database integrates well with AWS and offers global tables?
A. Neo4j
B. DynamoDB
C. Redis
A. Neo4j