Neelam VidyaVihar, Sijoul, Mailam, Madhubani, Bihar – 847235
Website: http://www.sandipuniversity.edu.in Email: [email protected]
SCHOOL OF COMPUTER SCIENCE AND ENGINEERING
Course Name: Data Analytics Code: CS803T
Class: B.Tech Session: 2020-2024
Year: 2023-2024 Semester: VIII
Question Bank
MCQs
1. What is the primary goal of data analysis?
a) Making predictions
b) Summarizing and interpreting data
c) Designing databases
d) Data visualization
2. What is the difference between mean and median?
a) Mean is the middle value; median is the average
b) Mean is the average; median is the middle value
c) Both represent the average
d) Mean and median are used interchangeably
3. What does the term "outlier" mean in data analysis?
a) The most common value in a dataset
b) Unusual or extreme values in a dataset
c) The difference between mean and median
d) The last value in a sorted dataset
4. Which is the most significant language for Data Science?
a) R
b) Ruby
c) Java
d) None of these
5. Data in ____ bytes size is called big data
a) Meta
b) Giga
c) Peta
c) Tera
6. A graph that uses vertical bars to represent data is called a ____.
a) Bar graph
b) Line graph
c) Scatterplot
d)All of these
7. Find the mean of 15,20 and 25 is
a) 15
b) 30
c) 20
d) 60
8. Pig is a
a) Programming language
b) Data Flow Language
c) Query language
d) Database
9. The results of a hive query can be stored as
a) Local File
b) HDFS File
c) Both
d) None of these
10. Which kind of keys(CONSTRAINTS) Hive can have?
a) Primary keys
b) Foreign keys
c) Unique keys
d) None of these
11. Which of the following is a data visualization method?
a) Line
b) Circle
c) Triangle
d) Pie chart and bar chart
12. A dashboard is a collection of ___.
a) Views
b) Hash function
c) Both A and B
d) None of these
13. Find the mean of 12, 20 and 28.
a) 30
b) 20
c) 60
d) 15
14. Big data analytics is the process of analyzing
a) Data
b) Information
c) Large and complex data set
d) None of these
15. By default when a database is dropped in Hive
a) The tables are also deleted
b) The directory is deleted if there are no tables
c) The HDFS blocks are formatted
d) None of these
16. What are the advantages of HDFS federation in Hadoop?
a) Isolation
b) Namespace scalability
c) Improves throughput
d) All of the above
17. Find the median of the data 2,5,3,4 and 6
a) 2
b) 3
c) 4
d) 6
18. What is the significance of the term "standard deviation"?
a) Measuring central tendency
b) Describing the spread or dispersion of data
c) Identifying the most frequent value
d) Representing the range of values
19. What does a histogram visualize?
a) Relationships between two variables
b) Distribution of a single variable
c) Hierarchical data structures
d) Time-series data
20. Which of the following analytical and statistical techniques do data scientists commonly use?
a) Classification
b) Regression
c) Clustering
d) All of the above
21. What is the purpose of a histogram in data visualization?
a) Displaying hierarchical relationships
b) Comparing multiple datasets
c) Showing the distribution of a single variable
d) Representing geographical data
22. How is the range of a dataset calculated?
a) The difference between the largest and smallest values b) The sum of all values
c) The product of all values
d) The average of all values
23. What is the use of data cleaning?
a) To remove the noisy data
b) Transformations to correct the wrong data
c) Correct the inconsistencies in data
d) All of the above
24. Find the mean of data 25, 30, 20 and 13
a) 20
b) 25
c) 22
d) 30
25. A view in Hive can be seen by using
a) SHOW TABLES
b) SHOW VIEWS
c) DESCRIBE VIEWS
d) VIEW VIEWS
26. During the execution of MapReduce job on which node sorting is done?
a) Reducer node
b) Mapper node
c) Both
d) None of these
27. Where is data warehousing used?
a) Transaction system
b) Logical system
c) Decision support system
d) None of these
28. Find the median of the data 3,4,5,3,1,2 and 7
a) 2
b) 3
c) 4
d) 5
29. Expand DSP
a) Digital signal processing
b) Data science programming
c) Data science process
d) Data science plan
30. Which of the intricate techniques is not used for data visualization?
a) Bullet Graphs
b) Bubble Clouds
c) Fever Maps
d) Heat Maps
Short Q&A
1. Explain about 4 V’s of big data with example.
2. Write down about popular data science toolkits.
3. Find the mean , median and mode of given data 4,8,12,12 and 16.
4. Write down about Challenges of Big Data Visualization.
5. Write down about MapReduce.
6. Explain terminology related with data science.
7. Write about cloud computing.
8. Write down about probability and frequency with examples.
9. Explain about 3 V’s of big data with example.
10. Find the mean , median and mode of given data 4,8,8,12 and 18.
11. Explain about Measures Of Central Tendency.
12. Write down about HIVE.
13. Explain about benefits of cloud computing.
14. Write down about big data and eGovernance.
15. Explain about Recent trends in various data collection and analysis techniques.
16. Explain data science process (DSP)
17. Write down about Normal distribution curve.
18. Find the mean , median and mode of given data 20,8,12,20 and 10.
19. Write down about Benefits of Big data.
20. Explain about Hadoop.
21. Differentiate between data warehouse and data lake.
22. Briefly explain data repository.
23. Define Cloud computing, Data mining an Machine Learning.
24. Explain about types of data with examples
Long Q&A
1. Explain about Big data architecture with diagram.
2. Explain the data science process (DSP).
3. Explain about Visualization Methods.
4. Write down about Problems of Estimation Population or Sample, Normal Distribution Curve.
5. Write down about types of big data analytics with proper examples.
6. Explain about Hive Architecture with diagram.
7. Explain Types of Statistical Data with examples.
8. Write down about Different Types of Statistical Means.
9. Write down about components of Hadoop framework.
10. Write down about Various Big Data Visualization Tools.
11. Explain Data Visualization and Importance of Data Visualization
12. Write down about PIG, Hadoop And Hive.
13. Write down about pig vs MapReduce.
14. Explain about Popular Data Science Toolkits.
Subject In-charge HOD
Mr. Rajiv Ranjan Mishra Mr. S. K. Singh