GB
KB
1 The size of big data
MB
Peta bytes
Facebook
Google
2 These data come from many sources like
Amazon
All of these
huge data
big data
3 Satellite gives
Minimum data
Both a and b
10^15
10^17
4 Petabyte=
10^18
10^19
10^15
10^17
5 Exabyte=
10^18
10^19
10^15
10^17
6 Zettabyte=
10^18
10^21
10^21
10^22
7 Yottabyte=
10^23
10^24
10^21
10^22
8 Padma=
10^23
10^32
human brain capacity
animal's brain capacity
9 2.5 petabytes =
downloaded documents of 1 day
None of these
genome sequences of 7 billion people (N)
genome sequences of 8 billion people (N)
10 4.75 exabytes=
genome sequences of 9 billion people (N)
genome sequences of 10 billion people (N)
Total digital data created in 2008 (N)
Total digital data created in 2009 (N)
11 422 exabytes=
Total digital data created in 2010 (N)
Total digital data created in 2011 (N)
10 billion messages (N)
A billion messages (N)
12 Facebook can generate approximately
20 billion messages (N)
30 billion messages (N)
structured (N)
unstructured (N)
13 Big Data can be
semi-structured (N)
All of these (N)
Tabular form (N)
Non tabular form (N)
14 Structured data
variable form (N)
None of these (N)
structured (N)
Un structured (N)
15 Any data with unknown form
semi-structured (N)
None of these (N)
heterogeneous data source (N)
non heterogeneous data source (N)
16 Unstructured data is
fixed format (N)
None of these (N)
XML file (N)
HTML file (N)
17 semi-structured data is a data represented in an
Node JS file (N)
None of these (N)
Structured data (N)
unstructured data (N)
18 Web server logs is an example of
Quasi-structured Data (N)
None of these (N)
Filter (N)
Reliable (N)
19 Veracity means
manage data (N)
All of these (N)
demanding data rapidly (N)
demanding data slowly (N)
20 The primary aspect of Big Data is to provide
no demanding data (N)
None of these (N)
business processes (N)
sensors (N)
21 Big data velocity deals
mobile devices (N)
All of these (N)
costs of data management (N)
data quality (N)
22 Big Data approaches are reducing the
standardization (N)
None of these (N)
Identify the data we have (N)
Identify the data we need (N)
23 Big data focus on
what you want to achieve (N)
All of these (N)
organizational fitness (N)
suitability of the business challenge (N)
24 Aspects of the appropriateness of big data
big data’s contribution to the organization" (N)
All of these (N)
computing resources (N)
collection of storage (N)
Most big data applications achieve their
25
performance through Both collection of storage and resources (N)
None of these (N)
Awareness of the architecture of computing platform (N)
Hardware (N)
26 Big data application is directly dependent on
software (N)
All of these ()
CPU (N)
processor (N)
27 Processing capability, often referred to as a
node (N)
All of these (N)
memory (N)
XML file (N)
Most single node machines have a limit to the
28
amount of XML file and memory (N)
None of these (N)
persistence of data (N)
non-demanding data (N)
29 Storage provides
demanding data (N)
None of these (N)
persistence of data (N)
pipes (N)
30 Network provides
demanding data (N)
All of these (N)
the pool of processing nodes (N)
assigns tasks (N)
31 A master job manager oversees
monitors the activity (N)
All of these (N)
the data storage pool (N)
distributes datasets (N)
32 A storage manager oversees
Both the data storage pool and distribute datasets (N)
None of these (N)
local (N)
close (N)
33 Threads process data
minimize the costs of data access latency (N)
All of these (N)
distributed file system (N)
Centralized file system (N)
34 Hadoop comes with a
Both distributed and central (N)
None of these (N)
difficult access (N)
easier access (N)
35 HDFS provides
Both easier and difficult access (N)
None of these (N)
fault tolerant (N)
designed using low-cost hardware. (N)
36 HDFS is highly
Both fault tolerant and designed using low-cost hardware (N)
None of these (N)
permissions (N)
authentication (N)
37 HDFS provides file
Both permissions and authentication (N)
None of these (N)
64MB (Y)
64 peta bytes (N)
38 The default block size of HDFS is
64 TB (N)
64 GB (N)
read-write operations (N)
read operations (N)
39 Datanodes perform
write operations (N)
None of these (N)
Manages the file system namespace (N)
Regulates client’s access to files (N)
40 The namenode performs
renaming (N)
All of these (N)
distributed data nodes (N)
data nodes (N)
41 The name node effectively coordinates
central data nodes (N)
None of these (N)
computation (N)
analysis (N)
42 Map functions is/are
set pairs (N)
All of these (Y)
parallel (N)
sequential (N)
43 The tasks that can be executed in map reduce
logical (N)
None of these (N)
MapReduce (N)
HDFS (N)
44 what are the components of big data
YARN (N)
All of these (N)
volume (N)
velocity (N)
45 What are the 4 V's in big data
variety (N)
All of these (N)
Facebook (N)
apple (N)
46 The world's largest hadoop cluster
datamatics (N)
None of these (N)
Structured (N)
Semi Structured (N)
have a structure but cannot be stored in a
47
database Unstructured (N)
None of these (N)
Velocity (N)
variety (N)
refers to the ability to turn your data useful
48
for business Value (N)
Volume (N)
HDFS (N)
Hadoop (N)
is an open source framework for storing
49 data and running application on clusters of
commodity hardware. MapReduce (N)
Cloud (N)
Validation (N)
Verification (N)
is factors considered before Adopting Big
50
Data Technology Data (N)
Design (N)
MAPPER (N)
REDUCER (N)
takes the grouped key-value paired data as
51 input and runs a Reducer function on each one of
them COMBINER (N)
PARTITIONER (N)
Decision Nodes (N)
End Nodes (N)
Choose from the following that are Decision Tree
52
nodes? Chance Nodes (N)
All of Above (N)
Decision tree (N)
Graphs (N)
A is a decision support tool that uses a tree-
like graph or model of decisions and their
53
possible consequences, including chance event
outcomes, resource costs, and utility. Trees (N)
Neural Networks (N)
Disks (N)
Squares (N)
54 Decision Nodes are represented by
Circles (N)
Triangles (N)
Disks (N)
Squares (N)
55 Chance Nodes are represented by
Circles (N)
Triangles (N)
Disks (N)
Squares (N)
56 End Nodes are represented by
Circles (N)
Triangles (N)
Possible Scenarios can be added (N)
Use a white box model, If given result is provided by a model (N)
Which of the following are the advantage/s of
57
Decision Trees? Worst, best and expected values can be determined for different
scenarios (N)
All of Above (N)
data mining process (Y)
not data mining process (N)
58 classification is
clustering process (N)
None of these (N)
Attributes are equally important (N)
Attributes are statistically dependent of one another given the class
value (N)
Which of the following statements about Naive
59
Bayes is incorrect? Attributes are statistically independent of one another given the class
value. (N)
Attributes can be nominal or numeric (N)
Full distribution (N)
Joint distribution (Y)
How the bayesian network can be used to answer
60
any query?" Partial distribution (N)
All of these (N)
Functionally dependent (N)
Dependant (N)
What is the consequence between a node and its
61
predecessors while creating bayesian network?" Conditionally independent (N)
Both Conditionally dependant & Dependant (N)
A component of a network (N)
In the context of KDD and data mining, this refers to random errors in
a database table. (N)
62 Node is
One of the defining aspects of a data warehouse (N)
None of these (N)
exclusive method (Y)
inclusive method (N)
The classification method in which the upper limit
63 of interval is same as of lower class interval is
called mid point method (N)
None of these (N)
Assumes that all the features in a dataset are equally important (N)
Assumes that all the features in a dataset are independent (N)
Which of the following is true about Naive Bayes
64/63
?
both (N)
None of these (N)
Partitioning methods (N)
Hierarchical methods (N)
65 categories of clustering methods
Grid based methods (N)
All of these (N)
Scalability (N)
Ability to deal with noisy data (N)
66 Requirements of cluster analysis Minimal requirements for domain knowledge to determine input
parameters (N)
All of these (N)
statistical classifiers (N)
predict class (N)
67 Bayesian Classifiers are
both (N)
None of these (N)
Solving queries (N)
Increasing complexity (N)
68 Where does the bayes rule can be used?
Decreasing complexity (N)
Answering probabilistic query (N)
Twitter(N)
Google(N)
Which of the following is not an example of Social
69/
Media? Insta(N)
Youtube(N)
TB(N)
YB(N)
By 2027, the volume of data produced digitally
70/68
will reach to ZB(N)
EB(N)
Google(N)
NetFlix(N)
Which of the following options is not the example
71
of NoSql ? Amazon(N)
CERN(N)
Open-Source(N)
Scalability(N)
What are the different features of Big Data
72
Analysis? Data Recovery(N)
All the above(N)
Finding the appropriate features is hard(N)
Recommendations for new users(N)
73 In Content-based Approach problem is/are
Both Finding the appropriate features is hard and Recommendations
for new users(N)
None of these(N)
possible(N)
impossible(N)
74 Can decision tree be used for clustering?
impossible in some scenario(N)
None of these(N)
1(N)
2(N)
There are major classification
75
collaborative filtering mechanisms 3(N)
None of these(N)
content based systems(N)
hybrid system(N)
recommended items based on
76
similarity measures between users and/or items collaborative filtering system (N)
none of these(N)
market basket analysis(N)
itemset filtering(N)
77 Association rules are sometimes referred to as
frequent item set analysis(N)
none of these(N)
Mapper(N)
Reducer(N)
maps input key/value pairs to a set
78
of intermediate key/value pairs Both Mapper and Reducer(N)
None of the above(N)
task(N)
output(N)
The number of maps is usually driven by the total
79
size of . input(N)
none(N)
structured(N)
unstructured(N)
NoSQL databases is used mainly for handling
80/78
large volumes of data semi-structured(N)
79 is missing
None of above(N)
Cassandra(N)
Scylla(N)
Which of the following is not an example of a
81/80
nosql database management system? Handhoop / Hbase(N)
PostgreSQL(N)
Uses JSON(N)
Needs a schema(N)
Which of the following is a characteristic of a
82
NoSQL database? Requires JOINs(N)
Uses tables for storage(N)
Network(N)
Distributed(N)
83 NoSQL databases are most often referred to as
Relational(N)
Object-oriented(N)
Field(N)
Database(N)
Which of the following represent column in
85/83
NoSQL Collection(N)
Document(N)
High availability(N)
Low availability(N)
86 The core principle of nosql is
both High & Low availability(N)
None of above(N)
Scalability(N)
Relational data(N)
Which of the following is not a strong feature for
87 Faster data access than RDBMS.(N)
nosql databases?
Data easily held across multiple servers(N)
Document databases.(N)
Key-value stores(N)
88 What are the types of nosql databases
Graph & Column-oriented databases.(N)
All of the above(N)
89
Key-value(N)
Document(N)
Which of the following are the simplest NoSQL
90/87
databases? Wide-column(N)
All of the above(N)
NoSQL is not suitable for storing structured data.(N)
NoSQL databases allow storing non-structured data.(N)
91 What is the aim of nosql?
NoSQL is a new data format to store large datasets.(N)
NoSQL provides an alternative to SQL databases to store textual
data.(N)
ALWAYS True(N)
True only for Apache Hadoop(N)
92 Hadoop is open source.
True only for Apache and Cloudera Hadoop(N)
ALWAYS False(N)
Analytics(N)
Data mining(N)
The Process of describing the data that is huge
93
and complex to store and process is known as Big Data(N)
Data Warehouse(N)
Text file, Audio Files, Video Files(N)
Only Text data(N)
94 Unstructured Data Consists of:
Tagged Data(N)
None of the above(N)
Weather forecasting(N)
Marketing(N)
Check below the best answer to “which industries
95 employ the use of so-called “Big Data” in their
day to day operations? Healthcare(N)
All of the above(N)
It is a distributed framework(N)
The main algorithm used in it is Map Reduce(N)
Which one of the following is false about
96
Hadoop?
It runs with commodity hardware(N)
All are true(N)
Data Node(N)
NameNode(N)
Which of the Node serves as the master and
97//91
there is only one NameNode per cluster. Data block(N)
Replication(N)
Hive(N)
Imphala(N)
98 which of the File system is used by HBase?
Hadoop(N)
Scala(N)
Data Node(N)
NameNode(N)
A serves as the master and there is
99
only one NameNode per cluster. Data block(N)
Replication(N)
unstructured(N)
structured(N)
NoSQL databases is used mainly for handling
100
large volumes of data. semi-structured(N)
all of the mentioned(N)
Creation of a record(N)
Modification of a record(N)
101 Hbase creates a new version of a record during
Deletion of a record(N)
All the above(N)
sequence of data items that arrive in some order and may be seen only
once.(N)
sequence of data items that arrive in some order and may be seen
twice.(N)
102 Real-time data stream is
sequence of data items that arrive in same order(N)
sequence of data items that arrive in different order(N)
It is possible to delete an element from a Bloom filter.(N)
A Bloom filter always returns the correct result.(N)
Which of the following statements about standard
103 It is possible to alter the hash functions of a full Bloom filter to create
Bloom filters is correct?
more space.(N)
A Bloom filter always returns TRUE when testing for a previously added
element(N)
Accept those tuples in the stream that meet a criterion(N)
Accept data in the stream that meet a criterion.(N)
104/96 In Filtering Streams
Accept those class in the stream that meet a criterion(N)
Accept rows in the stream that meet a criterion. (N)
through all stream elements whose keys are in Set(N)
through all stream elements whose keys are in class(N)
The purpose of the Bloom filter is to
105
allow
through all data elements whose keys are in Set(N)
through all touple elements whose keys are in Set(N)
worker-master fashion(N)
master-slave fashion(N)
106 HDFS works in a fashion.
master-worker fashion(N)
slave-master fashion(N)
web traffic(N)
internet(N)
Which one does not belong to application of data
107
stream? sensor data(N)
None of these(N)
mining query stream(N)
mining login stream(N)
Google wants to know which queries are frequent
108
today than yesterday mining search stream(N)
mining click stream(N)
Mining query stream(N)
Mining login stream(N)
Yahoo wants to know which of its pages are
109
getting unusual number of hits in the past Mining search stream(N)
Mining click stream(N)
financial applications(N)
network monitoring(N)
Which was not following the data stream
110
concepts? fraud detection(Y)
web application(N)
document(N)
key-value(N)
111 A store is a simple database that when
graph(N)
simple(N)
mapped, reduce(N)
mapping, Reduction(N)
The MapReduce algorithm contains two important
112
tasks, namely . Map, Reduction(N)
Map, Reduce(N)
Accept those tuples in the stream that meet a criterion.(N)
Accept data in the stream that meet a criterion.(N)
113 In Filtering Streams
Accept those class in the stream that meet a criterion(N)
Accept rows in the stream that meet a criterion.(N)
continuous queries(N)
one time queries(N)
In streaming queries, alter the user when stock
114
crosses over a price point is an example of sampling queries(N)
none of these(N)
MongoDB(N)
Oracle (N)
115 Which data base is popular?
Mysql(N)
None of the above(N)
Not SQL(N)
No usage of SQL(N)
116 No SQL means
Not only SQL(N)
Not for SQL(N)
Google (N)
NetFlix(N)
117 Which is not example of NoSQL?
Amazon(N)
None of these(N)
Twitter(N)
Facebook(N)
118 Graph model of NoSQL used in
Google (N)
WhatsAPP(N)
column based(N)
key value based(N)
119 MongoDB is
document based(N)
graph based(N)
Local file(N)
HDFS(N)
120 Hive query can be stored in
Both(N)
Can not be stored(N)
Made read only by setting the read only option(N)
Always writeable(N)
121 Hbase tables are
Always read only(N)
Are made read only using the query to the table(N)
high in size(N)
speed of data(N)
122 What is true about Variety in bigdata?
data from(N)
data in certain(N)
Cassandra(N)
Riak(N)
123 Which of the following is a wide-column store?
MongoDB(N)
Redis(N)
Larry Page(N)
Doug Cutting (N)
124 Hadoop developed by
Mark (N)
Bill Gates(N)
poor results(N)
poor data(N)
125 Problems in recommendation systems
Lack of data(N)
All of these(N)
content(N)
collaborative(N)
126 Type of recommender systems
knowledge(N)
All of these(N)
Finding frequent patterns(N)
associations(N)
127 Association Mining is
correlations(N)
All of these(N)
Basket data analysis(N)
cross-marketing(N)
128 Applications of Association rules
clustering(N)
All of these(N)
coherent signals(N)
packets of data(N)
129 A data stream is a sequence of digitally encoded
data packets(N)
All of these(N)
analyzes data(N)
correlates data(N)
130 Real Time Analytics Platform (RTAP)
predicts outcomes(N)
All of these(N)
unbounded in size(N)
generated continuously in real time(N)
131 A data stream is potentially
the volume of the data is very large(N)
All of these(N)
Large data volume(N)
likely structured(N)
132 Data Stream is
arriving a very high rate(N)
All of these(N)
Security applications(N)
Telecom call records(N)
133 Data streams are in actions
Financial applications(N)
All of these(N)
Can eliminate the need for large data engineering projects(N)
Performance, high availability and fault tolerance built in(N)
134 Benefits of a modern streaming architecture
Flexibility and support for multiple use cases(N)
All of these(N)
THE END OF MCQ
THIS WORD COLOR MEANS NOT IN SIRS
PDF
WORD ANS NOT VALID CHECK ANS FROM PDF