Which of the following data mining techniques is predictive?
- classification
It is a powerful tool that shows the network of data.
- Knime
It makes complex data more understandable and usable.
- data visualization
What is the process of deriving useful information from text?
- Text Analytics
It is used in organization’s strategic and tactical business decision making.
- business intelligence
It is a method for discovering patterns in large data sets.
- Data Mining
It includes identifying groups of data records.
- cluster analysis
Which of the following is NOT a goal in data mining?
- collecting data
Which of the following is NOT a method used in data analysis?
- Statistics Analytics
Which of the following type of text is processed in text analytics?
- unstructured
It has the goal of discovering useful information to support decision making.
- data analysis
It extracts meaningful numerical indices from information and make it available to statistical
and machine learning.
- Text analytics
_____________ includes identifying groups of data record.
- Cluster analysis
The following are artifacts used in data analysis EXCEPT:
- ANOVA
___________ uses artifacts to present data visually.
- data visualization
It transforms data into actionable intelligence for business purposes.
- Business Intelligence
The following processes are used in data analysis EXCEPT:
- collecting
It is a free software programming language.
- R-programming
What programming language doe Orange use?
- python
The goal is to transform raw data into understandable business information.
- Data mining
Which of the following type of text is processed in text analytics?
- unstructured
A matrix that has the same number of rows and columns is called
- square
A bell shaped curve that is symmetric about a vertical line.
- normal distribution
The product of a 2x5 and 5x3 matrices is a ______matrix
- 2x3
What is the value of the mean in a normal probability density function?
- 50
A special type of function where the domain is a set of consecutive integers.
- sequence
Another term for text analytics.
- text mining
The proportion of a well-defined classified positive events.
- data base
A graph that is used to indicate frequency distribution.
- histogram
It is used to enable an entity to determine consequences by thinking rather than acting.
- Knowledge Representation
Null strings are indicated by
- λ
It offers a way to examine trends from collected data and derive insights from it.
- Business Intelligence
Refers to using tools of statistics to present data visually.
- data visualization
Earlier name for data science.
- datalogy
What type of text are processed in Text analytics?
- unstructured
Which is Not an interaction data?
- data base
The proportion of a well-defined classified positive events.
- sensitivity
It is a collection of machine learning algorithms for data mining task.
- WEKA
Which of the following is NOT a module in rapid Miner?
- loop
Which of the following pertains to predictive data mining technique?
- Regression
_____________ is rated as the number one business analytics software.
- Rapid miner
Primarily used for data pre-processing.
- Knime
It is a perfect software which is written in Python computing language.
- Orange
Which of the following is NOT a data mining tool?
- Python
It is a module in rapid miner that considers the workflow.
- studio
The following are data mining techniques EXCEPT:
- Collection
Which is primarily written in C and in Fortran?
- R-programming
It sees a set of prototypes in particular prototypical diseases to be matched against the case at hand.
- INTERNIST
It is a numerical description of the outcome of a statistical experiment.
- random variable
The creation of data from varied sources and its qualification into information.
- datafication
If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is
- {3,5,6}
The sets A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the
word "STATISTICS"} , the two sets are
- joint
In α =babaa β =a^6b^5bb, what is the length of the concatenation of the two strings?
- 18
What does GLM means?
- Generalized Linear model
The process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful
information.
- data analysis
Empirical rule for a normal distribution lie ______% of data with 1 standard deviation below and above
the mean.
- 68
Another term for an empty set.
- null
Which is NOT a basic representation technology?
- graph
If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in the
word "STATISTICS"} then their intersection is
- {A,C,I,S,T}
The range in R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is
- {3,5,6,10,12}
The proportion of well defined negative events is called ________________.
- specificity
The symbol used to indicate strings with no elements.
-λ
What programming language is used in Rapid miner?
- Java
It is a method for discovering patterns in large data sets.
- Data Mining
_______________ is a data structure that every component has a unique processor and succesor.
- linear
What is an organized collection of information and set of information used to manage that operation?
- ADT
Which of the following is the transpose of B?
- -8 7 1 0
What is the correct meaning of ADT?
- Abstract Data Type
Which of the following is TRUE?
- A + B = B+ A
If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?
- { (3,4) (3,5) (2,4 ) {2,5) }
Which of the matrices is singular?
-A
What is the size of the product of a 5x 6 and a 6x 8 matrices?
- 5x 8
Matrix B is
- invertible
Addition and subtraction of matrices only is possible if two are more matrices.
- Have same sizes.
An array is a good example of _________data structure.
- static
It refers to a data structure that grows and shrinks at execution time.
- dynamic
ML means:
- Machine Learning
The intersection of the two sets A={ 2,3} B={4,5} is a
- null set
What is a data structure that has a fixed size?
- static
The two sets If A={ 2,3} B={4,5} are said to be
- adjoint
What is the earlier name for data science?
- datalogy
3A + B =
- -14 -2 13 18
What is the focus of data science?
- manipulate data efficiently and effectively
Which is NOT a characteristic feature of data structure?
- It contains a fixed structure.
The method that does NOT require t he assumption that the parameters are normally distributed.
- profile likehood
He coined the term "data scientist"
- DJ Patil
LR means ________________________.
- Logistic Regression
What range of values 3 SD below and above the mean in a normal distribution if the mean is 10 and
standard deviation is 2?
- 4-16
The following are the 3V's of big data EXCEPT
- veracity
According to Hilary Mason which is NOT a skill that a good data scientist must cultivate.
- critical thinking
Data is NOT information unless we add_________.
- analytics
The expected value or mean of a random variable in discrete case.
- probability mass distribution
A graph used to indicate intervals in a frequency distribution is refereed to as a______________.
- histogram
It expands available data enormously.
- text mining
A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How many can be expected to
weigh between 0.31 to 0.91 in a shipment of 4500 tomatoes.
- 4275
The quantification of data into information.
- datafication
The major outcome of correlation.
- prediction
Which belong to the GLM family?
- logistic and linear
Which is NOT a correct correlation Coefficient?
- 1.2
KR means __________________________.
- Knowledge Representation
He is someone who asks interesting questions on formal and informal theory.
- data scientist
He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data
- Eric Schmidt
The creation of data from varied sources and its quantification into information.
- Datafication
PAW means____________.
- Predictive Analytics World
Exabyte means ________bytes
- billion billion
It refers to well based theories and sound business judgement.
- Data Science
IOT means
- Internet of things
He said that “ In mathematics the art of proposing a question must be held of higher value than solving
it”.
- Georg Cantor
These are the data skills that a good data scientist need to cultivate EXCEPT
- speaking
The person who said that “ The future is not google-able”.
- William Gibson/William Gillason
How many bytes of data are generated every two days in today's world?
- 5 exabytes
“ All models are wrong but some are useful “
- George E. P. Box
The explosion of _______data is the main reason why every 2 days 5 exabytes of data are generated.
- interaction
A new phenomenon for the explosion of _________data
- interaction
The developer of farmville, a famous game in the internet.
- Zynga Incorporated
The creation of data from varied sources and its qualification into information.
- datafication
It shows a high correlation between the incidence of flu and searches about flu on google.
- Google Flu trends
It expands available data enormously since there is so much more text being generated than numbers.
- Text mining
What is a great example of data product?
- google maps
A distribution where large distribution are displayed.
- Grouped frequency distribution
What increases data volume?
- velocity
It is often used as model of of the number arrivals at a facility in a given period of time.
- poison probability distribution
It views the world in thinking of prototypical objects.
- frame
As of 2014,there are _______million of tweets a day.
- 500
The proportion of a well-classified negative event.
- specificity
The following are elements in an analytic plan EXCEPT
- graphs
It allows you to see which value of the explanatory variable corresponds a given probability success.
- probability analysis table
A positive z-score means that the score is
- Higher than the mean
If there are 101 scores the median is equal to the _____ranked score.
- 51st
The score easily affected by extreme values is the _________.
- mean
On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students
who took the exam. What is the percentile for Jef’s score?
- 48th
If in a distribution all scores are distinct then_____________.
- it is skewed.
Which of the following statements is TRUE?
- Q2=Range
The most frequent score.
- mode
If the standard deviation of a distribution is 3, the variance is
- 1.41
The distribution 2,4,4,4,5,5,6,8,9 is said to be
- bimodal
The standard deviation for the data in 2,4,4,4,5,5,6,8,9
- 2.15
Which is NOT a measure of variability?
- range
Which is not a measure of central tendency?
- standard deviation
A score of 3 in 2,4,4,4,5,5,6,8,9 is
- 1.18 below the mean
A distribution with 4 modes is said to be a _________distribution.
- bimodal
What is value of quartile 3 in 2,4,4,4,5,5,6,8,9 ?
-6
In 2,4,4,4,5,5,6,8,9 the range is
-5
Which is NOT a measure of central tendency?
- quartile
The number that occurs most frequently is called________.
- Mode
Another term for variability.
- dispersion
The score NOT easily affected by extreme values.
- Median
it is a perfect software for machine learning.
- orange
The following are large inputs EXCEPT
- Big beta notation
It relates the length of an algorithm’s input to the number of steps it takes.
- time complexity
The sets A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the
word "STATISTICS"} , the two sets are
- joint
Which of the following is a predictive data mining technique?
- regression
Algorithm analysis is an important part of a broader_____________.
- computational complexity theory
He coined the term “analysis of algorithms”.
- Donald Knuth
It is a process of finding the computational complexity of algorithms.
- analysis of algorithms
If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in the
word "STATISTICS"} then their intersection is
- {A,C,I,S,T}
The following are softwares used in data mining EXCEPT
- SPSS
It relates the length of an algorithm to the number of storage location it uses.
- space complexity
It is used to discover patterns in large data sets
- Data mining
An example of an abstract computer.
- Turing machine
It is popular among financial data analysts.
- Knime
A special type of function where the domain is a set of consecutive integers.
- Sequence
It is used for prototyping in Rapid miner.
- studio
The function describing the performance of an algorithm is usually an upper bound determined from
______inputs.
- worst case
It is a process of finding the computational complexity of algorithms.
- analysis of algorithms
The constant multiplicative factor in which algorithms are related are_______ constants.
- hidden
In α =babaa β =a^6b^5bb, what is the length of the concatenation of the two strings?
- 18
There are how many data mining techniques?
-7
It is a theoretical classification that estimates and anticipates the increase increase in running time for
algorithms.
- run time analysis
Which of the following is TRUE when a distribution is normal?
- Mean
It partitions a ranked data into four equal groups.
- quartile
If there are 103 scores the median is equal to the _____ranked score.
- 52nd
The creation of a data product contains 3 components EXCEPT
- time
A data having the same number of occurrence in scores is said to be
- no mode
A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a
normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27
and 43?
- 95
It refers to the degree of relationship between two variables?
- Correlation
A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. What percent of the tomatoes
weigh less than 0.71 lb?
- 85, 95
A perfect positive correlation coefficient is equal to
-1
It list the percent of data in a distribution.
- relative frequency distribution
What percent of data will lie within 2 standard deviation of the mean?
- 95
If the standard deviation of a distribution is 3.5, the variance is
- 12.25
On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students
who took the exam. What is the percentile for Jef’s score?
- 48th
The equation of the _______line predicts the value of Y given X.
- Regression
A bell-shaped distribution that is symmetric about a vertical line?
- Normal
A positive z-score means that the score is
- Higher than the mean
A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How many can be expected to
weigh between 0.31 to 0.91 in a shipment of 4500 tomatoes.
- 4275
Example of a data product.
- google map
A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a
normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39?
- 84
Which is NOT a value of r ?
- -0.05 0.98
In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?
- 9.38
The score NOT easily affected by extreme values.
- Median
The value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is
-5
A bell-shaped distribution that is symmetric about a vertical line.
- Normal
What is the value of the mean if a score of 110 is 3 standard deviation above the mean?
- 95
The area of the standard normal curve to the right of z=0.82 is _______.
- 0.206
The method of correlation used for ranked score is ________.
- Spearman rho
What range of values lie between 3 standard deviations above and below the mean if the mean is 80
and the standard deviation is 3?
- 71-89
Data involving two variables.
- bivariate
The normal distribution with a mean of 0 and standard deviation of 1.
- Standard
Who said that "The future is not google-able " ?
- William Gillason
The difference between the highest and lowest value.
- range
A negative correlation exists when___________.
- x increases y decreases
Which of the following is used as a method for Correlation?
- Pearson r
A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?
- 10
The middle-most value in a ranked list of numbers.
- median
A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How many can be expected to
weigh more than 0.31 lb in a shipment of 6000 tomatoes.
- 200 150