Class X
Unit 4 Data Science
Q1. What are the different applications of data science?
Ans: Applications
• Fraud and risk detection
• Internet search
• Website recommendation
• Medical diagnosis
• Digital marketing
Q2. Define system map?
Ans: It is a too that helps us find relationship between different elements of the problem which we have scoped.
Q3. Define data acquisition and sources of data?
Ans: The process of collecting data from different sources is called data acquisition or data collection.
Sources of data
• Offline
o Sensor
o Surveys
o observations
o interview
• Online
o Internet
o Government publication and portals
Q4. Explain the different types of data in Ai?
Ans: There are three types of data.
• Structured data
o Data has a specific pattern
o Can be stored in specific forms
o Easy to analyze
o Examples – csv files, sql files, excel files
• Unstructured data
o Data does not have predefined pattern
o Can be stored in any form
o Difficult to analyze
o Examples: Voice, Videos, Emails, Documents, etc.
• Semi structured data
o Data does not contain predefined format
o But data has followed some structural properties that makes analyzing it easier.
o Example: Ms word, Html, Pdf, etc.
Q5. Explain the different python packages available in python?
Ans: Python packages
• Numpy
o Stands for numerical python
o Important python library
o Provides multidimensional array object.
▪ 1d array: is also known as vector. It has single dimension.
▪ 2d array: is also known as matrix. It has two dimensions.
▪ Multidimensional array: It has more than two dimensions. It is also known as tensor.
• Pandas
o Pandas derives its name from panel data.
o Open-source library
o Used for data manipulation and analysis.
o Offers flexible data structures
▪ Series
▪ Data frame
• Matplotlib
o Cross platform
o Data visualization and graphical plotting library for python and its numerical extension.
o Matplotlib consists of several plots – line, bar, pie, scatter, histogram.
• Scipy
o Stands for scientific python
o Open-source library
o Builds upon numpy
o Used for solving mathematical, scientific, engineering and technical problems.
Q6. Define data visualization?
Ans: It aids in creating interactive visual representations of data which seems tedious with tables and numbers. It
gives us a clear idea of the information by providing visual context through maps or graphs.
Q7. Define following:
Ans:
• Scatter plot
o Used to visually represent the relationship between two variables.
o Use scatter ( ) function to draw a scatter plot under pyplot module.
• Bar charts
o Most common method to visualize data
o X and Y axis represent the category of data with rectangular horizontal bars
o Use barh ( ) function to draw a bar chart under pyplot module.
• Pie chart
o It draws one piece for each value in the array and makes a round pie chart.
o Also known as wedge.
o Use pie ( ) function to draw a pie chart under pyplot module.
• Histogram
o Representation of the probability distribution of continuous data.
o Use hist ( ) function to draw a histogram under pyplot module.
• Box plots
o Also known as whisker plot.
o Used to display summary of the set of data values.
o Use boxplot ( ) function to draw a box plots under pyplot module.
Q8. Explain K-nearest neighbor model (KNN model)?
Ans: KNN model is widely used machine learning and supervised learning algorithm that can be utilized to solve both
classification and regression programs.
Features of KNN
• Uses labelled input data set to predict the class of the data points.
• KNN is non parametric.
• KNN is known as lazy algorithm.
Q9 What are the advantages and disadvantages of KNN
Ans: Advantages of KNN
• Easy to understand and implement
• Use for both classification and regression problems
• It is faster than other algorithms
Disadvantages of KNN
• Speed efficiency of KNN declines as the dataset grows.
• Does not perform well on imbalanced data
Q10 What are the applications of KNN?
Ans: Applications of KNN
• Agriculture – used for crop prediction
• Recommendation systems – such as Netflix, amazon, youtube etc.
• Medical – eases data mining and explores the hidden patterns and relationships among the data.
Q11. What is alias?
Ans: Alias are the temporary name given to module, functions etc., for the purpose of easy reference.
Points to remember:
• Csv – comma separated value
• Sql – structure query language
• Each value in array is known as scalar or 0-d array
• Most common object defined in numpy is called ndarray.
• Show ( ) method used to show the chart on screen.