DA ORAL QUESTION BANK
1.List different data types in python
Text Type: str
Numeric Types: int , float , complex
Sequence Types: list , tuple , range
Mapping Type: dict
Set Types: set , frozenset
2. Mention five benefits of using Python?
• Python comprises of a huge standard library for most Internet platforms
like Email, HTML, etc. • Python does not require explicit memory
management as the interpreter itself allocates the memory to new
variables and free them automatically
• Provide easy readability due to use of square brackets
• Easy-to-learn for beginners
• Having the built-in data types saves programming time and effort
from declaring variables
3. Mention what are the various steps in an analytics project?
• Problem definition
• Data exploration
• Data preparation
• Modelling
• Validation of data
• Implementation and tracking
4. what is data cleansing?
Data cleaning also referred as data cleansing, deals with identifying and
removing errors and inconsistencies from data in order to enhance the
quality of data.
5. What are some Python libraries used in Data Analysis?
Ans. Some of the vital Python libraries used in Data Analysis include –
Bokeh
Matplotlib
NumPy
Pandas
SciKit
SciPy
Seaborn
TensorFlow
Keras
6.What is data visualization?
Ans. In simpler terms, data visualization is a graphical representation of
information and data. It enables the users to view and analyze data more smartly
and use technology to draw them into diagrams and charts.
7. What are some of the most popular tools used in data analytics?
Ans. The most popular tools used in data analytics are:
Tableau
Google Fusion Tables
Google Search Operators
Konstanz Information Miner (KNIME)
RapidMiner
Solver
OpenRefine
NodeXL
Io
Pentaho
SQL Server Reporting Services (SSRS)
Microsoft data management stack
8. WHAT IS DATA WRANGLING?
Data wrangling—also called data cleaning, data remediation, or data
munging—refers to a variety of processes designed to transform raw data into
more readily used formats. The exact methods differ from project to project
depending on the data you’re leveraging and the goal you’re trying to achieve.
Some examples of data wrangling include:
Merging multiple data sources into a single dataset for analysis
Identifying gaps in data (for example, empty cells in a spreadsheet) and either
filling or deleting them
Deleting data that’s either unnecessary or irrelevant to the project you’re
working on
Identifying extreme outliers in data and either explaining the discrepancies or
removing them so that analysis can take place
9. ) List out some common problems faced by data analyst?
• Common misspelling
• Duplicate entries
• Missing values
• Illegal values
• Varying value representations
• Identifying overlapping data
10. What is linear regression?
Linear regression analysis is used to predict the value of a variable based on the
value of another variable. The variable you want to predict is called the
dependent variable. The variable you are using to predict the other variable's
value is called the independent variable.
10. Types of Charts and Graphs to Use for Your Data
1. Bar Graph
2. Column Chart
3. Line Graph
4. Dual Axis Chart
5. Area Chart
6. Stacked Bar Graph
7. Mekko Chart
8. Pie Chart
9. Scatter Plot Chart
10.Bubble Chart
11.Waterfall Chart
12.Funnel Chart
13.Bullet Chart
14.Heat Map
Column Chart
Use a column chart to show a comparison among different items, or to show a
comparison of items over time. You could use this format to see the revenue per
landing page or customers by close date.
Line Graph
A line graph reveals trends or progress over time and you can use it to show
many different categories of data. You should use it when you chart a
continuous data set.
Bar Graph
A bar graph should be used to avoid clutter when one data label is long or if you
have more than 10 items to compare.
Scatter Plot Chart
A scatter plot or scattergram chart will show the relationship between two
different variables or reveals distribution trends. Use this chart when there are
many different data points, and you want to highlight similarities in the data set.
This is useful when looking for outliers or for understanding the distribution of
your data.
What is NumPy?
NumPy is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, fourier transform,
and matrices.
What is PANDAS?
Pandas is an open-source library that is made mainly for working with
relational or labeled data both easily and intuitively. It provides various data
structures and operations for manipulating numerical data and time series. This
library is built on top of the NumPy library. Pandas is fast and it has high
performance & productivity for users.
What is Matplotlib?
Matplotlib is a low level graph plotting library in python that serves as a
visualization utility.
Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python.
WHAT IS SEABORN
Seaborn is a Python data visualization library based on matplotlib. It provides a
high-level interface for drawing attractive and informative statistical graphics.
distplot() function is used to plot the distplot. The distplot represents the
univariate distribution of data i.e. data distribution of a variable against the
density distribution. The seaborn. distplot() function accepts the data variable as
an argument and returns the plot with the density distribution.
What is exploratory data analysis?
Learn everything you need to know about exploratory data analysis, a method
used to analyze and summarize data sets.
Exploratory data analysis (EDA) is used by data scientists to analyze and
investigate data sets and summarize their main characteristics, often employing
data visualization methods. It helps determine how best to manipulate data
sources to get the answers you need, making it easier for data scientists to
discover patterns, spot anomalies, test a hypothesis, or check assumptions.
Scatter plot, which is used to plot data points on a horizontal and a
vertical axis to show how much one variable is affected by another.
Multivariate chart, which is a graphical representation of the
relationships between factors and a response.
Run chart, which is a line graph of data plotted over time.
Bubble chart, which is a data visualization that displays multiple
circles (bubbles) in a two-dimensional plot.
Heat map, which is a graphical representation of data where values
are depicted by color.
Axes object is the region of the image with the data space. A given figure can
contain many Axes, but a given Axes object can only be in one Figure. The
Axes contains two (or three in the case of 3D) Axis objects.
Database: A database is an organized collection of structured information, or
data, typically stored electronically in a computer system. A database is usually
controlled by a database management system (DBMS).