0% found this document useful (0 votes)

5 views95 pages

Unit - I Data Science

Uploaded by

sonidev0725

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views95 pages

Unit - I Data Science

Uploaded by

sonidev0725

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 95

Session 1

• Defining data science and big

data,
• Recognizing the different types
of data,
• Gaining insight into the data
science process

08/03/22
Data All Around
• Lots of data is being collected and
warehoused
– Web data, e-commerce
– Financial transactions, bank/credit
transactions
– Online trading and purchasing
– Social Network
– Cloud

08/03/22
Data and Big Data
• “90% of the world’s data was generated in
the last few years.”
• Due to the advent of new technologies,
devices, and communication means like
social networking sites, the amount of data
produced by mankind is growing rapidly
every year.
• The amount of data produced by us from the
beginning of time till 2003 was 5 billion
gigabytes. If you pile up the data in the form
of disks it may fill an entire football field.
• The same amount was created in every two
days in 2011, and in every six minutes in
2016. This rate is still growing enormously.

08/03/22
What is Big Data
• Big Data is a collection of large
datasets that cannot be
processed using traditional
computing techniques.
• It is not a single technique or a
tool, rather it involves many
areas of business and
technology

08/03/22
• Big Data is any data that is expensive to
manage and hard to extract value from
– Volume
• The size of the data
– Velocity
• The latency of data processing
relative to the growing demand for
interactivity
– Variety and Complexity
• The diversity of sources, formats,
quality, structures.
08/03/22
08/03/22
Characteristics of Big
Data: Volume
Data Volume
• 44x increase from 2009 to 2020
• From 0.8 zettabytes to 35zb
Data volume is increasing exponentially

08/03/22
Characteristics of Big
Data: Variety
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media data,
multi-dim arrays, etc…
• Static data vs. streaming data A single
application can be generating/collecting
many types of data

08/03/22
Characteristics of Big
Data: Velocity
• Data is begin generated fast and need to be
processed fast

• Online Data Analytics

• Late decisions, missing opportunities

Examples
• E-Promotions: Based on your current location,
your purchase history, what you like send
promotions right now for store next to you.
• Healthcare monitoring: sensors monitoring your
activities and body any abnormal measurements
require immediate reaction

08/03/22
Benefits of Big Data
• Using the information kept in the social
network like Facebook, the marketing
agencies are learning about the response for
their campaigns, promotions, and other
advertising mediums.

• Using the information in the social media like

preferences and product perception of their
consumers, product companies and retail
organizations are planning their production.

• Using the data regarding the previous

medical history of patients, hospitals are
providing better and quick service

08/03/22
Big Data Technologies
• Operational Big data
• Analytical Big data

08/03/22
Operational Big Data
• These include systems like
MongoDB that provide operational
capabilities for real- time,
interactive workloads where data is
primarily captured and stored.

• NoSQL Big Data systems are

designed to take advantage of new
cloud computing architectures that
have emerged over the past
decade to allow massive
computations
08/03/22
to be run
Analytical Big Data
• These includes systems like
Massively Parallel Processing (MPP)
database systems and MapReduce
that provide analytical capabilities
for retrospective and complex
analysis that may touch most or all
of the data.
• MapReduce provides a new method
of analyzing data that is
complementary to the capabilities
provided by SQL, and a system
based on MapReduce that can be
scaled up from single servers to
thousands08/03/22
of high and low end
Who generates Big Data?
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Sensor technology and
networks
(measuring all kinds of data)
Mobile devices
(tracking all objects all the time)

08/03/22
Challenges in Big Data
• The major challenges associated with
big data are as follows:
– Capturing data
– Curation
– Storage
– Searching
– Sharing
– Transfer
– Analysis
– Presentation

08/03/22
What is Data Science ?
• An area that manages,
manipulates, extracts, and
interprets knowledge from
tremendous amount of data.
• Data science (DS) is a
multidisciplinary field of study
with goal to address the
challenges in big data.
• Data science principles apply to
all data –08/03/22
big and small.
What is Data Science ?
• Theories and techniques from many fields
and disciplines are used to investigate and
analyze a large amount of data to help
decision makers in many industries such as
science, engineering, economics, politics,
finance, and education
– Computer Science
• Pattern recognition, visualization, data
warehousing, High performance
computing, Databases, AI
– Mathematics
• Mathematical Modeling
– Statistics
• Statistical08/03/22
and Stochastic modeling,
Data Science Disciplines

08/03/22
Real Life Examples
• Internet Search
• Digital Advertisements (Targeted
Advertising and re- targeting)
• Recommender Systems
• Image Recognition
• Speech Recognition
• Gaming
• Price Comparison Websites
• Airline Route Planning
• Fraud and Risk Detection
• Delivery logistics

08/03/22
Internet Search

08/03/22
Digital Advertisements (Targeted Advertising
and re- targeting)

08/03/22
Recommender Systems

08/03/22
Image Recognition

08/03/22
Speech Recognition

08/03/22
Price Comparison Website

08/03/22
Airline Route Planning

08/03/22
Fraud Detection

08/03/22
Delivery Logistics

08/03/22
Facets of Data
• In data science and big data you’ll come
across many different types of data, and
each of them tends to require different tools
and techniques. The main categories of data
are these:
– Structured
– Unstructured
– Natural language
– Machine-generated
– Graph-based
– Audio, video, and images
– Streaming
08/03/22
Structured Data
• Structured data is data that depends on
a data model and resides in a fixed
field within a record.
• As such, it’s often easy to store
structured data in tables within
databases or Excel files, SQL , or
Structured Query Language, is the
preferred way to manage and query
data that resides in databases.
• You may also come across structured
data that might give you a hard time
storing it in a traditional relational
database. 08/03/22
Structured Data

08/03/22
Unstructured Data
• Unstructured data is data that isn’t easy to fit
into a data model because the content is
context-specific or varying. One example of
unstructured data is your regular email
• Although email contains structured elements
such as the sender, title, and body text, it’s
a challenge to find the number of people
who have written an email complaint about
a specific employee because so many ways
exist to refer to a person, for example.
• The thousands of different languages and
dialects out there further complicate this.
• A human-written email, as shown in next
figure, is also a perfect example of natural
language data.08/03/22
Unstructured Data

08/03/22
Machine Generated Data
• Machine-generated data is information
that’s automatically created by a
computer, process, application, or other
machine without human intervention.

• Machine-generated data is becoming a

major data resource and will continue to
do so.

• IDC (International Data Corporation) has

estimated there will be 26 times more
connected things than people in 2020.
This network is commonly referred to as
the internet of things
08/03/22
Session 2
Data Science Process: Overview,
Different steps

08/03/22
Data Science Process
Objectives:
• Understanding the flow of a data
science process
• Discussing the steps in a data
science process

08/03/22
Data Science Process
The typical data science process
consists of six steps

08/03/22
1. Setting research goal
• An essential outcome is the
research goal that states the
purpose of your assignment in a
clear and focused manner.
• Understanding the business goals
and context is critical for project
success.
• In this phase, you also need to frame
the business problem and formulate
initial hypotheses (IH) to test..

08/03/22
Create project charter
• Clients like to know upfront what they’re paying for, so
after you have a good understanding of the business
problem, try to get a formal agreement on the
deliverables.
• All this information is best collected in a project charter.
For any significant project this would be mandatory.
• A project charter requires teamwork, and your input
covers at least the following:
– A clear research goal
– The project mission and context
– How you’re going to perform your analysis
– What resources you expect to use
– Proof that it’s an achievable project, or proof of
concepts
– Deliverables and a measure of success
08/03/22
2. Retrieving data

08/03/22
Data Retrieval
• The next step in data science is to
retrieve the required data.
Sometimes you need to go into the
field and design a data collection
process yourself, but most of the
time you won’t be involved in this
step.
• Many companies will have already
collected and stored the data for you,
and what they don’t have can often
be bought from third parties.
• Don’t be afraid to look outside your
organization for data, because more
and more organizations are making
even high-quality
08/03/22
data freely
Data Stored in company
• Your first act should be to assess the
relevance and quality of the data
that’s readily available within your
company.
• Most companies have a program for
maintaining key data, so much of the
cleaning work may already be done.
• This data can be stored in official data
repositories such as databases, data
marts, data warehouses, and data
lakes maintained by a team of IT
professionals.
• The primary goal of a database is data
storage, while
08/03/22
a data warehouse is
Data Sources

08/03/22
3. Data Preparation

08/03/22
Data Preparation
• Data can have lots of inconsistencies
like missing value, blank columns,
incorrect data format which needs to be
cleaned
• Your task now is to sanitize and prepare
it for use in the modeling and reporting
phase.
• Doing so is tremendously important
because your models will perform
better and you’ll lose less time trying
to fix strange output.
• It can’t be mentioned nearly enough
times: garbage in equals garbage out.
08/03/22
Data Cleansing
• Data cleansing is a subprocess of the
data science process that focuses on
removing errors in your data so your
data becomes a true and consistent
representation of the processes it
originates from.

08/03/22
Overview of common errors

08/03/22
Example: Outliers

08/03/22
Data Entry Errors
• Data collection and data entry are error-prone
processes.
• They often require human intervention, and
because humans are only human, they make
typos or lose their concentration for a second
and introduce an error into the chain.
• But data collected by machines or computers
isn’t free from errors either. Errors can arise
from human sloppiness, whereas others are
due to machine or hardware failure.
• Examples of errors originating from machines
are transmission errors or bugs in the extract,
transform, and load phase ( ETL ).

08/03/22
Impossible values / Sanity
Check
• Sanity checks are another
valuable type of data check.
• Here you check the value against
physically or theoretically
impossible values such as people
taller than 3 meters or someone
with an age of 299 years.
• Sanity checks can be directly
expressed with rules:
check = 0 <= age <= 120
08/03/22
Outliers

• An outlier is an observation that seems

to be distant from other observations
or, more specifically, one observation
that follows a different logic or
generative process than the other
observations.
• The easiest way to find outliers is to
use a plot or a table with the minimum
and maximum values.
• The normal dis-tribution, or Gaussian
distribution, is the most common
distribution 08/03/22
in natural sciences.
Example:

08/03/22
08/03/22
Dealing with missing
values
• Missing values aren’t necessarily
wrong, but you still need to
handle them separately; certain
modeling techniques can’t handle
missing values.
• They might be an indicator that
something went wrong in your
data collection or that an error
happened in the ETL process.

08/03/22
Handling missing values

08/03/22
4. Exploratory Data Analysis
• During exploratory data analysis you
take a deep dive into the data (see
figure).
• Information becomes much easier to
grasp when shown in a picture,
therefore you mainly use graphical
techniques to gain an understanding of
your data and the interactions
between variables.
• The goal isn’t to cleanse the data, but
it’s common that you’ll still discover
anomalies 08/03/22
you missed before, forcing
4. Exploratory Data
Analysis

08/03/22
Exploratory Data Analysis

08/03/22
Histogram
• In a histogram a variable is cut into
discrete categories and the number of
occurrences in each category are
summed up and shown in the graph.
• The boxplot, on the other hand, doesn’t
show how many observations are present
but does offer an impression of the
distribution within categories.
• It can show the maximum, minimum,
median, and other characterizing
measures at the same time.
08/03/22
08/03/22
5. Build the model

08/03/22
Building a model
• With clean data in place and a good
understanding of the content, you’re
ready to build models with the goal of
making better predictions, classifying
objects, or gaining an understanding of
the system that you’re modeling.
• This phase is much more focused than
the exploratory analysis step, because
you know what you’re looking for and
what you want the outcome to be.

08/03/22
Building a model
• Building a model is an iterative process.
• The way you build your model depends
on whether you go with classic
statistics or the somewhat more recent
machine learning school, and the type
of technique you want to use.
• Either way, most models consist of the
following main steps:
– Selection of a modeling technique
and variables to enter in the model
– Execution of the model
08/03/22
Build a model
• You’ll need to select the variables
you want to include in your
model and a modeling technique.
• Your findings from the exploratory
analysis should already give a
fair idea of what variables will
help you construct a good model.
• Many modeling techniques are
available, and choosing the right
model for a problem requires
08/03/22
• You’ll need to consider model
performance and whether your project
meets all the requirements to use your
model,
• as well as other factors:
– Must the model be moved to a
production environment and, if so,
would it be easy to implement?
– How difficult is the maintenance on the
model: how long will it remain
relevant if left untouched?
– Does the model need to be easy to
explain?
08/03/22
Model Execution
• Luckily, most programming languages, such as
Python, already have libraries such as
StatsModels or Scikit-learn. These packages
use several of the most popular techniques.
• Coding a model is a nontrivial task in most
cases, so having these libraries available can
speed up the process.
• As you can see in the following code, it’s fairly
easy to use linear regression (figure) with
StatsModels or Scikit- learn.
• Doing this your self would require much more
effort even for the simple techniques.

08/03/22
08/03/22
Summary
• Setting the research goal—Defining the what, the why,
and the how of your project in a project charter.
• Retrieving data—Finding and getting access to data
needed in your project. This data is either found within
the company or retrieved from a third party.
• Data preparation—Checking and remediating data
errors, enriching the data with data from other data
sources, and transforming it into a suitable format for
your models.
• Data exploration—Diving deeper into your data using
descriptive statistics and visual techniques.
• Data modeling—Using machine learning and statistical
techniques to achieve your project goal.
• Presentation and automation—Presenting your results to
the stakeholder and industrializing your analysis
process for repetitive reuse and integration with other
tools. 08/03/22
Session III
Machine Learning Definition and
Relation with Data Science

08/03/22
ML
• Data Science is the study of data
cleansing, preparation, and
analysis, while machine learning is
a branch of AI and subfield of data
science.
• Data Science is a field to study the
approaches to find insights from
the raw data.
• Machine Learning is a technique
used by the group of data scientists
to enable the machines to learn
automatically from the past data.
08/03/22
08/03/22
Machine Learning
• Machine learning is an application of artificial
intelligence (AI) that provides systems the
ability to automatically learn and improve
from experience without being explicitly
programmed.
• Machine learning focuses on the
development of computer programs that can
access data and use it learn for themselves.
• The primary aim is to allow the computers
learn automatically without human
intervention or assistance and adjust actions
accordingly.

08/03/22
Uses
• Predict the outcomes of elections
• Identify and filter spam messages from e-
mail
• Foresee criminal activity
• Automate traffic signals according to road
conditions
• Produce financial estimates of storms and
natural disasters
• Examine customer churn
• Create auto-piloting planes and auto-driving
cars
• Identify individuals with the capacity to
donate
08/03/22
Types of Machine
Learning

08/03/22
Supervised Machine
Learning

08/03/22
Unsupervised Machine
Learning

08/03/22
Data Science VS Machine
Learning
Data Science Machine Learning

It deals with understanding It is a subfield of data science that enables the

and finding hidden patterns machine to learn from the past data and
or useful insights from the experiences automatically.
data, which helps to take
smarter business decisions.
It is used for discovering It is used for making predictions and
insights from the data. classifying the result for new data points.
It is a broad term that It is used in the data modeling step of the data
includes various steps to science as a complete process.
create a model for a given
problem and deploy the
model.
A data scientist needs to Machine Learning Engineer needs to have
have skills to use big data skills such as computer science fundamentals,
tools like Hadoop, Hive and programming skills in Python or R, statistics
Pig, statistics, and probability concepts, etc.
programming in Python, R,
or Scala.
It can work with raw, It mostly requires structured data to work on.
structured, and
unstructured data.
Data scientists spent lots of ML engineers spend a lot of time for managing
time in handling the data, the complexities that occur during the
cleansing the data, and 08/03/22
implementation of algorithms and
Applications of ML in Data
Science
• Regression and classification are of primary
importance to a data scientist. To achieve these
goals, one of the main tools a data scientist uses is
machine learning. The uses for regression and
automatic classification are wide ranging, such as the
following:
– Finding oil fields, gold mines based on existing sites
(classification and regression)
– Finding place names or persons in text
(classification)
– Identifying people based on pictures or voice
recordings (classification)
– Recognizing birds based on their whistle
(classification)

08/03/22
• Identifying profitable customers (regression
and classification)
• Proactively identifying car parts that are
likely to fail (regression)
• Identifying tumors and diseases
(classification)
• Predicting the amount of money a person
will spend on product X (regression)
• Predicting your company’s yearly revenue
(regression)
• Predicting which team will win the
Champions League in soccer (classification)
08/03/22
Presentation
• Sometimes people get so excited about your work
that you’ll need to repeat it over and over again
because they value the predictions of your models
or the insights that you produced. For this reason,
you need to automate your models.
• This doesn’t always mean that you have to redo all
of your analysis all the time.
• Sometimes it’s sufficient that you implement only
the model scoring; other times you might build an
application that automatically updates reports,
Excel spreadsheets, or PowerPoint presentations.
• The last stage of the data science process is where
your soft skills will be most useful, and yes, they’re
extremely important.

08/03/22

Data Science Foundations Guide
100% (2)
Data Science Foundations Guide
143 pages
Data Science SPPU
No ratings yet
Data Science SPPU
115 pages
Module-1: Introduction To Data Science
No ratings yet
Module-1: Introduction To Data Science
98 pages
Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
65% (139)
Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
603 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
232 pages
Unit I
No ratings yet
Unit I
262 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
65 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
Five Unit Notes
No ratings yet
Five Unit Notes
138 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Unit 01 Ids
No ratings yet
Unit 01 Ids
39 pages
UNIT I Democracy
No ratings yet
UNIT I Democracy
75 pages
Fds Question Bank With Answer
No ratings yet
Fds Question Bank With Answer
35 pages
FDS Notes
No ratings yet
FDS Notes
137 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
30 pages
Mod 3
No ratings yet
Mod 3
96 pages
CSD101 Fundamentals of Data Science Session 1 and 2
No ratings yet
CSD101 Fundamentals of Data Science Session 1 and 2
53 pages
Data Science: Chapter 1: Introduction To Big Data
100% (2)
Data Science: Chapter 1: Introduction To Big Data
77 pages
Fods Notes
No ratings yet
Fods Notes
139 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
1 Unit 1 Introduction To Data Science
No ratings yet
1 Unit 1 Introduction To Data Science
48 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Big Data and Data Science Guide
No ratings yet
Big Data and Data Science Guide
62 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
Data Science
No ratings yet
Data Science
54 pages
The Excitement of Data Science
No ratings yet
The Excitement of Data Science
137 pages
22UCS303 DS-Unit I-N
No ratings yet
22UCS303 DS-Unit I-N
42 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Chap1-Overview of Data Science
No ratings yet
Chap1-Overview of Data Science
50 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Foundations of Data Science Course
No ratings yet
Foundations of Data Science Course
25 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
15 pages
Unit 1
No ratings yet
Unit 1
9 pages
Ch2 Emerging
No ratings yet
Ch2 Emerging
24 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Fdsunit 1
No ratings yet
Fdsunit 1
27 pages
Data Science and Big Data Analytics Unit 1 Notes
No ratings yet
Data Science and Big Data Analytics Unit 1 Notes
13 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
Data Science Unit I
No ratings yet
Data Science Unit I
13 pages
Data Science: October 2021
No ratings yet
Data Science: October 2021
51 pages
Unit 1
No ratings yet
Unit 1
11 pages
Module 1
No ratings yet
Module 1
35 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
FDS Module I-I
No ratings yet
FDS Module I-I
38 pages
AIDS C04-Session-19
No ratings yet
AIDS C04-Session-19
29 pages
Introduction
No ratings yet
Introduction
21 pages
Explaratory Data Analysis - Python
No ratings yet
Explaratory Data Analysis - Python
16 pages
Prescriptive, Descriptive, Formal, Functional, & Pedagogical Grammar
0% (1)
Prescriptive, Descriptive, Formal, Functional, & Pedagogical Grammar
4 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Comprehension (The Northern Lights) Ms
50% (2)
Comprehension (The Northern Lights) Ms
9 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
7a Aming Pinahahayag (A) SATB A Cappella
No ratings yet
7a Aming Pinahahayag (A) SATB A Cappella
1 page
4 Template CV Farmasi Bahasa Inggris Yang Bisa Diedit
0% (1)
4 Template CV Farmasi Bahasa Inggris Yang Bisa Diedit
1 page
Accomplishment Report of Project ICARE
100% (1)
Accomplishment Report of Project ICARE
10 pages
1.2language Processing Activities
No ratings yet
1.2language Processing Activities
15 pages
Financial Modeling Question Paper
No ratings yet
Financial Modeling Question Paper
2 pages
Have You Got Any Brothers or Sisters? Yes, I Have. / No, I Haven't
No ratings yet
Have You Got Any Brothers or Sisters? Yes, I Have. / No, I Haven't
2 pages
Unit 3 - Listening - STUDENT
No ratings yet
Unit 3 - Listening - STUDENT
3 pages
English For Tourism Luh Sri Kusuma Dewi
No ratings yet
English For Tourism Luh Sri Kusuma Dewi
6 pages
AMS 507 Introduction To Probability: 1.2. The Basic Principle of Counting
No ratings yet
AMS 507 Introduction To Probability: 1.2. The Basic Principle of Counting
3 pages
Insaaf by Amnah El Yaqoub - 3
No ratings yet
Insaaf by Amnah El Yaqoub - 3
209 pages
Comparatives and Superlatives Sheets (2523)
No ratings yet
Comparatives and Superlatives Sheets (2523)
2 pages
REGULATION 2022R (Curriculam and Syllabus) UPDATED - 20.04.2024
No ratings yet
REGULATION 2022R (Curriculam and Syllabus) UPDATED - 20.04.2024
87 pages
Ansys Fluent Text Command List
No ratings yet
Ansys Fluent Text Command List
582 pages
LOCHHEAD "How Does It Work: Challenges To Analytic Explanation"
100% (1)
LOCHHEAD "How Does It Work: Challenges To Analytic Explanation"
23 pages
English Levels Explained for Beginners
No ratings yet
English Levels Explained for Beginners
7 pages
Hemanth (4,0)
No ratings yet
Hemanth (4,0)
4 pages
Zapotec Civilization
No ratings yet
Zapotec Civilization
8 pages
Logical Positivism
No ratings yet
Logical Positivism
6 pages
Scattering Theory
No ratings yet
Scattering Theory
1 page
Chavez, R.M. (Ece-1101) Logic
No ratings yet
Chavez, R.M. (Ece-1101) Logic
10 pages
Preserving Mangyan Poetry
No ratings yet
Preserving Mangyan Poetry
1 page
Hol 2225 02 Net - PDF - en
No ratings yet
Hol 2225 02 Net - PDF - en
262 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
42 pages
Maze Solving Robot Project
No ratings yet
Maze Solving Robot Project
42 pages
Abstract Logical Reasoning Reviewer
No ratings yet
Abstract Logical Reasoning Reviewer
5 pages
ICT 204 - Lecture 4 Methods
No ratings yet
ICT 204 - Lecture 4 Methods
31 pages
EDGR - 698 - Literature Review Final Demographic Data
No ratings yet
EDGR - 698 - Literature Review Final Demographic Data
14 pages

Unit - I Data Science

Uploaded by

Unit - I Data Science

Uploaded by

Session 1

• Defining data science and big

• Online Data Analytics

• Late decisions, missing opportunities

• Using the information in the social media like

• Using the data regarding the previous

• NoSQL Big Data systems are

• Machine-generated data is becoming a

• IDC (International Data Corporation) has

• An outlier is an observation that seems

It deals with understanding It is a subfield of data science that enables the

You might also like