Fods Notes For Lecturing

Data science and big data are integral in various industries for improving customer experiences and optimizing processes. The document outlines different types of data, including structured, unstructured, and machine-generated data, as well as the data science process, which involves setting research goals, data retrieval, preparation, exploration, modeling, and presentation. Each step is crucial for effectively analyzing data and deriving insights.

Uploaded by

Divya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

Fods Notes For Lecturing

Uploaded by

Divya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Fondations of Data science

Data science and big data are used almost everywhere in both commercial and
noncommercial settings. Commercial companies in almost every industry use data science
and big data to gain insights into their customers, processes, staff, completion, and product s.
Many companies use data science to offer customers a better user experience
A good example of this is Google AdSense,
which collects data from internet users so relevant commercial messages can be matched to
the person browsing the internet.
A data scientist in a governmental organization gets to work on
diverse projects such as detecting fraud and other criminal activity or optimizing project
funding
A data scientist in a governmental organization gets to work on
diverse projects such as detecting fraud and other criminal activity or optimizing project
funding In data science and big data you’ll come across many different types of data, and
each of
them tends to require different tools and techniques. The main categories of data
are these:
¦ Structured
¦ Unstructured
¦ Natural language
¦ Machine-generated
¦ Graph-based
¦ Audio, video, and images
¦ Streaming

Structured data
Structured data is data that depends on a data model and resides in a fixed field within a
record. it’s often easy to store structured data in tables wit hin databases or Excel files

SQL, or Structured Query Language, is the preferred way to manage and

query data that resides in databases.
Unstructured data
Unstructured data is data that isn’t easy to fit into a data model because the content is
context-specific or varying. example of unstructured data is your regular email.
Although email contains structured elements such as the sender, title, and body text, it’s
a challenge to find the number of people who have written an email complaint about a
specific employee because so many ways exist to refer to a person, for example. The
thousands of different languages and dialects out there further complicate this.

Natural language
Natural language is a special t ype of unstructured data; it’s challenging to process because it
requires knowledge of specific data science techniques and linguistics. The natural language
processing community has had success in entity recognition, topic recognition,
summarization, text completion, and sentiment analysis, but models trained in one domain
don’t generalize well to other domains.

Machine-generated data
Machine-generated data is information that’s automat ically created by a computer, process,
application, or other machine without human intervention

Graph-based or network data

a graph is a mathematical structure to model pair-wise relationships between objects.
The graph structures use nodes,edges, and properties to represent and store graphical data.

Graph databases are used to store graph-based data and are queried with specialized query
languages such as SPARQL.
Audio, image, and video
Audio, image, and video are data types that pose specific challenges to a data scientist. Tasks
that are trivial for humans, such as recognizing objects in pictures
Streaming data
While streaming data can take almost any of the previous forms, it has an extra property.
Examples are the “What’s trending”
on Twitter, live sporting or music events, and the stock market.
The data science process
The data science process typically consists of six steps.
1.3.1 Setting the research goal
Data science is mostly applied in the context of an organization. When the business asks you
to perform a data science project, you’ll first prepare a project charter. This charter contains
information such as what
you’re going to research, how the company benefits from that, what data and resources you
need, a timetable, and deliverables.
Retrieving data
The second step is to collect data
Data preparation
Data collection is an error-prone process; in this phase you enhance the quality of the data
and prepare it for use in subsequent steps.
This phase consists of three subphases:
Data cleansing removes false values from a data source and inconsistencies across data
sources,
data integration enriches data sources by combining information from multiple data
sources,
and data transformation ensures that the data is in a suitable format for use in your
models.
Data exploration
Data exploration is concerned with building a deeper understanding of your data. You try
to understand how variables interact with each other, the distribution of the data, and
whether there are outliers. To achieve this you mainly use descriptive statistics, visual
techniques, and simple modelling.
This step is also known as Exploratory Data Analysis.

Data modeling or model building

Building a model is an iterative process that involves selecting the variables for the model,
executing the model, and model diagnostics.
Presentation and automation
Finally, you present the results to your business.
Overview of the data science process
1.The first step of this process is setting a research goal. The main purpose here is making
sure all the stakeholders understand the what, how, and why of the project.
2.The second phase is data retrieval. You want to have data available for analysis, so this
step includes finding suitable data and getting access to the data from the data owner. The
result is data in its raw form, which probably needs polishing and transformation before it
becomes usable.
3.Now that you have the raw data, it’s time to prepare it. This includes transforming the data
from a raw form into data that’s direct ly To achieve this, you’ll detect and correct different
kinds of errors in the data, combine data from different data sources, and transform usable in
your models.
4.data exploration. The goal of this step is to gain a deep understanding of DATA
5.model building often referred to as “data modeling” the data
6.The last step of the data science model is presenting your results and automating the
Analysis.

Acc 206 Cost Accounting - Lecture Note
No ratings yet
Acc 206 Cost Accounting - Lecture Note
71 pages
AFES English Manual
100% (7)
AFES English Manual
290 pages
Explaratory Data Analysis - Python
No ratings yet
Explaratory Data Analysis - Python
16 pages
Unit 1
No ratings yet
Unit 1
19 pages
Data Science Foundations Guide
100% (2)
Data Science Foundations Guide
143 pages
Fdsa PPT - Unit 1
No ratings yet
Fdsa PPT - Unit 1
19 pages
Fods Notes
No ratings yet
Fods Notes
139 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Unit I
No ratings yet
Unit I
262 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Five Unit Notes
No ratings yet
Five Unit Notes
138 pages
FDS Notes PDF
No ratings yet
FDS Notes PDF
140 pages
Unit 1 To 5
No ratings yet
Unit 1 To 5
202 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Data Science & Big Data Guide
No ratings yet
Data Science & Big Data Guide
6 pages
Data v2
No ratings yet
Data v2
25 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Cs3352foundation of Data Science - 1
No ratings yet
Cs3352foundation of Data Science - 1
141 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Unit 1
No ratings yet
Unit 1
9 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
FDS Notes
No ratings yet
FDS Notes
137 pages
Fds Question Bank With Answer
No ratings yet
Fds Question Bank With Answer
35 pages
Unit 1
No ratings yet
Unit 1
11 pages
Fundamentals of Data Science Course
100% (3)
Fundamentals of Data Science Course
62 pages
Ids Sem Ans U-I
No ratings yet
Ids Sem Ans U-I
17 pages
Data Science SPPU
No ratings yet
Data Science SPPU
115 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Stucor Cs3352 Ad
No ratings yet
Stucor Cs3352 Ad
138 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
33 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Cs3352 Fods QB
No ratings yet
Cs3352 Fods QB
25 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
11.course Materials (Unit Wise
No ratings yet
11.course Materials (Unit Wise
138 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
CS3352 FDS Notes - 03 - by WWW - Notesfree.in
No ratings yet
CS3352 FDS Notes - 03 - by WWW - Notesfree.in
138 pages
FDS Notes
No ratings yet
FDS Notes
148 pages
FDS Notes
No ratings yet
FDS Notes
143 pages
CS3352 QB
No ratings yet
CS3352 QB
35 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
CS 3353 FDS Unit 1 Notes JPR
No ratings yet
CS 3353 FDS Unit 1 Notes JPR
39 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
185 pages
DS231 Week 2
No ratings yet
DS231 Week 2
33 pages
UNIT I Democracy
No ratings yet
UNIT I Democracy
75 pages
FDS - Aids Complete Notes
No ratings yet
FDS - Aids Complete Notes
138 pages
Mod 3
No ratings yet
Mod 3
96 pages
Fdsunit 1
No ratings yet
Fdsunit 1
27 pages
DS231 Module 2
No ratings yet
DS231 Module 2
33 pages
Module 1
No ratings yet
Module 1
35 pages
Unit 1 PPT 1
100% (1)
Unit 1 PPT 1
27 pages
Unit 1
No ratings yet
Unit 1
26 pages
Data Science Process UNIT - II PS New
No ratings yet
Data Science Process UNIT - II PS New
21 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
35 pages
Ds U1 chp1
No ratings yet
Ds U1 chp1
13 pages
Data Science & Analytics Overview
No ratings yet
Data Science & Analytics Overview
76 pages
Cs3352 FDS Question Bank
No ratings yet
Cs3352 FDS Question Bank
145 pages
Dietetics As A Profession
No ratings yet
Dietetics As A Profession
11 pages
Sajan Reliance MF
No ratings yet
Sajan Reliance MF
2 pages
Encoded Data Document
No ratings yet
Encoded Data Document
6 pages
Basic Conducting Online Lesson Plan 3 31
No ratings yet
Basic Conducting Online Lesson Plan 3 31
1 page
Fin 202
No ratings yet
Fin 202
28 pages
End 1 End 2: Intralox, Inc. P.O. Box 50699 New Orleans, LA 70150 USA Fax: (504) 734-0063
No ratings yet
End 1 End 2: Intralox, Inc. P.O. Box 50699 New Orleans, LA 70150 USA Fax: (504) 734-0063
2 pages
An Introduction To Hadoop
No ratings yet
An Introduction To Hadoop
12 pages
KSP Response To LINK Nky Records Request
No ratings yet
KSP Response To LINK Nky Records Request
2 pages
Automatic Night Lamp With
No ratings yet
Automatic Night Lamp With
3 pages
Regulatory Environment For Food and Beverage in Brazil
No ratings yet
Regulatory Environment For Food and Beverage in Brazil
12 pages
IPE 341 Chip Formation Mechanism
100% (1)
IPE 341 Chip Formation Mechanism
22 pages
District Test On The Circular Flow Model-1-1
100% (2)
District Test On The Circular Flow Model-1-1
7 pages
GDS Cycle V SOP
No ratings yet
GDS Cycle V SOP
5 pages
73 1st Long Problem Set
No ratings yet
73 1st Long Problem Set
11 pages
Pci Leasing and Finance
No ratings yet
Pci Leasing and Finance
6 pages
LiFePO4 Battery Specs HP-50160282
No ratings yet
LiFePO4 Battery Specs HP-50160282
14 pages
Rajneeti: Council of Ministers S. No. Name Department Office
No ratings yet
Rajneeti: Council of Ministers S. No. Name Department Office
20 pages
1.rakitanprinter 20 Januari 2020-1 1
No ratings yet
1.rakitanprinter 20 Januari 2020-1 1
1 page
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
No ratings yet
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
2 pages
Apex Freebitcoin High Odds Long Runner Intelligent Bot
No ratings yet
Apex Freebitcoin High Odds Long Runner Intelligent Bot
16 pages
Solar & Crank Emergency Radio Guide
100% (2)
Solar & Crank Emergency Radio Guide
28 pages
Management MCQ - Merged (1) - 1
No ratings yet
Management MCQ - Merged (1) - 1
1 page
Ict Policies and Issues Implication To Teaching and Learning
100% (3)
Ict Policies and Issues Implication To Teaching and Learning
25 pages
Conflict Style Self-Assessment
No ratings yet
Conflict Style Self-Assessment
2 pages
How To Improve Your Apache Web Server's Performance?
No ratings yet
How To Improve Your Apache Web Server's Performance?
2 pages
Blume Expando T
No ratings yet
Blume Expando T
24 pages
4th Sem Exam Fees Paid Yogi
No ratings yet
4th Sem Exam Fees Paid Yogi
1 page
Enzymes in Industrial Applications
No ratings yet
Enzymes in Industrial Applications
18 pages

Fods Notes For Lecturing

Uploaded by

Fods Notes For Lecturing

Uploaded by

Fondations of Data science

SQL, or Structured Query Language, is the preferred way to manage and

Graph-based or network data

Data modeling or model building

You might also like