Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views5 pages

Fods Notes For Lecturing

Data science and big data are integral in various industries for improving customer experiences and optimizing processes. The document outlines different types of data, including structured, unstructured, and machine-generated data, as well as the data science process, which involves setting research goals, data retrieval, preparation, exploration, modeling, and presentation. Each step is crucial for effectively analyzing data and deriving insights.

Uploaded by

Divya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Fods Notes For Lecturing

Data science and big data are integral in various industries for improving customer experiences and optimizing processes. The document outlines different types of data, including structured, unstructured, and machine-generated data, as well as the data science process, which involves setting research goals, data retrieval, preparation, exploration, modeling, and presentation. Each step is crucial for effectively analyzing data and deriving insights.

Uploaded by

Divya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Fondations of Data science

Data science and big data are used almost everywhere in both commercial and
noncommercial settings. Commercial companies in almost every industry use data science
and big data to gain insights into their customers, processes, staff, completion, and product s.
Many companies use data science to offer customers a better user experience
A good example of this is Google AdSense,
which collects data from internet users so relevant commercial messages can be matched to
the person browsing the internet.
A data scientist in a governmental organization gets to work on
diverse projects such as detecting fraud and other criminal activity or optimizing project
funding
A data scientist in a governmental organization gets to work on
diverse projects such as detecting fraud and other criminal activity or optimizing project
funding In data science and big data you’ll come across many different types of data, and
each of
them tends to require different tools and techniques. The main categories of data
are these:
¦ Structured
¦ Unstructured
¦ Natural language
¦ Machine-generated
¦ Graph-based
¦ Audio, video, and images
¦ Streaming

Structured data
Structured data is data that depends on a data model and resides in a fixed field within a
record. it’s often easy to store structured data in tables wit hin databases or Excel files

SQL, or Structured Query Language, is the preferred way to manage and


query data that resides in databases.
Unstructured data
Unstructured data is data that isn’t easy to fit into a data model because the content is
context-specific or varying. example of unstructured data is your regular email.
Although email contains structured elements such as the sender, title, and body text, it’s
a challenge to find the number of people who have written an email complaint about a
specific employee because so many ways exist to refer to a person, for example. The
thousands of different languages and dialects out there further complicate this.

Natural language
Natural language is a special t ype of unstructured data; it’s challenging to process because it
requires knowledge of specific data science techniques and linguistics. The natural language
processing community has had success in entity recognition, topic recognition,
summarization, text completion, and sentiment analysis, but models trained in one domain
don’t generalize well to other domains.

Machine-generated data
Machine-generated data is information that’s automat ically created by a computer, process,
application, or other machine without human intervention

Graph-based or network data


a graph is a mathematical structure to model pair-wise relationships between objects.
The graph structures use nodes,edges, and properties to represent and store graphical data.

Graph databases are used to store graph-based data and are queried with specialized query
languages such as SPARQL.
Audio, image, and video
Audio, image, and video are data types that pose specific challenges to a data scientist. Tasks
that are trivial for humans, such as recognizing objects in pictures
Streaming data
While streaming data can take almost any of the previous forms, it has an extra property.
Examples are the “What’s trending”
on Twitter, live sporting or music events, and the stock market.
The data science process
The data science process typically consists of six steps.
1.3.1 Setting the research goal
Data science is mostly applied in the context of an organization. When the business asks you
to perform a data science project, you’ll first prepare a project charter. This charter contains
information such as what
you’re going to research, how the company benefits from that, what data and resources you
need, a timetable, and deliverables.
Retrieving data
The second step is to collect data
Data preparation
Data collection is an error-prone process; in this phase you enhance the quality of the data
and prepare it for use in subsequent steps.
This phase consists of three subphases:
Data cleansing removes false values from a data source and inconsistencies across data
sources,
data integration enriches data sources by combining information from multiple data
sources,
and data transformation ensures that the data is in a suitable format for use in your
models.
Data exploration
Data exploration is concerned with building a deeper understanding of your data. You try
to understand how variables interact with each other, the distribution of the data, and
whether there are outliers. To achieve this you mainly use descriptive statistics, visual
techniques, and simple modelling.
This step is also known as Exploratory Data Analysis.

Data modeling or model building


Building a model is an iterative process that involves selecting the variables for the model,
executing the model, and model diagnostics.
Presentation and automation
Finally, you present the results to your business.
Overview of the data science process
1.The first step of this process is setting a research goal. The main purpose here is making
sure all the stakeholders understand the what, how, and why of the project.
2.The second phase is data retrieval. You want to have data available for analysis, so this
step includes finding suitable data and getting access to the data from the data owner. The
result is data in its raw form, which probably needs polishing and transformation before it
becomes usable.
3.Now that you have the raw data, it’s time to prepare it. This includes transforming the data
from a raw form into data that’s direct ly To achieve this, you’ll detect and correct different
kinds of errors in the data, combine data from different data sources, and transform usable in
your models.
4.data exploration. The goal of this step is to gain a deep understanding of DATA
5.model building often referred to as “data modeling” the data
6.The last step of the data science model is presenting your results and automating the
Analysis.

You might also like