CHAPTER: DATA SCIENCE
Note 1: Depending on the type of data given to an AI System it is classified into three domains :
1) Data Science
2) Computer Vision
3) Natural Language Processing
WHAT IS DATA SCIENCE?
1) Data Science is a domain of AI
2) Data Science is the process of using the skills of programming, mathematics and statistics
together to find meaningful information from the given data.
3) It is a technology does the analysis of data to create impactful solutions from the given data
or to predict outcomes for a problem statement.
APPLICATION OF DATA SCIENCE
1) Internet Search
There are many search engines such as Google, Yahoo, Bing, AOL etc. that make use of data
science algorithms to deliver the best result for our search query in a fraction of second.
2) Digital Advertisement/ Targeted Advertisement
The entire digital marketing like email marketing, pay per click advertising, banner
advertising etc are based on data science algorithm. These digital advertisements are
generated based on the user past experience.
3) Website Recommendation
Lots of companies into e-commerce business like Amazon, Flipkart, eBay, Twitter, Linkdin,
Netflix etc, help the customers find relevant products from billions of products available with
them based on the user’s past experience, preferable search and interests. For example if
we search for Bluetooth speaker, we will probably get recommendation for a bunch of
speakers of popular brands available online to compare with. All this is possible only using
Data Science as a tool.
4) Image Recognition
When we upload a picture, an automatic tag recognition system used by applications like
Facebook, suggest us people to tag. It is possible just because of Data Science.
5) Fraud and Risk Detection
Banking sector and finance companies use Data Science to analyse customer data like
customer profiling, past expenditures, and other essential variables to understand the
probabilities of risk, default and failure if any.
6) Gaming
Games like EA sports, Zynga, Sony etc are designed using machine learning algorithm that
help the levels in games to either improve or upgrade themselves based on the players
previous moves and accordingly mold up its game to a next higher level.
NOTE : THERE ARE OTHERS APPLICATIONS TOO, YOU CAN ALSO HAVE A LOOK TO THOSE.
REVISITING THE AI PROJECT CYCLE
Scenario
Every restaurant prepares food in bulk as they expect a good crowd to come and enjoy their food.
However, if the expectation is not met, a good amount of food gets wasted which eventually
becomes a loss for the restaurant as they have either to dump it or give it to hungry people free. And
if this daily loss taken into account for a year, it becomes a quite a big amount (financial loss).
Now our goal is to develop an AI model /machine that can predict the quantity of food to be
prepared by restaurant to minimize the wastage of food
Stage 1: Problem Scoping
This is the first stage of AI project Cycle. In this stage we closely examine the various factors that
cause the problem in order to build an AI enabled project.
Keeping the above scenario in mind, we can write the 4W canvas as
WHO canvas: Who is facing the problem ?
Who are the stakeholder ? Restaurant offering buffets
Restaurant owner
What do we know about them ? Restaurant cook food in bulk for
buffets every day to meet their
customer requirements
They estimate the number of
customers that will come for the buffet
in their restaurant
WHAT canvas: What is the Nature of the Problem?
What is the problem? Large amount of food wastage
Financial loss due to wastage of food
How do you know it is a problem? Market surveys show that restaurant
owners are facing problems of food
wastage
WHERE canvas: Where does the problem arise?
What is the context/situation in which the Restaurant that serve buffets
stakeholder experience this problem? The leftover unconsumed food at the
end day
WHY canvas: Why do you think it is a problem worth solving?
What would be of key value to the stakeholder? A correct estimate of food to be
prepared every day will reduce the
food wastage.
How would it improve their situation? No food or less food is left unconsumed
Reduction in financial losses due to
food wastage.
After clearly explaining the 4W, the goal of the project would be ‘To predict the quantity of food
dishes to be prepared for everyday consumption in restaurant buffets’
Stage 2: Data Acquisition
Our next step is to acquire data for training and testing for our AI model. For that we need to collect
data as per our project (data feature).
By looking at the problem statement, the data features that will be considered for the preparation of
food for the next day buffet for consumption are as follows:
Total number of customers
Dish Consumption
Price of food
Quantity of dish prepared
Unconsumed quantity of dish per day.
Quantity of dish for the next day.
Stage 3: Data Exploration
After creating the system map flow, we get to know the dependency of different factor on each
other. Hence we extract the meaning data from the acquired data and the following data need to
prepared for the model.
Name of the dish
Quantity of the dish prepared every day
Unconsumed quantity of dish per day
Stage 4: Data Modelling
As the data, which is need to be collected, is a continuous data for a certain period and there is also
dependency factor involved between different data, we will use a REGRESSION model for our
project.
For example if we have collected a continuous data for 30 days, we will train the model for the first
20 days (training data) and then is evaluated for the next 10 days (testing data).
Stage 5: Data Evaluation
The next stage is to test the model if its working properly or not. The following are the steps
followed to test our model based on the above scenario.
Step 1: We feed the data to the trained model. In this example, Name of the dish and the quantity
produced are fed to the trained model.
Step 2: To feed the data of quantity of unconsumed food of the same dish on previous days.
Step 3: The model then works upon the entries based on the training it got in the modelling stage.
Step 4: The model predicts the quantity of food to be prepared for the next day.
Step 5: The predicted quantity is now compared with the testing data. From the testing data, the
quantity of food to be produced for the next day should be total quantity minus the unconsumed
quantity.
Step 6: The model is tested with different dataset at least 10 times during training
Step 7: Now the predicted values and actual values are compared to check the efficiency of the
model.
Step 8: The model is said to be accurate if the difference between the predicted value and actual
value are similar. If no, then for better efficiency accuracy, either the model selection is changed or it
is trained on more data.