0 ratings0% found this document useful (0 votes) 26 views10 pagesFILE Ai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
CONTENT
. INTRODUCTION TO DATA SCIENCE
. AI PROJECT CYCLE FRAMEWORE
AND DATA SCIENCE
. BASIC STATISTIC DATAEXPLORATION
AND VISULISIATION
. CLASSIFICATION MODELINTRODUCTION TO DATA SCIENCE
The term “data science” was coined in 2001, attempting to describe a
new field. Some argue that it's nothing more than the natural evolution of
statistics, and shouldn't be called a new field at all. But others argue that
it's more interdisciplinary. For example, in The Data Science Design
Manual (2017), Steven Skiena says the following.
| think of data science as lying at the intersection of computer
science, statistics, and substantive application domains. From
computer science comes machine learning and high-performance
computing technologies for dealing with scale. From statistics comes
a long tradition of exploratory data analysis, significance testing, and
visualization. From application domains in business and the sciences
comes challenges worthy of battle, and evaluation standards to
assess when they have been adequately conquered.
This echoes a famous blog post by Drew Conway in 2013, called in
which he drew the following diagram to indicate the various fields that
come together to form what we call “data science.”
"Regardless of whether data science is just a part of statistics, and
regardless of the domain to which we're applying data science, the goal
is the same: to turn data into actionable value. The professional
society INFORMS defines the related field of analytics as “the scientific
process of transforming data into insight for making better
Ags
Substantive
Expertise1.2. What do data scientists do?4
Turning data into actionable value usually involves answering questions
using data. Here’s a typical workflow for how that plays out in practice.
1. Obtain data that you hope will help answer the question.
2. Explore the data to understand it.
3. Clean and prepare the data for analysis.
4. Perform analysis, model building, testing, etc.
(The analysis is the step most people think of as data science, but
it's just one step! Notice how much more there is that surrounds it.)
5. Draw conclusions from your work.
6. Report those conclusions to the relevant stakeholders.
Our course focuses on all the steps except for the analysis. You've
learned some introductory statistical analysis in one of the course
prerequisites (GB213), and we will leverage that. (Later in our course we
will review simple linear regression and hypothesis testing.) If you have
taken other relevant courses in statistics, mathematical modeling,
econometrics, etc., and want to bring that knowledge in to use in this
course, great, but it’s not a requirement. Other advanced statistics and
modeling courses you take later will essentially plug into step 4 in this
data science workflow.
1.3. What's in our course?
Our course covers the following four foundational aspects of data
science.
« Mathematics: We will cover foundational mathematical concepts,
such as functions, relations, assumptions, conclusions, and
abstraction, so that we can use these concepts to define and
understand many aspects of data manipulation. We will also make
use of statistics from GB213 (and optionally other statistics courses
you may have taken) in course projects, and we will briefly reviewthat statistical material as well. We will also see small previews of
other mathematics and statistics courses and their connections to
data science, including graphs for social network analysis, matrices
for finding themes in relations, and supervised machine learning.
Technology: We will extend your Python knowledge from the
CS230 prerequisite with more advanced table manipulation
functions, extended practice with data cleaning and manipulation
tasks, computational notebooks (such as Jupyter), and GitHub for
version control and project publishing.
Visualization: We will learn new types of plots for a wide variety of
data types and what you intend to communicate about them. We
will also study the general principles that govern when and how to
use visualizations and will learn how to build and publish interactive
online visualizations (dashboards).
Communication: We will study how to write comments in code,
documentation for code, motivations in computational notebooks,
interpretation of results in computational notebooks, and technical
reports about the results of analyses. We will prioritize clarity,
brevity, and knowing the target audience. Many of these same
principles will arise when creating presentations or videos as well.
Each of these modes of communication is required at some point in
our course
1.4. Will this course make me a data scientist?
This course is an introduction to data science. Learning more math,
stats, and technology will make you more qualified than just this one
course can. (Bentley University has both a, if you’re curious which
courses are relevant.)
But there are two focuses of our course that will make a big difference:Al PROJECT CYCLE FRAMEWORE
veseeeAND DATA SCIENCE ..........
In the rapidly evolving world of artificial intelligence (Al), project management can be as
complex as the technology itself. A staggering number of Al projects fail, not due to a
lack of technical prowess, but because of ineffective project management.
Implementing a well-defined Al project life cycle can significantly improve the success
rate of these endeavors, transforming raw data and innovative ideas into practical,
efficient solutions. As shown below, the 6 key phases of the Al life cycle are (1) Problem
Definition, (2) Data Acquisition and Preparation; (3) Model Development; (4) Model
Evaluation and Refinement; (5) Deployment; and (6) MLOps.
Problem
Piel)
pete)
ML Ops Acquisition &
ere loa
DNC
Des macad Perec
Scand
Model
Pues
ila
Understanding the Al Life Cycle
Conceptually, one can think of an Al project life cycle as the sequential progression of
tasks and decisions that drive the development and deployment of Al solutions.Problem Definition
This is where the journey begins. It involves defining the problem to be solved or the
opportunity to be explored using Al. It’s a crucial stage that sets the direction for the
entire project. Having a clear, well-defined problem helps guide data collection, model
development, and ultimately, the successful implementation of the solution. This is
where the role of an Al product manager ean be useful
Data Acquisition and Preparation
After identifying the problem, the next step is to collect and prepare data. Al and
machine learning algorithms need data to learn, so this stage involves gathering
relevant data and preparing it for use. This preparation may involve cleaning the data,
dealing with missing values, or transforming the data into a format suitable for the
chosen Al models. While the least glamorous, this can be the most time-consuming
phase of the Al life cycle,
Model Development and Training
This phase involves developing the Al model that will solve the defined problem and
training it with the prepared data. This stage is iterative, often involving multiple rounds
of model development and refinement based on the model's performance during
training.
Model Evaluation and Refinement
Once the model has been trained, it must be evaluated to see how well it performs. This
involves testing the model on unseen data and analyzing its predictions. If the model's
performance is not satisfactory, its refined and tweaked. This could mean adjusting the
mode''s parameters, changing the mode's architecture, or even returning to the data
acquisition phase to gather additional data.
DeploymentOnce the model is performing satisfactorily, it is deployed to a production environment
where it can start solving real-world problems. Deployment might involve integrating the
model with existing systems, creating an application or service that uses the model, or
leveraging the insights via on offline context such as a report to management.
Machine Learning Operations
Most of the time, after deployment, the model will need to be maintained and updated.
In this machine learning operations phase, the team monitors the model's performance
to ensure it's still working as expected, updating the model with new data, or refining the
model based on feedback from its users.
Furthermore, teams often need to go back to a previous phase (ex. Going from model
evaluation back to model development). This is to be expected and should be
considered a normal part of the Al life cycle (and not an issue with the Al development
team).
Importance of Each Stage in the Al
Project Life Cycle
Each stage in the Al project life cycle serves a vital role. The problem definition phase
establishes the project's direction. The data acquisition and preparation phase creates
the foundation for the Al solution, The model development and training phase turns this
foundation into a functional tool. Then, the model evaluation and refinement phase
ensures that the tool/model meets the expected standards. Finally, deployment brings
the Al solution to its intended users, and maintenance keeps it running smoothly over
time.
In addition, Al projects often need to adapt to changes quickly, whether these are
changes in the project's requirements, unexpected issues with the data, or new
developments in Al technology. Building this adaptability into the project life cycle can
be difficult but is crucial for long-term project success. This is where the use of an agile
framework can help.Benefits of Implementing a Robust
Al Project Life Cycle
Employing a structured Al project life cycle has numerous benefits:
© Increased Success Rate: A robust project life cycle helps ensure that each
necessary step in the development of an Al solution is followed, greatly
increasing the likelihood of project success.
© Risk Reduction: By flagging potential issues early in the process, a well-structured
project life cycle helps to mitigate risks. For example, during the problem
definition phase, if the problem isn't defined clearly, the project may lose
direction. Identifying this risk early on allows teams to refocus and avoid costly,
time-consuming revisions later on.
© Improved Efficiency and Productivity: A structured project life cycle streamlines,
the workflow, ensuring that everyone on the team understands their roles and
responsibilities at each stage. This clarity can significantly improve efficiency and
productivity, reducing the time to deployment.
© Enhanced Quality of Al Solution: By enforcing thoroughness and rigor at each
stage, a well-defined project life cycle enhances the quality of the final Al
solution. Rigorous evaluation and refinement ensure the Al solution performs as
expected, while regular maintenance and updates keep it running smoothly over
time.
© Enhanced Resource Allocation: Al projects require significant resources,
including time, human expertise, and computational power. Identifying and
balancing these resources across each phase of the life cycle can be challenging,
but being explicit about resource allocation across the life cycle can help the
team appropriately resource the project.
In short, a well-defined Al life cycle can help teams plan their Al projects more
effectively, maximizing their chances of success while minimizing potential hurdles.
An Example Al Project Life CycleLet’s explore the simple example of using the Al project life cycle in the development of
Abbased recommendation system (specifically Amazon's system recommending what
to purchase):
1. Problem Definition: The primary problem is clearly defined - to improve the
accuracy of product recommendations and thereby enhance the shopping
experience for users while driving increased sales
2. Data Acquisition and Preparation: Amazon collects vast amounts of user data,
including browsing history, purchase history, and ratings. These data points can
be identified as the critical information needed and collected / prepared for the
model development phase
3. Model Development and Training: Amazon uses many machine learning models,
such as collaborative filtering, to create their recommendation system. Their
models are trained with the prepared data to predict a customer's interests based
on similarities with other customers.
4, Model Evaluation and Refinement: The model is tested extensively, and its
predictions are compared to actual customer behavior to evaluate its accuracy.
Based on these tests, the model is continually refined and improved to increase
the precision of its recommendations.
5. Deployment: Once the recommendation model meets the performance
benchmark, it is deployed on Amazon's platform. The model now operates in
realtime, suggesting products to users based on their browsing and purchasing
behavior.
6. Machine Learning Operations: Post-deployment, the model is continually
monitored and updated. As user behavior and preferences evolve over time, the
model is retrained and updated to ensure its recommendations remain relevant
and accurate.
This is an example of the Al project life cycle in action, showcasing how each stage
plays an important role in delivering a successful Al solution. This Al life cycle works
equally well for building/refining generative Al models.
Iterating through the Al Life Cycle
Itis important to note that the Al life cycle should be thought of as an iterative process
that incrementally delivers a better solution. In other words, each of the life cycle
phases is typically revisited many times throughout an Al project.In the context of an Al project life cycle, an MVP (Minimal Viable Product) is a simplified
version of the Al solution that is developed as quickly as possible to validate the
underlying concept. It includes just enough features to be usable by early customers
who can provide feedback for future development. For example, in the model
development and training phase, rather than training the Al model on the entire data set,
an MVP might be trained on a subset of the data to speed up the development process.
This allows the team to quickly validate whether their approach is viable before
investing more resources.
By leveraging an MVP and gathering user feedback, teams can identify any issues or
areas for improvement early in the development process, making it easier to make
changes and enhancements before the full solution is rolled out.
Key Take-aways
Using a well-defined Al project life cycle should not be optional—it should be an integral
part of successful Al development. Embracing a life cycle approach can significantly
improve the efficiency, productivity, and overall success of Al projects, making it an
essential consideration for any team venturing into the world of Al.
For more information on Al project management, explore our post on 6 Concepts to
Help Lead an Al Team.
Or, if you want more structured training, explore our Al Project Management course,
which more deeply explores how to effectively do Al project management and build Al
systems. To best support your needs, we have a range of individual and team courses.