0% found this document useful (0 votes)

4 views20 pages

Unit IV Part1

The document outlines best practices for documentation and deployment in data science projects, emphasizing the use of the knitr R package for reproducible milestone documentation and the importance of effective comments and version control. It highlights the need for clear presentations to project sponsors, focusing on business needs rather than technical details, and suggests methods for deploying models, including HTTP services and exporting models for collaboration. Key takeaways include maintaining thorough documentation, utilizing version control, and ensuring models are accessible for testing and experimentation.

Uploaded by

wipawil697

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views20 pages

Unit IV Part1

Uploaded by

wipawil697

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Documentation and Deployment

The buzz data is structured as shown below:

Using knitr to produce milestone documentation (knitr is an engine for dynamic report
generation with R)

For self/peer documentation, you want to concentrate on facts: what the stated goals were, where
the data came from, and what techniques were tried. You assume as long as you use standard
terminology or references that the reader can figure out anything else they need to know. You
want to emphasize any surprises or exceptional issues, as they’re exactly what’s expensive to
relearn. You can’t expect to share this sort of documentation with clients.

The first sort of documentation we recommend is project milestone or checkpoint

documentation. At major steps of the project you should take some time out to repeat your work
in aclean environment, . we’ll use the knitr R package to document starting work with the buzz
data.

What is knitr?

knitr is an R package that allows the inclusion of R code and results inside documents. knitr’s
operation is similar in concept to Knuth’s literate programming and to the R Sweave package. In
practice you maintain a master file that contains both user readable documentation and chunks of
program source code. The document types supported by knitr include LaTeX, Markdown, and
HTML. LaTeX format is a good choice for detailed typeset technical documents. Markdown
format is a good choice for online documentation and wikis. Direct HTML format may be
appropriate for some web applications.

knitr process schematic

A simple knitr Markdown example

Markdown (http://daringfireball.net/projects/markdown/) is a simple web-ready format that’s

used in many wikis. The following listing shows a simple Markdown document with knitr
annotation blocks denoted with ```.
knitr LaTeX example

LaTeX to create the final add.pdf file:

Simple knitr LaTeX result

Purpose of knitr

The purpose of knitr is to produce reproducible work. When you distribute your work in knitr
format, anyone can download your work and, without great effort, rerun it to confirm they get
the same results you did. This is the ideal standard of scientific research, but is rarely met, as
scientists usually are deficient in sharing all of their code, data, and actual procedures. knitr
collects and automates all the steps, so it becomes obvious .

knitr chunk options: A sampling of useful option assignments is given in table

Using comments and version control for running documentation

Another essential record of your work is what we call running documentation. Running
documentation is more informal than milestone/checkpoint documentation and is easiest
maintained in the form of code comments and version control records.

R’s comment style is simple: everything following a # (that isn’t itself quoted) until the end of a
line is a comment and ignored by the R interpreter. The following listing is an example of a well
commented block of R code.

Example code comment

# Return the pseudo logarithm of x, which is close to

# sign(x)*log10(abs(x)) for x such that abs(x) is large

# and doesn't "blow up" near zero. Useful

# for transforming wide range

variables that may be negative

# (like profit/loss).

# See: http://www.win-vector.com/blog

Good comments include what the function does, what types arguments are expected to be, limits
of domain, why you should care about the function, and where it’s from. Of critical importance
are any NB or TODO notes. It’s vastly more important to document any unexpected features or
limitations in your code than to try to explain the obvious. Because R variables don’t have types.
Using version control to record history

Version control can both maintain critical snapshots of your work in earlier states and produce
running documentation of what was done by whom and when in your project.

Version control saving the day

The basics of using Git as a version control system.

Familiar with a few commands:

git init .

git add -A .

git commit

git status

git log

git diff

git checkout
A possible project directory structure

Starting a Git project using the command line

When you’ve decided on your directory structure and want to start a version-controlled project,
do the following:

1.Start the project in a new directory. Place any work either in this directory or in subdirectories.

2. Move your interactive shell into this directory and type git init .. It’s okay if you’ve already
started working and there are already files present.

3. Exclude any subdirectories you don’t want under source control with .git ignore control files.

Using add/commit pairs to checkpoint work

As often as practical, enter the following two commands into an interactive shell in your project
directory:

A good rule of thumb for Git: you should be as nervous about having uncommitted changes as you
should be about not having clicked Save. You don’t need to push/pull often, but you do need to make
local commits often (even if you later squash them with a Git technique called rebasing). Any time you
want to know about your work progress, type either git status to see if there are any edits you can put
through the add/commit cycle, or git log to see the history of your work.
Using Git through RStudio

The RStudio IDE supplies a graphical user interface to Git that you should try. The add/commit
cycle can be performed as follows in RStudio:

Start a new project. From the RStudio command menu, select Project > Create Project, and
choose New Project. Then select the name of the project, what directory to create the new project
directory in, leave the type as (Default), and make sure Create a Git Repository for this Project is
checked. When the new project pane looks something like figure , click Create Project, and you
have a new project.

Do some work in your project. Create new files by selecting File > New > R Script. Type some
R code(like 1/5) into the editor pane and then click the Save icon to save the file. When saving
the file, be sure to choose your project directory or a subdirectory of your project. Commit your
changes to version control.

Using version control to explore your project

Git is ready to

 Help you with any of the following tasks:

 Tracking your work over time Recovering a deleted file

 Comparing two past versions of a file

 Finding when you added a specific bit of text

 Recovering a whole file or a bit of text from the past (undo an edit)

 Sharing files with collaborators

 Publicly sharing your project (à la GitHub at https://github.com/, or Bitbucket at

https://bitbucket.org)

 Maintaining different versions (branches) of your work And that’s why you want to add
and commit often.

Getting help on Git

For any Git command, you can type git help [command] to get usage information. For example,
to learn about git log, type git help log. The main ways to view the detailed history of your
project are command-line tools like git log --graph --name-status and GUI tools such as RStudio
and gitk. A Git commit represents the complete state of a directory tree at a given time. A Git branch
represents a sequence of commits and changes as you move through time. Commits are immutable;
branches record progress.

The usual shared workflow is like this:

Continuously: work, work, work.

Frequently: commit results to the local repository using a git add/git commit pair.

Every once in a while: pull a copy of the remote repository into our view with some variation of
git pull and then use git push to push work upstream.

The main rule of Git is this: don’t try anything clever (push/pull, and so on) unless you’re in a
“clean” state (everything committed, confirmed with git status).

The new Git commands you need to learn are these:

git push (usually used in the git push -u origin master variation)

git pull (usually used in the git fetch; git merge -m pull master origin/master or git pull --

rebase origin master variations)

git pull: rebase versus merge

Merging is what’s really happening, but rebase is much simpler to read. The general rule is that
you should only rebase work you haven’t yet shared (in our example, Worker B should feel free
to rebase their edits to appear to be after Worker A’s edits, as Worker B hasn’t yet successfully
pushed their work anywhere). You should avoid rebasing records people have seen,as you’re
essentially hiding the edit steps they may be basing their work on
Deploying models

A successful data science project should include at least a demonstration deployment of any
techniques and models developed. Good documentation and presentation are vital, but at some
point people have to see things working and be able to try their own tests. We strongly encourage
partnering with a development group to produce the actual production-hardened version of your
model, but a good demonstration helps recruit these collaborators.
Deploying models as R HTTP services One easy way to demonstrate an R model in operation
is to expose it as an HTTP service.
Listing shows how to call the HTTP service
Deploying models by export

it often makes sense to export a copy of the finished model from R, instead of attempting to
reproduce all of the details of model construction. When exporting a model, you’re depending on
development partners to handle the hard parts of hardening a model for production. Software
engineers tend to be good at project management and risk control, so export projects are also a
good opportunity to learn.

The structure of our random forest model is large but simple: a big collection of decision trees.
But the construction is time-consuming and technical. The idea is this: it can be easier to fax a
friend a solved Sudoku puzzle than to teach them your entire solution strategy.

Exporting the random forest model

A decision tree is a series of tests traditionally visualized as a diagram of decision nodes, as shown in the
top portion of the figure. The content of a decision tree is easy to store in a table where each table row
represents the facts about the decision node
Key takeaways

Use knitr to produce significant reproducible milestone/checkpoint documentation.

Write effective comments.

Use version control to save your work history.

Use version control to collaborate with others.

Make your models available to your partners for experimentation and testing.

Producing effective presentations

Table summarizes the relevant entities in our scenario, including products that are sold by our
company and by competitors.
Presenting your results to the project sponsor

The project sponsor is the person who wants the data science result—generally for the business
need that it will fill. Though project sponsors may have technical or quantitative backgrounds
and may enjoy hearing about technical details and nuances, their primary interest is business-
oriented, so you should discuss your results in terms of the business problem, with a minimum of
technical detail.

we recommend a structure similar to the following:

1. Summarize the motivation behind the project, and its goals.

2. State the project’s results.

3. Back up the results with details, as needed.

4. Discuss recommendations, outstanding issues, and possible future work.

Some people also recommend an “Executive Summary” slide: a one-slide synopsis of steps 1 and
2.

we’ll concentrate on the content of the presentations, rather than the visual format of the slides.
In an actual presentation, you’d likely prefer more visuals and less text than the slides that we
provide here.

Summarizing the project’s goals :

Let’s put together the goal slides for the WVCorp buzz model example. In our example, eRead is
WVCorp’s ebook reader, which led the market until our competitor released a new version of
their e-book reader, BookBits. The new version of BookBits has a shared-bookshelves feature
that eRead doesn’t provide—though many eRead users expressed the desire for such
functionality on the forums. Unfortunately, forum traffic is so high that product managers have a
hard time keeping up, and somehow missed detecting this expression of users’ needs. Hence,
WVCorp lost market share by not anticipating the demand for the shared-bookshelf feature.

Motivation f or project
Stating the project goal

Stating the project’s results

the presentation briefly describes what you did, and what the results were, in the context of the
business need.

Filling in the details

Once your audience knows what you’ve done, why, and how well you’ve succeeded (from a
business point of view), you can fill in details to help them understand more. As before, try to
keep the discussion relatively nontechnical and grounded in the business process. A description
of where the model fits in the business process or workflow and some examples of interesting
findings.
“How it Works” slide in shows where the buzz model fits into a product manager’s workflow

The bottom slide of figure presents an interesting finding from the project

Optional slide on the modeling method

Making recommendations and discussing future work

No project ever produces a perfect outcome, and you should be up-front (but optimistic) about the
limitations of your results. In the buzz model example, we end the presentation by listing some
improvements and follow-ups that we’d like to make.

Discussing future work

The
project sponsor presentation focuses on the big picture and how your results help to better address a
business need.

Project sponsor presentation takeaways

the project sponsor presentation:

Keep it short.
Keep it focused on the business issues, not the technical ones.
Your project sponsor might use your presentation to help sell the project or its results to the rest of the
organization. Keep that in mind when presenting background and motivation.
Introduce your results early in the presentation, rather than building up to them.

The Autoimmune Epidemic by Human Garage
No ratings yet
The Autoimmune Epidemic by Human Garage
12 pages
08 Git Notes
No ratings yet
08 Git Notes
30 pages
Git Notes ?-1
No ratings yet
Git Notes ?-1
71 pages
Interlocking Paver Block Making Cost: Top Layer 500
No ratings yet
Interlocking Paver Block Making Cost: Top Layer 500
5 pages
08 Git
No ratings yet
08 Git
37 pages
Week 2 Reproducibility in Practice
No ratings yet
Week 2 Reproducibility in Practice
25 pages
Incredible English. Unit 8
No ratings yet
Incredible English. Unit 8
4 pages
Professional Git 1st Edition Brent Laster PDF Download
100% (1)
Professional Git 1st Edition Brent Laster PDF Download
49 pages
E-Poster Clinical Project
No ratings yet
E-Poster Clinical Project
1 page
1 Git
No ratings yet
1 Git
57 pages
Git Basics Vtu
No ratings yet
Git Basics Vtu
11 pages
I'm Yours Lyrics for Singers
No ratings yet
I'm Yours Lyrics for Singers
2 pages
MATLAB & Git: Beginner's Guide
No ratings yet
MATLAB & Git: Beginner's Guide
13 pages
Lecture 1
No ratings yet
Lecture 1
34 pages
Git and Github
No ratings yet
Git and Github
64 pages
Lab-Manual Git Add Push
No ratings yet
Lab-Manual Git Add Push
3 pages
Git Lecture-Unit 2
No ratings yet
Git Lecture-Unit 2
73 pages
Using Git and Tools
No ratings yet
Using Git and Tools
22 pages
Devops
No ratings yet
Devops
24 pages
Advanced Git: Branching & Merging
No ratings yet
Advanced Git: Branching & Merging
38 pages
Introduction To Git: Arunan J Neeraj N Lokhith
No ratings yet
Introduction To Git: Arunan J Neeraj N Lokhith
15 pages
09-Git
No ratings yet
09-Git
53 pages
Final IInd Year Syllabus of BAMS
67% (3)
Final IInd Year Syllabus of BAMS
22 pages
Git and Github Lesson
No ratings yet
Git and Github Lesson
51 pages
GIT FirstPart
No ratings yet
GIT FirstPart
22 pages
Piping Codes & Standards Guide
100% (1)
Piping Codes & Standards Guide
17 pages
Git Lec1
No ratings yet
Git Lec1
27 pages
Introduction To Git: BY: Manoj
No ratings yet
Introduction To Git: BY: Manoj
31 pages
Lab3 Git
No ratings yet
Lab3 Git
6 pages
Git and GitHub
No ratings yet
Git and GitHub
12 pages
Installing/learning Git
No ratings yet
Installing/learning Git
17 pages
Automation Manual
No ratings yet
Automation Manual
250 pages
Git Work
No ratings yet
Git Work
25 pages
Git and Github Workshop
No ratings yet
Git and Github Workshop
18 pages
02b Lecture - Git - GitHub Intro
No ratings yet
02b Lecture - Git - GitHub Intro
24 pages
Devsecops Viva - Edited
No ratings yet
Devsecops Viva - Edited
23 pages
Rys Git Tutorial
No ratings yet
Rys Git Tutorial
39 pages
Git and GitHub
No ratings yet
Git and GitHub
40 pages
Git & Version Control
No ratings yet
Git & Version Control
6 pages
Version Control
No ratings yet
Version Control
15 pages
Introduction To Git and GitHub
No ratings yet
Introduction To Git and GitHub
10 pages
Introduction To GIT
100% (1)
Introduction To GIT
25 pages
Git & GitHub for Developers
No ratings yet
Git & GitHub for Developers
26 pages
Selenium Framework & Git Commands Guide
No ratings yet
Selenium Framework & Git Commands Guide
11 pages
Saloni SCM Final File
No ratings yet
Saloni SCM Final File
27 pages
Slides Git First Steps
No ratings yet
Slides Git First Steps
171 pages
Git Github
No ratings yet
Git Github
1 page
Introduction To Git and GitHub
No ratings yet
Introduction To Git and GitHub
3 pages
Python & Git Reduced
No ratings yet
Python & Git Reduced
2 pages
A+ Blog SSLC Biology Chapter 1 Genetics of Life PDF Note (Em)
No ratings yet
A+ Blog SSLC Biology Chapter 1 Genetics of Life PDF Note (Em)
5 pages
COSC Git Workshop
No ratings yet
COSC Git Workshop
23 pages
Basic Git
No ratings yet
Basic Git
87 pages
Git and Github
No ratings yet
Git and Github
17 pages
Version Control: Why Git?
No ratings yet
Version Control: Why Git?
5 pages
Data Science Course for Beginners
No ratings yet
Data Science Course for Beginners
34 pages
H and M Hennes and Mauritz Retail Private Limited
No ratings yet
H and M Hennes and Mauritz Retail Private Limited
20 pages
Concept of Version Control System
No ratings yet
Concept of Version Control System
2 pages
Introduction To Literary Theory Syllabus
No ratings yet
Introduction To Literary Theory Syllabus
2 pages
Installing Git
No ratings yet
Installing Git
7 pages
8 Version Control - Notes
No ratings yet
8 Version Control - Notes
8 pages
Amrit Navy Form 2023
No ratings yet
Amrit Navy Form 2023
2 pages
Prodigius GIT
0% (1)
Prodigius GIT
26 pages
An Intro To Git - Github
No ratings yet
An Intro To Git - Github
7 pages
Git To Know Git: An 8 Minute Introduction
No ratings yet
Git To Know Git: An 8 Minute Introduction
6 pages
Data Science Tools for Beginners
No ratings yet
Data Science Tools for Beginners
29 pages
1.01 01 - Introduction To Git
No ratings yet
1.01 01 - Introduction To Git
15 pages
Gittutorial (7) Manual Page: The Git User's Manual
No ratings yet
Gittutorial (7) Manual Page: The Git User's Manual
13 pages
cst438 Midterm
No ratings yet
cst438 Midterm
7 pages
Constantine, Sirmium & Early Christianity
No ratings yet
Constantine, Sirmium & Early Christianity
82 pages
Year-End Break & New Session Notice
No ratings yet
Year-End Break & New Session Notice
2 pages
Merge Conflicts: Git Mergetool - Tool Meld
No ratings yet
Merge Conflicts: Git Mergetool - Tool Meld
6 pages
No Due III B
No ratings yet
No Due III B
3 pages
The Ultimate 5-Ingredient Cookbook - Fast and Flavorful 5 Ingredients or Less Recipes For Any Skill Leve
100% (1)
The Ultimate 5-Ingredient Cookbook - Fast and Flavorful 5 Ingredients or Less Recipes For Any Skill Leve
105 pages
Pubmed Microneedl Set
No ratings yet
Pubmed Microneedl Set
3 pages
Week4 M
No ratings yet
Week4 M
29 pages
Full Download Linux Fundamentals Second Edition Richard Blum PDF
No ratings yet
Full Download Linux Fundamentals Second Edition Richard Blum PDF
40 pages
Connect Representations of Functions
No ratings yet
Connect Representations of Functions
2 pages
Additional Illustration 17
No ratings yet
Additional Illustration 17
2 pages
Employment Law (Palgrave Law Masters) (PDFDrive)
No ratings yet
Employment Law (Palgrave Law Masters) (PDFDrive)
521 pages
Handbook of Ethics in Quantitative Methodology 1st Edition A. T. Panter All Chapters Instant Download
100% (4)
Handbook of Ethics in Quantitative Methodology 1st Edition A. T. Panter All Chapters Instant Download
84 pages
Fertility Cycle Tracking Data
No ratings yet
Fertility Cycle Tracking Data
1 page
FlowCon Green DN15 40 Tech Note 2024 03 EN
No ratings yet
FlowCon Green DN15 40 Tech Note 2024 03 EN
10 pages
SoftTest03022023 0937
No ratings yet
SoftTest03022023 0937
5 pages
Law Courses and Faculty List
No ratings yet
Law Courses and Faculty List
131 pages
BA Underpayment Appeal Letter - NSA MNRP
No ratings yet
BA Underpayment Appeal Letter - NSA MNRP
3 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
23 pages
Online Registration Manual
No ratings yet
Online Registration Manual
14 pages
THC222 3
No ratings yet
THC222 3
8 pages

Unit IV Part1

Uploaded by

Unit IV Part1

Uploaded by

Documentation and Deployment

The buzz data is structured as shown below:

The first sort of documentation we recommend is project milestone or checkpoint

knitr process schematic

Markdown (http://daringfireball.net/projects/markdown/) is a simple web-ready format that’s

LaTeX to create the final add.pdf file:

Simple knitr LaTeX result

knitr chunk options: A sampling of useful option assignments is given in table

Example code comment

# Return the pseudo logarithm of x, which is close to

# sign(x)*log10(abs(x)) for x such that abs(x) is large

# and doesn't "blow up" near zero. Useful

# for transforming wide range

variables that may be negative

Version control saving the day

The basics of using Git as a version control system.

Familiar with a few commands:

Starting a Git project using the command line

Using add/commit pairs to checkpoint work

Using version control to explore your project

 Help you with any of the following tasks:

 Tracking your work over time Recovering a deleted file

 Finding when you added a specific bit of text

 Sharing files with collaborators

 Publicly sharing your project (à la GitHub at https://github.com/, or Bitbucket at

Getting help on Git

The usual shared workflow is like this:

Continuously: work, work, work.

The new Git commands you need to learn are these:

rebase origin master variations)

git pull: rebase versus merge

Exporting the random forest model

Use knitr to produce significant reproducible milestone/checkpoint documentation.

Write effective comments.

Use version control to save your work history.

Use version control to collaborate with others.

Producing effective presentations

we recommend a structure similar to the following:

1. Summarize the motivation behind the project, and its goals.

2. State the project’s results.

3. Back up the results with details, as needed.

4. Discuss recommendations, outstanding issues, and possible future work.

Summarizing the project’s goals :

Stating the project’s results

Filling in the details

Optional slide on the modeling method

Discussing future work

Project sponsor presentation takeaways

the project sponsor presentation:

You might also like