Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (1 vote)
899 views21 pages

Project 1

This document provides an introduction to analyzing real-world stock market data from the New York Stock Exchange. It outlines the goals of the project, which include calculating summary statistics, analyzing business metrics, and using models to forecast company growth. Students will use a cleaned dataset of financial data from 448 S&P 500 companies to complete the analysis and create a presentation and spreadsheet with their findings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
899 views21 pages

Project 1

This document provides an introduction to analyzing real-world stock market data from the New York Stock Exchange. It outlines the goals of the project, which include calculating summary statistics, analyzing business metrics, and using models to forecast company growth. Students will use a cleaned dataset of financial data from 448 S&P 500 companies to complete the analysis and create a presentation and spreadsheet with their findings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Kaggle's New York Stock Exchange S&P 500 dataset

Introduction
In this project we will analyze real life data from the New York Stock Exchange. You will be
drawing a subset of a large dataset provided by Kaggle that contains historical financial
data from S&P 500 companies. We have created a smaller subset of the data that you will
be using for the project.
What do I need to install?
You may use any spreadsheet application you like. This includes Google Sheets, Microsoft
Excel, etc.

Why this Project?


This project will introduce you to the data analysis process that you will be using throughout
the rest of the Nanodegree program. In this project, you will go through the process of
calculating summary statistics, drawing an inference from the statistics, calculating business
metrics and using models to forecast future growth prospects for the companies. The goal is
for you to perform an analysis and also create visual tools to communicate the results in
informative ways.

We have provided a clean data set for this project. Although in real life scenarios, data sets
often need to be cleaned and processed before analysis can proceed. This project allows
you to see what a clean data set should look like.

Background:
We used the Fundamentals.csv and Securities.csv files provided by Kaggle. The
Fundamentals file provides the fundamental financial data gathered from SEC 10K annual
filings from 448 companies listed on the S&P 500 index. The Securities file provided the
industry or sector information the companies are categorized under on the S&P 500 index.

What skills will I use?


The main goal of this project is for you to demonstrate your ability to:

 interpret the measures of central tendency and spread (mean, median, standard
deviation, range)
 use a combination of Excel or Google Sheets functions (e.g., IF statements, INDEX
and MATCH, calculating descriptive statistics with the IF statement, drop downs, data
validation, VLOOKUP).
 analyze and forecast financial business metrics using Excel or Google Sheets.
 create visualizations of a business metric and use Excel or Google Sheets to create
a financial forecast model.

Project Set Up
This project is made of two parts. For each part, you will be using the same dataset, which
you can find in the Supporting Materials as Projectdata NYSE.csv at the bottom of this
page. If you are using Google Spreadsheets, you can access the link to the data here:
1. The first part of the project is a set of quiz questions, which you will find in the
upcoming concepts. These concepts are aimed to help you get familiar with the dataset and
test that you have mastered the core concepts in the previous lessons. Correctly answering
each of the quiz questions will assure you are on the right track before you dive into the
second part of the project. This part of the project will not be submitted for review.

2. The second part of your project is the portion you will turn in for review. You will need
to create a presentation and spreadsheet to be reviewed. The details of this
submission are provided in the last page in this lesson. Pay attention to the details of
the Rubric to assure you have all deliverables. In order to have your presentation reviewed,
you will need to save your slides as a PDF. You can save your spreadsheet as a Microsoft
Excel workbook or Google spreadsheet.

Supporting Materials
 Projectdata NYSE

Cleaning Up The Data


Although you do not need to follow these for setting up the dataset, these are some
suggestions:

1. Change all the column names to have no spaces, but still be informative. This isn't
necessary, but just a recommendation. Depending on what you do with the data in the
future having spaces or special characters in the column names may not work nicely. You
will see this in the next content on SQL.

The following information is included in the Project data NYSE file:


 Ticker symbol: Stock symbol
 Years: Number of years for which data is provided
 Period ending
 Total revenue
 Cost of goods sold
 Sales, General and Administrative expenses
 Research and Development expenses
 Other Operating expense items
 Global Industry Classification Standard (GICS) Sector: Industry sector the
company is categorized under (e.g., American Airlines with the ticker symbol AAL is
categorized under Industrials.)
 GICS Sub Industry: Sub-industry sector the company is categorized under (e.g.,
AAL is further categorized under the sub-category of Airlines industry.)
Here is the link to the Business Metrics lesson in case you would like to review the material
on the line items within the Income Statement.

NEXT
Project:Analyze NYSE Data

SEARCH
RESOURCES

CONCEPTS

1.  1. Project Overview

2.  2. Understanding the Data

3.  3. Quiz 1: Exploring the Data

4.  4. Quiz 2: Exploring the Data

5.  5. Project Details

6.  6. Finished Example Slide

7.  7. Helpful Ideas

8.  8. Project: Analyze NYSE Data


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

Analyze NYSE Data


SUBMIT PROJECT

Project Submission

Have project questions? Ask a technical mentor or search for existing answers!


ASK A MENTOR

  D UE D AT E
Feb 10
  S TA TU S
Unsubmitted
Project past due
Project Submission
For the final project, you will conduct your own data analysis and create a presentation file
to share your findings. You will also create an Excel workbook that contains your
calculations of the summary statistics, dashboard and forecast scenarios.

Evaluation
Use the Project Rubric to review your project. If you are happy with your submission, then
you are ready to submit! If you see room for improvement in any category in which you do
not meet specifications, keep working!
Your project will be evaluated by a Udacity reviewer according to the same Project Rubric.
Your project must "meet specifications" in each category in order for your submission to
pass.

Submission

What to include in your submission:


1. A presentation file that should include a slide with:
 A statement of the question you posed
 Summary statistics and plots communicating your final results
Formatting of your submission:
 Feel free to use our template to develop the presentation.
 In order to submit your presentation for review, you will need to save your
slides as a PDF or PowerPoint PPT file. You can create a PDF file from within Google
Slides by selecting File > Download as > PDF Document.
2. Excel Workbook or Google Sheets with tabs for each of the following:
 Dataset
 Summary statistics
 Profit and Loss statement dashboard
 Forecast model for three scenarios
 Your workbook can include additional tabs you may need for your project
(e.g., pivot tables).
Formatting of your submission:
 The Forecast model should be set to the company of your choice. You
may choose to have the ticker symbol or name of the company in text at the top for the
forecast model.
 You will also need to save your spreadsheet in .xlsx format OR provide
a link to your Google Sheets. You should provide the link to the Google sheet in your
presentation slides since Google Sheets formulas do not download properly into Excel
and the reviewer will not be able to see all the formulas.
3. A list of websites, books, forums, blog posts, etc. that you referred to or used in
creating your submission (add N/A if you did not use any such resources).
4. Zip (compress) the folder and submit this zipped folder with both files in it.
5. Ready to submit your project? Please review and confirm the following items.
 I am confident all rubric items have been met and my project will pass as submitted.

 Project builds correctly without errors and runs.

 All required functionality exists and my project behaves as expected per the project's
specifications.

Once you have checked all these items, click on the "Submit Project" button and
follow the instructions to submit!
It can take us up to a week to grade the project, but in most cases, it is much faster. You will
get an email when your submission has been reviewed.
;

QUESTION 1 OF 4

Which company posted the maximum total revenue in any individual year?



WMT

AAPL

CVX

EXR
SUBMIT

QUESTION 2 OF 4

Match the mean of Total Revenue across all the companies for each of the
calendar years for which we have data.
2014
2012
2016
2015
2013
MEAN OF TOTAL REVENUE
YEARS
$18,243,540,884.96
$20,304,113,413.30
$21,113,807,543.06
$20,371,836,512.88
$24,577,107,967.14
SUBMIT

QUESTION 3 OF 4

Now calculate the descriptive statistics for the Total Revenue column for all
the companies across all years. Based on the descriptive statistics for Total
Revenue, which of the following conclusions are true?

The mean is the same as the median for Total Revenue.

The distribution for Total Revenue is positively or right-skewed.

Fifty percent of the Total Revenue amounts reported are higher than $9,968,000,000.

The mean is higher than the median for Total Revenue.

Fifty percent of the Total Revenue amounts reported are higher than $8,077,927,500.


SUBMIT

QUESTION 4 OF 4

Now calculate the descriptive statistics for R&D expenses for the whole data
set. Based on the descriptive statistics for R&D expenses, which of the
following conclusions are true?

Fifty percent of reported R&D expenses equal $0.

The median is equal to the mode.


The graph representing R&D expenses is positively or right-skewed.

More than 75% of the reported R&D expenses are less than $95,750,000 .
SUBMIT
As you can see from the scatter plot below, the distribution is not a normal distribution. The
majority of financial statements for all companies report $0 are spent on R&D expenses.

NEXT
;

Project:Analyze NYSE Data

SEARCH
RESOURCES

CONCEPTS

1.  1. Project Overview

2.  2. Understanding the Data

3.  3. Quiz 1: Exploring the Data

4.  4. Quiz 2: Exploring the Data

5.  5. Project Details

6.  6. Finished Example Slide

7.  7. Helpful Ideas

8.  8. Project: Analyze NYSE Data


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

Quiz 2: Exploring the Data


SEND FEEDBACK

QUESTION 1 OF 5

Which two columns should you use to create a column for gross profit?

Total revenue

Other operating items

Cost of goods sold


SUBMIT

QUESTION 2 OF 5
Which expenses should be included in the Total Operating Expenses?

Sales, General and Admin.

Other operating items

Research & Development


SUBMIT

QUESTION 3 OF 5

What is the Operating Income reported by EBAY in 2014?



$7,127,000,000

$3,459,000,000

$2,476,000,000
SUBMIT

QUESTION 4 OF 5

Which of these metrics are commonly used to forecast financial models?



Revenue growth

Operating margin

Gross margin


Cost of Goods Sold
SUBMIT

QUESTION 5 OF 5

Match the formulas for each of the following concepts.


[Total Operating Expenses/Total Revenue]
[Operating Income/Total Revenue]
= Prior year's revenue x (1 + Revenue Growth)
[Operating Margin * Total Revenue]
= [1-(Cost of Goods Sold/Total Revenue)]
[Cost of Revenue/Total Revenue]
CONCEPT
FORMULAS
Forecast for Total Revenue
Gross Margin
Operating Margin
Forecast for Operating Income
SUBMIT

NEXT

; Project:Analyze NYSE Data

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Overview

2.  2. Understanding the Data

3.  3. Quiz 1: Exploring the Data

4.  4. Quiz 2: Exploring the Data

5.  5. Project Details

6.  6. Finished Example Slide


7.  7. Helpful Ideas

8.  8. Project: Analyze NYSE Data


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

Project Details
SEND FEEDBACK

How Do You Complete this Project?


This project is connected with the Introduction to the Data part of the course, but depending
on your background knowledge, you may not need to take this module to complete this
project.

Introduction
For the final project, you will conduct three tasks: 1) complete your own data analysis and
create a presentation to share your findings, 2) develop a dashboard for a Profit and Loss
Statement, and 3) create a Financial Forecasting Model using three scenarios. You should
start by taking a look at your dataset and brainstorming which sub-category and company
you want to focus your data analysis on - the questions leading to this page should have
assisted in this process! Then you should use spreadsheets or another Excel-like software
to conduct your analysis and choose a sub-category and company you are most interested
in. This project is open-ended in that there is no one right answer.

Project Goals:
Here are the three tasks that you will complete in the final project.

Task 1:
a. Identify the question about the data that you will answer based on your data analysis, and
include this in your slide presentation.
 Your question should include at least one categorical variable (GICS Sector or
GICS Sub Industry) and one quantitative variable (one of the financial metrics) and
require the use of at least one of the summary statistics.
 A tab within the Excel spreadsheet that you submit should include the summary
statistics [measures of central tendency (e.g., mean, median) and measures of spread
(standard deviation and range)] you used to answer your question.
 Deliverable: Slide presentation, Spreadsheet with tab for Summary statistics
b. Your slide presentation should provide at least one visualization to help with your
answer.
 This visualization might be a bar chart, histogram, scatterplot, box-plot or other visual
that you learned to make. Include your insights from the measures of center and spread and
at least one numeric summary statistic in the description.
 Deliverable: Slide presentation (includes visualization)
Task 2:
 Create a dashboard for a Profit and Loss Statement that calculates the Gross
Profit, Operating Profit or EBIT for a company selected from a drop-down list.
 Your drop-down list should pull historical fundamentals data to create the P&L
Statement.
 The P&L statement should include the Gross Profit, Operating Profit or EBIT
values for all the years there is historical data available for that company in the
dataset.
 Deliverable: Spreadsheet with tab for Dynamic P&L statement
Task 3:
 Create a financial model for a company (different from Task 2) of your choice that
forecasts out the Gross Profit, Operating Profit or EBIT for two more years using three
scenarios (Best case, Weak case and Base case).
 Your assumptions for revenue growth, gross margin and operating margin should
change for each scenario.
 The forecasting model should be dynamic for the selection of the case (Weak, Base,
Strong). However, the forecasting model can be static for the chosen company sticker
symbol.
 Deliverable: Spreadsheet with tab for Forecasting Model
Step One - Get Organized
When you complete your analysis and presentation you’ll want to submit your project. Get
organized before you begin. I recommend creating a single folder that
will eventually contain:
 The presentation with the visual and summary
 The original data set
 A copy of the spreadsheet workbook you will use to do the analysis for your report
that contains at least the following tabs:
1. Data file
2. Summary statistics
3. P&L Statement Dashboard
4. Forecast scenarios
Step Two - Analyze Your Data
Look through the Tasks described above and select the qualitative variable and quantitative
variable you want to focus your analysis on for the various tasks. Then use the .csv file to
conduct your data analysis.
Step Three - Create Your Presentation
Once you have finished analyzing the data, create a presentation that shares the visual and
summary paragraph. The summary paragraph should clearly communicate your findings
based on your analysis, and provide visual or numeric values associated with your
summary.
SUBMISSION TEMPLATE

The submission template is a Google Slides file. Make a copy of the submission template to
complete your project. We suggest you use the layout provided, though it is not a
requirement.
Step Four - Assemble your Worksheet You will need to include the Excel file with the
summary statistics, dashboard and financial model scenarios.

Put your presentation and spreadsheet workbook you used to do the analysis in a
folder and zip it. Then submit the zipped folder for your project.
Step Five - Check the Rubric
Use the Project Rubric located here. If you see room for improvement, keep working to
improve your project.
Step Six - Assemble your folder ready for submission
If you are happy with your submission, then you're ready to submit your project. Put your
presentation and spreadsheet workbook in a folder and zip it. Then submit the zipped folder
for your project.
NEXT

; Project:Analyze NYSE Data

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Overview

2.  2. Understanding the Data

3.  3. Quiz 1: Exploring the Data

4.  4. Quiz 2: Exploring the Data

5.  5. Project Details

6.  6. Finished Example Slide

7.  7. Helpful Ideas

8.  8. Project: Analyze NYSE Data


 Mentor Help
Ask a mentor on our Q&A platform
 Peer Chat
Chat with peers and alumni

Finished Example Slide


SEND FEEDBACK

Finished Example Slide

The above slide and graphs were generated with the project data and are meant to be
examples. You can see how this example slide meets the rubric requirements.

1. Clear question in the title indicating what is being investigated


2. Descriptive title on each chart describing its contents
3. y-axis title
4. x-axis title
5. Detailed insight based on the descriptive statistics.
6. Summary statistics about the data
If you have more questions about what you need in your project, double check the rubric.
NEXT

; Project:Analyze NYSE Data

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Overview

2.  2. Understanding the Data

3.  3. Quiz 1: Exploring the Data

4.  4. Quiz 2: Exploring the Data

5.  5. Project Details

6.  6. Finished Example Slide

7.  7. Helpful Ideas

8.  8. Project: Analyze NYSE Data


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

Helpful Ideas
SEND FEEDBACK

Helpful Ideas
Based on previous project submissions, this page is meant to review some ideas that are
commonly missed.
Plots
In the last Excel lesson you were introduced to some ways to visually display your data.
However, you should know that the plots you can make are tied to different data types. We
go over those once again here.

Plots You Can Use For Categorical Variables


If you have categorical data, here is a list of the possible univariate (one variable) plots you
can make:

1. Bar Chart
2. Pie Chart

Plots You Can Use For Quantitative Variables


If you have quantitative data, here is a list of the possible univariate plots you can make:

1. Histogram
2. Box Plot

Plots to Compare 2 Variables


If you are interested in comparing two quantitative variables, then the main way to perform
this comparison is with a scatterplot. However, if one of the variables is related to time, then
a line plot is frequently used.

Statistics
Quantitative Variables
When describing quantitative variables, it is common to use the statistics discussed earlier:

1. Measures of center - mean, median, mode


2. Measures of spread - standard deviation, range, IQR

Categorical Variables
However, when you are analyzing categorical variables, measures of center and spread Do
Not make sense.
In cases of describing categorical variables, you need to use percentages or counts. Not
means, medians, modes, standard deviations, or ranges.

Important Last Thought


With this in mind, think of the variable type of the columns you are analyzing, and determine
which plots and statistics make sense for your analysis.
NEXT
; Project:Analyze NYSE Data

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Overview

2.  2. Understanding the Data

3.  3. Quiz 1: Exploring the Data

4.  4. Quiz 2: Exploring the Data

5.  5. Project Details

6.  6. Finished Example Slide

7.  7. Helpful Ideas

8.  8. Project: Analyze NYSE Data


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

Analyze NYSE Data


SUBMIT PROJECT

Project Submission

Have project questions? Ask a technical mentor or search for existing answers!


ASK A MENTOR

  D UE D AT E
Feb 10
  S TA TU S
Unsubmitted
Project past due

Project Submission
For the final project, you will conduct your own data analysis and create a presentation file
to share your findings. You will also create an Excel workbook that contains your
calculations of the summary statistics, dashboard and forecast scenarios.

Evaluation
Use the Project Rubric to review your project. If you are happy with your submission, then
you are ready to submit! If you see room for improvement in any category in which you do
not meet specifications, keep working!
Your project will be evaluated by a Udacity reviewer according to the same Project Rubric.
Your project must "meet specifications" in each category in order for your submission to
pass.

Submission

What to include in your submission:


1. A presentation file that should include a slide with:
 A statement of the question you posed
 Summary statistics and plots communicating your final results
Formatting of your submission:
 Feel free to use our template to develop the presentation.
 In order to submit your presentation for review, you will need to save your
slides as a PDF or PowerPoint PPT file. You can create a PDF file from within Google
Slides by selecting File > Download as > PDF Document.
2. Excel Workbook or Google Sheets with tabs for each of the following:
 Dataset
 Summary statistics
 Profit and Loss statement dashboard
 Forecast model for three scenarios
 Your workbook can include additional tabs you may need for your project
(e.g., pivot tables).
Formatting of your submission:
 The Forecast model should be set to the company of your choice. You
may choose to have the ticker symbol or name of the company in text at the top for the
forecast model.
 You will also need to save your spreadsheet in .xlsx format OR provide
a link to your Google Sheets. You should provide the link to the Google sheet in your
presentation slides since Google Sheets formulas do not download properly into Excel
and the reviewer will not be able to see all the formulas.
3. A list of websites, books, forums, blog posts, etc. that you referred to or used in
creating your submission (add N/A if you did not use any such resources).
4. Zip (compress) the folder and submit this zipped folder with both files in it.
5. Ready to submit your project? Please review and confirm the following items.

 I am confident all rubric items have been met and my project will pass as submitted.

 Project builds correctly without errors and runs.

 All required functionality exists and my project behaves as expected per the project's
specifications.

Once you have checked all these items, click on the "Submit Project" button and
follow the instructions to submit!
It can take us up to a week to grade the project, but in most cases, it is much faster. You will
get an email when your submission has been reviewed.
;

You might also like