0% found this document useful (0 votes)

25 views31 pages

Attachment 3 Python For Data Analysis Lyst9850

The document provides an overview of Python libraries NumPy and Pandas for data analysis, covering installation, array creation, basic operations, and data structures. It explains key features such as NumPy's n-dimensional arrays, broadcasting, and Pandas' Series and DataFrames for handling and analyzing data. Additionally, it discusses methods for managing missing data, grouping, merging, and input/output operations with various file formats.

Uploaded by

kalpeshboratkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views31 pages

Attachment 3 Python For Data Analysis Lyst9850

Uploaded by

kalpeshboratkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

SKILLATHON.

PYTHON
FOR
DATA ANALYSIS
© www.skillathon.co
Content

✔ NumPy ✔ Pandas
✔ Introduction ✔ Introduction
✔ Installation ✔ Series
✔ Numpy Arrays ✔ DataFrames
✔ How to create ndarrays? ✔ Missing Data
✔ random() methods ✔ Groupby
✔ Shape of arrays ✔ Aggregate Functions
✔ Reshaping arrays ✔ Merging joining and
✔ Operation on arrays concatenating
✔ Arithmetic ✔ Operations
✔ Broadcasting ✔ Data Input and output

❑ Stands for Numerical Python .

❑ Fundamental package for scientific computing in

python

❑ Incredibly fast , since has binding to C libraries .

❑ Part of the SciPy stack .

❑ Many other libraries rely on numpy as one of their

building blocks .

❑ It’s highly recommended to install anaconda distribution to make sure all underlying
dependencies sync up .

❑ If you have anaconda , install numpy by going to the

terminal or command prompt and start typing :

conda install numpy

❑ If you don’t have anaconda , then type

pip install numpy

❑ Fast built-in n-dimensional array object containing elements of same type .

❑ Dimensions are called axes .

Note

✔ Indexing starts at 0
✔Unlike list , they can be broadcasted .

❑ To start using numpy package , we need to import it.

>>> import numpy as np ### we’re importing numpy as np to reduce the work

❑ numpy arrays can directly be created using np.array() function.

>>> arr1 = np.array([1,2,3]) ###passing a simple list as arguments

>>> arr1
array([1,2,3]) ### returns a 1-d array
>>> arr2 = np.array( [ [1,2,3] , [2,3,4] ] ) ### passing nested list
>>>arr2
array( [ [1, 2, 3],
[2, 3, 4] ] ) ### returns a 2-d array

❑ numpy arrays can be quickly generated using np.arange() function.

np.arange ( start , stop, step)

❑Example:
>>> a = np.arange( 0 , 5) ###generates an array from 0 to 4.
© www.skillathon.co
How to create numpy arrays (continued)

❑ To generate an array of zeroes :

>>> np.zeros(shape)

❑ To generate arrays of ones :

>>> np.ones(shape)

❑ To create an identity matrix of size n*n:

>>> np.eye(n)

❑ To create an array with evenly spaced points :

>>> np.linspace(start, stop, no. of points)

linspace is same as arange but it takes an

additional argument of number of points.

❑ Numpy consists of some functions to generate arrays with random

elements.

np.random.rand(shape) : This function returns random numbers from a uniform

distribution

np.random.randn(shape) : This function generates array of the given size from

gaussian distribution or normal distribution set around zero.

np.random.randint( low , high , size ) : It returns array of given range and size.

Note:
✔In randint() function , lower limit is inclusive and upper limit is exclusive.

❑ To get the shape of an numpy array shape attribute is used.

>>> a = np.array ( [ 7, 2, 9, 10] )

>>> a.shape
( 4, )
>>> b = np.array ( [ [ 2, 4, 6 ] , [ 1, 3, 5 ] ] )
>>> b.shape
( 2, 3)

Note :
✔No brackets ,since it’s not a method but attribute .

❑ Shape of the arrays can be changed.

❑ Using numpy’s reshape() function , the dimensions of the given function can be changed.

❑ Example :
>>> a = np.random.rand( 4,4 )
>>> a.resahpe ( 2, 2, 4)

❑ Numpy provide some functions to perform basic operations on the array.

ndarray.max() : returns the max element in the given array.

>>> a = np.array ( [ 2, 4, 12, 83, 1] )
>>> a.max()
83
ndarray.min() : returns the smallest element in the given array.
>>> a.min()
1
ndarray.argmax() : returns the index of max element.
>>> a.argmax()
3
ndarray.argmin() : returns the index of smallest element.
>>> a.argmin()
4
ndarray.sum() : returns the sum of the given array.
>>> a.sum()
102
© www.skillathon.co
Basic Operations : statistics

❑ We can calculate mean , median or standard deviation using numpy functions directly.

>>> a = np.array([1,2,3,3])
>>> a.mean () ### will return mean of a
2.25
>>> a.median() ### return the median
2.5
>>> a.std() ### standard deviation
0.8291

❑ Many arithmetic operations can be done with numpy arrays.

❑ With scalars :
>>> a = np.array( [1 , 2, 3] )
>>> a + 1 ###adding 1 to each element in the array
[2, 3, 4]
>>> a ** 2 ### squaring all the elements of the array
[1, 4, 9]
❑ With another array :
>>> b = np.ones(3) ###generates this array [ 1, 1, 1]
>>> a + b
[2, 3, 4]
>>> a-b
[0,1,2]
>>> a * b
[1, 2, 3] ###this multiplication is not matrix multiplication,we use np.dot(a,b) for that.

Note: These operations are of course much faster than if you did them in pure python

❑ Comparisons can be done between elements 2 arrays.

>>> a == b ###returns an array of Booleans
[ True, False ,False]
>>> a > b
[False , True , True ]
❑ Comparing 2 arrays.
>>> np.array_equal (a ,b) ### returns a boolean value
False
❑ Logical operations :
>>> a = np.array([1 , 0, 0, 1], dtype=bool)
>>> b = np.array([0 , 1, 0, 1],dtype=bool)
>>> np.logical_or(a , b)
[ True, True, False, True ]
>>> np.logical_and(a, b)
[False, False, False, True]

❑ Broadcasting is useful when we want to do element-wise operations on numpy arrays with different
shape.
❑ It’s possible to do operations on arrays of different sizes if NumPy can transform these arrays so that
they all have the same size: this conversion is called broadcasting.
❑ It does this without making needless copies of data and usually leads to efficient algorithm
implementations.

Note:
✔If both your arrays are two-dimensional, then their corresponding sizes have to be either
equal or one of them has to be 1 .
© www.skillathon.co
Broadcasting : example

❑ One of the richest library in python.

❑ Can be used to analyze and visualize data.

❑ Pandas provide us two high performing new data structures :

Series : 1D labeled vector
DataFrames : 2-D spreadsheet like structure

❑ These data structures are fast since they are made on top of Numpy.

❑ SQL like functionality : GroupBy , joining / merging etc.

❑ Missing data handling

❑ Series is One dimensional object similar to array, list or column in a table.

❑ To each item in the list , an index is assigned .
❑ The index can be integer or string .
❑ By default each item will receive an index label from 0 to n .
❑ Values Can be heterogeneous

❑ Dictionaries can be converted into series.

❑ To grab any value from the given series, it’s index is used.

❑ A DataFrame is a tabular data structure comprised of rows and columns, like a spreadsheet,
database table, or R's dataframe object.
❑ Could be thought of as a bunch of Series objects grouped together to share the same index.

❑ Most commonly used pandas object.

❑ To create a DataFrame, pd.DataFrame() is used.

❑ Like Series, DataFrame accepts many different kinds of input:
Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

Note:
✔ Along with the data, you can optionally pass index (row labels) and columns (column labels)
arguments.
✔ If axis labels are not passed, they will be constructed from the input data based on common
sense rules
© www.skillathon.co
DataFrames : Columns and rows

❑ To select a column in a data frame , we simply write:

dataframe_name [ ‘ Column_name’]
dataframe_name [ [ ‘Column_name_1’ ,‘Column_name_2’]] ###To select multiple columns
❑ To create a new column:
dataframe_name [‘New_column_name’] = [‘ Values’ ]
❑ We can also remove any column from the dataset .
dataframe_name.drop ( ‘Column_name’ , axis , inplace )
Note: we have to specify the axis of that column and whether we want to remove the column
permanently.
❑ To select rows in a dataframe we use loc attribute
dataframe_name.loc[ ‘row_name’]

❑ There maybe many missing data in your datasets.

❑ Pandas provide some functions to deal with the.

df.dropna() : Return object with labels on given axis

omitted where alternately any or all of the
data are missing.

df.fillna() : Fill NA/NaN values using the specified

method.

❑ GroupBy method is used to group together the data based off any row or column .

❑ After grouping them together , aggregate functions can be used on the data for analysis.

❑ There are many aggregate functions available like:

sum()
std()
mean()
min()
max()
describe()

Note: describe() method is the prior to the

rest of them, as it would already print the
max, min, std (standard deviation), count, etc.
out of the numerical columns of the
DataFrame.
© www.skillathon.co
© www.skillathon.co
Merging and Concatenation

❑ Concatenation basically glues together two dataframes who’s dimensions are same.
❑ Pandas provide a function pd.concat( ) to concatenate.
❑ The merge function allows you to merge DataFrames together using a similar logic as merging
SQL Tables together.
Conc
at
Merging

❑ Using pandas we can read and write files of various format like :
.csv()
.json()
.xml()
.html
And many more…
❑ Functions to read a file:
pd.read_csv(‘file_name’)
pd.read_json(‘file_name’)
pd.read_excel(‘file_name’)
❑ Functions to write a file:
pd.to_csv(‘file_name’)
pd.to_excel(‘file_name’)

Data Preprocessing
No ratings yet
Data Preprocessing
159 pages
IPT - AI - 30 Days
No ratings yet
IPT - AI - 30 Days
39 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
63 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
UNIT II - Data Handling Part I
No ratings yet
UNIT II - Data Handling Part I
8 pages
What Is Numpy?: Aim: Study Python Libraries: Numpy, Pandas, Matplotlib, Scikitlearn With Student Dataset
No ratings yet
What Is Numpy?: Aim: Study Python Libraries: Numpy, Pandas, Matplotlib, Scikitlearn With Student Dataset
18 pages
Ch2 Numpy Pandas
No ratings yet
Ch2 Numpy Pandas
87 pages
Unit 3 - Numpy - VP
No ratings yet
Unit 3 - Numpy - VP
53 pages
Int254 Unit 2
No ratings yet
Int254 Unit 2
33 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Machine Learning Using Phython
No ratings yet
Machine Learning Using Phython
25 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
NumPy and Pandas: Essential Python Libraries
No ratings yet
NumPy and Pandas: Essential Python Libraries
72 pages
RAW Data
No ratings yet
RAW Data
22 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Numpy Data Analysis and Visualisation With Python
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
Ch-2 Python Libraries For ML
No ratings yet
Ch-2 Python Libraries For ML
70 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
34 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
ML Sample Programs
No ratings yet
ML Sample Programs
7 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
NUMPY
No ratings yet
NUMPY
33 pages
PPS - Unit 5 (Imp Topics)
No ratings yet
PPS - Unit 5 (Imp Topics)
7 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
62 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Week 4 - Introduction To Python #3
No ratings yet
Week 4 - Introduction To Python #3
47 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Removing Active Directory Rights Management Services Step-By-Step Guide
No ratings yet
Removing Active Directory Rights Management Services Step-By-Step Guide
9 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
Unit 5
No ratings yet
Unit 5
28 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Python NumPy for Beginners
100% (1)
Python NumPy for Beginners
84 pages
Report
No ratings yet
Report
18 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Comprehensive NumPy Guide for Python
No ratings yet
Comprehensive NumPy Guide for Python
30 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
PyDays Day-2 - Final
No ratings yet
PyDays Day-2 - Final
26 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Social Media Management System Project Report
No ratings yet
Social Media Management System Project Report
92 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Industrial Training Report
No ratings yet
Industrial Training Report
17 pages
Packages
No ratings yet
Packages
37 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Py PPT 06
No ratings yet
Py PPT 06
33 pages
Python Data Science Packages Guide
No ratings yet
Python Data Science Packages Guide
11 pages
Python NumPy for Developers
No ratings yet
Python NumPy for Developers
43 pages
Oracle Cash Management Guide
No ratings yet
Oracle Cash Management Guide
13 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
S D Letters
No ratings yet
S D Letters
3 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
How To Make Custom Shops in Elden Ring - Introduction To Talk Menus
No ratings yet
How To Make Custom Shops in Elden Ring - Introduction To Talk Menus
35 pages
Introduction To Programming Arc Objects With VBA
100% (1)
Introduction To Programming Arc Objects With VBA
408 pages
【Zybio】 Guía de funcionamiento del sistema de gestión de datos del analizador de hematología V1.0 - 20191105
No ratings yet
【Zybio】 Guía de funcionamiento del sistema de gestión de datos del analizador de hematología V1.0 - 20191105
15 pages
Sankara Subramanian-Resume
No ratings yet
Sankara Subramanian-Resume
7 pages
Vendor Master Data Management
100% (1)
Vendor Master Data Management
34 pages
InVision Design Sprints Guidebook
No ratings yet
InVision Design Sprints Guidebook
28 pages
Log
No ratings yet
Log
390 pages
How To Execute Field Extension of ACDOCU - 1909 - V6
No ratings yet
How To Execute Field Extension of ACDOCU - 1909 - V6
21 pages
Process Synchronization Basics
No ratings yet
Process Synchronization Basics
58 pages
Alfresco Sample Certification Exam
No ratings yet
Alfresco Sample Certification Exam
62 pages
New Template-JEKK (Jurnal)
No ratings yet
New Template-JEKK (Jurnal)
4 pages
Excel Basics for Beginners
No ratings yet
Excel Basics for Beginners
6 pages
TriBuild 1.41: Advanced Diagnostics
No ratings yet
TriBuild 1.41: Advanced Diagnostics
2 pages
Bos Cse (Ai&ml) - 1-05-25
No ratings yet
Bos Cse (Ai&ml) - 1-05-25
35 pages
IoT Course Overview for Students
No ratings yet
IoT Course Overview for Students
24 pages
Share
No ratings yet
Share
9 pages
GPT4All Technical Report 3
No ratings yet
GPT4All Technical Report 3
4 pages
Virtual I/O Architecture and Performance: Li Ming Jun
No ratings yet
Virtual I/O Architecture and Performance: Li Ming Jun
41 pages
Project Report On Aaj
No ratings yet
Project Report On Aaj
57 pages
APCS Java Syllabus 2020
No ratings yet
APCS Java Syllabus 2020
5 pages
Chapter 3
No ratings yet
Chapter 3
55 pages
Teachnical Propsal For BODY WORN CAMERA SOLUTION
No ratings yet
Teachnical Propsal For BODY WORN CAMERA SOLUTION
12 pages
02 ML
100% (2)
02 ML
23 pages
Installation or Run AstroHora File
No ratings yet
Installation or Run AstroHora File
5 pages
OOPs Coding Problems
No ratings yet
OOPs Coding Problems
4 pages

Attachment 3 Python For Data Analysis Lyst9850

Uploaded by

Attachment 3 Python For Data Analysis Lyst9850

Uploaded by

SKILLATHON.

❑ Stands for Numerical Python .

❑ Fundamental package for scientific computing in

❑ Incredibly fast , since has binding to C libraries .

❑ Part of the SciPy stack .

❑ Many other libraries rely on numpy as one of their

❑ If you have anaconda , install numpy by going to the

conda install numpy

❑ If you don’t have anaconda , then type

pip install numpy

❑ Fast built-in n-dimensional array object containing elements of same type .

❑ Dimensions are called axes .

❑ To start using numpy package , we need to import it.

❑ numpy arrays can directly be created using np.array() function.

>>> arr1 = np.array([1,2,3]) ###passing a simple list as arguments

❑ numpy arrays can be quickly generated using np.arange() function.

❑ To generate an array of zeroes :

❑ To generate arrays of ones :

❑ To create an identity matrix of size n*n:

❑ To create an array with evenly spaced points :

>>> np.linspace(start, stop, no. of points)

linspace is same as arange but it takes an

❑ Numpy consists of some functions to generate arrays with random

np.random.rand(shape) : This function returns random numbers from a uniform

np.random.randn(shape) : This function generates array of the given size from

❑ To get the shape of an numpy array shape attribute is used.

>>> a = np.array ( [ 7, 2, 9, 10] )

❑ Shape of the arrays can be changed.

❑ Numpy provide some functions to perform basic operations on the array.

ndarray.max() : returns the max element in the given array.

❑ Many arithmetic operations can be done with numpy arrays.

❑ Comparisons can be done between elements 2 arrays.

❑ One of the richest library in python.

❑ Can be used to analyze and visualize data.

❑ Pandas provide us two high performing new data structures :

❑ SQL like functionality : GroupBy , joining / merging etc.

❑ Missing data handling

❑ Series is One dimensional object similar to array, list or column in a table.

❑ Dictionaries can be converted into series.

❑ Most commonly used pandas object.

❑ To create a DataFrame, pd.DataFrame() is used.

❑ To select a column in a data frame , we simply write:

❑ There maybe many missing data in your datasets.

❑ Pandas provide some functions to deal with the.

df.dropna() : Return object with labels on given axis

df.fillna() : Fill NA/NaN values using the specified

❑ There are many aggregate functions available like:

Note: describe() method is the prior to the

You might also like