Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views31 pages

Lect-07 and 08, Week-02

The document provides an overview of NumPy, a Python library for numerical computing, highlighting its advantages over traditional Python lists, such as speed and efficiency in handling arrays. It covers various aspects of NumPy, including array creation, indexing, slicing, data types, and basic operations. Additionally, it briefly introduces pandas, another Python library for data manipulation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Lect-07 and 08, Week-02

The document provides an overview of NumPy, a Python library for numerical computing, highlighting its advantages over traditional Python lists, such as speed and efficiency in handling arrays. It covers various aspects of NumPy, including array creation, indexing, slicing, data types, and basic operations. Additionally, it briefly introduces pandas, another Python library for data manipulation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DATA SCIENCE

Week # 02
Day # 01
Lecture # 07 & 08
Class: Summer’25
Your Facilitator, Adil Khan
NUMPY
NUMPY INTRODUCTION

 NumPy is a Python library.


 It is used for working with arrays.
 It is short for "Numerical Python".
 It also has functions for working in domain of linear algebra, Fourier
transform, and matrices.
 It was created in 2005 by Travis Oliphant. It is an open source project and you
can use it freely.
WHY USE NUMPY?

 In Python we have lists that serve the purpose of arrays, but they are
slow to process.
 NumPy aims to provide an array object that is up to 50x faster than
traditional Python lists.
 The array object in NumPy is called ndarray, it provides a lot of
supporting functions that make working with ndarray very easy.
 Arrays are very frequently used in data science, where speed and
resources are very important.
 Data Science: is a branch of computer science where we study how to
store, use and analyse data for deriving information from it.
WHY IS NUMPY FASTER THAN LISTS?

 NumPy arrays are stored at one continuous place in memory unlike


lists, so processes can access and manipulate them very efficiently.
 This behaviour is called locality of reference in computer science. This
is the main reason why NumPy is faster than lists. Also, it is optimized
to work with latest CPU architectures.
 NumPy is a Python library and is written partially in Python, but most
of the parts that require fast computation are written in C or C++.
STARTING WITH NUMPY

 If you have Python and PIP already installed on a system, Example


then installation of NumPy is very easy. import numpy
 C:\Users\Your Name>pip install numpy
arr = numpy.array([1, 2, 3, 4, 5])
 Python distribution that already has NumPy installed are,
Anaconda, Spyder etc
print(arr)
 Once NumPy is installed, import it in your applications by
adding the import keyword: Example
 NumPy is usually imported under the np alias. import numpy as np

 alias: In Python alias are an alternate name for referring to arr = np.array([1, 2, 3, 4, 5])
the same thing.
 Create an alias with the as keyword while importing: print(arr)
NUMPY CREATING ARRAYS
Example
import numpy as np
 We can create a NumPy ndarray object by using the
array() function
arr = np.array([1, 2, 3, 4, 5])
 To create an ndarray, we can pass a list, tuple or any
array-like object into the array() method, and it will be print(arr)
converted into an ndarray:
print(type(arr))

Example
import numpy as np

arr = np.array((1, 2, 3, 4, 5))


f
print(arr)
Example 2D arrays
import numpy as np

2D & 3D ARRAYS arr = np.array([[1, 2, 3], [4, 5, 6]])


print(arr)

Example 3D arrays
 An array that has 1-D arrays as its elements is import numpy as np
called a 2-D array.
arr = np.array([[[1, 2, 3], [4, 5, 6]],
 These are often used to represent matrix or 2nd [[1, 2, 3], [4, 5, 6]]])
order tensors. print(arr)

 NumPy has a whole sub module dedicated towards Example


import numpy as np
matrix operations called numpy.mat
a = np.array(42)
 An array that has 2-D arrays (matrices) as its b = np.array([1, 2, 3, 4, 5])
elements is called 3-D array. c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3],
 These are often used to represent a 3rd order [4, 5, 6]]])
f
tensor.
print(a.ndim)
 NumPy Arrays provides the ndim attribute that print(b.ndim)
returns an integer that tells us how many print(c.ndim)
dimensions the array have. print(d.ndim)
NUMPY ARRAY INDEXING
Example
import numpy as np
 Array indexing is the same as accessing an array arr = np.array([1, 2, 3, 4])
element.
print(arr[0])

 You can access an array element by referring to its Example


index number. import numpy as np

arr = np.array([1, 2, 3, 4])


 The indexes in NumPy arrays start with 0, meaning print(arr[1])
that the first element has index 0, and the second has
index 1 etc. Example
f
import numpy as np

arr = np.array([1, 2, 3, 4])

print(arr[2] + arr[3])
NUMPY ARRAY INDEXING

 To access elements from 2-D arrays, we can use Example


comma separated integers representing the import numpy as np
dimension and the index of the element.
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1])


 Think of 2-D arrays like a table with rows and Example
columns, where the dimension represents the row import numpy as np
and the index represents the column.
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('5th element on 2nd row: ', arr[1, 4])


Example
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
NUMPY ARRAY SLICING print(arr[1:5])

Example
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
 Slicing in python means taking elements from print(arr[4:])
one given index to another given index.
 We pass slice instead of index like this: Example
import numpy as np
[start:end].
arr = np.array([1, 2, 3, 4, 5, 6, 7])
 We can also define the step, like this: print(arr[:4])
[start:end:step].
Example
 If we don't pass start it considered 0 import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
 If we don't pass end its considered length of print(arr[1:5:2])
f
array in that dimension
Example
 If we don't pass step, it considered 1 import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[::2])
NUMPY DATA TYPES

 By default, Python have these data types:

 strings - used to represent text data, the text is given under quote marks. e.g. "ABCD"
 integer - used to represent integer numbers. e.g. -1, -2, -3
 float - used to represent real numbers. e.g. 1.2, 42.42
 boolean - used to represent True or False.
 complex - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j
NUMPY DATA TYPES
 NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc.
 Below is a list of all data types in NumPy and the characters used to represent them.
 i - integer
 b - boolean
 u - unsigned integer
 f - float
 c - complex float
 m - timedelta
 M - datetime
 O - object
 S - string
 U - unicode string
 V - fixed chunk of memory for other type ( void )
CREATING ARRAYS WITH A DEFINED DATA
TYPE
Example
import numpy as np
 We use the array() function to create arrays;
this function can take an optional arr = np.array([1, 2, 3, 4], dtype='S')
argument: dtype that allows us to define
print(arr)
the expected data type of the array print(arr.dtype)
elements:
Example
import numpy as np

arr = np.array([1, 2, 3, 4], dtype='i4')


f
print(arr)
print(arr.dtype)
Example: Make a copy, change the original array, and
display both arrays:
NUMPY ARRAY COPY import numpy as np
arr = np.array([1, 2, 3, 4, 5])
VS VIEW x = arr.copy()
arr[0] = 42
print(arr)
 The main difference between a copy and a view of an print(x)
array is that the copy is a new array, and the view is
Example: Make a view, change the original array, and
just a view of the original array.
display both arrays:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
 The copy owns the data, and any changes made to x = arr.view()
the copy will not affect original array, and any arr[0] = 42
changes made to the original array will not affect the print(arr)
print(x)
copy.
Example: Make a view, change the view, and display
both arrays:
 The view does not own the data, and any changes import numpy as np
f
made to the view will affect the original array, and arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
any changes made to the original array will affect the
x[0] = 31
view print(arr)
print(x)
NUMPY ARRAY SHAPE
Example: Print the shape of a 2-D array:
import numpy as np

 The shape of an array is the number of arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
elements in each dimension.
print(arr.shape)
 NumPy arrays have an attribute called
shape that returns a tuple with each index Example
Print the shape of a 2-D array:
having the number of corresponding import numpy as np
elements.
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)
f
Example: Iterate on the elements of the following 1-D array:
NUMPY ARRAY import numpy as np

ITERATING arr = np.array([1, 2, 3])

for x in arr:
print(x)
 Iterating means going through elements one by
one. Example:Iterate on the elements of the following 2-D array:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])


 As we deal with multi-dimensional arrays in numpy,
we can do this using basic for loop of python. for x in arr:
print(x)
 If we iterate on a 1-D array it will go through each
element one by one. Example: Iterate on each scalar element of the 2-D array:
import numpy as np
 Arrays Join and Splitting is your homework!
arr = np.array([[1, 2, 3], [4, 5, 6]])
f
for x in arr:
for y in x:
print(y)
Example: Find the indexes where the value is 4:
NUMPY SEARCHING import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
ARRAYS x = np.where(arr == 4)
print(x)

Example Find the indexes where the values are even:


 You can search an array for a certain value and import numpy as np
return the indexes that get a match. arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
print(x)

 There is a method called searchsorted() which Example: Find the indexes where the value 7 should be
performs a binary search in the array and returns inserted:
the index where the specified value would be import numpy as np
inserted to maintain the search order. arr = np.array([6, 7, 8, 9])
x = np.searchsorted(arr, 7)
 The searchsorted() method is assumed to be used on print(x)
sorted arrays.
Example: Find the indexes where the values 2, 4, and
 To search for more than one value, use an array with 6 should be inserted:
f
the specified values. import numpy as np
arr = np.array([1, 3, 5, 7])
x = np.searchsorted(arr, [2, 4, 6])
print(x)
NUMPY SORTING Example
Sort the array:
ARRAYS import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))

 Sorting means putting elements in an ordered sequence. Example


Sort the array alphabetically:
import numpy as np
arr = np.array(['banana', 'cherry', 'apple'])
 Ordered sequence is any sequence that has an order print(np.sort(arr))
corresponding to elements, like numeric or alphabetical, Example
ascending or descending. Sort a boolean array:
import numpy as np
arr = np.array([True, False, True])
print(np.sort(arr))
 The NumPy ndarray object has a function called sort(),
Example
that will sort a specified array. Find the indexes where the values 2, 4, and 6 should be
inserted:
 You can also sort arrays of strings, or any other data type: import numpyf as np
arr = np.array([1, 3, 5, 7])
 If you use the sort() method on a 2-D array, both arrays x = np.searchsorted(arr, [2, 4, 6])
will be sorted: print(x)

 Array filtering is your homework!


PANDAS
PANDAS INTRODUCTION

 pandas is a Python library used for working with data sets.


 It has functions for analysing, cleaning, exploring, and manipulating
data.
 The name “pandas" has a reference to both "Panel Data", and "Python
Data Analysis" and was created by Wes McKinney in 2008.
 Pandas allows us to analyse big data and make conclusions based on
statistical theories.
 Pandas can clean messy data sets and make them readable and
relevant.
STARTING WITH PANDAS

 If you have Python and PIP already installed on a Example


system, then installation of pandas is very easy.
 C:\Users\Your Name>pip install pandas import pandas as pd
 Python distribution that already has pandas
mydataset = {
installed are, Anaconda, Spyder etc
'cars': ["BMW", "Volvo", "Ford"],
 Once pandas is installed, import it in your 'passings': [3, 7, 2]
applications by adding the import keyword: }
 pandas is usually imported under the pd alias.
myvar = pd.DataFrame(mydataset)

print(myvar)
Example: Create a simple Pandas Series from a list:
PANDAS SERIES
import pandas as pd
 A Pandas Series is like a column in a table. a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

 It is a one-dimensional array holding data of any Example: Return the first value of the Series:
type.
print(myvar[0])
 If nothing else is specified, the values are labelled
with their index number. First value has index 0, Example: Create your own labels:
second value has index 1 etc.
import pandas as pd
 This label can be used to access a specified value. a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
 With the index argument, you can name your own print(myvar)
labels.
Example: When f you have created labels, you can
access an item by referring to the label.
Return the value of "y":

print(myvar["y"])
Example
Create a simple Pandas Series from a dictionary:

PANDAS SERIES import pandas as pd

calories =
{"day1": 420, "day2": 380, "day3": 390}
 You can also use a key/value object, like a dictionary,
myvar = pd.Series(calories)
when creating a Series.
 Note: The keys of the dictionary become the labels. print(myvar)

Example
 To select only some of the items in the dictionary, Create a Series using only data from "day1" and
use the index argument and specify only the items "day2":
you want to include in the Series. import pandas as pd

calories =
{"day1": 420, "day2": 380, "day3": 390}
f
myvar = pd.Series(calories, index =
["day1", "day2"])

print(myvar)
PANDAS SERIES

Example
 Data sets in Pandas are usually multi- Create a DataFrame from two Series:
dimensional tables, called DataFrames.
import pandas as pd

 Series is like a column, a DataFrame is the data = {


whole table. "calories": [420, 380, 390],
"duration": [50, 40, 45]
}

myvar = pd.DataFrame(data)
f
print(myvar)
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
PANDAS DATAFRAMES "calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
 A Pandas DataFrame is a 2-dimensional data Result
structure, like a 2-dimensional array, or a table calories duration
0 420 50
with rows and columns.. 1 380 40
2 390 45
Example
Return row 0:
 As you can see from the result above, the #refer to the row index:
DataFrame is like a table with rows and columns. print(df.loc[0])
Result
calories 420
duration 50
Name: 0, dtype: int64
 Pandas use the loc attribute to return one or more Example
specified row(s) Return row 0 and 1:
f
#use a list of indexes:
print(df.loc[[0, 1]])
Result
calories duration
0 420 50
1 380 40
Example
Add a list of names to give each row a name:
import pandas as pd

PANDAS DATAFRAMES data = {


"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index =
["day1", "day2", "day3"])
 With the index argument, you can name your own print(df)
indexes. Result
calories duration
 Use the named index in the loc attribute to return day1
day2
420
380
50
40
the specified row(s). day3 390 45

 If your data sets are stored in a file, Pandas can Example


load them into a DataFrame. Return "day2":
#refer to the named index:
print(df.loc["day2"])
Result
calories 380
duration 40
Name: day2,f dtype: int64

Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
PANDAS READ CSV
 A simple way to store big data sets is to use CSV files
Example: Load the CSV into a DataFrame
(comma separated files).
 CSV files contains plain text and is a well know format import pandas as pd
that can be read by everyone including Pandas.
df = pd.read_csv('data.csv')
 In our examples we will be using a CSV file called
'data.csv'. print(df.to_string())
 Download data.csv. or Open data.csv from Data Science
Example: Print the DataFrame without the
course on elearning.
to_string() method:
 Tip: use to_string() to print the entire DataFrame
import pandas as pd
 If you have a large DataFrame with many rows, Pandas
will only return the first 5 rows, and the last 5 rows: df = pd.read_csv('data.csv')

print(df)
PANDAS READ CSV
Example
Check the number of maximum returned rows:
 max_rows: The number of rows returned is defined
import pandas as pd
in Pandas option settings.
print(pd.options.display.max_rows)

 You can check your system's maximum rows with Example


the pd.options.display.max_rows statement. Increase the maximum number of rows to
 In my system the number is 60, which means that if display the entire DataFrame:
the DataFrame contains more than 60 rows, the
print(df) statement will return only the headers and import pandas as pd
the first and last 5 rows.
pd.options.display.max_rows = 9999
f
df = pd.read_csv('data.csv')
 You can change the maximum rows number with
the same statement. print(df)
PANDAS - ANALYSING DATAFRAMES
Example: Get a quick overview by printing the first
 One of the most used method for getting a quick 10 rows of the DataFrame:
overview of the DataFrame, is the head() method.
import pandas as pd

df = pd.read_csv('data.csv')
 The head() method returns the headers and a
specified number of rows, starting from the top. print(df.head(10))

 There is also a tail() method for viewing the last Example: Print the first 5 rows of the DataFrame:
rows of the DataFrame.
import pandas as pd

df = pd.read_csv('data.csv')
f
 The tail() method returns the headers and a specified
number of rows, starting from the bottom. print(df.head()) Example

Print the last 5 rows of the DataFrame:


print(df.tail())
SUMMARY

 In this lecture we learned following concepts:


 NumPy library
 Detailed Introduction to pandas library
 Pandas series
 Pandas DataFrame

You might also like