Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views59 pages

Complete Notes of Python

The document provides an overview of Python programming, covering topics such as variables, data types, arithmetic and string operations, data structures (strings, tuples, lists, sets, and dictionaries), and control structures. It includes examples and explanations of key concepts, including type casting, string formatting, and conditional statements. Additionally, it discusses logical operators and provides sample programs to illustrate decision-making and control flow in Python.

Uploaded by

himanshii7802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views59 pages

Complete Notes of Python

The document provides an overview of Python programming, covering topics such as variables, data types, arithmetic and string operations, data structures (strings, tuples, lists, sets, and dictionaries), and control structures. It includes examples and explanations of key concepts, including type casting, string formatting, and conditional statements. Additionally, it discusses logical operators and provides sample programs to illustrate decision-making and control flow in Python.

Uploaded by

himanshii7802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Basics of Python

1. Introduction to Python

Python is a high-level, interpreted programming language that is easy to read and write. It is widely used for web development, data analysis, artificial
intelligence, and automation.

2. Variables in Python

• Variables are used to store data values.

• Python does not require explicit declaration of variables.

• Example:

x = 10

y = "Hello, Python!"

3. Finding Data Type of Variables

• You can find the data type using the type() function.

• Syntax: type(variable_name)

• Example:

x = 10

print(type(x)) # Output: <class 'int'>

4. Type Casting in Python

• Type casting is used to convert one data type to another.


• Functions used: int(), float(), str(), bool(), etc.

• Example:

x = 5.7

y = int(x) # Converts float to int

print(y) # Output: 5

5. Arithmetic Operations in Python

• Python supports common arithmetic operators like +, -, *, /, //, %, and **.

• Example:

a = 15

b=4

print(a + b) # Addition

print(a - b) # Subtraction

print(a * b) # Multiplication

print(a / b) # Division

print(a // b) # Floor Division

print(a % b) # Modulus

print(a ** b) # Exponentiation

6. String Operations in Python

• Strings are sequences of characters enclosed within quotes (' ' or " ").

• Operations like concatenation, indexing, and slicing are commonly used.


7. String Concatenation

• You can join strings using the + operator.

• Example:

str1 = "Hello"

str2 = "World"

print(str1 + " " + str2) # Output: Hello World

8. Indexing in Strings

• String indexing is used to access specific characters using their positions.

• Index starts from 0.

• Example:

text = "Python"

print(text[0]) # Output: P

print(text[-1]) # Output: n (Negative Indexing)

Reverse Indexing in Python

• In Python, reverse indexing allows you to access characters from the end of a string using negative indices.

• The last character has an index of -1, the second last character has an index of -2, and so on.

Syntax:

string_name[-index]

• string_name → The name of the string.


• -index → The negative index representing the position from the end.

Example:

text = "Python"

print(text[-1]) # Output: n (Last character)

print(text[-2]) # Output: o (Second last character)

print(text[-6]) # Output: P (First character)

Using Reverse Indexing in Slicing:

You can also apply reverse indexing for slicing:

text = "Programming"

print(text[-7:-1]) # Output: "rammin" (from -7 to -1, excluding -1)

print(text[-9:]) # Output: "ogramming" (from -9 to end)

print(text[::-1]) # Output: "gnimmargorP" (Reverses the string)

• [::-1] → Reverses the entire string using a negative step.

9. Slicing in Strings

• Slicing extracts a portion of a string using start:end:step.

• Example:

text = "Programming"
print(text[0:6]) # Output: Progra

print(text[3:]) # Output: gramming

print(text[:5]) # Output: Progr

print(text[::2]) # Output: Pormig (Every 2nd character)

10. String Methods in Python (link)

• Python provides built-in string methods to manipulate strings.

• Common methods:

o lower() - Converts to lowercase

o upper() - Converts to uppercase

o strip() - Removes spaces

o replace() - Replaces a substring

o find() - Finds a substring

• Example:
String Formatting in Python

String formatting allows you to create well-structured and readable output by inserting variables into strings. Python provides multiple ways to format
strings:

1. Using format() Method

• The format() method lets you insert values into placeholders {} within a string.

• You can use positional or keyword arguments.

Syntax:

python

CopyEdit

"string {}".format(value)
#Now there are a plenty of other string operations available in Python you could explore those using -
help(str)
#Ignore the methods with a double underscore, for now, those are called the magic methods and we have seen enough magic for the day to be a Python Wiz
ard!!

DATA STRUCTURES IN PYTHON


1. Strings
Definition: A string is a sequence of characters enclosed within single quotes ' ', double quotes " ", or triple quotes ''' '''.

Example:

name = "John Doe"

print(type(name)) # Output: <class 'str'>


Common String Functions:

• len() - Returns the length of the string.

• upper() - Converts the string to uppercase.

• lower() - Converts the string to lowercase.

• strip() - Removes leading and trailing whitespaces.

• replace(old, new) - Replaces a substring with another.

• split() - Splits a string into a list based on a delimiter.

2. Tuples
Definition: A tuple is an immutable, ordered sequence of elements enclosed within parentheses ().

Example:

t = ("Apple", 10, 5.5)

print(type(t)) # Output: <class 'tuple'>

Common Operations with Tuples:

• Indexing: t[1] → Returns element at index 1.

• Slicing: t[0:2] → Returns a slice of the tuple.

• Concatenation: t1 + t2 → Combines two tuples.

• Length: len(t) → Returns number of elements.

Note: Tuples are immutable, meaning elements cannot be changed.

A tuple can be defined without using parenthesis


Single value tuple

This is not a tuple

Indexing in tuples Slicing


Concatenating tuples: More functions: [ sum() - min() - max() ]

Immutability of tuples

Sorting a tuple

Nested tuples
dir() - to view the attributes or methods of an object

t = ()
print(dir(t))

3. Lists
Definition: A list is a mutable, ordered collection of elements enclosed within square brackets [].

Example:

L = ["Apple", 20, 3.14]

print(type(L)) # Output: <class 'list'>

Common List Methods:

• append(x) - Adds an element to the end of the list.

• extend(iterable) - Extends the list by appending elements from another iterable.

• pop(index) - Removes and returns an element at a specific index.

• remove(x) - Removes the first occurrence of an element.

• sort() - Sorts the list in ascending order.

• reverse() - Reverses the list in place.


Nested list:

Indexing:

Membership in lists:

List concatenation:

extend() function:

append()
del Command

pop()

remove()

Sorting Lists

Difference between sort and sorted


Shadow Copying

A = ["Orange", "Strawberry", "Mango"]


B = A[:] #Note 'A[:]' is used to call all the values stored in A
#Now B has all the values of A but isn't a shadow copy of A
#Shadow copy is basically assigning multiple labels to a single reference point or memory location
A[0] = "Apple"

NOTE:

Now that we know how to store multiple items together let us build another app that could store userid of our customers, which were randomly generated.

We have the following requirements from the app -

1. It shouldn't store duplicate values in the data-set

2. Since the userid are randomly generated the order doesn't matter

3. We need mutability i.e. if we want to delete a particular value we should be able to do so

4. We might frequently want to check whether a user-id is part of the existing data set so it should be able to perform this operation fast

For this use case, we could either use a unique function and maintain our list, but that would be comparatively time-consuming, so let's look at another data
type provided by Python, which would be the best fit here.

4. Sets

Definition: A set is an unordered, mutable collection of unique elements enclosed within curly braces {}.

• Sets are a type of collection like lists and tuples, storing mixed data.

• Sets are enclosed within curly brackets and elements are written as comma-separated.

• Sets are unordered


• Sets do not allow duplicates

Example:

s = {1, 2, 3, 4, 5}

print(type(s)) # Output: <class 'set'>

Common Set Operations:

• add(x) - Adds an element to the set.

• remove(x) - Removes an element from the set.

• union(set) - Returns a new set containing all unique elements.

• intersection(set) - Returns common elements.

• difference(set) - Returns elements present only in the first set.

• symmetric_difference(set) - Returns elements that are in either of the sets, but not both.

add():
remove():

# Set Operations

A = {0, 2, 4, 6, 8}
B = {1, 2, 3, 4, 5}

5. Dictionaries
Definition: A dictionary is an unordered collection of key-value pairs enclosed within curly braces {}.

• A dictionary stores element as keys and values pairs.

• The key is like an index, it's is always unique and immutable.

• The values are the objects that contain information.

• Values are accessed using their keys.

• Each key is followed by a value separated by a colon.


• The values can be immutable, mutable, and duplicates.

• Each key and value pair is separated by a comma enclosed inside curly brackets.

Example:

d = {"Name": "John", "Age": 25, "City": "New York"}

print(type(d)) # Output: <class 'dict'>

Common Dictionary Methods:

• keys() - Returns a view of the dictionary's keys.

• values() - Returns a view of the dictionary's values.

• items() - Returns a view of the dictionary's key-value pairs.

• get(key) - Returns the value associated with the key.

• update(dict) - Updates the dictionary with another dictionary.

• pop(key) - Removes the specified key-value pair and returns its value.

• del d[key] - Deletes a key-value pair by key.

Creating a dictionary

Replace the value for a key in a dictionary


# Insert new key value pair into a dictionary

# Deleting a key value pair

# Sorting a dictionary

# values() Method

# Keys() Method
# Update() Method

Example Application: Customer Management

from random import randint

# Collect customer details

name = input("What's your name? ")

number = input("What's your number? ")

# Generate a unique user ID

user_id = randint(100, 999)

# Store details in a tuple for immutability

customer_data = (name, number, user_id)


print(f"Customer Details: {customer_data}")

This program ensures the generated user_id is immutable by storing it in a tuple, maintaining data security.

Control Structures in Python


Control structures in Python are blocks of code that determine the flow of execution based on conditions and logic. They control the order in which
statements are executed.

If statements

"IF" statements are imperative in Python and they help us build programs that could make decisions based on a specified condition

• If I am tired, I'll go to bed

• If I am hungry, I'll order food

Notice all these applications start with the word 'IF' and that is the first way we are going to control our applications.

Relational Operators
Relational operators are used to compare values and return a boolean value (True or False).

Operator Description Example Result

== Equal to 10 == 10 True

!= Not equal to 10 != 5 True

> Greater than 10 > 5 True

< Less than 10 < 5 False

>= Greater than or equal to 10 >= 5 True


Operator Description Example Result

<= Less than or equal to 10 <= 5 False

Step 2: Difference Between = and ==

• = (Assignment Operator)
Assigns a value to a variable.

• x = 10 # Assigns 10 to x

• == (Equality Operator)
Compares two values and returns True or False.

• print(10 == 10) # True

• print(10 == 5) # False

Step 3: Decision-Making with if Statements

To write a conditional statement, use if followed by an expression.

Example 1: Basic if Statement

age = 20

if age >= 18:

print("You are eligible to vote.")

If age is 18 or more, it prints "You are eligible to vote."


Step 4: Adding an else Statement

The else block runs when the if condition is False.

age = 16

if age >= 18:

print("You are eligible to vote.")

else:

print("You are not eligible to vote.")

If age is less than 18, it prints "You are not eligible to vote."

Step 5: Using elif for Multiple Conditions

The elif statement is used to check multiple conditions.

marks = 85

if marks >= 90:

print("Grade: A")

elif marks >= 75:

print("Grade: B")

else:

print("Grade: C")

If marks is:
• 90 or more → "Grade: A"

• 75 or more → "Grade: B"

• Otherwise → "Grade: C"

Step 6: Using Relational Operators in if Conditions

Relational operators can be used in conditional statements.

temperature = 30

if temperature > 25:

print("It's a hot day.")

else:

print("The weather is cool.")

If temperature is more than 25, it prints "It's a hot day.", otherwise, "The weather is cool."

Decision-Making, and Loops in Python


Step 1: Logical Operators

Logical operators are used to combine multiple conditions in a single statement. They return a boolean value (True or False).

Operator Description Example Result

and Returns True if both conditions are True 10 > 5 and 8 > 3 True

or Returns True if at least one condition is True 10 > 5 or 8 < 3 True


Operator Description Example Result

not Reverses the result not(10 > 5) False

Step 2: Program to Check Age for Party Entry Using Logical Operators

Problem:
Write a program to check if a visitor can attend an exclusive children's day party hosted by Mr. Obama, only if the visitor is above 60 years or below 18 years.

Solution:

1. Take input from the user for their age.

2. Use a logical or operator to check if the age is greater than or equal to 60 or less than or equal to 18.

3. Print appropriate messages.

x = int(input("Enter your age: "))

if x <= 18 or x >= 60:

print("Welcome to the Party!!")

else:

print("Sorry!! You do not fit in the age criteria.")

Step 3: Program to Offer Discounts Based on Purchase Amount

Problem:
Write a program that offers different discounts based on the shopping bill amount.
Solution:

1. Take the shopping total as input.

2. Use conditional statements (if-elif-else) to apply discounts:

o Bill ≥ 500 → Flat ₹1000 discount voucher

o Bill ≥ 250 → Flat ₹500 discount voucher

o Bill ≥ 100 → Flat ₹100 discount voucher

3. Print the respective message.

shopping_total = 550

if shopping_total >= 500:

print("You won a discount voucher of flat 1000 on the next purchase.")

elif shopping_total >= 250:

print("You won a discount voucher of flat 500 on the next purchase.")

elif shopping_total >= 100:

print("You won a discount voucher of flat 100 on the next purchase.")

else:

print("OOPS!! No discount for you!")

Step 4: Nested if-else Example — Cricket World Cup Finals


Problem:
Write a program to check whether New Zealand made it to the Cricket World Cup finals in the 20th century.

Solution:

1. Use a dictionary to store the World Cup finalists for different years.

2. Take the year as input from the user.

3. Use nested if-else statements to check:

o If the year exists in the dictionary.

o If "New Zealand" was a finalist.

4. Print appropriate messages.

world_cups = {

2019: ['England', 'New Zealand'],

2015: ["Australia", "New Zealand"],

2011: ["India", "Sri Lanka"],

2007: ["Australia", "Sri Lanka"],

2003: ["Australia", "India"]

year = int(input("Enter a year to check if New Zealand made it to the finals: "))

if year in world_cups:
if "New Zealand" in world_cups[year]:

print("New Zealand made it to the finals.")

else:

print("New Zealand did not make it to the finals.")

else:

print(f"World Cup wasn't played in {year}.")

Loops and Iterations


1. While Loop
• A while loop repeats a block of code as long as a specified condition is True.

• It is often used when the number of iterations is not known in advance.

While Loop Example — PIN Checker

Problem:
Create a simple PIN validation program using a while loop.

• If the PIN is correct, print "Pin validation successful."

• Allow a maximum of 3 attempts using a counter.

import sys

pin = input("Enter your four-digit PIN: ")

attempt_count = 1
while pin != '1234':

if attempt_count >= 3:

sys.exit("Too many invalid attempts.")

pin = input("Invalid input, please try again: ")

attempt_count += 1

print("Pin validation successful.")

2. For Loop

• A for loop is used to iterate over a sequence (list, tuple, dictionary, set, or string).

• It is commonly used when the number of iterations is known in advance.

For Loop Examples

1. Iterate Over a List of Integers

l = [1, 3, 4, 2, 5, 6]

for i in l:

print(i)

2. Iterate Over a String

string = "New York"

for ch in string:
print(ch)

3. Modify Print Using end

for ch in "New York":

print(ch, end=":") # Prints N:e:w: :Y:o:r:k:

4. Iterate Over a Dictionary

students_data = {

1: ['Sam', 24],

2: ['Rob', 25],

3: ['Jack', 26],

4: ['Connor', 24],

5: ['Trump', 27]

for key, val in students_data.items():

print(key, val)

5. Iterate Over Dictionary Keys

for key in students_data.keys():

print(key)

Using range() in For Loops


• Basic Range

for i in range(1, 101):

print(i, end=" ")

• Step Count of 2

print(list(range(1, 100, 2))) # Prints odd numbers

• Reverse Sequence

print(list(range(100, 0, -1))) # Prints numbers from 100 to 1

Step 6: Prime Number Program Using Loops

Problem:
Write a program to print all prime numbers between 1 to 20.

Solution:

1. Use a for loop to check each number.

2. Use a nested loop to check divisibility.

3. Use a flag variable to track prime numbers.

4. Print the prime numbers.

for n in range(1, 20):

flag = True

for i in range(2, n):

if n % i == 0:
flag = False

break

if flag and n > 1:

print(n)

Output: 2, 3, 5, 7, 11, 13, 17, 19

COMPREHENSIONS:
Comprehensions in Python provide a concise and elegant way to create sequences like lists, sets, or dictionaries using a single line of code.
They are generally faster and more readable than traditional loops.

Types of Comprehensions

Python supports four types of comprehensions:

1. List Comprehension

• List comprehension creates a new list by applying an expression to each element of an iterable (like a list, string, or range).

• It is written inside square brackets.

SYNTAX: [expression for item in iterable if condition]


2. Set Comprehension

• Set comprehension is similar to list comprehension but creates a set instead of a list.

• It is written using curly braces {}.

Write a program which takes a word as input from user and returns vowels from the word
3. Dictionary Comprehension

• Dictionary comprehension is used to create dictionaries using key-value pairs.

• It is also written using curly braces {}

Python Functions:
In Python, a function is a block of code that performs a specific task and can be reused multiple times. Functions help in making the code modular and
organized.
Lambda Function
A Lambda Function in Python is a small, anonymous function that can have any number of arguments but only one expression. It is often used for short and
simple operations.

lambda arguments : expression

1. MAP FUNCTION:
• map() applies a given function to each item of an iterable (e.g., list, tuple) and returns a new iterable (map object).
2. filter() Function
• filter() filters elements from an iterable using a function that returns either True or False. It only returns elements
where the function returns True.

Ex. Write a python program to count the students above age 18

students_data = {1:['Sam', 15] , 2:['Rob',18], 3:['Kyle', 16], 4:['Cornor',19], 5:['Trump',20]}

len(list(filter(lambda x : x[1] > 18, students_data.values())))

3. reduce() Function
Definition:
• reduce() is used to reduce an iterable to a single cumulative value by applying a specified function repeatedly.
• It is available in the functools module.

NUMPY

NumPy (Numerical Python) is a powerful library in Python used for numerical computations. It provides support for large, multi-dimensional arrays and
matrices, along with a collection of mathematical functions to operate on these data structures.

You can install NumPy using pip: { pip install numpy }

Create a NumPy Array


Multi-Dimensional Array

Mathematical Operations

Common NumPy Functions

NumPy vs Lists
1. Performance:

o NumPy is faster due to optimized C-based implementation.

o Lists are slower for large datasets.

2. Memory Efficiency:
o NumPy uses less memory as it stores homogeneous data.

o Lists consume more memory due to heterogeneous data.

3. Functionality:

o NumPy provides built-in mathematical functions for complex operations.

o Lists lack such functions and require manual implementation.

4. Data Type:

o NumPy uses a single data type for all elements in an array.

o Lists can store different data types in one collection.

5. Libraries Support:

o NumPy is widely used in scientific computing and data analysis.

o Lists are suitable for general-purpose programming.

1D Arrays in NumPy
A 1D Array (One-Dimensional Array) in NumPy is a simple linear array with only one row of elements, similar to a list in Python.

Creating a 1D Array

Accessing Elements

You can access elements using indexing (starts from 0):


Slicing

You can slice a 1D array using start:stop:step:

Operations on 1D Arrays

You can perform element-wise operations:

Common Functions for 1D Arrays

Multidimensional Arrays in NumPy


A Multidimensional Array in NumPy is an array with more than one dimension (2D, 3D, or higher). It’s useful for representing data in a matrix or tensor
format.
Types of Multidimensional Arrays

1. 1D Array → Single row (vector) → [1, 2, 3]

2. 2D Array → Matrix (rows and columns) → [[1, 2], [3, 4]]

3. 3D Array → Collection of matrices (tensor) → [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]

Creating Multidimensional Arrays

Accessing Elements

You can access elements using multiple indices:

Operations on Multidimensional Arrays

Shape and Size


• arr.shape → Returns the dimensions of the array

• arr.size → Total number of elements

• arr.ndim → Number of dimensions

Print the size of a single item of the array

data.itemsize

Creating NumPy Arrays


In the previous segments, you learnt how to convert lists or tuples to arrays using np.array(). There are other ways in which you can create arrays. The
following ways are commonly used when you know the size of the array beforehand:

1. np.ones(): It is used to create an array of 1s.

2. np.zeros(): It is used to create an array of 0s.

3. np.random.randint(): It is used to create a random array of integers within a particular range.

4. np.random.random(): It is used to create an array of random numbers.

5. np.arange(): It is used to create an array with increments of fixed step size.

6. np.linspace(): It is used to create an array of fixed length.


From 3 to 35 with a step of 2

np.arange(3,35,2)

array([ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31

Array of random numbers

2D Array of random numbers

Array of length 20 between 1 and 10


Exercises

Apart from the methods mentioned above, there are a few more NumPy functions that you can use to create special NumPy arrays:

• np.full(): Create a constant array of any number ‘n’

• np.tile(): Create a new array by repeating an existing array for a particular number of times

• np.eye(): Create an identity matrix of any dimension

• np.random.randint(): Create a random array of integers within a particular range

Operations on NumPy Arrays


The learning objectives of this section are:

• Manipulate arrays

• Reshape arrays

• Stack arrays

• Perform operations on arrays

• Perform basic mathematical operations

• Apply built-in functions

• Apply your own functions

• Apply basic linear algebra operations


1 (Arithmatric Operations)

Ex:1

array1 = np.array([10,20,30,40,50])
array2 = np.arange(5)

# Add array1 and array2.


array3 = array1 + array2

 array([10, 21, 32, 43, 54])

Ex:

array = np.linspace(1, 10, 5)

array*2

Stacking Arrays in NumPy (np.hstack() & np.vstack())


Stacking allows combining multiple arrays along a specified axis.

I. Horizontal Stacking (np.htack()):

o Stacks arrays side by side (column-wise).

o Condition: The number of rows must be the same.

o Syntax: np.hstack((array1, array2))


II. Vertical Stacking (np.vstack()):

• Stacks arrays on top of each other (row-wise).

• Condition: The number of columns must be the same.

• Syntax: np.vstack((array1, array2))

Reshape an array:

array1 = np.arange(12).reshape(3,4) #3x4


array2 = np.arange(20).reshape(5,4) #5x4

(Numpy Built-in functions)

2. (Trignometric functions)

3. Exponential and logarithmic functions


4. Aggregates)

The np.add.reduce() function in NumPy is used to reduce an array by repeatedly applying the addition operation.

The np.add.accumulate() function in NumPy performs an accumulated addition across an array, returning the intermediate results of the addition.

Apply Basic Linear Algebra Operations


NumPy provides the np.linalg package to apply common linear algebra operations, such as:
• np.linalg.inv: Inverse of a matrix
• np.linalg.det: Determinant of a matrix
• np.linalg.eig: Eigenvalues and eigenvectors of a matrix
Also, you can multiple matrices using np.dot(a, b).

# np.linalg documentation
help(np.linalg)

Compare Computation Times in NumPy and Standard Python Lists


Now that we know how to use numpy, let us see code and witness the key advantages of numpy i.e. convenience and speed of computation.
In the data science landscape, you'll often work with extremely large datasets, and thus it is important point for you to understand how much
computation time (and memory) you can save using numpy, compared to standard python lists.
Let's compare the computation times of arrays and lists for a simple task of calculating the element-wise product of numbers.

o/p:
0.1251368522644043
0.005321025848388672
The ratio of time taken is 23.0

In this case, numpy is an order of magnitude faster than lists. This is with arrays of size in millions, but you may work on much larger arrays of sizes in
order of billions. Then, the difference is even larger.
Some reasons for such difference in speed are:
• NumPy is written in C, which is basically being executed behind the scenes
• NumPy arrays are more compact than lists, i.e. they take much lesser storage space than lists
Introduction to Pandas
Pandas is an open-source Python library specifically designed for data manipulation, analysis, and visualization. It is built on top of NumPy and provides
easy-to-use data structures and data analysis tools. The name "Pandas" comes from "Panel Data," a term used in econometrics for multi-dimensional
datasets.

Pandas is widely used in various fields like finance, economics, data science, and machine learning due to its efficiency in handling large datasets.

Why Use Pandas?

• Efficient handling of large datasets

• Easy data cleaning and preprocessing

• Flexible data manipulation with DataFrames and Series

• Support for reading and writing data from multiple formats (CSV, Excel, SQL, etc.)

• Integrated with libraries like Matplotlib and Seaborn for data visualization

• Powerful data aggregation and transformation using groupby and pivot operations

Creating DataFrames:

Importing CSV Data:

Reading and Summarising Data:

• Use functions like .head(), .info(), .describe() for a quick overview of the data.
Read file - skip header

Assign Headers

cars_df.columns = ['country code', 'region', 'country', 'cars_per_cap', 'drive_right']

Skip header and assign first column as index.

 cars_df.index
 # Read file and set 1st column as index
cars_df = pd.read_csv("cars.csv", header= None, index_col=0)

Rename the Index Name

Delete the index name

Set Hierarchical index:

Hierarchical indexing, also known as MultiIndexing, is a feature in Pandas that allows you to have multiple levels of index on a DataFrame. This is useful
when you are working with datasets that have multiple dimensions or categories, such as geographical, temporal, or product data.

It creates a multi-level index where rows are indexed by multiple keys instead of a single one, making it easier to organize and analyze data.
Sorting DataFrames:

• You can sort data using sort_values().

Labelling, Indexing, and Slicing:

• Access data using loc (label-based) or iloc (index-based).

Merging DataFrames Using Joins:

• Perform SQL-like joins using pd.merge().

Pivoting and Grouping:

• pivot_table() and groupby() are useful for aggregation and summarisation.


Write cars_df to cars_to_csv.csv

# Read file and set 1st two columns as index


sales = pd.read_excel('sales.xlsx', index_col = [0,1])

Display first 3 land last 3 rows of the sales dataframe

Display the information about the data stored in data frame

Display the statistical information about the data in dataframe

Indexing and Slicing in Pandas


Indexing and slicing are essential operations when working with Pandas DataFrames and Series. They help you select specific rows, columns, or subsets of
data for analysis.
Selecting Columns in Pandas

You can select a single column or multiple columns using different methods:

1. Single Column Selection (Returns Series)

2. Multiple Column Selection (Returns DataFrame)

Selecting Specific Rows and Columns using loc and iloc


A. Using loc (Label-Based Selection)
• Select by row labels and column names
• Syntax: df.loc[row_label, column_label]

B. Using iloc (Index-Based Selection)


• Select by row and column index numbers
• Syntax: df.iloc[row_index, column_index]
Pandas Series

A Pandas Series is a one-dimensional labeled array capable of holding any data type, similar to a column in a DataFrame.

Operations on DataFrames in Pandas


Pandas provides a powerful set of tools to perform operations on DataFrames, making it an essential library for data manipulation and analysis. Let’s explore
key operations like filtering, creating new columns, and handling time-series data.

Filtering Data

Filtering in Pandas is similar to NumPy. You can apply conditional statements to filter rows based on specific conditions.

Creating New Columns

If you noticed the filtering done in the earlier examples did not give precise information about the days, the data column simply has the dates. The date
column can be split into the year, month and day of the month.

Special module of pandas The "DatetimeIndex" is a particular module which has the capabilities to extract a day, month and year form the date.

Adding New columns To add a new column in the dataframe just name the column and pass the instructions about the creation of the new colums
Handling Time-Series Data

Time-series data is data collected at regular time intervals. Pandas has built-in support to handle such data using DatetimeIndex.

Using Lambda Functions with apply()

You can apply custom logic using lambda functions with the .apply() method.

Conclusion

• Filtering Data: Use conditional statements for data extraction.

• Creating Columns: Perform operations to generate new columns.

• Time-Series Data: Convert strings to datetime for efficient time-based analysis.

• Lambda Functions: Apply custom logic using .apply() for quick calculations.

Grouping and Aggregate Functions

• In this section, you will learn how to apply grouping and aggregate functions using Pandas.
Sorting_values:

data_bylocation.sort_values('Rainfall', ascending = False).head()

more……

Sometimes feeling cold is more than about low temperatures; a windy day can also make you cold. A factor called the chill factor can be used to quantify the
cold based on the wind speed and the temperature. The formula for the chill factor is given by

WCI=(10∗√v−v+10.5).(33−Tm)WCI=(10∗v−v+10.5).(33−Tm)

v is the speed of the wind and TmTm is the minimum temperature

Add a column for WCI and find the month with the lowest WCI.

from math import sqrt


def wci(x):
velocity = x['WindGustSpeed']
minTemp = x['MinTemp']
return ((10 * sqrt(velocity) - velocity + 10.5)*(33-minTemp))

Merging Dataframes

The join command is used to combine dataframes. Unlike hstack and vstack, the join command works by using a key to combine to dataframes.

Merging DataFrames in Pandas

Pandas provides powerful tools to merge DataFrames, similar to SQL joins. The primary function used for merging is pd.merge().

Here are the four main types of joins:

INNER JOIN

• Returns only the matching rows from both DataFrames.


LEFT JOIN

• Returns all rows from the left DataFrame and matching rows from the right DataFrame.

• Unmatched rows from the right are filled with NaN.

RIGHT JOIN

• Returns all rows from the right DataFrame and matching rows from the left DataFrame.

• Unmatched rows from the left are filled with NaN.

Example:

FULL OUTER JOIN

• Returns all rows from both DataFrames, with NaN where there is no match.

• Equivalent to SQL FULL OUTER JOIN.


Pivot Tables in Pandas

A pivot table is a powerful tool used to summarize and analyze data by grouping and aggregating it, similar to Excel Pivot Tables. It provides an easy and
structured way to extract insights from large datasets.

Why Use Pivot Tables?

• Simplifies large datasets into a summarized form.

• Provides flexibility in arranging and viewing data.

• Allows custom aggregation using various functions like sum(), mean(), count(), etc.

• Acts as an alternative to groupby() in Pandas

Pivot Table
In Python, a pivot table is a powerful data manipulation tool typically used for summarizing, analyzing, and exploring data. It is commonly created using
the pivot_table() function in pandas.

What is a Pivot Table?

• A pivot table rearranges data to provide a summary using aggregation functions (e.g., sum, mean, count).

• It helps in analyzing large datasets by breaking them down into easily interpretable formats.

• Similar to Excel's Pivot Table but created using pandas in Python.

Create Pivot Table:

• Use pd.pivot_table() to create a pivot table.


 You can use multiple columns for index, columns, or values.
 You can apply multiple aggregation functions using aggfunc ['sum', 'mean'].
 Use .reset_index() to convert the pivot table back to a DataFrame.

You might also like