Python for Data Science
1. Introduction to Python
Description: Python is a high-level, interpreted programming language widely used in data
science, machine learning, and AI because of its simplicity and rich ecosystem of libraries.
• Easy syntax → beginner-friendly.
• Open-source and free.
• Huge number of libraries → NumPy, Pandas, Matplotlib, etc.
• Used in Web Development, Automation, AI, ML, Data Science.
2. Python Basics
2.1 Printing Output
print("Hello, Data Science")
Functionality: Displays output to the screen.
2.2 Variables
name = "Aleezay" # String
age = 28 # Integer
gpa = 3.7 # Float
is_student = True # Boolean
Functionality: Used to store and reuse data.
• Naming rules: cannot start with a number, case-sensitive.
2.3 Data Types
• Numeric → int, float
• String (str) → "Hello"
• Boolean (bool) → True or False
• Collections:
o List → ["apple", "banana"]
o Tuple → (3, 4)
o Dictionary → {"name": "Ali", "age": 22}
o Set → {1, 2, 3}
2.4 Type Casting
x = int("5") # Convert string to integer
y = float(10) # Convert int to float
Functionality: Convert one data type into another.
2.5 Operators
• Arithmetic: + - * / % ** //
• Comparison: == != > < >= <=
• Logical: and or not
3. Control Flow
3.1 If-Else
age = 18
if age >= 18:
print("Adult")
else:
print("Minor")
Functionality: Decision making.
3.2 Loops
For Loop – repeat known number of times
for i in range(5):
print(i)
While Loop – repeat until condition false
i=1
while i <= 5:
print(i)
i += 1
3.3 Break & Continue
for i in range(1, 6):
if i == 3:
continue # skips 3
print(i)
4. Functions
4.1 Defining Functions
def greet(name):
return f"Hello, {name}!"
4.2 Lambda Functions
square = lambda x: x**2
print(square(5))
Functionality: Short one-line function.
5. Modules & Packages
5.1 Built-in Module
import math
print(math.sqrt(16))
5.2 User-defined Module
• Create mymodule.py
def add(a, b): return a+b
• Use in another file
import mymodule
print(mymodule.add(5,3))
6. Exception Handling
try:
num = int(input("Enter a number: "))
print(10 / num)
except ZeroDivisionError:
print("Cannot divide by zero")
except ValueError:
print("Invalid input")
finally:
print("Program finished")
Functionality: Handles runtime errors safely.
7. Data Structures
7.1 List
fruits = ["apple", "banana", "cherry"]
fruits.append("orange")
7.2 Tuple
point = (3, 4)
7.3 Dictionary
student = {"name": "Ali", "age": 22}
print(student["name"])
7.4 Set
nums = {1,2,3,3}
print(nums) # {1,2,3}
8. File Handling
Reading a File
with open("data.txt", "r") as f:
print(f.read())
Writing to a File
with open("data.txt", "w") as f:
f.write("Hello Data Science")
9. Libraries for Data Science
9.1 NumPy (Numerical Python)
Functionality: Fast array operations, linear algebra.
import numpy as np
arr = np.array([1,2,3])
print(arr.mean())
9.2 Pandas
Functionality: Data manipulation (tables, spreadsheets).
import pandas as pd
df = pd.DataFrame({"Name":["Ali","Sara"], "Age":[22,21]})
print(df.head())
9.3 Matplotlib
Functionality: Data visualization (charts/plots).
import matplotlib.pyplot as plt
plt.plot([1,2,3],[2,4,6])
plt.show()
9.4 Seaborn
Functionality: Advanced visualization (heatmaps, distributions).
import seaborn as sns
sns.histplot([1,2,2,3,3,4,4,5])
plt.show()
9.5 Scikit-learn
Functionality: Machine Learning library.
from sklearn.linear_model import LinearRegression
• Classification
• Regression
• Clustering
9.6 TensorFlow & PyTorch
Functionality: Deep learning libraries for Neural Networks.
9.7 Other Useful Libraries
• Statsmodels → statistical tests
• NLTK / SpaCy → natural language processing
• OpenCV → computer vision
10. Programming Practice
1. Write a function that returns whether a number is prime.
2. Read a CSV of student marks using Pandas and find topper.
3. Create a bar chart of student marks.
4. Use NumPy to generate 100 random numbers and calculate mean.
5. Train a simple LinearRegression model in scikit-learn.