Datascience:
Data science is a multidisciplinary field that uses scientific
methods, processes, algorithms, and systems to extract
knowledge and insights from data
Examples of where Data Science is needed:
For route planning: To discover the best routes to ship
To foresee delays for flight/ship/train etc. (through predictive
analysis)
To create promotional offers
To find the best suited time to deliver goods
To forecast the next years revenue for a company
To analyze health benefit of training
To predict who will win elections
What Data Science Entails:
Data Collection: Gathering raw data from various sources,
including databases, sensors, and user interactions.
Data Cleaning: Ensuring the data is accurate, complete, and
ready for analysis.
Data Analysis: Applying statistical and computational methods
to identify patterns, trends, and relationships in the data.
Data Visualization: Creating visual representations of the data,
like charts and graphs, to communicate findings effectively.
Data Interpretation: Drawing meaningful conclusions and
insights from the analyzed data.
Decision Making: Using these insights to inform strategies,
solve problems, or predict future outcomes.
Applications of Data Science:
Healthcare: Predicting diseases, patient monitoring
Finance: Fraud detection, risk assessment
Marketing: Customer segmentation, personalized
recommendations
Retail: Inventory management, demand forecasting
Social Media: Sentiment analysis, trend prediction
Why python:
Python is the most popular language used in data science —
and for good reason. Here's why Python is widely preferred in
data science:
1. Easy to Learn and Use
Python has simple and readable syntax similar to English.
Even beginners can quickly start writing and understanding
code.
2. Rich Libraries for Data Science
Python has powerful libraries that make data science easier:
NumPy – for numerical operations
Pandas – for data manipulation and analysis
Matplotlib / Seaborn / Plotly – for data visualization
Scikit-learn – for machine learning
TensorFlow / PyTorch – for deep learning
3. Huge Community Support
Large global community for help and support.
Tons of tutorials, courses, and documentation available.
4. Integration with Other Tools
Works well with databases, web apps, APIs, big data tools (like
Spark).
Integrates easily with Jupyter notebooks, used for interactive
analysis.
5. Ideal for Prototyping
You can build a data model, test, and deploy it quickly.
Perfect for experimenting with ideas and models.
6. Industry Standard
Used by top companies like Google, Netflix, Facebook, and
Amazon.
Most data science jobs require Python as a core skill.
The fundamental Python libraries that every data scientist
should know and use:
1. NumPy (Numerical Python)
Core library for numerical computations.
Supports multi-dimensional arrays and matrix operations.
Functions: mean(), std(), dot(), etc.
Install: pip install numpy
Example:
import numpy as np
arr = np.array([1, 2, 3])
print(np.mean(arr))
2. Pandas (Panel Data)
Powerful for data manipulation and analysis.
Uses DataFrames (like Excel tables).
Ideal for cleaning, filtering, merging, and grouping data.
Install: pip install pandas
Example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())
3. Matplotlib
Basic plotting library for creating charts and graphs.
Useful for line plots, histograms, bar charts, etc.
Install: pip install matplotlib
Example:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()
4. Seaborn
Built on Matplotlib, but with simpler syntax and better visuals.
Great for statistical plots like boxplots, heatmaps, and violin
plots.
Install: pip install seaborn
Example:
import seaborn as sns
sns.boxplot(data=df, x='category', y='value')
5. Scikit-learn
Main library for machine learning.
Includes algorithms for classification, regression, clustering, and
model evaluation.
Install: pip install scikit-learn
Example:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
6. TensorFlow / PyTorch
Used for deep learning and neural networks.
TensorFlow by Google; PyTorch by Facebook.
Install:
TensorFlow: pip install tensorflow
PyTorch: pip install torch
Integrated development environment (IDE):
An integrated development environment (IDE) is a software
application that helps programmers develop software code
efficiently.
It increases developer productivity by combining capabilities
such as software editing, building, testing, and packaging in an
easy-to-use application.
Just as writers use text editors and accountants use
spreadsheets, software developers use IDEs to make their job
easier.
The IDE provide a central interface for common developer
tools, making the software development process much more
efficient.
Developers can start programming new applications quickly
instead of manually integrating and configuring different
software.
They also don't have to learn about all the tools and can instead
focus on just one application.
The following are some reasons why developers use IDEs:
Code editing automation
Programming languages have rules for how statements must be
structured. Because an IDE knows these rules, it contains many
intelligent features for automatically writing or editing the
source code.
,,
Syntax highlighting
An IDE can format the written text by automatically making
some words bold or italic, or by using different font colors.
These visual cues make the source code more readable and
give instant feedback about accidental syntax errors.
Intelligent code completion
Various search terms show up when you start typing words in a
search engine. Similarly, an IDE can make suggestions to
complete a code statement when the developer begins typing.
Refactoring support
Code refactoring is the process of restructuring the source code
to make it more efficient and readable without changing its
core functionality. IDEs can auto-refactor to some extent,
allowing developers to improve their code quickly and easily.
Other team members understand readable code faster, which
supports collaboration within the team.
Local build automation
IDEs increase programmer productivity by performing
repeatable development tasks that are typically part of every
code change. The following are some examples of regular
coding tasks that an IDE carries out.
Compilation
An IDE compiles or converts the code into a simplified language
that the operating system can understand. Some programming
languages implement just-in-time compiling, in which the IDE
converts human-readable code into machine code from within
the application.
Testing
The IDE allows developers to automate unit tests locally before
the software is integrated with other developers' code and
more complex integration tests are run.
Debugging
Debugging is the process of fixing any errors or bugs that
testing reveals. One of the biggest values of an IDE for
debugging purposes is that you can step through the code, line
by line, as it runs and inspect code behaviour. IDEs also
integrate several debugging tools that highlight bugs caused by
human error in real time, even as the developer is typing.
Features of a Good IDE:
Syntax highlighting and auto-completion
Error detection while typing
Version control integration
Code suggestions and refactoring tools
Multi-language support
Examples of Popular IDEs:
Primary
IDE Best For
Language(s)
Data science, web
PyCharm Python
apps (Django)
Desktop apps,
C#, C++,
Visual Studio enterprise
Python, more
development
Java, C++, Enterprise software,
Eclipse
Python Android apps
Visual Studio Many (with Lightweight, cross-
Code (VS Code) extensions) platform coding
NetBeans Java, PHP, C++ Java development
Swift, macOS and iOS app
Xcode
Objective-C development
Advantages of IDE:
Boosts productivity
Reduces errors and bugs
Provides a clean, organized workflow
Enhances learning and exploration with built-in tools
IPython:
IPython (short for Interactive Python) is an enhanced interactive
shell that provides a rich toolkit for interactive computing in
Python. It’s widely used by data scientists, researchers, and
developers for its powerful features that go far beyond the
default Python shell.
IPython is:
An interactive command-line interface for Python.
A powerful tool for exploratory programming, data analysis,
and debugging.
The core of the Jupyter Notebook interface.
Features of IPython:
Offers a powerful interactive Python shell.
Acts as a main kernel for Jupyter notebook and other front end tools
of Project Jupyter.
Possesses object introspection ability. Introspection is the ability to
check properties of an object during runtime.
Syntax highlighting.
Stores the history of interactions.
Tab completion of keywords, variables and function names.
Magic command system useful for controlling Python environment
and performing OS tasks.
Ability to be embedded in other Python programs.
Provides access to Python debugger.
Installing IPython Package on Windows using PIP:
If you want the installation to be done through PIP, open up the
Command Prompt and use the below command:
pip install ipython
You will get a similar message once the installation is complete:
Verifying IPython Package Installation on Windows using PIP:
To verify if the IPython Package has been successfully installed in your
system run the below command in Command Prompt:
python -m pip show ipython
You'll get the below message if the installation is complete:
Starting IPython from Command Prompt.
Before proceeding to understand about IPython in depth, note that instead
of the regular >>>, you will notice two major Python prompts as
explained below −
In[1] appears before any input expression.
Out[1] appears before the Output appears.
Besides, the numbers in the square brackets are incremented
automatically. Observe the following screenshot for a better
understanding −
Example Commands:
In [1]: x = 42
In [2]: x?
# Shows info about the variable
In [3]: %timeit sum(range(1000))
# Measures how fast the code runs
In [4]: !dir
# Lists files in current directory (Windows)
How to Launch Jupyter Notebook (on Windows)
Jupyter Notebook is a powerful web-based tool for writing and running
Python code. Here's how to launch it step-by-step on Windows:
Step 1: Install Jupyter Notebook
Option A: Using pip (if you already have Python)
1. Open Command Prompt.
2. Run:
pip install notebook
Option B: Using Anaconda (Recommended for Data Science)
1. Download and install
2. Anaconda:
🔗 https://www.anaconda.com/products/distribution
3. It includes Jupyter, Spyder, Python, and many scientific libraries.
Step 2: Launch Jupyter Notebook
If Installed with pip:
1. Open Command Prompt.
2. Type:
jupyter notebook
This will:
Start a local server.
Open your default browser with the Jupyter interface (URL like
http://localhost:8888/tree).
If Using Anaconda:
1. Open Anaconda Navigator (from Start menu).
2. Click "Launch" under Jupyter Notebook.
Step 3: Use Jupyter
Click New → Python 3 Notebook to open a coding workspace.
You can write and run code in separate cells.
Use Shift + Enter to run a cell.
To Stop Jupyter:
In the browser, click File → Close and Halt to stop a notebook.
In Command Prompt, press Ctrl + C twice to stop the Jupyter
server.
Sample Jupyter Notebook Code:
1. Basic Python Exercise
# Simple loop and condition
for i in range(1, 6):
if i % 2 == 0:
print(f"{i} is even")
else:
print(f"{i} is odd")
2. Using NumPy: Array and Statistics
import numpy as np
# Create a NumPy array
data = np.array([10, 20, 30, 40, 50])
print("Mean:", np.mean(data))
print("Standard Deviation:", np.std(data))
print("Max Value:", np.max(data))
3. Using Matplotlib: Plotting a Line Graph
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 15, 13, 18, 16]
# Plotting the data
plt.plot(x, y, marker='o', linestyle='-', color='blue')
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()
4. Using Pandas: Create and Display DataFrame
import pandas as pd
# Create a simple DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'Score': [85, 90, 78]
df = pd.DataFrame(data)
print(df)
Here are some of the most useful keyboard shortcuts in the IPython shell
(interactive command-line interface, not Jupyter Notebook):
Navigation and Editing
Shortcut Description
Up Arrow / Down Arrow Scroll through command history
Move cursor to the beginning of the
Ctrl + A
line
Ctrl + E Move cursor to the end of the line
Ctrl + K Delete from cursor to end of line
Delete from cursor to beginning of
Ctrl + U
line
Ctrl + W Delete the word before the cursor
Clear the screen (like clear
Ctrl + L
command)
Ctrl + Left/Right Arrow Move cursor word by word
Esc + B / Esc + F Move backward/forward by one word
Auto-completion
Shortcut Description
Tab Auto-complete variable/function names
Shift + Tab Show function signature/help (sometimes works)
Ctrl + R Reverse search through command history
Ctrl + P / Ctrl + N Previous/next command (like up/down arrow)
Execution and Multiline Input
Shortcut Description
Enter Execute command
Shift + Enter Continue to the next line (for multi-line input)
Ctrl + C Cancel current command / keyboard interrupt
Ctrl + D Exit IPython shell (or exit() / quit())
Magic commands in IPython:
Magic commands in IPython are special commands prefixed with one or
two percent signs (% or %%) that provide a range of helpful functionalities
to interact with the IPython system. These commands are designed to
improve productivity and enhance interactivity in the IPython shell and
Jupyter Notebooks.
🔹 Types of Magic Commands
1. Line magics (%): Apply to a single line of code.
2. Cell magics (%%): Apply to the entire cell below the magic.
Commonly Used Magic Commands
Type Command Description
Line %lsmagic Lists all available magic commands
Line %time Times the execution of a single line of code
Cell %%time Times the execution of the whole cell
Runs a line multiple times and reports average
Line %timeit
execution time
Cell %%timeit Same as above, but for entire cell
Line %run script.py Runs a Python script
Line %load script.py Loads a script into the current cell
Line %matplotlib inline Displays plots inline (for Jupyter)
Line %pwd Prints current working directory
Type Command Description
Line %cd folder_path Changes the working directory
Line %who / %whos Lists variables in memory
Line %reset Clears all variables from memory
Enters the interactive debugger after an
Line %debug
exception
Line %hist Displays command history
Line %alias Creates shortcuts for system commands
Line %pip install package Installs Python packages directly from IPython
Line %load_ext Loads an IPython extension
Examples
%matplotlib inline # For plotting graphs within a Jupyter notebook
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.title("Simple Plot")
plt.show()
%time sum(range(1000000)) # Times execution of one line
%%time
total = 0
for i in range(1_000_000):
total += i
print(total)
%%writefile – Write Code to a File
%%writefile sample_script.py
def greet(name):
return f"Hello, {name}!"
print(greet("Ram"))
%run – Run a Python File
%run sample_script.py