0% found this document useful (0 votes)

5 views19 pages

Datascience Notes Unit-1

Uploaded by

wowivi2661

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views19 pages

Datascience Notes Unit-1

Uploaded by

wowivi2661

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Datascience:

 Data science is a multidisciplinary field that uses scientific

methods, processes, algorithms, and systems to extract
knowledge and insights from data

Examples of where Data Science is needed:

 For route planning: To discover the best routes to ship

 To foresee delays for flight/ship/train etc. (through predictive
analysis)
 To create promotional offers
 To find the best suited time to deliver goods
 To forecast the next years revenue for a company
 To analyze health benefit of training
 To predict who will win elections
What Data Science Entails:
 Data Collection: Gathering raw data from various sources,
including databases, sensors, and user interactions.
 Data Cleaning: Ensuring the data is accurate, complete, and
ready for analysis.
 Data Analysis: Applying statistical and computational methods
to identify patterns, trends, and relationships in the data.
 Data Visualization: Creating visual representations of the data,
like charts and graphs, to communicate findings effectively.
 Data Interpretation: Drawing meaningful conclusions and
insights from the analyzed data.
 Decision Making: Using these insights to inform strategies,
solve problems, or predict future outcomes.

Applications of Data Science:

 Healthcare: Predicting diseases, patient monitoring
 Finance: Fraud detection, risk assessment
 Marketing: Customer segmentation, personalized
recommendations
 Retail: Inventory management, demand forecasting
 Social Media: Sentiment analysis, trend prediction

Why python:
Python is the most popular language used in data science —
and for good reason. Here's why Python is widely preferred in
data science:

1. Easy to Learn and Use

 Python has simple and readable syntax similar to English.
 Even beginners can quickly start writing and understanding
code.

2. Rich Libraries for Data Science

Python has powerful libraries that make data science easier:
 NumPy – for numerical operations
 Pandas – for data manipulation and analysis
 Matplotlib / Seaborn / Plotly – for data visualization
 Scikit-learn – for machine learning
 TensorFlow / PyTorch – for deep learning

3. Huge Community Support

 Large global community for help and support.
 Tons of tutorials, courses, and documentation available.

4. Integration with Other Tools

 Works well with databases, web apps, APIs, big data tools (like
Spark).
 Integrates easily with Jupyter notebooks, used for interactive
analysis.

5. Ideal for Prototyping

 You can build a data model, test, and deploy it quickly.
 Perfect for experimenting with ideas and models.

6. Industry Standard
 Used by top companies like Google, Netflix, Facebook, and
Amazon.
 Most data science jobs require Python as a core skill.

 The fundamental Python libraries that every data scientist

should know and use:
1. NumPy (Numerical Python)
 Core library for numerical computations.
 Supports multi-dimensional arrays and matrix operations.
 Functions: mean(), std(), dot(), etc.
Install: pip install numpy
Example:
import numpy as np
arr = np.array([1, 2, 3])
print(np.mean(arr))

2. Pandas (Panel Data)

 Powerful for data manipulation and analysis.
 Uses DataFrames (like Excel tables).
 Ideal for cleaning, filtering, merging, and grouping data.
Install: pip install pandas
Example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())

3. Matplotlib
 Basic plotting library for creating charts and graphs.
 Useful for line plots, histograms, bar charts, etc.
Install: pip install matplotlib
Example:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()

4. Seaborn
 Built on Matplotlib, but with simpler syntax and better visuals.
 Great for statistical plots like boxplots, heatmaps, and violin
plots.
Install: pip install seaborn
Example:
import seaborn as sns
sns.boxplot(data=df, x='category', y='value')

5. Scikit-learn
 Main library for machine learning.
 Includes algorithms for classification, regression, clustering, and
model evaluation.
Install: pip install scikit-learn
Example:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)

6. TensorFlow / PyTorch
 Used for deep learning and neural networks.
 TensorFlow by Google; PyTorch by Facebook.
Install:
 TensorFlow: pip install tensorflow
 PyTorch: pip install torch

Integrated development environment (IDE):

 An integrated development environment (IDE) is a software
application that helps programmers develop software code
efficiently.
 It increases developer productivity by combining capabilities
such as software editing, building, testing, and packaging in an
easy-to-use application.
 Just as writers use text editors and accountants use
spreadsheets, software developers use IDEs to make their job
easier.
 The IDE provide a central interface for common developer
tools, making the software development process much more
efficient.
 Developers can start programming new applications quickly
instead of manually integrating and configuring different
software.
 They also don't have to learn about all the tools and can instead
focus on just one application.
 The following are some reasons why developers use IDEs:

Code editing automation

 Programming languages have rules for how statements must be
structured. Because an IDE knows these rules, it contains many
intelligent features for automatically writing or editing the
source code.
,,
Syntax highlighting
 An IDE can format the written text by automatically making
some words bold or italic, or by using different font colors.
These visual cues make the source code more readable and
give instant feedback about accidental syntax errors.

Intelligent code completion

 Various search terms show up when you start typing words in a
search engine. Similarly, an IDE can make suggestions to
complete a code statement when the developer begins typing.

Refactoring support
 Code refactoring is the process of restructuring the source code
to make it more efficient and readable without changing its
core functionality. IDEs can auto-refactor to some extent,
allowing developers to improve their code quickly and easily.
Other team members understand readable code faster, which
supports collaboration within the team.
Local build automation
 IDEs increase programmer productivity by performing
repeatable development tasks that are typically part of every
code change. The following are some examples of regular
coding tasks that an IDE carries out.

Compilation
 An IDE compiles or converts the code into a simplified language
that the operating system can understand. Some programming
languages implement just-in-time compiling, in which the IDE
converts human-readable code into machine code from within
the application.

Testing
 The IDE allows developers to automate unit tests locally before
the software is integrated with other developers' code and
more complex integration tests are run.

Debugging
 Debugging is the process of fixing any errors or bugs that
testing reveals. One of the biggest values of an IDE for
debugging purposes is that you can step through the code, line
by line, as it runs and inspect code behaviour. IDEs also
integrate several debugging tools that highlight bugs caused by
human error in real time, even as the developer is typing.

Features of a Good IDE:

 Syntax highlighting and auto-completion
 Error detection while typing
 Version control integration
 Code suggestions and refactoring tools
 Multi-language support

Examples of Popular IDEs:

Primary
IDE Best For
Language(s)
Data science, web
PyCharm Python
apps (Django)
Desktop apps,
C#, C++,
Visual Studio enterprise
Python, more
development
Java, C++, Enterprise software,
Eclipse
Python Android apps
Visual Studio Many (with Lightweight, cross-
Code (VS Code) extensions) platform coding
NetBeans Java, PHP, C++ Java development
Swift, macOS and iOS app
Xcode
Objective-C development

Advantages of IDE:
 Boosts productivity
 Reduces errors and bugs
 Provides a clean, organized workflow
 Enhances learning and exploration with built-in tools

IPython:
IPython (short for Interactive Python) is an enhanced interactive
shell that provides a rich toolkit for interactive computing in
Python. It’s widely used by data scientists, researchers, and
developers for its powerful features that go far beyond the
default Python shell.
IPython is:
 An interactive command-line interface for Python.
 A powerful tool for exploratory programming, data analysis,
and debugging.
 The core of the Jupyter Notebook interface.

Features of IPython:

 Offers a powerful interactive Python shell.

 Acts as a main kernel for Jupyter notebook and other front end tools
of Project Jupyter.
 Possesses object introspection ability. Introspection is the ability to
check properties of an object during runtime.
 Syntax highlighting.
 Stores the history of interactions.
 Tab completion of keywords, variables and function names.
 Magic command system useful for controlling Python environment
and performing OS tasks.
 Ability to be embedded in other Python programs.
 Provides access to Python debugger.

Installing IPython Package on Windows using PIP:

If you want the installation to be done through PIP, open up the

Command Prompt and use the below command:

pip install ipython

You will get a similar message once the installation is complete:

Verifying IPython Package Installation on Windows using PIP:

To verify if the IPython Package has been successfully installed in your

system run the below command in Command Prompt:

python -m pip show ipython

You'll get the below message if the installation is complete:

Starting IPython from Command Prompt.

Before proceeding to understand about IPython in depth, note that instead

of the regular >>>, you will notice two major Python prompts as
explained below −

 In[1] appears before any input expression.

 Out[1] appears before the Output appears.

Besides, the numbers in the square brackets are incremented

automatically. Observe the following screenshot for a better
understanding −

Example Commands:
In [1]: x = 42

In [2]: x?

# Shows info about the variable

In [3]: %timeit sum(range(1000))

# Measures how fast the code runs

In [4]: !dir

# Lists files in current directory (Windows)

How to Launch Jupyter Notebook (on Windows)

Jupyter Notebook is a powerful web-based tool for writing and running

Python code. Here's how to launch it step-by-step on Windows:

Step 1: Install Jupyter Notebook

Option A: Using pip (if you already have Python)

1. Open Command Prompt.

2. Run:

pip install notebook

Option B: Using Anaconda (Recommended for Data Science)

1. Download and install

2. Anaconda:
🔗 https://www.anaconda.com/products/distribution
3. It includes Jupyter, Spyder, Python, and many scientific libraries.

Step 2: Launch Jupyter Notebook

If Installed with pip:

1. Open Command Prompt.

2. Type:

jupyter notebook

This will:

 Start a local server.

 Open your default browser with the Jupyter interface (URL like
http://localhost:8888/tree).

If Using Anaconda:

1. Open Anaconda Navigator (from Start menu).

2. Click "Launch" under Jupyter Notebook.

Step 3: Use Jupyter

 Click New → Python 3 Notebook to open a coding workspace.

 You can write and run code in separate cells.
 Use Shift + Enter to run a cell.

To Stop Jupyter:

 In the browser, click File → Close and Halt to stop a notebook.

 In Command Prompt, press Ctrl + C twice to stop the Jupyter
server.

Sample Jupyter Notebook Code:

1. Basic Python Exercise

# Simple loop and condition

for i in range(1, 6):

if i % 2 == 0:

print(f"{i} is even")

else:
print(f"{i} is odd")

2. Using NumPy: Array and Statistics

import numpy as np

# Create a NumPy array

data = np.array([10, 20, 30, 40, 50])

print("Mean:", np.mean(data))

print("Standard Deviation:", np.std(data))

print("Max Value:", np.max(data))

3. Using Matplotlib: Plotting a Line Graph

import matplotlib.pyplot as plt

# Sample data

x = [1, 2, 3, 4, 5]

y = [10, 15, 13, 18, 16]

# Plotting the data

plt.plot(x, y, marker='o', linestyle='-', color='blue')

plt.title("Simple Line Plot")

plt.xlabel("X-axis")

plt.ylabel("Y-axis")

plt.grid(True)

plt.show()

4. Using Pandas: Create and Display DataFrame

import pandas as pd

# Create a simple DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 22],

'Score': [85, 90, 78]

df = pd.DataFrame(data)

print(df)

Here are some of the most useful keyboard shortcuts in the IPython shell
(interactive command-line interface, not Jupyter Notebook):

Navigation and Editing

Shortcut Description
Up Arrow / Down Arrow Scroll through command history
Move cursor to the beginning of the
Ctrl + A
line
Ctrl + E Move cursor to the end of the line
Ctrl + K Delete from cursor to end of line
Delete from cursor to beginning of
Ctrl + U
line
Ctrl + W Delete the word before the cursor
Clear the screen (like clear
Ctrl + L
command)
Ctrl + Left/Right Arrow Move cursor word by word
Esc + B / Esc + F Move backward/forward by one word

Auto-completion
Shortcut Description
Tab Auto-complete variable/function names
Shift + Tab Show function signature/help (sometimes works)
Ctrl + R Reverse search through command history
Ctrl + P / Ctrl + N Previous/next command (like up/down arrow)

Execution and Multiline Input

Shortcut Description
Enter Execute command
Shift + Enter Continue to the next line (for multi-line input)
Ctrl + C Cancel current command / keyboard interrupt
Ctrl + D Exit IPython shell (or exit() / quit())

Magic commands in IPython:

Magic commands in IPython are special commands prefixed with one or

two percent signs (% or %%) that provide a range of helpful functionalities
to interact with the IPython system. These commands are designed to
improve productivity and enhance interactivity in the IPython shell and
Jupyter Notebooks.

🔹 Types of Magic Commands

1. Line magics (%): Apply to a single line of code.

2. Cell magics (%%): Apply to the entire cell below the magic.

Commonly Used Magic Commands

Type Command Description

Line %lsmagic Lists all available magic commands
Line %time Times the execution of a single line of code
Cell %%time Times the execution of the whole cell
Runs a line multiple times and reports average
Line %timeit
execution time
Cell %%timeit Same as above, but for entire cell
Line %run script.py Runs a Python script
Line %load script.py Loads a script into the current cell
Line %matplotlib inline Displays plots inline (for Jupyter)
Line %pwd Prints current working directory
Type Command Description
Line %cd folder_path Changes the working directory
Line %who / %whos Lists variables in memory
Line %reset Clears all variables from memory
Enters the interactive debugger after an
Line %debug
exception
Line %hist Displays command history
Line %alias Creates shortcuts for system commands
Line %pip install package Installs Python packages directly from IPython
Line %load_ext Loads an IPython extension

Examples

 %matplotlib inline # For plotting graphs within a Jupyter notebook

%matplotlib inline

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [4, 5, 6])

plt.title("Simple Plot")

plt.show()

 %time sum(range(1000000)) # Times execution of one line

 %%time

total = 0

for i in range(1_000_000):

total += i

print(total)

 %%writefile – Write Code to a File

%%writefile sample_script.py

def greet(name):

return f"Hello, {name}!"

print(greet("Ram"))

 %run – Run a Python File

%run sample_script.py

Python's Applications in The Real World
No ratings yet
Python's Applications in The Real World
12 pages
PDS Chapter 3
No ratings yet
PDS Chapter 3
37 pages
2.introduction For Python
No ratings yet
2.introduction For Python
22 pages
Data Science Handwritten Notes
No ratings yet
Data Science Handwritten Notes
44 pages
Madhubabu - Shivangi - PDF
No ratings yet
Madhubabu - Shivangi - PDF
228 pages
AI - Week 1 - Class 1 - ARSLAN
No ratings yet
AI - Week 1 - Class 1 - ARSLAN
51 pages
Python Programming Essentials
No ratings yet
Python Programming Essentials
323 pages
Day 1 Notes Python + Power BI Classes
No ratings yet
Day 1 Notes Python + Power BI Classes
26 pages
Lec 1 Introduction To Python
No ratings yet
Lec 1 Introduction To Python
23 pages
Lec 1 Introduction To Python
No ratings yet
Lec 1 Introduction To Python
26 pages
Advanced Data Analytics Using Python - Unit II
No ratings yet
Advanced Data Analytics Using Python - Unit II
57 pages
PYTHON
No ratings yet
PYTHON
11 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
React Native Guide
100% (1)
React Native Guide
31 pages
DeltaV OPC
No ratings yet
DeltaV OPC
2 pages
About Python
No ratings yet
About Python
17 pages
Python U-5
No ratings yet
Python U-5
76 pages
MACHINE LEARNING YUDHISTHIR - Yudhisthir Singh Gour
No ratings yet
MACHINE LEARNING YUDHISTHIR - Yudhisthir Singh Gour
23 pages
1 Introduction Python Programming For Data Science
No ratings yet
1 Introduction Python Programming For Data Science
11 pages
It3501 Full Stack Web Development
100% (1)
It3501 Full Stack Web Development
4 pages
Python Basic
No ratings yet
Python Basic
145 pages
PDS Unit1-1
No ratings yet
PDS Unit1-1
104 pages
Python Handout Level5 Nit&Sod 062930
No ratings yet
Python Handout Level5 Nit&Sod 062930
68 pages
Coding
No ratings yet
Coding
11 pages
Week 1
No ratings yet
Week 1
121 pages
Python U-5 Combined Notes
No ratings yet
Python U-5 Combined Notes
76 pages
Information Technology Workshop: B.Tech - IT, Semester-3
No ratings yet
Information Technology Workshop: B.Tech - IT, Semester-3
11 pages
Big Data Lecture # 2
No ratings yet
Big Data Lecture # 2
10 pages
Unit I Python Introduction
No ratings yet
Unit I Python Introduction
65 pages
Association Seminar On Python Tools 26-09-23
No ratings yet
Association Seminar On Python Tools 26-09-23
5 pages
Python Libraries Seminar Report
100% (2)
Python Libraries Seminar Report
16 pages
Placement Portal Management System
No ratings yet
Placement Portal Management System
29 pages
Servicenow Application Developer Exam New-Practice Test Set 5
No ratings yet
Servicenow Application Developer Exam New-Practice Test Set 5
29 pages
AWS EC2 App Token Storage Solutions
No ratings yet
AWS EC2 App Token Storage Solutions
161 pages
Python Essentials for Managers
No ratings yet
Python Essentials for Managers
57 pages
Intro To Python and IDE
No ratings yet
Intro To Python and IDE
2 pages
Unit I Python Introduction
No ratings yet
Unit I Python Introduction
65 pages
Python IDE Case Study
No ratings yet
Python IDE Case Study
4 pages
Python Unit 1 & 2
No ratings yet
Python Unit 1 & 2
16 pages
1.1-1.4 - Introduction To Python
No ratings yet
1.1-1.4 - Introduction To Python
50 pages
Data Ty
No ratings yet
Data Ty
59 pages
Python Module 1
No ratings yet
Python Module 1
9 pages
Exploring 10 Diverse Applications
No ratings yet
Exploring 10 Diverse Applications
9 pages
Python For Data Science
No ratings yet
Python For Data Science
20 pages
What Is Dynamic Website
No ratings yet
What Is Dynamic Website
7 pages
Py Chapter 1 Topic 1
No ratings yet
Py Chapter 1 Topic 1
7 pages
Prac1 AAM
No ratings yet
Prac1 AAM
6 pages
Eguide of Cloud Data Engineering
No ratings yet
Eguide of Cloud Data Engineering
23 pages
Clips Reference Manual
No ratings yet
Clips Reference Manual
426 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
4 pages
Ds Module 1
No ratings yet
Ds Module 1
72 pages
Py Chapter 1 Topic 3
No ratings yet
Py Chapter 1 Topic 3
4 pages
Lec 2
No ratings yet
Lec 2
18 pages
Python For Web Development Pre
No ratings yet
Python For Web Development Pre
15 pages
Lec 01 Alpha Batch
No ratings yet
Lec 01 Alpha Batch
16 pages
Micro Project Report Format
No ratings yet
Micro Project Report Format
11 pages
Paper 5184
No ratings yet
Paper 5184
7 pages
Python Tools for Data Scientists
No ratings yet
Python Tools for Data Scientists
19 pages
Module03-Introduction To Python
No ratings yet
Module03-Introduction To Python
40 pages
Python For Data Science
No ratings yet
Python For Data Science
17 pages
A A Aaaaaaaaaaaa
No ratings yet
A A Aaaaaaaaaaaa
18 pages
Main PART PDF
No ratings yet
Main PART PDF
46 pages
DA Python Env Intro
No ratings yet
DA Python Env Intro
47 pages
DT-1. Familiarization With AIML Platforms
No ratings yet
DT-1. Familiarization With AIML Platforms
25 pages
Python For Data Science Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 02 Introduction To Python
No ratings yet
Python For Data Science Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 02 Introduction To Python
18 pages
SEO Complete Audit Checklist by Iq SEO - Please Feel Free To Make A
No ratings yet
SEO Complete Audit Checklist by Iq SEO - Please Feel Free To Make A
24 pages
Lec-1-Introduction To Python
No ratings yet
Lec-1-Introduction To Python
25 pages
Python for Developers & Analysts
No ratings yet
Python for Developers & Analysts
23 pages
Analysis of Loops - Complexity
No ratings yet
Analysis of Loops - Complexity
6 pages
Introduction To Algorithms and Data Structures in Swift 4 Get Ready For Programming Job Interviews. Write Better, Faster Swift Code. (Swift Clinic Book 1) - Karoly Nyisztor
No ratings yet
Introduction To Algorithms and Data Structures in Swift 4 Get Ready For Programming Job Interviews. Write Better, Faster Swift Code. (Swift Clinic Book 1) - Karoly Nyisztor
182 pages
Iress Pro Installation Guide January 2020
No ratings yet
Iress Pro Installation Guide January 2020
11 pages
Verilog
No ratings yet
Verilog
8 pages
PL - SQL Transaction Commit, Rollback, Savepoint, Autocommit, Set Transaction
No ratings yet
PL - SQL Transaction Commit, Rollback, Savepoint, Autocommit, Set Transaction
3 pages
Managed Save Sequence (Additional and Unmanage Save)
No ratings yet
Managed Save Sequence (Additional and Unmanage Save)
9 pages
Dbms Final Project
No ratings yet
Dbms Final Project
57 pages
Git Commands Cheat Sheet
No ratings yet
Git Commands Cheat Sheet
2 pages
DS - Queue Best
No ratings yet
DS - Queue Best
7 pages
MySQL Looping Techniques
No ratings yet
MySQL Looping Techniques
39 pages
Section Tools Primer
No ratings yet
Section Tools Primer
18 pages
Dark Browser
No ratings yet
Dark Browser
6 pages
RLGG Crash
No ratings yet
RLGG Crash
20 pages
Python Partical QP - Bca
No ratings yet
Python Partical QP - Bca
2 pages
Pemodelan Basis Data
No ratings yet
Pemodelan Basis Data
15 pages
HandsOn Questions
No ratings yet
HandsOn Questions
36 pages
CSS Notes
No ratings yet
CSS Notes
5 pages
Jalur Belajar Web Development 2022 Untuk Pemula
No ratings yet
Jalur Belajar Web Development 2022 Untuk Pemula
1 page
Features of Java
No ratings yet
Features of Java
6 pages
String in Java
No ratings yet
String in Java
3 pages