0% found this document useful (0 votes)

19 views14 pages

Pre ML Practise

pre-ml-practise

Uploaded by

bpkdeveloper45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views14 pages

Pre ML Practise

pre-ml-practise

Uploaded by

bpkdeveloper45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

NumPy is a powerful Python library used for numerical computing.

It provides support for large, multi-dimensional arrays and matrices, along with a collection
of mathematical functions to operate on these arrays efficiently. NumPy is widely used in scientific computing, machine learning, data analysis, and other fields
where numerical operations on large datasets are common
STACKING :

PANDAS:

Pandas is a popular open-source Python library used for data manipulation and analysis. It provides data structures like DataFrame and Series
that are designed to make working with structured data easy and intuitive.

Key features of pandas include:

1. DataFrame: A two-dimensional, labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL
table.
2. Series: A one-dimensional labeled array capable of holding data of any type.
3. Data manipulation: Pandas provides a wide range of functions for data cleaning, reshaping, merging, slicing, indexing, and more.
4. Data import/export: Supports reading and writing data in various formats like CSV, Excel, SQL databases, and more.
5. Missing data handling: Provides tools for dealing with missing data, such as filling in missing values or dropping rows/columns with
missing data.
6. Time series data: Includes functionalities for working with time series data, such as date range generation, shifting, and frequency
conversion.
7. Powerful indexing: Supports various methods of indexing and selecting data, including label-based indexing with loc, integer-based
indexing with iloc, and boolean indexing.
8. Groupby: Allows splitting data into groups based on some criteria and then applying functions to each group independently.
9. Plotting: Integration with Matplotlib for creating visualizations directly from pandas data structures.

Overall, pandas is widely used in data analysis, data cleaning, data preprocessing, and various other data-related tasks in Python.

PIP INSTALL PANDAS

1. Google Dataset
Search
Type of data: Miscellaneous
Data compiled by: Google
Access: Free to search, but does include some fee-based search results
Sample dataset: Global price of coffee, 1990-present
2. Kaggle
Type of data: Miscellaneous
Data compiled by: Kaggle
Access: Free, but registration required
Sample dataset: Daily temperature of major cities

3. Data.Gov
Type of data: Government
Data compiled by: US Federal Government
Access: Free, no registration required
Sample dataset: Lobster Report for Transshipment and Sales

4. Datahub.io
Type of data: Mostly business and finance
Data compiled by: Datahub
Access: Mostly free, no registration required
Sample dataset: Average mass of glaciers since 1945

5. UCI Machine Learning Repository

Type of data: Machine learning
Data compiled by: University of California Irvine
Access: Free, no registration required
Sample dataset: Behavior of urban traffic in Sao Paulo, Brazil

5. Earth Data
Type of data: Earth science
Data compiled by: NASA
Access: Free, no registration required
Sample dataset: Environmental conditions during fall moose hunting season in Alaska, 2000-2016

6. CERN Open Data Portal

Type of data: Particle Physics
Data compiled by: CERN
Access: Free, no registration required
Sample dataset: Higgs candidate collision events from 2011 and 2012

7. Global Health Observatory Data Repository

Type of data: Health
Data compiled by: UN World Health Organization
Access: Free, no registration required
Sample dataset: Polio immunization coverage estimates by region
8. BFI film industry statistics
Type of data: Entertainment and film
Data compiled by: British Film Institute
Access: Free, no registration required
Sample dataset: Weekend box office figures from 2001-present

9. NYC Taxi Trip Data

Type of data: Transport
Data compiled by: New York City Taxi and Limousine Commission
Access: Free, no registration required
Sample dataset: Take your pick!

10. FBI Crime Data Explorer

Type of data: Crime and drugs
Data compiled by: Federal Bureau of Investigation
Access: Free, no registration required
Sample dataset: Homicide offense counts in Point Pleasant, 2008-2018

CREATING A DATASET AFTER DOWNLOSING

OPERRATIONS ON DATA FRAME

MATPLOTLIB:

Matplotlib is a popular Python library used for creating static, animated, and interactive visualizations in Python. It provides a wide variety of plots
and charts, including line plots, bar charts, histograms, scatter plots, and more.

Key features of Matplotlib include:

1. Wide range of plots: Matplotlib supports a wide variety of plots and charts, making it suitable for many different types of data visualization
tasks.
2. Customization: Matplotlib allows for extensive customization of plots, including colors, labels, fonts, line styles, and more.
3. Publication-quality output: Matplotlib is designed to produce high-quality plots suitable for publication and presentation.
4. Integration with Jupyter notebooks: Matplotlib integrates well with Jupyter notebooks, allowing for interactive plotting within the
notebook environment.
5. Backend support: Matplotlib supports multiple backends for rendering plots, including rendering to file formats like PNG, PDF, SVG, and
interactive backends for displaying plots in GUI applications.
6. Compatibility: Matplotlib is compatible with a wide range of Python versions and platforms, including Windows, macOS, and Linux.

Overall, Matplotlib is a powerful and flexible library for creating a wide variety of plots and visualizations in Python.
Seaborn:
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative
statistical graphics. Seaborn is built on top of Matplotlib and integrates closely with pandas data structures, making it particularly useful for
working with data frames and arrays.

Key features of Seaborn include:

1. High-level interface: Seaborn provides a simple and intuitive interface for creating complex visualizations with just a few lines of code.
2. Attractive default styles: Seaborn comes with several built-in themes and color palettes that make it easy to create visually appealing plots.
3. Statistical plotting: Seaborn includes several functions for visualizing statistical relationships in data, such as scatter plots, box plots, violin
plots, and more.
4. Integration with pandas: Seaborn works seamlessly with pandas data frames, making it easy to plot data directly from a data frame.
5. Flexible customization: Seaborn allows for extensive customization of plots, including control over colors, styles, and other visual
properties.
6. Wide range of plots: Seaborn supports a wide range of plot types, including heatmaps, pair plots, joint plots, and more.
7. Works well with Jupyter notebooks: Seaborn integrates well with Jupyter notebooks, allowing for interactive plotting and data exploration.

Overall, Seaborn is a powerful and versatile library for creating informative and visually appealing plots in Python.
h ps://seaborn.pydata.org/

Scikit-learn is a popular open-source machine learning library for Python. It provides simple and efficient tools for data mining and data analysis,
built on NumPy, SciPy, and Matplotlib.
Key features of scikit-learn include:

1. Consistent interface: Scikit-learn provides a consistent API for various machine learning algorithms, making it easy to experiment with
different models.
2. Supervised and unsupervised learning algorithms: Scikit-learn includes a wide range of supervised and unsupervised learning algorithms,
including support for classification, regression, clustering, dimensionality reduction, and more.
3. Easy to use: Scikit-learn is designed to be easy to use, with a focus on simplicity and readability of code.
4. Integration with other Python libraries: Scikit-learn integrates well with other Python libraries, such as NumPy, pandas, and Matplotlib,
making it easy to use in conjunction with these libraries.
5. Model evaluation and validation: Scikit-learn provides tools for model evaluation and validation, including functions for cross-validation,
grid search, and performance metrics.
6. Community and support: Scikit-learn has a large and active community of users and developers, providing support and contributing to the
development of the library.

Overall, scikit-learn is a powerful and versatile library for machine learning in Python, suitable for both beginners and experts alike.

Statistics play a crucial role in machine learning, as they provide the foundation for many machine learning algorithms and techniques. Here are
some key statistics concepts that are important for machine learning:

1. Descriptive statistics: Descriptive statistics are used to summarize and describe the main features of a dataset. This includes measures such
as mean, median, mode, variance, standard deviation, and percentiles.
2. Probability distributions: Probability distributions describe the likelihood of different outcomes in a dataset. Common probability
distributions used in machine learning include the normal distribution, binomial distribution, and Poisson distribution.
3. Statistical inference: Statistical inference involves drawing conclusions about a population based on a sample of data. This includes
hypothesis testing and confidence intervals.
4. Correlation and covariance: Correlation measures the relationship between two variables, while covariance measures the extent to which
two variables change together.
5. Regression analysis: Regression analysis is used to model the relationship between a dependent variable and one or more independent
variables. It is commonly used for prediction in machine learning.
6. Classification: Classification is a type of supervised learning where the goal is to predict the class label of new observations based on past
observations. Statistics provides the theoretical foundation for many classification algorithms, such as logistic regression and decision
trees.
7. Clustering: Clustering is an unsupervised learning technique where the goal is to group similar data points together. Statistics provides
methods for measuring the similarity between data points, such as distance metrics and clustering algorithms.
8. Dimensionality reduction: Dimensionality reduction techniques, such as principal component analysis (PCA) and t-distributed stochastic
neighbor embedding (t-SNE), are used to reduce the number of variables in a dataset while preserving important information. Statistics
provides the theoretical basis for these techniques.

Understanding these statistics concepts is essential for effectively applying machine learning algorithms and interpreting their results.

Linear algebra is a fundamental mathematical tool in machine learning, as it provides the basis for many machine learning algorithms and
concepts. Here are some key linear algebra concepts that are important for machine learning:

1. Vectors and matrices: Vectors and matrices are used to represent data in machine learning. A vector is a one-dimensional array of
numbers, while a matrix is a two-dimensional array. Vectors and matrices are used to represent features, labels, and parameters in machine
learning models.
2. Matrix operations: Linear algebra provides several operations for manipulating matrices, such as addition, subtraction, multiplication, and
transposition. These operations are used in various machine learning algorithms for tasks like data transformation, model training, and
prediction.
3. Dot product and matrix multiplication: The dot product of two vectors and the matrix multiplication of two matrices are important
operations in linear algebra. They are used in machine learning for computing similarities between vectors, transforming data, and
updating model parameters during training.
4. Eigenvalues and eigenvectors: Eigenvalues and eigenvectors are important concepts in linear algebra that are used in machine learning for
dimensionality reduction, feature extraction, and solving systems of linear equations.
5. Singular value decomposition (SVD): SVD is a matrix factorization technique that is used in machine learning for dimensionality reduction,
data compression, and noise reduction.
6. Norms: Norms are used to measure the magnitude of vectors and matrices. Different norms, such as the L1 norm, L2 norm, and Frobenius
norm, are used in machine learning for regularization, error calculation, and model evaluation.
7. Linear transformations: Linear transformations are used to transform data in machine learning. They are used in algorithms like principal
component analysis (PCA) and linear regression.
8. Vector spaces and subspaces: Vector spaces and subspaces are used to define the mathematical properties of vectors and matrices in
machine learning.

Understanding these linear algebra concepts is essential for effectively implementing and understanding many machine learning algorithms.
Probability is a fundamental concept in machine learning, as it provides a framework for reasoning about uncertainty and making predictions
based on data. Here are some key probability concepts that are important for machine learning:

1. Probability distributions: Probability distributions describe the likelihood of different outcomes in a dataset. Common probability
distributions used in machine learning include the normal distribution, binomial distribution, and Poisson distribution.
2. Conditional probability: Conditional probability measures the probability of an event occurring given that another event has already
occurred. It is used in machine learning for modeling dependencies between variables.
3. Bayes' theorem: Bayes' theorem is a fundamental theorem in probability theory that describes how to update the probability of a
hypothesis based on new evidence. It is used in machine learning for Bayesian inference and probabilistic modeling.
4. Expectation and variance: Expectation is a measure of the central tendency of a random variable, while variance is a measure of its spread.
These concepts are used in machine learning for model evaluation and optimization.
5. Joint, marginal, and conditional probability distributions: Joint probability distributions describe the probabilities of multiple events
occurring together, while marginal probability distributions describe the probabilities of individual events. Conditional probability
distributions describe the probabilities of events given certain conditions. These concepts are used in machine learning for modeling
complex dependencies between variables.
6. Maximum likelihood estimation (MLE): MLE is a method for estimating the parameters of a probability distribution based on observed
data. It is used in machine learning for fitting probabilistic models to data.
7. Naive Bayes classifier: The Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive)
independence assumptions between the features. It is commonly used in machine learning for text classification and spam filtering.
8. Probabilistic graphical models: Probabilistic graphical models are a framework for modeling complex probabilistic relationships between
variables. They are used in machine learning for representing and reasoning about uncertainty in data.

Understanding these probability concepts is essential for effectively applying probabilistic models and reasoning in machine learning.
FINALLY INSTALL ANACONDA -- & FROM THAT INSTALL THIS JUPITER NOTEBBOK :

Now start working on your project

…
Filename: Document1
Directory:
Template: Normal.dotm
Title:
Subject:
Author: sandanakari sachin
Keywords:
Comments:
Crea on Date: 3/13/2024 7:00:00 PM
Change Number: 1
Last Saved On:
Last Saved By:
Total Edi ng Time: 1,399 Minutes
Last Printed On: 3/14/2024 6:29:00 PM
As of Last Complete Prin ng
Number of Pages: 13
Number of Words: 2,550 (approx.)
Number of Characters: 14,538 (approx.)

Sic Ip Service Handbook 2.3 en
No ratings yet
Sic Ip Service Handbook 2.3 en
91 pages
25 Zero Investment Business Ideas
No ratings yet
25 Zero Investment Business Ideas
109 pages
Python
No ratings yet
Python
29 pages
Reimagining Semiconductor Development Machine Learning Applications From Device Physics To System Architectures Survey Paper
No ratings yet
Reimagining Semiconductor Development Machine Learning Applications From Device Physics To System Architectures Survey Paper
8 pages
Introduction To Python For Data Analysis and Visualization 2
No ratings yet
Introduction To Python For Data Analysis and Visualization 2
24 pages
Pai 6
No ratings yet
Pai 6
17 pages
Unit 5 Python Notes HM
No ratings yet
Unit 5 Python Notes HM
59 pages
V Unit
No ratings yet
V Unit
17 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Op Jeeva1
No ratings yet
Op Jeeva1
36 pages
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
14 pages
ML Assigment 1
No ratings yet
ML Assigment 1
6 pages
Expt-1 Dav
No ratings yet
Expt-1 Dav
5 pages
Practical 1
No ratings yet
Practical 1
2 pages
Asm 135233
No ratings yet
Asm 135233
3 pages
Common Python Packages For FinML
No ratings yet
Common Python Packages For FinML
7 pages
00 Dm2 Python Libraries4data Science 2020
No ratings yet
00 Dm2 Python Libraries4data Science 2020
7 pages
34.data Visualiztion Tools
No ratings yet
34.data Visualiztion Tools
4 pages
Dot Net Core - Lab-1
No ratings yet
Dot Net Core - Lab-1
30 pages
13 - Data Visualization
No ratings yet
13 - Data Visualization
15 pages
Ramdump Modem 2024-09-16 16-47-44 Props
No ratings yet
Ramdump Modem 2024-09-16 16-47-44 Props
25 pages
External VGA - GPU For Laptops Using EXP GDC Beast - 15 Steps (With Pictures) - Instructables
No ratings yet
External VGA - GPU For Laptops Using EXP GDC Beast - 15 Steps (With Pictures) - Instructables
5 pages
ML Assignment - 1
No ratings yet
ML Assignment - 1
7 pages
NTRN15DA.3 (6500 R12.6 PhotonicLayerGuide) Issue1
100% (1)
NTRN15DA.3 (6500 R12.6 PhotonicLayerGuide) Issue1
136 pages
CN Lec2
No ratings yet
CN Lec2
49 pages
Data Science
No ratings yet
Data Science
14 pages
Staple Python Libraries For Data Science
No ratings yet
Staple Python Libraries For Data Science
26 pages
Matplotlib Merged Merged
No ratings yet
Matplotlib Merged Merged
93 pages
Digital Audio Workstation Meaning
No ratings yet
Digital Audio Workstation Meaning
10 pages
Practical 1
No ratings yet
Practical 1
8 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
DAX Interview Questions
No ratings yet
DAX Interview Questions
8 pages
Numpy Code
No ratings yet
Numpy Code
10 pages
Unit 4
No ratings yet
Unit 4
105 pages
21bcp420 ML Lab Report
No ratings yet
21bcp420 ML Lab Report
69 pages
Types of Components and Objects To Be Measured
No ratings yet
Types of Components and Objects To Be Measured
23 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
DAV Exp.1-8 Output
No ratings yet
DAV Exp.1-8 Output
19 pages
IBM Dumps
No ratings yet
IBM Dumps
31 pages
Mastering CCNARouting Fundamentals 654 FCFC 9 Da 692 A 4 D
No ratings yet
Mastering CCNARouting Fundamentals 654 FCFC 9 Da 692 A 4 D
12 pages
Top 20 Python Libraries For Data Science
No ratings yet
Top 20 Python Libraries For Data Science
15 pages
Module 4
No ratings yet
Module 4
57 pages
Combinepdf
No ratings yet
Combinepdf
101 pages
UNV【Datasheet】 IPC2122LB-SF28 (40) -A-BY 2MP Mini Fixed Bullet Network Camera Datasheet V1.1-EN
No ratings yet
UNV【Datasheet】 IPC2122LB-SF28 (40) -A-BY 2MP Mini Fixed Bullet Network Camera Datasheet V1.1-EN
4 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
6 pages
Combinepdf
No ratings yet
Combinepdf
77 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
DA&V Module 6 (SAMI)
No ratings yet
DA&V Module 6 (SAMI)
10 pages
MS SQL Administrator Resume
No ratings yet
MS SQL Administrator Resume
1 page
15 Python Libraries For Data Science
No ratings yet
15 Python Libraries For Data Science
17 pages
PYTHON
No ratings yet
PYTHON
11 pages
Dsbda Unit4
No ratings yet
Dsbda Unit4
110 pages
10 Essential Python Libraries For Data Professionals - by Sigli Mumuni - Medium
No ratings yet
10 Essential Python Libraries For Data Professionals - by Sigli Mumuni - Medium
6 pages
Tutorial 03 Latch FF State Machines 1
No ratings yet
Tutorial 03 Latch FF State Machines 1
81 pages
Famos Heat Sealers EN
No ratings yet
Famos Heat Sealers EN
18 pages
System Based Error Book
No ratings yet
System Based Error Book
16 pages
Computer Chapter-2
No ratings yet
Computer Chapter-2
33 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
TOP 7 Python Libraries For DATA Visualization!!
No ratings yet
TOP 7 Python Libraries For DATA Visualization!!
9 pages
Libraries For Data Science
No ratings yet
Libraries For Data Science
2 pages
Unit 3 (Python)
No ratings yet
Unit 3 (Python)
29 pages
Windows User Account Management Lab
No ratings yet
Windows User Account Management Lab
3 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
Case Study On Systems Consideration in Hris
No ratings yet
Case Study On Systems Consideration in Hris
10 pages
Exp 1
No ratings yet
Exp 1
22 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Visualización
No ratings yet
Visualización
12 pages
Libraries
No ratings yet
Libraries
3 pages
Important Libraries For Data Science
No ratings yet
Important Libraries For Data Science
29 pages
Aqib-Sr DevOps Eng
No ratings yet
Aqib-Sr DevOps Eng
2 pages
Day2Part2. DataVisualization
No ratings yet
Day2Part2. DataVisualization
29 pages
NEW Centurylink Tutorial
No ratings yet
NEW Centurylink Tutorial
10 pages
Section6Exercise1 MakingPredictions ParticulateMatterExposure PDF
No ratings yet
Section6Exercise1 MakingPredictions ParticulateMatterExposure PDF
66 pages
Top 18 Python Libraries for Data Science
100% (1)
Top 18 Python Libraries for Data Science
11 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
Procedural Lab Use The Teamcenter Environment Manager To Deploy The Template Project
No ratings yet
Procedural Lab Use The Teamcenter Environment Manager To Deploy The Template Project
2 pages
Machine Learning Document
No ratings yet
Machine Learning Document
7 pages
Data Science Lecture 5 6th Semster
No ratings yet
Data Science Lecture 5 6th Semster
3 pages
Crash
No ratings yet
Crash
4 pages
Data Analysis Library: by Muthu Priya J 19MZ06
No ratings yet
Data Analysis Library: by Muthu Priya J 19MZ06
3 pages
AIES Assignment1
No ratings yet
AIES Assignment1
15 pages
ICAML 2021: 3 International Conference On Applications of AI & Machine Learning
No ratings yet
ICAML 2021: 3 International Conference On Applications of AI & Machine Learning
2 pages
DVAP - Final Project Report
No ratings yet
DVAP - Final Project Report
27 pages
Basic Libraries For Data Science
No ratings yet
Basic Libraries For Data Science
4 pages
Data Ty
No ratings yet
Data Ty
59 pages
People Also Ask: How To Download Scribd Documents For Free - Techjunkie
No ratings yet
People Also Ask: How To Download Scribd Documents For Free - Techjunkie
7 pages
Mathematical Attack On RSA
No ratings yet
Mathematical Attack On RSA
5 pages
Visualization - Python Data Analysis
No ratings yet
Visualization - Python Data Analysis
13 pages
Core Libraries For Machine Learning
No ratings yet
Core Libraries For Machine Learning
5 pages

Pre ML Practise

Uploaded by

Pre ML Practise

Uploaded by

NumPy is a powerful Python library used for numerical computing.

Key features of pandas include:

PIP INSTALL PANDAS

5. UCI Machine Learning Repository

6. CERN Open Data Portal

7. Global Health Observatory Data Repository

9. NYC Taxi Trip Data

10. FBI Crime Data Explorer

CREATING A DATASET AFTER DOWNLOSING

OPERRATIONS ON DATA FRAME

Key features of Matplotlib include:

Key features of Seaborn include:

Now start working on your project

You might also like