PANDAS
Python’s library for data analysis
Name derived from “panel data system” meaning multidimensional, structured data sets
Data analysis- process of evaluating big data sets using analytical and statistical tools
Main author of PANDAS – WES McKinney
Why pandas
Can read and write in many diff. data formats(integer,float,double etc.)
Can calculate in all possible ways data is organized
Can easily select subsets of data from bulky data sets and even combine multiple datasets
together.
Allows you to apply operations to independent groups within data
Supports reshaping of data
Supports advance time-series
Supports visualization
DATA STRUCTURE- specific way of storing and organizing data in a computer to suit specific purpose
so can be accessed easily.
SERIES- 1dimensional data structure of Pandas
It has 2 main components- a)an array of actual data, b) an associated array of indexes
DATAFRAME- 2-dimensional structure of pandas
SNO. SERIES DATAFRAME
1 1D 2D
2 HOMOGENEOUS HETEROGENEOUS
3 VALUE MUTABLE VALUE MUTABLE
4 SIZE IMMUTABLE SIZE MUTABLE
SNO Series object Lists
.
1 1D CAN BE 1D AND MULTI-D BOTH
2 CAN HAVE NUMERIC AND LABELS INDEXES ONLY NUMERIC INDEXES
3 SUPPORTS EXPLICIT INDEXING ONLY SUPPORTS IMPLICIT INDEXING
4 INDEXES CAN BE DUPLICATED INDEXES CANNOT BE DUPLICATED
5 HOMOGENEOUS ELEMENTS HETEROGENOUS ELEMENTS
SERIES V.S DICTIONARY
SNO SERIES DICTIONARY
.
1 1D 1D AND MULTI- D WITH NETED DIC
2 VALUES / KEYS- DIC & VALUES/LABELS-SERIES KEYS-INDEXES & VALUES-ELEMENTS
3 INDEXES CAN BE NUMBERS AND LABELS ONLY KEYS – IMMUTABLE
SERIES V/S NDARRAY
SNO SERIES NDARRAY
.
1 HOMOGENEOUS HOMOGENEOUS
2 SUPPORTS EXPLICIT INDEXING DOES NOT SUPPORT EXPLICIT INDEX
3 BOTH INDEXES AND STRING TYPE ONLY NUMERIC TYPES
4 PERFORM VECT. OPER. IF SHPES ARE DIFFE. PERFORM VECTORIZED OPER. IF THEIR
USING NaN FOR NON-MATCHING LABELS SHAPES MATCH
5 TAKES MORE MEMORY TAKES LESSER MEMORY
SNO DATAFRAME VS NDARRAY
.
1 2D 2D
2 HETEROGENEOUS HOMOGENEOUS
3 CAN HAVE BOTH INDEXES AND LABELS FOR R INDEXED BY TUPLE OF +VE INT FOR
AND C BOTH AXES
4 CONSUMES MORE MEMORY TAKES LESS MEMORY
5 EXPANDABLE NOT EXPANDABLE