- Python library that helps structure data in
DataFramesand contains built-in data analysis functions.
import pandas as pd- Pandas is an
exploratory data analysis toolkitwith a rich set of attributes and methods - Pandas provide a wide range of
functionsandmethods - Widely used for
data cleaning,data exploration,data manipulation, anddata analysistasks. Toolkitfor reading, writing, accessing, filtering, grouping, aggregating, merging, joining, combining, reshaping, cleaning, selecting data and performing statistical computation. The financial term formultidimensional structured data setsisPanel- Supports various formats of data:
csv,tsv,txt,xls,xlsx,json, etc. Performance optimization( Changing data types, storage type )- Integrates well with other important libraries like
NumPy,Matplotlib,Seaborn,Scipy, etc.
- Time series support
- Handling missing values
- Grouped operations
- Categorical data support
- Merging and joining DataFrames
- Statistical functions
- Data visualization tools
| Data Type or Data Structure | Description |
|---|---|
pandas.Series() |
1D array is an object that can hold any data type. |
pandas.DataFrame() |
2D table is like a data structure that can hold multiple types of data in columns. |
| Attribute | Meaning |
|---|---|
df.index |
The row index labels of DataFrame ( Default: RangeIndex |
df.columns |
The column index labels of DataFrame (axis = 1) |
df.size |
Number of columns in DataFrame |
df.shape |
A tuple of rows and columns ( nrows, ncols ) |
df.ndim |
Number of dimensions in the DataFrame ( 1D, 2D, 3D ) |
df.values |
Values of DataFrame |
df.axes |
List containing index and columns indices in a DataFrame |
| Method | Use |
|---|---|
pd.read_csv(), pd.read_excel(), pd.read_json() |
Import data |
df.to_csv(), df.to_excel(), df.to_parquet() |
Export data |
df.head(), df.tail(), df.sample(),df.sort_values() |
Preview data |
df.query() |
Filter data |
df.iat[], df.at[], df.iloc[], df.loc[] |
Indexing and Slicing |
df.info() |
Metadata Information |
df.dropna(), df.fillna(), df.drop_duplicates(), df.rename(), df.set_index() |
Clean data |
df.apply(), df.map(), df.reduce(), df.explode() |
Transform data |
df.groupby(), df.groupby().agg(), df.groupby().aggregate() |
Group and aggregate data |
df.join(), df.merge(), df.concat() |
Combine data |
df.pivot_table(), df.stack(), df.unstack() |
Reshape data |
df.plot() |
Visualize data |
df.sum(), df.mean(), df.median(), df.max(), df.value_counts(), df.describe() |
Mathematical operations |
df.date_range(), df.to_datetime() |
Time Series analysis |
Seriesholdshomogeneousdata values, i.e. All data values are ofsamedata type.- Data axis labels are called as
index
# Create a series:
pd.Series([1, 2, 3, 4])
# Accessing a series:
DataFrame['SeriesName'] or DataFrame.SeriesName- Data is aligned in tabular form with
rowsandcolumns DataFrameis a sequence ofSeriesthat shares the sameindex- The Python equivalent of an Excel or SQL table which is used to store and analyze data.
# Empty DataFrame:
pd.DataFrame()
# Accessing DataFrame:
DataFrame[['SeriesName1', 'SeriesName2', 'SeriesName3']]