Fedex

Introduction

FEDEx is a system that assists in the process of EDA (Exploratory Data Analysis) sessions - you can use FEDEx API instead of pandas and execute various operations (currently supports Filter, Group By and Join) on your data on real-time and it will generate NL explanations + Visualizations to your queries results. The explanations are coherent and costumized specifically to your query - It explains what is actually interesting in the query itself or it's result dataframe.

The PD-EXPLAIN Python library

FEDEx is now a part of PD-EXPLAIN! an easy-to-use Python wrapper for Pandas. Install by:

  pip install pd-explain

For more information, visit the PD-EXPLAIN Github page

How it works

FEDEx is built of multiple parts, the high level process is:

The user enters input dataframe and a query (Filter/GroupBy/Join) and it's parameters.
FEDEx executes the query
Then FEDEx calculates an Interestingness Measure (that works well with the specific operation, for example Exceptionality measure for Filter and Join operations) for every column in the output dataframe (the query result)
FEDEx finds the most interesting columns and partition them to set of rows.
Then It finds the set-of-rows that affects the Interesingness measure result the most (from [2]).
Now FEDEx takes the top columns and set-of-rows and generates meaningful explanations

For the full details, you can either view the code or read our article which will be referenced here really soon:)

Example

We used the spotify dataset from Kaggle. The first operation of our user was SELECT * FROM Spotify WHERE popularity > 65;

The raw output (Snip) -

The generated explanation -

The second operation of the user was SELECT AVG(dancability), AVG(loudness) FROM [SELECT * FROM Spotify WHERE year >= 1990] GROUPBY year;

The raw output (Snip) -

The generated explanation -

Usage

Notice - This project was tested on python version 3.6-3.8.

First, you have to install the requirements - py -3 -m pip install -r requirements.txt

Secondly, you should install latex on your system (the explanations inside the graphs require that). Things will still work even without latex but the experince might be a bit inferior.

For now, you can view usage examples at Notebooks folder and at UserStudyInteractive.py. We are currently working on a better API that will allow users to use pandas and generate explanations without effort and without using additional dedicated API. You can get sense of how it will work at the Interactive notebooks. You should use the functions join, filter_ and group_by. If you want to disable FEDEX-Sampling - you should set SAMPLE global variable at UserStudyInteractive.py to 0.

Notice that UserStudyInteractive.py was designed to be used inside a jupyter notebook, so you should use jupyter notebook or to make several minor changes.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Images		Images
Measures		Measures
Notebooks		Notebooks
Operations		Operations
.gitignore		.gitignore
DatasetRelation.py		DatasetRelation.py
README.md		README.md
UserStudyInteractive.py		UserStudyInteractive.py
__init__.py		__init__.py
kstest.py		kstest.py
measures_code_example.py		measures_code_example.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fedex

Introduction

The PD-EXPLAIN Python library

How it works

Example

Usage

About

Uh oh!

Releases

Packages

Languages

Leye5555/FedEx

Folders and files

Latest commit

History

Repository files navigation

Fedex

Introduction

The PD-EXPLAIN Python library

How it works

Example

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages