Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

dflib/dflib

Repository files navigation

Maven Central build test deploy

DFLib

DFLib ("DataFrame Library") is a lightweight pure Java implementation of a common DataFrame data structure. DataFrames exist in Python (pandas), R, Spark and other languages and frameworks. DFLib's DataFrame is specifically intended for Java and JVM languages.

With DataFrame API, you get essentially the same data manipulation capabilities you may be used to in SQL (such as joins, etc.), only you apply them in-memory and over dynamically defined "table" objects. While SQL is "declarative", DataFrame allows step-by-step transformations that are somewhat easier to understand and much easier to compose.

DataFrame is extremely versatile and can be used to model a variety of data tasks. ETL, log analysis, spreadsheets processing are just some of the examples. DFLib comes with connectors for many data formats: CSV, Excel, RDBMS, Avro, Parquet, JSON and can be easily adapted to other formats (e.g. web-based ones like Google Sheets, etc.)

While DFLib can be used in any Java application, there is a special integration with Jupyter Notebook, a browser-based interactive environment for data exploration and analysis popular among data scientists and engineers. In fact, our community maintains a Java "kernel" for Jupyter, which is a sister project of DFLib.

DFLib provides integration with Apache ECharts to visualize DataFrame data. Charts are generated in a form of HTML/JavaScript code and work in Jupyter as well as regular web applications. A few examples of charts made with DFLib:

dflib-stocks-candlestick-bar-chart dflib-player-weights-scatter-chart dflib-github-activity-heatmap-chart

Project Links

Presentation Videos