Codestin Search App

Note: to open links in new tab use CTRL+click (Windows and Linux) or CMD+click (MacOS).

What is AgETL?

Agricultural Data Extract, Transform, and Load Framework is a set of functions written in python that allow you to process data files from different agricultural and plant science experiments and aggregate them into a standard database table in a central repository to make data available for different variety of data analyses.

The execution of functions for this step is divided into two notebook files and configuration files.

Extraction and Transformation processes:

Runs the Extraction and Transformation processes, and the user gets a CSV file where the data from different source files are aggregated and standardized into a single format.

Notebook file: extract-transform.ipynb

Configuration file: config_extract-transform.yml

Load processes

Loads the data into a single table in a data warehouse

Notebook file: load.ipynb

Configuration file: config_load.yml

If you are working on plant phenotyping experiments, we encourage you to follow the MIAPPE standards (https://www.miappe.org/) for creating your database tables.

How to run AgETL?

Option 1
- You should make a simple installation of either JupyterLab or Jupyter Notebook, or you also can install an environment management such as conda, mamba, or pipenv.
Option 2
- Using a Jupyter Hub enviroment.

Prerequisites

Option 1
- Using Requirements File

   pip install -r requirements.txt

option 2
- Install the requiered libraries using the pip package installer for Python.
PyYAML
```
    pip install pyyaml
```
Pandas
```
    pip install pandas
```
psycopg2
```
    pip install psycopg2 
```

Clone or download AgTC from the GitHub repository

Clone option
1. Open a new Jupyter Notebook Terminal
New > Terminal
1. Clone the GitHub repository
```
    git clone https://github.com/Purdue-LuisVargas/agETL.git
```
Download option
1. Download AgETL from the Github repository: https://github.com/Purdue-LuisVargas/agETL.
2. Unzip the entire folder, then copy (if running Jupyter locally) or upload the downloaded files (if using the Jupyter Hub environment) in your Jupyter Notebook directory.

Which files should I run?

To run the functions in AgETL you should open them in Jupyter Notebook, first modify the configuration file (.yml), and second run the Python functions (.ipynb). The process is divided into two tasks as it is indicated bellow:

Raw data files (input) --> Extraction and transformation --> standardized dataframe (output) --> Load

Extraction and Transformation: The first set of functions runs the Extract and Transform processes. It outputs a CSV file where the data from different source files have been aggregated and standardized into a single format.
```
  You need the following files:

      extract-transform.ipynb

      config_extract-transform.yml
```

Loading: The second group of functions is used to load data into a single table in the database.

  You need the following files:

      load.ipynb

      config_load.yml

To make the database connection you need to update the following information in the configuration file (config_load.yml), as the following examples:

Localhost database:

    DATABASE_CREDENTIALS:
        Host: localhost
        Dbname: wanglab
        user: postgres
        port: 5432
        password: **************WAdxm1

Cloud server database:

    DATABASE_CREDENTIALS:
        Host: containers-us-west-187.railway.app
        Dbname: railway
        user: postgres
        port: 7895
        password: **************WAdxm1

Cite as

Vargas-Rojas L, Ting T-C, Rainey KM, Reynolds M and Wang DR (2024) AgTC and AgETL: open-source tools to enhance data collection and management for plant science research. Front. Plant Sci. 15:1265073. doi: 10.3389/fpls.2024.1265073.

Contact

Diane Wang - [email protected]

Luis Vargas Rojas - [email protected]

Purdue University, Wang Lab dianewanglab.com

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
et_additional_information_files		et_additional_information_files
et_files_to_process		et_files_to_process
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
additional_configurations.yml		additional_configurations.yml
config_extract-transform.yml		config_extract-transform.yml
config_load.yml		config_load.yml
etl_functions.py		etl_functions.py
extract-transform.ipynb		extract-transform.ipynb
functions.py		functions.py
load.ipynb		load.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

What is AgETL?

How to run AgETL?

Prerequisites

Clone or download AgTC from the GitHub repository

Which files should I run?

Cite as

Contact

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

DS4Ag/AgETL

Folders and files

Latest commit

History

Repository files navigation

What is AgETL?

How to run AgETL?

Prerequisites

Clone or download AgTC from the GitHub repository

Which files should I run?

Cite as

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages