Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ AgETL Public

Function to process data files from different agricultural and plant science experiments and aggregate them into a standard database table in a central repository to make data available for different variety of data analyses.

License

DS4Ag/AgETL

Repository files navigation

Wang lab logo

Python version JupyterLab Jupyter Notebook YAML 1.2

DOI

Note: to open links in new tab use CTRL+click (Windows and Linux) or CMD+click (MacOS).

What is AgETL?

Agricultural Data Extract, Transform, and Load Framework is a set of functions written in python that allow you to process data files from different agricultural and plant science experiments and aggregate them into a standard database table in a central repository to make data available for different variety of data analyses.

The execution of functions for this step is divided into two notebook files and configuration files.

  • Extraction and Transformation processes:

Runs the Extraction and Transformation processes, and the user gets a CSV file where the data from different source files are aggregated and standardized into a single format.

Notebook file: extract-transform.ipynb

Configuration file: config_extract-transform.yml
  • Load processes

Loads the data into a single table in a data warehouse

Notebook file: load.ipynb

Configuration file: config_load.yml

If you are working on plant phenotyping experiments, we encourage you to follow the MIAPPE standards (https://www.miappe.org/) for creating your database tables.

How to run AgETL?

  • Option 1

    • You should make a simple installation of either JupyterLab or Jupyter Notebook, or you also can install an environment management such as conda, mamba, or pipenv.
  • Option 2

Prerequisites

  • Option 1
    • Using Requirements File
   pip install -r requirements.txt

Clone or download AgTC from the GitHub repository

  • Clone option

    1. Open a new Jupyter Notebook Terminal

    New > Terminal

    1. Clone the GitHub repository
        git clone https://github.com/Purdue-LuisVargas/agETL.git
    
  • Download option

    1. Download AgETL from the Github repository: https://github.com/Purdue-LuisVargas/agETL.
    2. Unzip the entire folder, then copy (if running Jupyter locally) or upload the downloaded files (if using the Jupyter Hub environment) in your Jupyter Notebook directory.

Which files should I run?

To run the functions in AgETL you should open them in Jupyter Notebook, first modify the configuration file (.yml), and second run the Python functions (.ipynb). The process is divided into two tasks as it is indicated bellow:

Raw data files (input) --> Extraction and transformation --> standardized dataframe (output) --> Load

  • Extraction and Transformation: The first set of functions runs the Extract and Transform processes. It outputs a CSV file where the data from different source files have been aggregated and standardized into a single format.

      You need the following files:
    
          extract-transform.ipynb
    
          config_extract-transform.yml
    
  • Loading: The second group of functions is used to load data into a single table in the database.

      You need the following files:
    
          load.ipynb
    
          config_load.yml
    

    To make the database connection you need to update the following information in the configuration file (config_load.yml), as the following examples:

    • Localhost database:
        DATABASE_CREDENTIALS:
            Host: localhost
            Dbname: wanglab
            user: postgres
            port: 5432
            password: **************WAdxm1
    
    • Cloud server database:
        DATABASE_CREDENTIALS:
            Host: containers-us-west-187.railway.app
            Dbname: railway
            user: postgres
            port: 7895
            password: **************WAdxm1
    

Cite as

Vargas-Rojas L, Ting T-C, Rainey KM, Reynolds M and Wang DR (2024) AgTC and AgETL: open-source tools to enhance data collection and management for plant science research. Front. Plant Sci. 15:1265073. doi: 10.3389/fpls.2024.1265073.

Contact

Diane Wang - [email protected]

Luis Vargas Rojas - [email protected]

Purdue University, Wang Lab dianewanglab.com

About

Function to process data files from different agricultural and plant science experiments and aggregate them into a standard database table in a central repository to make data available for different variety of data analyses.

Resources

License

Stars

Watchers

Forks

Packages

No packages published