Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

deniskropp/t144

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Linguistic Transformation Pipeline Demo

This project demonstrates a linguistic transformation pipeline implemented in both Python and a custom DSL (KickLang). It is structured around a set of principles called Task-Agnostic Steps (TAS), which provide a generalized framework for designing and understanding data processing workflows.

Project Structure

  • TAS.json: The source of truth defining the Task-Agnostic Steps (TAS). Each step is a JSON object with a name, description, and other metadata.
  • ANALYSIS.md: A detailed document mapping the concepts in TAS.json to the concrete implementation in linguistic_transform_demo.kl.
  • linguistic_transform_demo.kl: A demo pipeline written in KickLang, a custom Domain-Specific Language for data transformations. It showcases how to define and compose reusable processing units.
  • demo_pipeline.py: A Python implementation of the same linguistic pipeline. It serves as a more familiar reference for developers and demonstrates how TAS can be applied in a general-purpose language.
  • README.md: This file—an entry point for contributors and users.

Getting Started

Python Demo

The Python demo provides a straightforward implementation of the pipeline and requires no external dependencies.

  1. Run the script:

    python demo_pipeline.py
  2. Review the output: The script will generate two files:

    • py_normalized_output.json: Contains the result of the text normalization transform.
    • py_analysis_output.json: Contains the result of the full composite analysis pipeline.

KickLang Demo

The KickLang (.kl) file is a conceptual demonstration and is not executable on its own. It illustrates a declarative approach to building data pipelines. Read the ANALYSIS.md file for a detailed walkthrough of how its structure aligns with the TAS framework.

The TAS Framework

The Task-Agnostic Steps (TAS) are a set of seven high-level, language-independent concepts for building modular and scalable data workflows. They are defined in TAS.json and serve as the architectural foundation for this project.

  1. Ingest Heterogeneous Sources: Load data from various sources.
  2. Declare External Dependencies: Import necessary libraries or modules.
  3. Define Reusable Transformation Unit: Create self-contained processing blocks.
  4. Specify Input-Output Contract: Define the data schema for inputs and outputs.
  5. Compose Multiple Transformations: Chain together transformation units.
  6. Apply Transformation to Data Source: Execute a transformation on data.
  7. Persist Transformation Results: Save the output of a transformation.

How to Contribute

This project is a conceptual demonstration, but contributions are welcome to expand its scope or improve its clarity.

Areas for Contribution

  • Expand the Python Demo: Add more complex NLP features (e.g., using NLTK or spaCy) to make the pipeline more realistic.
  • Improve Documentation: Enhance the explanations in ANALYSIS.md or this README.
  • Add New Demos: Implement the TAS framework in other languages (e.g., JavaScript, Go, Rust) to further demonstrate its universality.
  • Refine TAS Definitions: Propose improvements or clarifications to the step definitions in TAS.json.

Contribution Workflow

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and ensure they align with the project's goals.
  4. Submit a pull request with a clear description of your changes.

About

Linguistic Transformation Pipeline Demo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages