Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@rsteele5
Copy link
Collaborator

Goals:

  • Consolidate pipeline executor functionality (Extract, Classification, and [LLM] Indexing)
    • Reducing code duplication
    • Making pipeline related parsing and execution functions consistent (and modular)
  • Reorganize certain executor into functional packages
    • NOTE: this only effects PipelineExecutor and DocumentLLMPipelineExecutor.
    • PipelineExecutor has become a base class and is not longer intended for direct instantiation.
    • DocumentLLMPipelineExecutor is now LLMIndexerExecutor
    • MetaMergeExecutor has been derived from the merge_metadata function and is no longer in PipelineExecutor
  • Convert all lingering executor imports to absolute imports

Additional Requested Feature:

  • Add a force OCR regeneration feature to the TextExtractionExecutor
    • This executor attempts to parse a "force" attribute from an "ocr" "extract" feature in the request's feature list. If true, any existing or cached OCR will be ignored.
    • Example: {"name":"ocr","type":"extract","force":true}
  • check and update documentation. See guide and ask the team.

rsteele added 9 commits September 11, 2025 16:05
* Replace relative imports in executor init's with absolute imports
* Move extract.util to executor package level
* Separate extract.util into asset and into asset_util.py and request_util.py
* moved store_metadata pipeline function to base_pipeline.py
* All pipelines generate their root_asset_dir from the MARIE_CACHE dir
* Removed classification processing from llm_pipeline.py
* Add OCR disable flag to classification pipeline (default True)
* generalized pipeline component enabled status
* Feature example:{"name":"ocr","type":"extract","force":true}
@rsteele5 rsteele5 force-pushed the refactor/pipeline-executors branch from 1ae0e05 to 1d9ef37 Compare September 11, 2025 21:18
@rsteele5 rsteele5 mentioned this pull request Sep 23, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant