Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DocumentAgent (Phase 1) #438

@marklysze

Description

@marklysze

--FEEDBACK WELCOME--

DocumentAgent will be a document-based agent, able to ingest documents/sources of information and have that knowledge accessible to achieve its given task.

Examples of use-cases:

  • Document classification
  • Document/Page summarisation
  • Question Answering
  • Identify missing information
  • Invoice handling

The objective for this Phase is to provide a quick-start agent that developers can incorporate easily.

This DocumentAgent will include RAG capabilities and, so, it will be built progressively, with this Phase 1 implementation containing basic RAG capabilities such as being able to ingest and then embed into a vector database. Future implementations will include more advanced RAG capabilities and engines, as well as additional capabilities for document transformation.

Capabilities include:

  • Input: Read one or more TXT, CSV, PDF, HTML, Markdown, PPTX, JSON
  • Extract and store data, including into an intermediate format (such as Doclings DoclingDocument format)
  • Developer determined handling (put in prompt, use vector database, use third party query engine)
  • Query data, including support for 3rd party querying
  • Support for Structured Outputs to control output format

Example code (not final API):

# Most basic
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=...,
    sources="my_file.txt")

# Multiple sources, supporting different types
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=...,
    sources=[my_file_name_with_path, "https://my.url.com"]

# Storage and Retrieval from a Vector database
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=...,
    sources=[my_file_name_with_path, my_file_name_with_path],
    handling_config = DocumentHandlingConfig(document_types=[DocType.Text, DocType.XLSX], storage=DocumentStore.Weaviate, settings={...})

# 3rd-party query engine (or this could be an agent built on DocumentAgent, e.g. DocumentAgentAgentQL)
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=None,
    sources="https://my.url.com",
    handling_config = None,
    query_config = DocumentQueryConfig(document_types=[DocType.URL], provider=DocQueryProvider.AgentQL, settings={...})

Internal agent workflow:

  1. Load/Convert the document through handling configuration (defaulted for easy of use)
  2. Uses query configuration to respond to queries (e.g. inject full source into system message, query vector store and inject into system message, run external provider)

Notes:

  • The use of a common intermediate format may be important, such as using Docling for document parsing and their Docling Document format for local storage. This could provide a good basis for standardised tools for this agent.

Deliverables:

  • DocumentAgent code
  • Documentation
  • Blog
  • Notebook
  • Video script

Sub-issues

Metadata

Metadata

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions