Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

idea: Production Level Queue System #580

@dan-menlo

Description

@dan-menlo

Objective

  • Do we need a queue system that scales to thousands of requests

Motivation

Nullpointer Errors?

  • Currently, inference requests are handled FIFO
  • We are adopting an OpenAI API, which means that we will receive requests across Chat, Audio, Vision etc
  • Given that users are on laptops with limited RAM and VRAM, we are likely to have to switch models

Preparing for Cloud Native

  • Our long-term future is likely as an enterprise OpenAI-alternative, which will be multi-user and have a queue system
  • Should we bake in this abstraction, and use a local file-based queue (which is later swapped out for a more sophisticated queue?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Icebox

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions