KubeFoundry

A web-based platform for deploying and managing large language models on Kubernetes with support for multiple inference providers.

Features

�️ Web UI: Modern interface for all deployment and management tasks
📦 Model Catalog: Browse curated models or search the entire HuggingFace Hub
🔍 Smart Filtering: Automatically filters models by architecture compatibility
📊 GPU Capacity Warnings: Visual indicators showing if models fit your cluster's GPU memory
⚡ Autoscaler Integration: Detects cluster autoscaling and provides capacity guidance
🚀 One-Click Deploy: Configure and deploy models without writing YAML
📈 Live Dashboard: Monitor deployments with auto-refresh and status tracking
🔌 Multi-Provider Support: Extensible architecture supporting multiple inference runtimes
🔧 Multiple Engines: vLLM, SGLang, and TensorRT-LLM (via NVIDIA Dynamo)
📥 Installation Wizard: Install providers via Helm directly from the UI
🎨 Dark Theme: Modern dark UI with provider-specific accents

Supported Providers

Provider	Status	Description
NVIDIA Dynamo	✅ Available	GPU-accelerated inference with aggregated or disaggregated serving
KubeRay	✅ Available	Ray-based distributed inference

Prerequisites

Kubernetes cluster with kubectl configured
helm CLI installed
GPU nodes with NVIDIA drivers (for GPU-accelerated inference)
HuggingFace account (for accessing gated models like Llama)

Quick Start

Option A: Run Locally

Download the latest release for your platform and run:

./kubefoundry

Open the web UI at http://localhost:3001

Requires: kubectl configured with cluster access, helm CLI installed

Option B: Deploy to Kubernetes

kubectl apply -f https://raw.githubusercontent.com/sozercan/kube-foundry/main/deploy/kubernetes/kubefoundry.yaml

# Access via port-forward
kubectl port-forward -n kubefoundry-system svc/kubefoundry 3001:80

Open the web UI at http://localhost:3001

See Kubernetes Deployment for configuration options.

1. Install a Provider

Navigate to the Installation page and click Install next to your preferred provider. The UI will guide you through the Helm installation process with real-time status updates.

2. Connect HuggingFace Account

Go to Settings → HuggingFace and click "Sign in with Hugging Face" to connect your account via OAuth. Your token will be automatically distributed to all required namespaces.

Note: A HuggingFace token is required to access gated models like Llama.

3. Deploy a Model

Navigate to the Models page
Browse the curated catalog or Search HuggingFace for any compatible model
Review GPU memory estimates and fit indicators (✓ fits, ⚠ tight, ✗ exceeds)
Click Deploy on your chosen model
Select Runtime: Choose between NVIDIA Dynamo or KubeRay based on installed runtimes
Configure deployment options (engine, replicas, tensor parallelism, etc.)
Click Create Deployment to launch

Note: Each deployment can use a different runtime. The deployment list shows which runtime each deployment is using.

4. Monitor Your Deployment

Head to the Deployments page to:

View real-time status of all deployments
See pod readiness and health checks
Access logs and deployment details
Scale or delete deployments

5. Access Your Model

Once status shows Running, your model exposes an OpenAI-compatible API. Use kubectl port-forward to access it locally:

# Port-forward to the service (check Deployments page for exact service name)
kubectl port-forward svc/<deployment-name> 8000:8000 -n <namespace>

# List available models
curl http://localhost:8000/v1/models

# Test with a chat completion
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "<model-name>", "messages": [{"role": "user", "content": "Hello!"}]}'

Supported Models

KubeFoundry supports any HuggingFace model with a compatible architecture. Browse the curated catalog for tested models, or search HuggingFace Hub for thousands more.

Supported Architectures

When searching HuggingFace, models are filtered by architecture compatibility:

Engine	Supported Architectures
vLLM	LlamaForCausalLM, MistralForCausalLM, Qwen2ForCausalLM, GPT2LMHeadModel, and 40+ more
SGLang	LlamaForCausalLM, MistralForCausalLM, Qwen2ForCausalLM, and 20+ more
TensorRT-LLM	LlamaForCausalLM, GPTForCausalLM, MistralForCausalLM, and 15+ more

Authentication (Optional)

KubeFoundry supports optional authentication using your existing kubeconfig OIDC credentials.

To enable, start the server with:

AUTH_ENABLED=true ./kubefoundry

Then use the CLI to login:

kubefoundry login                              # Uses current kubeconfig context
kubefoundry login --server https://example.com # Specify server URL
kubefoundry login --context my-cluster         # Use specific context

The login command extracts your OIDC token and opens the browser automatically.

Documentation

Contributing

We welcome contributions! Please see CONTRIBUTING.md for development setup and guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
backend		backend
deploy/kubernetes		deploy/kubernetes
docs		docs
frontend		frontend
shared		shared
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
agents.md		agents.md
bun.lock		bun.lock
bunfig.toml		bunfig.toml
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KubeFoundry

Features

Supported Providers

Prerequisites

Quick Start

Option A: Run Locally

Option B: Deploy to Kubernetes

1. Install a Provider

2. Connect HuggingFace Account

3. Deploy a Model

4. Monitor Your Deployment

5. Access Your Model

Supported Models

Supported Architectures

Authentication (Optional)

Documentation

Contributing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

sozercan/kube-foundry

Folders and files

Latest commit

History

Repository files navigation

KubeFoundry

Features

Supported Providers

Prerequisites

Quick Start

Option A: Run Locally

Option B: Deploy to Kubernetes

1. Install a Provider

2. Connect HuggingFace Account

3. Deploy a Model

4. Monitor Your Deployment

5. Access Your Model

Supported Models

Supported Architectures

Authentication (Optional)

Documentation

Contributing

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages