Skyulf: The Visual MLOps Builder
Skyulf is a self-hosted, privacy-first. It is designed to be the "glue" that holds your data science workflow together (soon with export project option). Bring your data, clean it visually, engineer features with a node-based canvas, and train models, all in one place.
I named it Skyulf after two ideas. Sky is the open space above Earth, where the sun, moon, stars, and clouds live. Ulf means βwolf,β with Nordic roots, and the wolf is also a strong symbol in Turkic tradition. Together it fits the project: independent and helpful to community.
- Quick Start
- Using Skyulf as a Library
- Key Features
- Roadmap
- Version History
- Workflow Overview
- Development
- Contributing
- License
Prerequisites: Python 3.10+
Using pip:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements-fastapi.txt
python run_skyulf.pyUsing uv (Faster):
uv venv
.\.venv\Scripts\Activate.ps1
uv pip install -r requirements-fastapi.txt
python run_skyulf.pyThe run_skyulf.py script will automatically start the FastAPI server.
Optional: Celery & Redis By default, Skyulf uses Celery and Redis for robust background task management. However, for simple local testing or environments where you cannot run Redis, you can disable this dependency.
Add this to your .env file:
USE_CELERY=falseWhen disabled, background tasks (training, ingestion) will run in background threads within the main application process instead of a separate worker.
To enable S3 integration for data and artifacts, add these to your .env:
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-bucket
# Optional: Upload local training artifacts to S3
UPLOAD_TO_S3_FOR_LOCAL_FILES=true
# Optional: Force local storage even for S3 data
SAVE_S3_ARTIFACTS_LOCALLY=falsedocker compose up --pull=always --buildThis will start the full stack:
- FastAPI Backend (Port 8000)
- Redis (Port 6379)
- Celery Worker (Background jobs)
Open:
- API health β http://127.0.0.1:8000/health
- Docs (dev mode) β http://127.0.0.1:8000/docs
The core machine learning logic of Skyulf (preprocessing, modeling, tuning) is available as a standalone library on PyPI. You can use it to build reproducible pipelines in your own scripts or notebooks, independent of the web platform.
pip install skyulf-core
# or
uv add skyulf-coreSkyulf isn't just a web application; its core logic is available as a standalone Python library (skyulf-core). You can use it in your own scripts or Jupyter notebooks for powerful EDA and pipeline building. Using EDA is a great way to get started and it is really easy to use.
import polars as pl
from skyulf.profiling.analyzer import EDAAnalyzer
from skyulf.profiling.visualizer import EDAVisualizer
# 1. Load Data
df = pl.read_csv("your_dataset.csv")
# 2. Run Analysis
analyzer = EDAAnalyzer(df)
profile = analyzer.analyze(
target_col="target",
task_type="Classification", # Optional: Force "Classification" or "Regression"
date_col="timestamp", # Optional: Manually specify if auto-detection fails
lat_col="latitude", # Optional
lon_col="longitude" # Optional
)
# 3. Visualize Results (The Easy Way)
# This single class handles all the rich terminal output and matplotlib plots
viz = EDAVisualizer(profile, df)
# Print the dashboard
viz.summary()
# Show the plots
viz.plot()For detailed examples including Time Series, Geospatial Analysis, and Causal Inference, see the EDA User Guide.
- π¨ Visual Feature Canvas: A node-based editor to clean, transform, and engineer features without writing spaghetti code. (25+ built-in nodes).
- Automated EDA: Professional-grade Exploratory Data Analysis with interactive charts, causal discovery (DAGs), decision trees for rule extraction, segmentation, outlier detection, and statistical alerts.
- Drift Analysis Built on the EDA engine to monitor data and model drift over time with statistical tests and visualizations.
- High-Performance Engine: Built on FastAPI and Polars for lightning-fast data processing and easy API extension.
- β‘ Async by Default: Heavy training jobs run in the background via Celery & Redis (or background threads)βyour UI never freezes.
- πΎ Flexible Data: Ingest CSV, Excel, JSON, Parquet, or SQL. Start with SQLite (zero-config) and scale to PostgreSQL.
- βοΈ S3 Integration: Full support for S3-compatible storage (AWS, MinIO) for data ingestion, artifact storage, and model registry.
- π§ Model Training: Built-in support for Scikit-Learn models with hyperparameter search (Grid/Random/Halving) and optional Optuna integration.
- π¦ Model Registry & Deployment: Version control your models, track metrics, and deploy them to a live inference API with a single click.
- π Experiment Tracking: Compare multiple runs side-by-side with interactive charts, confusion matrices, and ROC curves.
We have a clear vision to turn Skyulf into a complete App Hub for AI.
- Phase 1: Polish & Stability (Done) - Architecturing, type safety, and documentation.
- Phase 2: Deepening Data Science (Current Focus) - Advanced EDA, Ethics/Fairness checks, Synthetic Data, and Public Data Hubs, more models, NLP and more.
- Phase 3: The "App Hub" Vision - Plugin system, GenAI/LLM Builders, and Deployment.
- Phase 4: Expansion - Real-time collaboration, Edge/IoT export, and Audio support.
π View the full ROADMAP.md for details.
We maintain a detailed changelog of all major updates, new features, and architectural changes.
π View the full VERSION_UPDATE.md for the complete history.
The high-level flow from dataset to model training inside Skyulf:
Dataset source β train/val/test split β background model training (Celery)
- Configuration via
config.pywith sane defaults (SQLite, dev CORS) - Lifespan hooks initialize the async DB engine automatically
- Tests under
tests/cover core feature engineering and training helpers docker-compose.ymlto run API + Redis (+ Celery worker)
We welcome contributions! See CONTRIBUTING.md for setup and workflow guidance, and read our Code of Conduct.
Skyulf uses a split licensing model to balance open standards with sustainable development:
- Backend & Core: Apache 2.0 (Permissive) - Ideal for integration and enterprise use.
- Frontend (Feature Canvas): GNU AGPLv3 (Copyleft) - Ensures UI improvements are shared back to the community.
Commercial Use:
No separate commercial license is required for internal use or building proprietary plugins on the backend.
However, if you are building a proprietary SaaS that modifies the frontend and cannot comply with AGPLv3, please see COMMERCIAL-LICENSE.md for partnership options.
If you'd like to contribute, sponsor, or request a commercial license, please star the repo, open a Discussion or issue, or see .github/FUNDING.yml for sponsorship options.
I'm building this because I love it, but I can't do it alone forever.
- Try it out: Clone the repo, run it, break it.
- Give Feedback: Tell me what sucks. Tell me what you love.
- Contribute: Even a typo fix in the README helps.
Let's build the simplest, most powerful MLOps hub together.
Β© 2025 Murat Unsal β Skyulf Project
SPDX-License-Identifier: Apache-2.0