A lightweight, YAML-driven ETL pipeline that transforms text data into vector embeddings —
with zero boilerplate, full flexibility, and seamless database integration.
YamlPipe lets you build end-to-end ETL pipelines for vector embedding workflows — all defined in a single YAML file.
It’s designed for AI developers, data engineers, and RAG (Retrieval-Augmented Generation) builders who want simplicity without losing flexibility.
With YamlPipe, you can:
- ✅ Load data from files, web, S3, or Postgres
- 🧠 Chunk text dynamically using multiple strategies
- ⚙️ Generate embeddings with OpenAI or Sentence Transformers
- 🧩 Store vectors in LanceDB or ChromaDB
- 💻 Run everything via CLI or Web UI (Streamlit)
- YAML-based Configuration – define your pipeline once, run it anywhere
- Pluggable Components – modular architecture for each stage
- Advanced Chunking –
recursive_character,markdown, oradaptive - Multiple Embedding Models –
sentence_transformerandopenai - Vector Database Integration –
lancedborchromadb - CLI & Streamlit UI – full control, both terminal and browser
git clone https://github.com/dongwonmoon/Yaml-Pipe.git
cd Yaml-Pipe
pip install -r requirements.txtpython main.py init
python main.py run -c pipelines/pipeline.yamlsource:
type: local_files
config:
path: ./data
glob_pattern: "*.txt"
chunker:
type: adaptive
config:
chunk_size: 200
chunk_overlap: 40
embedder:
type: sentence_transformer
config:
model_name: "jhgan/ko-sbert-nli"
sink:
type: chromadb
config:
path: "./chroma_data"
collection_name: "my_documents"streamlit run app.pyUse the dashboard to visualize your pipelines, test search results, and monitor ingestion progress.
- No more boilerplate ETL code — define everything in YAML
- Designed for RAG, embedding pipelines, and AI data workflows
- Fully open-source and easily extendable
- Add Milvus / Pinecone sinks
- Support LangChain / LlamaIndex integrations
- Add benchmarking and pipeline visualization
Contributions are always welcome!
Fork the repo, create a feature branch, and submit a PR.
New ideas, documentation improvements, and bug reports are all appreciated.
If YamlPipe helps you, please consider giving it a star 🌟
Every star motivates continued development and new features!
MIT © dongwonmoon