DadAI is an open-source AI built to support new fathers during pregnancy and early parenthood. Fine-tuned on real parenting conversations from Reddit and augmented with curated parenting psychology via RAG.
Try the demo: huggingface.co/spaces/benlongi/DadAI (Note: the HF demo uses a standard Mistral model via API. The full fine-tuned Qwen 14B + RAG runs locally — clone this repo to try the real thing.)
Most parenting resources are either mother-centric or scattered across forums. As a first-time dad, I realized how hard it can be to find support that's both practical and emotionally relevant — so I built an AI that talks to you like a friend who's been through it all.
DadAI covers:
- Emotional support during pregnancy and early parenthood
- Sleep deprivation, relationship strain, identity loss
- Dad mental health, bonding struggles, work-life guilt
- Couple conflict after baby, breaking generational patterns
- Practical tips from real fathers who've been there
- How a Solo Dev Built an AI for Dads — RunPod Blog (May 2025)
- How I Fine-Tuned a Custom AI Model (DadAI) — LinkedIn (Apr 2025)
The original version was built as a hands-on learning exercise with ChatGPT:
- Mistral 7B Instruct v0.1 (GPTQ quantized)
- QLoRA + PEFT fine-tuning on RunPod (RTX 4090, ~$5 total)
- 298 Reddit posts from 4 subreddits
- No UI — CLI only
What went wrong: A thorough code audit (by Claude) uncovered 5 critical bugs:
- Tokenization bug — the model never trained on completions (labels were wrong)
- Prompt template mismatch — training used
[INST]format but inference used a different template - No
mask_prompt— the model trained on the prompts too, diluting learning - Small, noisy dataset — only 298 pairs, ~30% bot contamination, no quality filtering
- Format incompatibility — GPTQ to GGUF to LocalAI deployment never worked
See the v0.1-original tag for the original codebase.
A complete rewrite over a weekend with Claude via Cursor, powered by Apple's MLX framework:
| v1 (2025) | v2 (2026) | |
|---|---|---|
| Base model | Mistral 7B v0.1 (GPTQ) | Mistral 7B Instruct v0.3 (MLX 4-bit) |
| Training | RunPod RTX 4090 ($5) | MacBook Pro M1 (free) |
| Framework | HuggingFace + PEFT + bitsandbytes | Apple MLX + mlx-lm |
| Dataset | 298 pairs (buggy pipeline, 30% bots) | 2,147 curated pairs (0% bots) |
| Data sources | 4 subreddits | 7 subreddits + 68 synthetic gap topics |
| Key training fix | None (trained on prompts) | mask_prompt: true (trains on completions only) |
| Deployment | LocalAI (never worked) | Gradio + HF Spaces |
| UI | None | Chat interface with streaming |
v2 taught DadAI how to talk like a supportive dad. v3 gives it what to know.
The insight: Fine-tuning and RAG are complementary:
- Fine-tuning = personality. The model studied real dad conversations and internalized empathy, warmth, and tone.
- RAG = knowledge. When a dad asks a question, the model searches a curated knowledge base of parenting psychology and weaves expert insights into its response.
They stack: the warm dad voice from fine-tuning meets grounded wisdom from books. No retraining needed.
The version that actually delivers. Three major upgrades:
| v3 | v4 (current) | |
|---|---|---|
| Model | Mistral 7B (4-bit) | Qwen2.5-14B-Instruct (4-bit) |
| Training data | 2,147 pairs | 2,260 pairs (5% synthetic) |
| RAG knowledge | 1 book (295 passages) | 4 books (1,637 passages) |
| Retrieval | Top-2 vector search | Top-5 + cross-encoder reranker |
| Conversation | Stateless | 3-turn memory |
| Training time | ~80 min (M1) | ~2.5 hrs (M1) |
Why the upgrade matters: The 7B model could do empathy or knowledge synthesis — not both in one response. The 14B model weaves book-informed advice into a natural dad voice. The cross-encoder reranker ensures the right passages get retrieved, not just the closest-sounding ones.
- Model: Qwen2.5-14B-Instruct (4-bit MLX) — ~8.3 GB on disk
- Training: QLoRA fine-tuning via mlx-lm on Apple Silicon
- Data: 2,147 real Reddit Q&A pairs + 113 synthetic pairs for under-covered topics
- RAG: ChromaDB + sentence-transformers (
all-MiniLM-L6-v2) for semantic retrieval - Reranker: Cross-encoder (
ms-marco-MiniLM-L6-v2) for two-stage retrieval - UI: Gradio chat interface with streaming responses
- Local inference: Fused model (LoRA baked into base weights) for fast generation
- Online demo: HF Spaces via Inference API (standard model)
- Language: Python 3.11
dadAI/
├── app.py # Gradio chat UI (local, fused model + RAG + reranker)
├── hf-space/ # Hugging Face Spaces deployment
│ ├── app.py # HF demo (Inference API, standard model)
│ ├── requirements.txt
│ └── README.md
├── data/ # Datasets
│ ├── reddit_dataset.jsonl # Raw Reddit posts (~2,100)
│ ├── formatted_dataset.jsonl # ChatML prompt/completion pairs
│ ├── cleaned_dataset.jsonl # Filtered, deduplicated
│ ├── synthetic_gap_topics.jsonl # Synthetic pairs for gap topics
│ ├── synthetic_v31_pairs.jsonl # Additional v4 synthetic pairs (5% ratio)
│ ├── training_dataset.jsonl # Final merged dataset (2,260)
│ ├── mlx_training/ # Train/valid/test splits for mlx-lm
│ └── rag_db/ # ChromaDB vector database (gitignored)
├── scripts/ # Pipeline scripts
│ ├── collect_reddit_data.py # Reddit data collection (PRAW)
│ ├── format_reddit_data.py # Convert to chat format
│ ├── clean_dataset.py # Quality filtering & dedup
│ ├── check_dataset_format.py # Validation
│ ├── generate_synthetic_data.py # Synthetic data for gap topics
│ ├── generate_synthetic_v31.py # V4 synthetic pairs (5% ratio)
│ ├── prepare_training_data.py # mlx-lm format + token filtering + split
│ ├── chunk_book.py # Extract & chunk EPUBs for RAG
│ ├── build_rag_db.py # Build ChromaDB vector database
│ ├── compare_models.py # Side-by-side model comparison
│ ├── inference.py # Interactive CLI chat
│ ├── evaluate_model.py # A/B comparison: base vs fine-tuned
│ └── deploy_to_hf.py # One-command HF Spaces deployment
├── books/ # Source books for RAG (gitignored, copyrighted)
├── training_config.yaml # MLX LoRA training config (Mistral 7B)
├── training_config_qwen14b.yaml # MLX LoRA training config (Qwen 14B)
├── train.sh # One-command training script
├── Makefile # Pipeline commands
├── models/ # Downloaded/fused models (gitignored)
├── adapters/ # LoRA adapters (gitignored)
├── requirements.txt # Python dependencies
├── .env # Reddit API credentials (gitignored)
└── .venv/ # Python virtual environment (gitignored)
- macOS with Apple Silicon (M1/M2/M3/M4)
- Homebrew
- 16 GB RAM minimum
# Clone the repo
git clone https://github.com/brossign/dadAI.git
cd dadAI
# Install Python 3.11
brew install [email protected]
# Create and activate virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtFor Qwen 14B (recommended, v4):
python -c "
from huggingface_hub import snapshot_download
snapshot_download('mlx-community/Qwen2.5-14B-Instruct-4bit', local_dir='models/qwen2.5-14b-instruct-4bit')
"For Mistral 7B (lighter, v2):
python -c "
from huggingface_hub import snapshot_download
snapshot_download('mlx-community/Mistral-7B-Instruct-v0.3-4bit', local_dir='models/mistral-7b-instruct-v0.3-4bit')
"To add book knowledge, place EPUB files in books/ and run:
# Chunk each book into passages
python scripts/chunk_book.py --input books/your_book.epub --output data/rag_chunks_yourbook.jsonl
# Build/update the ChromaDB vector database
python scripts/build_rag_db.pyThe app automatically detects the RAG database at startup and uses it if available. Without it, DadAI still works — it just won't have book knowledge.
source .venv/bin/activate
# Prepare data
python scripts/prepare_training_data.py
# Train (~2.5 hours on M1 16GB)
mlx_lm.lora --config training_config_qwen14b.yaml
# Fuse adapter into base model
mlx_lm.fuse \
--model models/qwen2.5-14b-instruct-4bit \
--adapter-path adapters/dadai-qwen14b-lora \
--save-path models/dadai-qwen14b-fused# Train (~80 min on M1 16GB)
mlx_lm.lora --config training_config.yaml
# Fuse adapter
mlx_lm.fuse \
--model models/mistral-7b-instruct-v0.3-4bit \
--adapter-path adapters/dadai-lora \
--save-path models/dadai-v2-fused- Method: QLoRA (4-bit quantized base) + LoRA rank 16
- Key fix from v1:
mask_prompt: trueensures the model only trains on completions - Memory: Peak ~10-12 GB for 14B, ~7 GB for 7B
- Dataset: 2,260 examples (2,147 Reddit + 113 synthetic)
- Best checkpoint: Selected via A/B evaluation (iteration 400 for 7B, full run for 14B)
- NaN prevention: Sequences > 2,048 tokens pre-filtered to prevent gradient explosion in 4-bit QLoRA
- Config: See
training_config_qwen14b.yamlfor all hyperparameters
source .venv/bin/activate
python app.py
# Open http://localhost:7860Uses the fused model with streaming responses. RAG and the cross-encoder reranker load lazily on the first query to keep startup fast. Conversation history (up to 3 turns) is maintained automatically.
Visit huggingface.co/spaces/benlongi/DadAI
Uses Mistral 7B via HF Inference API with the DadAI system prompt. This is not the fine-tuned model — it's a standard model with DadAI's prompt engineering. For the real experience, run locally.
If you want to rebuild DadAI from zero:
- Set up environment — Follow the Installation steps above
- Collect Reddit data — Create a
.envwith Reddit API credentials (get them here), thenmake collect - Process data —
make format && make clean && make check - Add synthetic data —
python scripts/generate_synthetic_data.py && python scripts/generate_synthetic_v31.py - Prepare for training —
python scripts/prepare_training_data.py - Download base model — See instructions above
- Train —
mlx_lm.lora --config training_config_qwen14b.yaml(~2.5 hrs on M1) - Fuse —
mlx_lm.fuse --model models/qwen2.5-14b-instruct-4bit --adapter-path adapters/dadai-qwen14b-lora --save-path models/dadai-qwen14b-fused - Add books for RAG — Place EPUBs in
books/, chunk withscripts/chunk_book.py, index withscripts/build_rag_db.py - Run —
python app.py
Note: Books are not included in the repo (copyrighted). You'll need to source your own parenting/fatherhood books for RAG. DadAI works without them — you just won't get book-informed responses.
- Always check your training labels. v1's biggest bug: the tokenization was wrong, so the model never learned from completions.
mask_promptis essential. - Prompt template consistency matters. Train and infer with the same format. Use
tokenizer.apply_chat_template()everywhere. - MLX makes local fine-tuning real. A MacBook M1 fine-tunes a 14B model in 2.5 hours. No cloud GPU needed.
- Clean data beats more data. 2,260 filtered pairs beat 298 noisy ones. Quality > quantity.
- Early stopping wins. Iteration 400 beat iteration 1000 for the 7B model. Test, don't assume.
- Fine-tuning gives personality. RAG gives knowledge. They're complementary. Fine-tune for how to respond, RAG for what to say.
- Two-stage retrieval matters. A cross-encoder reranker on top of vector search catches what embedding similarity misses.
- Test the bigger model before committing. We tried 24B, measured disk-swapping, pivoted to 14B. Data-driven decisions save time.
- Remove complexity before adding it. V1's LocalAI + Docker + GPTQ pipeline was replaced by a single Gradio file.
- Ship the honest version. Document limitations alongside wins. Every failure teaches something.
Benoît Rossignol
MIT