NoesisNoema 🧠✨

Private, offline, multi‑RAGpack LLM RAG for macOS and iOS. Empower your own AGI — no cloud, no SaaS, just your device and your knowledge. 🚀

What’s New in v0.3 (Dec 2025) 🔥

This release delivers the first fully unified and polished mobile + desktop experience.

iOS (Major Overhaul)

Full‑screen edge‑to‑edge layout (iPhone 17 Pro Max verified)
New compact header with philosophical manuscript background (Husserl)
Stable layout across light/dark mode
Restored correct answer rendering (white‑on‑white bug fixed)
RAG answers now load reliably on‑device using the unified pipeline
History view correctly loads threads and supports per‑question detail

macOS

Parity refinements with the new iOS interface
Improved consistency in RAG retrieval behavior

Core RAG Engine

Unified RAGpack v2 pipeline
Cleaner answer normalization
Reduced UI blocking during inference
Pre‑flight guards around model loading and tokenizer workflows

Stability

Eliminated inconsistent safe‑area behavior across navigation wrappers
Fixed residual navigation‑controller padding issues from older builds

Performance Issues Identified (Dec 2025) ⚡

During the development of v0.3, several bottlenecks surfaced during real‑device testing on iPhone 17 Pro Max. These findings now drive our v0.4 optimization cycle.

Tokenizer execution performing work on the main thread
Repeated loading of embeddings, tokenizer vocab, and metadata
Non‑streaming generation resulting in synchronous UI stalls
RAGpack v2 .zip extraction missing an effective caching layer
Oversized default context window causing unnecessary compute
Swift Concurrency task switching overhead during retrieval

These are addressed under branch: feature/rag-perf-optimization-2025.

Optimization Plan (v0.3 → v0.4) 🚀

Move tokenizer and embedding lookup off the MainActor
Preload embeddings asynchronously at app startup
Implement llama.cpp streaming callbacks to eliminate blocking
Introduce aggressive caching layers for embeddings, tokenizer vocab, and RAGpack metadata
Dynamically scale context window based on query type
Add precise instrumentation for each phase (tokenize / retrieve / generate)
Maintain API compatibility across macOS & iOS targets

These changes aim to deliver a smoother, significantly faster private‑RAG experience.

Features ✨

Multi‑RAGpack search and synthesis
Transversal retrieval across packs (e.g., Kant × Spinoza)
Deep Search (query iteration + MMR re‑ranking) with cross‑pack support
Fast local inference via llama.cpp + GGUF models
Private by design: fully offline; no analytics; minimal, local SystemLog (no PII)
Feedback & learning: thumbs up/down feeds ParamBandit to auto‑tune retrieval (session‑scoped, offline)
Modern UX
- Two‑pane macOS UI
- iOS (v0.3)
  - Stable full‑screen layout; compact header with manuscript background
  - Multiline input restored with proper dark/light mode rendering
  - Clear Ask / History / Settings tab design
  - QADetail overlays functioning with correct dismiss gestures
  - Reliable answer rendering with proper color handling
Clean answers, consistently
- <think>…</think> is filtered on the fly; control tokens removed; stop tokens respected
Thin, future‑proof core
- llama.cpp through prebuilt xcframeworks (macOS/iOS) with a thin Swift shim
- Runtime guard + system info log for quick diagnosis

Privacy & Diagnostics 🔒

100% offline by default. No network calls for inference or retrieval.
No analytics SDKs. No telemetry is sent.
SystemLog is local‑only and minimal (device/OS, model name, params, pack hits, latency, failure reasons). You can opt‑in to share diagnostics.

Requirements ✅

macOS 13+ (Apple Silicon recommended) or iOS 17+ (A15/Apple Silicon recommended)
Prebuilt llama xcframeworks (included in this repo):
- llama_macos.xcframework, llama_ios.xcframework
Models in GGUF format
- Default expected name: Jan-v1-4B-Q4_K_M.gguf

Note (iOS): By default we run CPU fallback for broad device compatibility; real devices are recommended over the simulator for performance.

Quick Start 🚀

macOS (App)

Open the project in Xcode.
Select the NoesisNoema scheme and press Run.
Import your RAGpack(s) and start asking questions.

iOS (App)

Select the NoesisNoemaMobile scheme.
Run on a real device (recommended).
Import RAGpack(s) from Files and Ask.

History stays visible; QADetail appears as an overlay (swipe down or ✖︎ to close).
Return adds a newline in the input; only the Ask button starts inference.

CLI harness (LlamaBridgeTest) 🧪

A tiny runner to verify local inference.

Build the LlamaBridgeTest scheme and run with -p "your prompt".
Uses the same output cleaning to remove <think>…</think>.

Using RAGpacks 📦

RAGpack is a .zip with at least:

chunks.json — ordered list of text chunks
embeddings.csv — embedding vectors aligned by row
metadata.json — optional, bag of properties

Importer safeguards:

Validates presence of chunks.json and embeddings.csv and enforces 1:1 count
De‑duplicates identical chunk+embedding pairs across packs
Merges new, unique chunks into the in‑memory vector store

Tip: Generate RAGpacks with the companion pipeline: noesisnoema-pipeline

Model & Inference 🧩

NoesisNoema links llama.cpp via prebuilt xcframeworks. You shouldn’t manually embed llama.framework; link the xcframework and let Xcode process it.
Model lookup order (CLI/app): CWD → executable dir → app bundle → Resources/Models/ → NoesisNoema/Resources/Models/ → ~/Downloads/
Output pipeline:
- Jan/Qwen‑style prompt where applicable
- Streaming‑time <think> filtering and <|im_end|> early‑stop
- Final normalization to erase residual control tokens and self‑labels

Device‑optimal presets ⚙️

A17/M‑series: n_threads = 6–8, n_gpu_layers = 999
A15–A16: n_threads = 4–6, n_gpu_layers = 40–80
Generation length: max_tokens 128–256 (short answers), 400–600 (summaries)
Temperature: 0.2–0.4, Top‑K: 40–80 for stability

These are sensible defaults; you can tune per device/pack.

UX Details that Matter 💅

iOS v0.3
- Full‑screen, stable layout using updated HostingController stack
- Correct rendering across dark/light modes
- Persistent History tab with selectable past threads
- Clean answer display with scroll indicators
- Manuscript‑style header background with automatic scaling
macOS
- Two‑pane layout with History and Detail; same output cleaning; quick import

Engineering Policy & Vendor Guardrails 🛡️

Vendor code (llama.cpp) is not modified. xcframeworks are prebuilt and checked in.
Thin shim only: adapt upstream C API in LibLlama.swift / LlamaState.swift. Other files must not call llama_* directly.
Runtime check: verify llama.framework load + symbol presence on startup and log llama_print_system_info().
If upstream bumps break builds, fix the shim layer and add a unit test before merging.

QA Checklist (release‑ready) ✅

Accuracy: run same question ×3; verify gist stability at low temperature (0.2–0.4)
Latency: measure p50/p90 for short/long prompts and multi‑pack queries; split warm vs warm+1
Memory/Thermals: 10‑question loop; consider thread scaling when throttled
Failure modes: empty/huge/broken packs; missing model path; user‑facing messages
Output hygiene: ensure <think>/control tokens are absent; newlines preserved
History durability: ~100 items; startup time and scroll smoothness
Battery: 15‑minute session; confirm best params per device
Privacy: verify network off; no analytics; README/UI clearly state offline

Troubleshooting 🛠️

dyld: Library not loaded: @rpath/llama.framework
- Clean build folder and DerivedData
- Link the xcframework only (no manual embed)
- Ensure Runpath Search Paths include @executable_path, @loader_path, @rpath
Multiple commands produce llama.framework
- Remove manual “Embed Frameworks/Copy Files” for the framework; rely on the xcframework
Model not found
- Place the model in one of the searched locations or pass an absolute path (CLI)
iOS keyboard won’t hide
- Tap outside the input or scroll History to dismiss
Output includes control tags or <think>
- Ensure you’re on the latest build; the streaming filter + final normalizer should keep answers clean

Known Issues & FAQ ❓

iOS Simulator is slower and may not reflect real thermals. Prefer running on device.
Very large RAGpacks can increase memory usage. Prefer chunking and MMR re‑ranking.
If you still see <think> in answers, capture logs and open an issue (model‑specific templates can slip through).
Where is scripts/build_xcframework.sh?
- Not included yet. Prebuilt llama_*.xcframework are provided in this repo. If you need to rebuild, use upstream llama.cpp build instructions and replace the frameworks under Frameworks/.

Roadmap 🗺️

iOS universal polishing (iPad layouts, sharing/export)
Enhanced right pane: chunk/source/document previews
Power/thermal controls (device‑aware throttling)
Cloudless peer‑to‑peer sync
Plugin/API extensibility
CI for App targets

Ecosystem & Related Projects 🌍

RAGfish: Core RAGpack specification and toolkit 📚
noesisnoema-pipeline: Generate your own RAGpacks from PDF/text 💡

Contributing 🤗

We welcome Designers, Swift/AI/UX developers, and documentation writers. Open an issue or PR, or join our discussions. See also RAGfish for the pack spec.

PR Checklist (policy):

llama.cpp vendor frameworks unchanged
Changes limited to LibLlama.swift / LlamaState.swift for core llama integration
Smoke/Golden/RAG tests passed locally

From the Maintainers 💬

This project is not just code — it’s our exploration of private AGI, blending philosophy and engineering. Each commit is a step toward tools that respect autonomy, curiosity, and the joy of building. Stay curious, and contribute if it resonates with you.

🌟

Your knowledge. Your device. Your rules.

ParamBandit: Thompson Sampling to Select Optimal Parameters per Query 🔬

What: A lightweight bandit that dynamically selects retrieval parameters (top_k, mmr_lambda, min_score) per query cluster.
Why: Quickly improves relevance with minimal feedback and provides the feeling of a system that is learning.
Where: Just before the generator, immediately before the retrieval pipeline.
How:
- Maintains Beta(α,β) distributions for each arm (parameter set) and selects using Thompson Sampling.
- Updates α/β based on feedback events (👍/👎) from the RewardBus.
- Example default arms: k4/l0.7/s0.20, k5/l0.9/s0.10, k6/l0.7/s0.15, k8/l0.5/s0.15.

Usage example (integration concept)

Call ParamBandit just before existing LocalRetriever usage points, and perform retrieval with the returned parameters.
On the UI side, trigger RewardBus.shared.publish(qaId:verdict:tags:) upon user feedback (👍/👎).

Simplified flow:

let qa = UUID()
let choice = ParamBandit.default.chooseParams(for: query, qaId: qa)
let ps = choice.arm.params // topK, mmrLambda, minScore
let chunks = LocalRetriever(store: .shared).retrieve(query: query, k: ps.topK, lambda: ps.mmrLambda)
Filter by minScore for similarity (see BanditRetriever)
On user evaluation, call RewardBus.shared.publish(qaId: qa, verdict: .up/.down, tags: …)

Tests and Definition of Done (DoD)

Unit: Verify initial α=1, β=1, and that 👍 increments α and 👎 increments β (add to TestRunner, skip in CLI build).
Integration: Confirm preference converges to the best arm with composite rewards (same as above).
DoD: Add ParamBandit as an independent service, integrate with RewardBus, define default arms, and provide lightweight documentation (this section).

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github		.github
Apps		Apps
NoesisNoema.xcodeproj		NoesisNoema.xcodeproj
Shared		Shared
docs		docs
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
mobile-rag-fix.patch		mobile-rag-fix.patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoesisNoema 🧠✨

What’s New in v0.3 (Dec 2025) 🔥

iOS (Major Overhaul)

macOS

Core RAG Engine

Stability

Performance Issues Identified (Dec 2025) ⚡

Optimization Plan (v0.3 → v0.4) 🚀

Features ✨

Privacy & Diagnostics 🔒

Requirements ✅

Quick Start 🚀

macOS (App)

iOS (App)

CLI harness (LlamaBridgeTest) 🧪

Using RAGpacks 📦

Model & Inference 🧩

Device‑optimal presets ⚙️

UX Details that Matter 💅

Engineering Policy & Vendor Guardrails 🛡️

QA Checklist (release‑ready) ✅

Troubleshooting 🛠️

Known Issues & FAQ ❓

Roadmap 🗺️

Ecosystem & Related Projects 🌍

Contributing 🤗

From the Maintainers 💬

ParamBandit: Thompson Sampling to Select Optimal Parameters per Query 🔬

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

rag-fish/NoesisNoema

Folders and files

Latest commit

History

Repository files navigation

NoesisNoema 🧠✨

What’s New in v0.3 (Dec 2025) 🔥

iOS (Major Overhaul)

macOS

Core RAG Engine

Stability

Performance Issues Identified (Dec 2025) ⚡

Optimization Plan (v0.3 → v0.4) 🚀

Features ✨

Privacy & Diagnostics 🔒

Requirements ✅

Quick Start 🚀

macOS (App)

iOS (App)

CLI harness (LlamaBridgeTest) 🧪

Using RAGpacks 📦

Model & Inference 🧩

Device‑optimal presets ⚙️

UX Details that Matter 💅

Engineering Policy & Vendor Guardrails 🛡️

QA Checklist (release‑ready) ✅

Troubleshooting 🛠️

Known Issues & FAQ ❓

Roadmap 🗺️

Ecosystem & Related Projects 🌍

Contributing 🤗

From the Maintainers 💬

ParamBandit: Thompson Sampling to Select Optimal Parameters per Query 🔬

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages