NoesisNoema 🧠✨

Private, offline, multi‑RAGpack LLM RAG for macOS and iOS. Empower your own AGI — no cloud, no SaaS, just your device and your knowledge. 🚀

What’s New (Aug 2025) 🔥

The on‑device experience leveled up across macOS and iOS:

iOS Universal App (iPhone/iPad; shipped)
- Fresh iOS screenshot available in docs/assets/noesisnoema_ios.png (see above)
- Always‑visible History, QADetail overlays on top (swipe‑down or ✖︎ to close)
- Multiline input with placeholder, larger tap targets, equal‑width action buttons
- Global loading lock to prevent duplicate queries; answer only appended once
- Keyboard UX: tap outside or scroll to dismiss
- Startup splash overlay: temporary "Noesis Noema" title on launch
Deep Search retrieval pipeline
- Two-stage retrieval: LocalRetriever + QueryIterator with MMR re‑ranking
- Works across multiple RAGpacks; better relevance and source diversity
- Tuned defaults per device; fast even on iPhone
Feedback loop (local‑only)
- Thumbs up/down captured via RewardBus
- ParamBandit tunes retrieval params per session (topK, mmrLambda, minScore)
- 100% offline; no telemetry
Output hygiene & stability
- Streaming filter removes <think>…</think> and control tokens; stop at <|im_end|>
- Final normalization unifies model differences for clean, copy‑ready answers
- Runtime guard detects broken llama.framework loads; lightweight SystemLog
RAGpack import is stricter and safer
- Validates presence of chunks.json and embeddings.csv and enforces count match
- De‑duplicates identical chunks across multiple RAGpacks

Added

Experimental support for GPT‑OSS‑20B (gpt-oss-20b-Q4_K_S, released 2025‑08‑08). Place the model at Resources/Models/gpt-oss-20b-Q4_K_S.gguf and select it in the LLM picker (macOS/iOS).
LLM Presets (Pocket, Balanced, Pro Max) with device‑tuned defaults (threads, GPU layers, batch size, context length, decoding params).
Liquid Glass design across macOS and iOS with accessibility‑aware fallbacks (respects Reduce Transparency; solid fallback).

macOS keeps its “workstation feel”; iOS now brings the same private RAG, in your pocket. 📱💻

Features ✨

Multi‑RAGpack search and synthesis
Transversal retrieval across packs (e.g., Kant × Spinoza)
Deep Search (query iteration + MMR re‑ranking) with cross‑pack support
Fast local inference via llama.cpp + GGUF models
Private by design: fully offline; no analytics; minimal, local SystemLog (no PII)
Feedback & learning: thumbs up/down feeds ParamBandit to auto‑tune retrieval (session‑scoped, offline)
Modern UX
- Two‑pane macOS UI
- iOS: History always visible; QADetail overlays; copy‑able answers; smooth keyboard handling
- Multiline question input; equal‑width Ask / Choose RAGpack buttons
Clean answers, consistently
- <think>…</think> is filtered on the fly; control tokens removed; stop tokens respected
Thin, future‑proof core
- llama.cpp through prebuilt xcframeworks (macOS/iOS) with a thin Swift shim
- Runtime guard + system info log for quick diagnosis

Privacy & Diagnostics 🔒

100% offline by default. No network calls for inference or retrieval.
No analytics SDKs. No telemetry is sent.
SystemLog is local‑only and minimal (device/OS, model name, params, pack hits, latency, failure reasons). You can opt‑in to share diagnostics.

Requirements ✅

macOS 13+ (Apple Silicon recommended) or iOS 17+ (A15/Apple Silicon recommended)
Prebuilt llama xcframeworks (included in this repo):
- llama_macos.xcframework, llama_ios.xcframework
Models in GGUF format
- Default expected name: Jan-v1-4B-Q4_K_M.gguf

Note (iOS): By default we run CPU fallback for broad device compatibility; real devices are recommended over the simulator for performance.

Quick Start 🚀

macOS (App)

Open the project in Xcode.
Select the NoesisNoema scheme and press Run.
Import your RAGpack(s) and start asking questions.

iOS (App)

Select the NoesisNoemaMobile scheme.
Run on a real device (recommended).
Import RAGpack(s) from Files and Ask.

History stays visible; QADetail appears as an overlay (swipe down or ✖︎ to close).
Return adds a newline in the input; only the Ask button starts inference.

CLI harness (LlamaBridgeTest) 🧪

A tiny runner to verify local inference.

Build the LlamaBridgeTest scheme and run with -p "your prompt".
Uses the same output cleaning to remove <think>…</think>.

Using RAGpacks 📦

RAGpack is a .zip with at least:

chunks.json — ordered list of text chunks
embeddings.csv — embedding vectors aligned by row
metadata.json — optional, bag of properties

Importer safeguards:

Validates presence of chunks.json and embeddings.csv and enforces 1:1 count
De‑duplicates identical chunk+embedding pairs across packs
Merges new, unique chunks into the in‑memory vector store

Tip: Generate RAGpacks with the companion pipeline: noesisnoema-pipeline

Model & Inference 🧩

NoesisNoema links llama.cpp via prebuilt xcframeworks. You shouldn’t manually embed llama.framework; link the xcframework and let Xcode process it.
Model lookup order (CLI/app): CWD → executable dir → app bundle → Resources/Models/ → NoesisNoema/Resources/Models/ → ~/Downloads/
Output pipeline:
- Jan/Qwen‑style prompt where applicable
- Streaming‑time <think> filtering and <|im_end|> early‑stop
- Final normalization to erase residual control tokens and self‑labels

Device‑optimal presets ⚙️

A17/M‑series: n_threads = 6–8, n_gpu_layers = 999
A15–A16: n_threads = 4–6, n_gpu_layers = 40–80
Generation length: max_tokens 128–256 (short answers), 400–600 (summaries)
Temperature: 0.2–0.4, Top‑K: 40–80 for stability

These are sensible defaults; you can tune per device/pack.

UX Details that Matter 💅

iOS
- Interface preview: see screenshot noesisnoema_ios.png
- Multiline input with placeholder; Return adds a newline (does not send)
- Only the Ask button can start inference (no accidental double sends)
- During generation: global overlay lock; all inputs disabled (no duplicate queries)
- Tap outside or scroll History to dismiss the keyboard
- QADetail overlays History; close with swipe‑down or ✖︎; answers are text‑selectable and copyable
- Scroll indicators visible in answers to clarify vertical scroll
macOS
- Two‑pane layout with History and Detail; same output cleaning; quick import

Engineering Policy & Vendor Guardrails 🛡️

Vendor code (llama.cpp) is not modified. xcframeworks are prebuilt and checked in.
Thin shim only: adapt upstream C API in LibLlama.swift / LlamaState.swift. Other files must not call llama_* directly.
Runtime check: verify llama.framework load + symbol presence on startup and log llama_print_system_info().
If upstream bumps break builds, fix the shim layer and add a unit test before merging.

QA Checklist (release‑ready) ✅

Accuracy: run same question ×3; verify gist stability at low temperature (0.2–0.4)
Latency: measure p50/p90 for short/long prompts and multi‑pack queries; split warm vs warm+1
Memory/Thermals: 10‑question loop; consider thread scaling when throttled
Failure modes: empty/huge/broken packs; missing model path; user‑facing messages
Output hygiene: ensure <think>/control tokens are absent; newlines preserved
History durability: ~100 items; startup time and scroll smoothness
Battery: 15‑minute session; confirm best params per device
Privacy: verify network off; no analytics; README/UI clearly state offline

Troubleshooting 🛠️

dyld: Library not loaded: @rpath/llama.framework
- Clean build folder and DerivedData
- Link the xcframework only (no manual embed)
- Ensure Runpath Search Paths include @executable_path, @loader_path, @rpath
Multiple commands produce llama.framework
- Remove manual “Embed Frameworks/Copy Files” for the framework; rely on the xcframework
Model not found
- Place the model in one of the searched locations or pass an absolute path (CLI)
iOS keyboard won’t hide
- Tap outside the input or scroll History to dismiss
Output includes control tags or <think>
- Ensure you’re on the latest build; the streaming filter + final normalizer should keep answers clean

Known Issues & FAQ ❓

iOS Simulator is slower and may not reflect real thermals. Prefer running on device.
Very large RAGpacks can increase memory usage. Prefer chunking and MMR re‑ranking.
If you still see <think> in answers, capture logs and open an issue (model‑specific templates can slip through).
Where is scripts/build_xcframework.sh?
- Not included yet. Prebuilt llama_*.xcframework are provided in this repo. If you need to rebuild, use upstream llama.cpp build instructions and replace the frameworks under Frameworks/.

Roadmap 🗺️

iOS universal polishing (iPad layouts, sharing/export)
Enhanced right pane: chunk/source/document previews
Power/thermal controls (device‑aware throttling)
Cloudless peer‑to‑peer sync
Plugin/API extensibility
CI for App targets

Ecosystem & Related Projects 🌍

RAGfish: Core RAGpack specification and toolkit 📚
noesisnoema-pipeline: Generate your own RAGpacks from PDF/text 💡

Contributing 🤗

We welcome Designers, Swift/AI/UX developers, and documentation writers. Open an issue or PR, or join our discussions. See also RAGfish for the pack spec.

PR Checklist (policy):

llama.cpp vendor frameworks unchanged
Changes limited to LibLlama.swift / LlamaState.swift for core llama integration
Smoke/Golden/RAG tests passed locally

From the Maintainers 💬

This project is not just code — it’s our exploration of private AGI, blending philosophy and engineering. Each commit is a step toward tools that respect autonomy, curiosity, and the joy of building. Stay curious, and contribute if it resonates with you.

🌟

Your knowledge. Your device. Your rules.

ParamBandit: Thompson Sampling to Select Optimal Parameters per Query 🔬

What: A lightweight bandit that dynamically selects retrieval parameters (top_k, mmr_lambda, min_score) per query cluster.
Why: Quickly improves relevance with minimal feedback and provides the feeling of a system that is learning.
Where: Just before the generator, immediately before the retrieval pipeline.
How:
- Maintains Beta(α,β) distributions for each arm (parameter set) and selects using Thompson Sampling.
- Updates α/β based on feedback events (👍/👎) from the RewardBus.
- Example default arms: k4/l0.7/s0.20, k5/l0.9/s0.10, k6/l0.7/s0.15, k8/l0.5/s0.15.

Usage example (integration concept)

Call ParamBandit just before existing LocalRetriever usage points, and perform retrieval with the returned parameters.
On the UI side, trigger RewardBus.shared.publish(qaId:verdict:tags:) upon user feedback (👍/👎).

Simplified flow:

let qa = UUID()
let choice = ParamBandit.default.chooseParams(for: query, qaId: qa)
let ps = choice.arm.params // topK, mmrLambda, minScore
let chunks = LocalRetriever(store: .shared).retrieve(query: query, k: ps.topK, lambda: ps.mmrLambda)
Filter by minScore for similarity (see BanditRetriever)
On user evaluation, call RewardBus.shared.publish(qaId: qa, verdict: .up/.down, tags: …)

Tests and Definition of Done (DoD)

Unit: Verify initial α=1, β=1, and that 👍 increments α and 👎 increments β (add to TestRunner, skip in CLI build).
Integration: Confirm preference converges to the best arm with composite rewards (same as above).
DoD: Add ParamBandit as an independent service, integrate with RewardBus, define default arms, and provide lightweight documentation (this section).

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github		.github
.idea		.idea
LlamaBridgeTest		LlamaBridgeTest
NoesisNoema.xcodeproj		NoesisNoema.xcodeproj
NoesisNoema		NoesisNoema
NoesisNoemaMobile		NoesisNoemaMobile
docs/assets		docs/assets
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
MODEL_REGISTRY_IMPLEMENTATION.md		MODEL_REGISTRY_IMPLEMENTATION.md
README.md		README.md
demo_model_registry.swift		demo_model_registry.swift
nn		nn
verify_implementation.swift		verify_implementation.swift

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NoesisNoema 🧠✨

What’s New (Aug 2025) 🔥

Added

Features ✨

Privacy & Diagnostics 🔒

Requirements ✅

Quick Start 🚀

macOS (App)

iOS (App)

CLI harness (LlamaBridgeTest) 🧪

Using RAGpacks 📦

Model & Inference 🧩

Device‑optimal presets ⚙️

UX Details that Matter 💅

Engineering Policy & Vendor Guardrails 🛡️

QA Checklist (release‑ready) ✅

Troubleshooting 🛠️

Known Issues & FAQ ❓

Roadmap 🗺️

Ecosystem & Related Projects 🌍

Contributing 🤗

From the Maintainers 💬

ParamBandit: Thompson Sampling to Select Optimal Parameters per Query 🔬

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

rag-fish/NoesisNoema

Folders and files

Latest commit

History

Repository files navigation

NoesisNoema 🧠✨

What’s New (Aug 2025) 🔥

Added

Features ✨

Privacy & Diagnostics 🔒

Requirements ✅

Quick Start 🚀

macOS (App)

iOS (App)

CLI harness (LlamaBridgeTest) 🧪

Using RAGpacks 📦

Model & Inference 🧩

Device‑optimal presets ⚙️

UX Details that Matter 💅

Engineering Policy & Vendor Guardrails 🛡️

QA Checklist (release‑ready) ✅

Troubleshooting 🛠️

Known Issues & FAQ ❓

Roadmap 🗺️

Ecosystem & Related Projects 🌍

Contributing 🤗

From the Maintainers 💬

ParamBandit: Thompson Sampling to Select Optimal Parameters per Query 🔬

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages