DogPy

create venv: python3 -m venv .venv
activate source .venv/bin/activate
install deps: python3 -m pip install -r requirements.txt
run python3 __main__.py or ./__main__.py

non-interactive usage: ./__main__.py --prompt "how to create a list"

What is it?

Python documentation helper (e.g. "how do I split a string by spaces?")

How it works?

smolagents¹ for Agentic RAG + light formatting of answer
Real python docs (retrieved based on your python3 --version)
Embed each .txt file in docs with some embedding method, store it somehow (maybe add cache)
Query documentation through agent -> find relevant doc -> repeat -> output concise answer

Goals

ease of use. Interactive chat mode with highlighting and non-interactive cli raw text output.
speed. It has to be faster than googling and waiting for gemini to generate explanation
quality. It has to have knowledge required to assist with Leetcode/Advent of Code type programs, no 100% compatibility, indigenious knowledge.
local first / offline mode. Fallback to ollama (maybe solely focus on it if this does not sacrifice quality and speed too much)

Research goals

actually read references 3-6
Compare complexity of task to LIMIT² dataset
Explore how HyDE³ increase quality
Explore how Instruction-Trained Retrievers⁴ increase quality
Compare with multi-vector⁵⁶ and lexical search⁷
Formalize task, find cherry picks and assemble dataset
Multiple languages. Query in ru, documents in en VS query and documents in en. Assess ru embedding models.

@misc{weller2025theoreticallimitationsembeddingbasedretrieval,
      title={On the Theoretical Limitations of Embedding-Based Retrieval}, 
      author={Orion Weller and Michael Boratko and Iftekhar Naim and Jinhyuk Lee},
      year={2025},
      eprint={2508.21038},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2508.21038}, 
}

@misc{gao2022precisezeroshotdenseretrieval,
      title={Precise Zero-Shot Dense Retrieval without Relevance Labels}, 
      author={Luyu Gao and Xueguang Ma and Jimmy Lin and Jamie Callan},
      year={2022},
      eprint={2212.10496},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2212.10496}, 
}

@misc{weller2024promptrieverinstructiontrainedretrieversprompted,
      title={Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models}, 
      author={Orion Weller and Benjamin Van Durme and Dawn Lawrie and Ashwin Paranjape and Yuhao Zhang and Jack Hessel},
      year={2024},
      eprint={2409.11136},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2409.11136}, 
}

@article{robertson1995okapi,
  title={{Okapi at TREC-3}},
  author={Robertson, Stephen E and Walker, Stephen and Hancock-Beaulieu, Micheline M and Gatford, Mark},
  journal={Proceedings of the Third Text REtrieval Conference (TREC-3)},
  year={1995},
  pages={109--122}
}

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

@misc{nussbaum2024nomic,
      title={Nomic Embed: Training a Reproducible Long Context Text Embedder}, 
      author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
      year={2024},
      eprint={2402.01613},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@Misc{smolagents,
  title =        {`smolagents`: a smol library to build great agentic systems.},
  author =       {Aymeric Roucher and Albert Villanova del Moral and Thomas Wolf and Leandro von Werra and Erik Kaunismäki},
  howpublished = {\url{https://github.com/huggingface/smolagents}},
  year =         {2025}
}

smolagents: a smol library to build great agentic systems. ↩
On the Theoretical Limitations of Embedding-Based Retrieval ↩
Precise Zero-Shot Dense Retrieval without Relevance Labels ↩
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models ↩
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference ↩
Nomic Embed: Training a Reproducible Long Context Text Embedder ↩
Okapi at TREC-3 ↩

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agents		agents
datasets		datasets
rag		rag
.gitignore		.gitignore
README.md		README.md
__main__.py		__main__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DogPy

What is it?

How it works?

Goals

Research goals

About

Uh oh!

Releases

Packages

Languages

notTGY/dogpy

Folders and files

Latest commit

History

Repository files navigation

DogPy

What is it?

How it works?

Goals

Research goals

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages