Official UI | UI Code | Stable API | Blog
Keep searching, reading webpages, reasoning until an answer is found (or the token budget is exceeded). Useful for deeply investigating a query.
Important
Unlike OpenAI/Gemini/Perplexity's "Deep Research", we focus solely on finding the right answers via our iterative process. We don't optimize for long-form articles, that's a completely different problem – so if you need quick, concise answers from deep search, you're in the right place. If you're looking for AI-generated long reports like OpenAI/Gemini/Perplexity does, this isn't for you.
---
config:
theme: mc
look: handDrawn
---
flowchart LR
subgraph Loop["until budget exceed"]
direction LR
Search["Search"]
Read["Read"]
Reason["Reason"]
end
Query(["Query"]) --> Loop
Search --> Read
Read --> Reason
Reason --> Search
Loop --> Answer(["Answer"])
Whether you like this implementation or not, I highly recommend you to read DeepSearch/DeepResearch implementation guide I wrote, which gives you a gentle intro to this topic.
We host an online deployment of this exact codebase, which allows you to do a vibe-check; or use it as daily productivity tools.
The official API is also available for you to use:
https://deepsearch.jina.ai/v1/chat/completions
Learn more about the API at https://jina.ai/deepsearch
git clone https://github.com/jina-ai/node-DeepResearch.git
cd node-DeepResearch
npm installIt is also available on npm but not recommended for now, as the code is still under active development.
We use Gemini (latest gemini-2.0-flash) / OpenAI / LocalLLM for reasoning, Jina Reader for searching and reading webpages, you can get a free API key with 1M tokens from jina.ai.
export GEMINI_API_KEY=... # for gemini
# export OPENAI_API_KEY=... # for openai
# export LLM_PROVIDER=openai # for openai
export JINA_API_KEY=jina_... # free jina api key, get from https://jina.ai/reader
npm run dev $QUERYYou can try it on our official site.
You can also use our official DeepSearch API:
https://deepsearch.jina.ai/v1/chat/completions
You can use it with any OpenAI-compatible client.
For the authentication Bearer, API key, rate limit, get from https://jina.ai/deepsearch.
If you are building a web/local/mobile client that uses Jina DeepSearch API, here are some design guidelines:
- Our API is fully compatible with OpenAI API schema, this should greatly simplify the integration process. The model name is
jina-deepsearch-v1. - Our DeepSearch API is a reasoning+search grounding LLM, so it's best for questions that require deep reasoning and search.
- Two special tokens are introduced
<think>...</think>. Please render them with care. - Citations are often provided, and in Github-flavored markdown footnote format, e.g.
[^1],[^2], ... - Guide the user to get a Jina API key from https://jina.ai, with 1M free tokens for new API key.
- There are rate limits, between 10RPM to 30RPM depending on the API key tier.
- Download Jina AI logo here
was recorded with
gemini-1.5-flash, the latestgemini-2.0-flashleads to much better results!
Query: "what is the latest blog post's title from jina ai?"
3 steps; answer is correct!

Query: "what is the context length of readerlm-v2?"
2 steps; answer is correct!

Query: "list all employees from jina ai that u can find, as many as possible"
11 steps; partially correct! but im not in the list :(

Query: "who will be the biggest competitor of Jina AI"
42 steps; future prediction kind, so it's arguably correct! atm Im not seeing weaviate as a competitor, but im open for the future "i told you so" moment.

More examples:
# example: no tool calling
npm run dev "1+1="
npm run dev "what is the capital of France?"
# example: 2-step
npm run dev "what is the latest news from Jina AI?"
# example: 3-step
npm run dev "what is the twitter account of jina ai's founder"
# example: 13-step, ambiguious question (no def of "big")
npm run dev "who is bigger? cohere, jina ai, voyage?"
# example: open question, research-like, long chain of thoughts
npm run dev "who will be president of US in 2028?"
npm run dev "what should be jina ai strategy for 2025?"
Note, not every LLM works with our reasoning flow, we need those who support structured output (sometimes called JSON Schema output, object output) well. Feel free to purpose a PR to add more open-source LLMs to the working list.
If you use Ollama or LMStudio, you can redirect the reasoning request to your local LLM by setting the following environment variables:
export LLM_PROVIDER=openai # yes, that's right - for local llm we still use openai client
export OPENAI_BASE_URL=http://127.0.0.1:1234/v1 # your local llm endpoint
export OPENAI_API_KEY=whatever # random string would do, as we don't use it (unless your local LLM has authentication)
export DEFAULT_MODEL_NAME=qwen2.5-7b # your local llm model nameIf you have a GUI client that supports OpenAI API (e.g. CherryStudio, Chatbox) , you can simply config it to use this server.
Start the server:
# Without authentication
npm run serve
# With authentication (clients must provide this secret as Bearer token)
npm run serve --secret=your_secret_tokenThe server will start on http://localhost:3000 with the following endpoint:
The server uses Winston with Google Cloud Logging for comprehensive logging. When deployed to Google Cloud environments (App Engine, GKE, Compute Engine, etc.), logs will automatically be sent to Cloud Logging.
For local development, logs will be output to the console. To view logs in Google Cloud:
- Go to Cloud Logging Console
- Select your project
- Use the query builder to filter logs by severity, request URLs, or other metadata
Configure your Google Cloud project credentials by either:
- Setting up Application Default Credentials
- For local development, you can also specify project ID and credentials:
# Optional: for local development with specific credentials export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
# Without authentication
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "jina-deepsearch-v1",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
# With authentication (when server is started with --secret)
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_secret_token" \
-d '{
"model": "jina-deepsearch-v1",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"stream": true
}'Response format:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "jina-deepsearch-v1",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "YOUR FINAL ANSWER"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}For streaming responses (stream: true), the server sends chunks in this format:
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1694268190,
"model": "jina-deepsearch-v1",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"delta": {
"content": "..."
},
"logprobs": null,
"finish_reason": null
}
]
}Note: The think content in streaming responses is wrapped in XML tags:
<think>
[thinking steps...]
</think>
[final answer]
To build the Docker image for the application, run the following command:
docker build -t deepresearch:latest .To run the Docker container, use the following command:
docker run -p 3000:3000 --env GEMINI_API_KEY=your_gemini_api_key --env JINA_API_KEY=your_jina_api_key deepresearch:latestYou can also use Docker Compose to manage multi-container applications. To start the application with Docker Compose, run:
docker-compose upNot sure a flowchart helps, but here it is:
flowchart TD
Start([Start]) --> Init[Initialize context & variables]
Init --> CheckBudget{Token budget<br/>exceeded?}
CheckBudget -->|No| GetQuestion[Get current question<br/>from gaps]
CheckBudget -->|Yes| BeastMode[Enter Beast Mode]
GetQuestion --> GenPrompt[Generate prompt]
GenPrompt --> ModelGen[Generate response<br/>using Gemini]
ModelGen --> ActionCheck{Check action<br/>type}
ActionCheck -->|answer| AnswerCheck{Is original<br/>question?}
AnswerCheck -->|Yes| EvalAnswer[Evaluate answer]
EvalAnswer --> IsGoodAnswer{Is answer<br/>definitive?}
IsGoodAnswer -->|Yes| HasRefs{Has<br/>references?}
HasRefs -->|Yes| End([End])
HasRefs -->|No| GetQuestion
IsGoodAnswer -->|No| StoreBad[Store bad attempt<br/>Reset context]
StoreBad --> GetQuestion
AnswerCheck -->|No| StoreKnowledge[Store as intermediate<br/>knowledge]
StoreKnowledge --> GetQuestion
ActionCheck -->|reflect| ProcessQuestions[Process new<br/>sub-questions]
ProcessQuestions --> DedupQuestions{New unique<br/>questions?}
DedupQuestions -->|Yes| AddGaps[Add to gaps queue]
DedupQuestions -->|No| DisableReflect[Disable reflect<br/>for next step]
AddGaps --> GetQuestion
DisableReflect --> GetQuestion
ActionCheck -->|search| SearchQuery[Execute search]
SearchQuery --> NewURLs{New URLs<br/>found?}
NewURLs -->|Yes| StoreURLs[Store URLs for<br/>future visits]
NewURLs -->|No| DisableSearch[Disable search<br/>for next step]
StoreURLs --> GetQuestion
DisableSearch --> GetQuestion
ActionCheck -->|visit| VisitURLs[Visit URLs]
VisitURLs --> NewContent{New content<br/>found?}
NewContent -->|Yes| StoreContent[Store content as<br/>knowledge]
NewContent -->|No| DisableVisit[Disable visit<br/>for next step]
StoreContent --> GetQuestion
DisableVisit --> GetQuestion
BeastMode --> FinalAnswer[Generate final answer] --> End
If you're experiencing intermittent 500 Internal Server Error when using Google's Gemini API for embeddings, this is a known issue affecting many users. Here are the solutions implemented in this codebase:
- Error Message:
"500 Internal error encountered"or"An internal error has occurred" - Frequency: Intermittent, especially during high load periods
- Root Causes:
- Regional API endpoint overload
- Model capacity limitations (often shows as 503 but manifests as 500)
- Large payload sizes triggering server errors
- Problematic text characters causing tokenization issues
-
Reduced Batch Sizes (Line 5 in
src/tools/embeddings.ts)const BATCH_SIZE = 50; // Reduced from 100 to avoid 500 errors
-
Enhanced Retry Logic with Exponential Backoff
- Increased retries from 3 to 5 attempts
- Longer delays for server errors (2-30 seconds vs 1-10 seconds)
- Added jitter to prevent thundering herd effects
-
Text Preprocessing
- Removes problematic Unicode characters that can cause tokenization errors
- Normalizes encoding issues (smart quotes, em dashes, etc.)
- Truncates extremely long texts (>30,000 characters)
-
Circuit Breaker Pattern
- After 3 consecutive failures, temporarily stops API calls for 1 minute
- Automatically generates zero embeddings as placeholders
- Prevents cascading failures and API rate limiting
-
Alternative Model Fallback
- Switches to
text-embedding-004after 2 consecutive failures - Provides redundancy when primary model is overloaded
- Switches to
If you continue experiencing issues, try these approaches:
-
Regional Switching (Based on Google Cloud Community feedback)
- The error is often region-specific
- Users report success switching from
northamerica-northeast1tous-central1 - Add this environment variable to try different regions:
export GOOGLE_AI_REGION=us-central1 -
Reduce Payload Size
- Further reduce
BATCH_SIZEinsrc/tools/embeddings.ts(try 25 or 10) - Implement text chunking for very long documents
- Further reduce
-
Implement Request Queuing
- Add delays between batches to reduce API load
- Use a queue system to serialize requests during peak hours
The enhanced error logging now captures:
- HTTP status codes (500, 503)
- Error type classification (server error vs overload)
- Batch sizes that trigger errors
- Circuit breaker status
- Retry attempt details
Look for these log patterns:
ERROR: Error calling Google Embeddings API (attempt X/5)
DEBUG: Circuit breaker: X consecutive failures
DEBUG: Using alternative model due to failures: text-embedding-004
