The main goal here is to just have a better way of organizing papers for readers of Daily Papers. It leverages Neo4j and Google AI Studio's Gemini 2.0 Flash to create a knowledge graph of papers, their concepts, and relationships, allowing you to easily discover connections and insights. It's designed for readers of Daily Papers who want a deeper understanding of the research landscape.
To get started, you'll need to do the following:
-
Install Neo4j Desktop: Download and install Neo4j Desktop. This is necessary to create and manage your local graph database.
-
Create a Neo4j Database:
- Open Neo4j Desktop.
- Create a new DBMS.
- Inside the DBMS, create a new database (e.g., named
paper_assistant). - Start the database.
-
Set the
DATABASEEnvironment Variable:- Create a
.envfile in the root of your project directory. - Add the
DATABASEvariable to the.envfile. The value should be the name of your database.
- Create a
-
Get a Google AI Studio API Key:
- Go to Google AI Studio and create an account (if you don't already have one).
- Create a new API key.
- Add the
GEMINI_API_KEYvariable to your.envfile:GEMINI_API_KEY=YOUR_GEMINI_API_KEY
-
Install Python Dependencies:
- Run
pip install -r requirements.txtto install the required Python packages, including docling, which is used for text processing and analysis.
- Run
-
Note:
- The pdf for the papers processed are stored in the
papers/directory, and the corresponding markdowns are stored in themarkdown/directory. A few of these have been added as a starting point.
- The pdf for the papers processed are stored in the
In order to get started, you should first populate the DATABASE and GEMINI_API_KEY in the `.env.
There are 2 user-facing scripts -
store.py:- This script is used for processing today's papers and storing them in the graph db.
- To process the papers already provided in the
papers/directory, you can run this script with--existing. - The data should now be available in the database.
retrieve.py: This script allows you to interact with the db and explore the relationships between papers, concepts, and clusters.- Example Queries:
- "What's paper all about"
- "List various approaches related to "
- Exiting the Loop: Type
"exit"/"q"/"quit"to end the loop.
- Example Queries:
Here's a visualization of the knowledge graph generated from the existing papers in the papers/ directory:
The graph shows papers (orange), concepts (blue), and clusters (green). This visualization helps understand the connections between research areas.