Description

This is a python script used to generate an index for a SANS book. It uses the OpenAI GPT-4 model to identify key terms and their definitions from each page of the book. The index is then written to a CSV file.

It consists of two main scripts:

indexer.py - This script goes through each page of a given PDF and uses GPT-4 to identify a key term and its definition on that page, creating an index file for that book.
aggregator.py - This script combines the index files of multiple books into a composite index file.

Setup Instructions

Clone the repository.
Install the necessary dependencies by running pip install -r requirements.txt in your terminal.
Create a .env file in the root directory of the project.

.env file

Rename example.env to just .env, and the file should contain the following:

OPENAI_API_KEY=<your-openai-api-key>
PDF_PASSWORD=<your-pdf-password>

Replace and with your actual OpenAI API key and PDF password. Make sure to not have any spaces or single or double quotes surrouding your key and password in the .env file.

OpenAI API Key

To obtain an OpenAI API Key, follow these steps:

Visit the OpenAI website.
Sign up for an account if you don't already have one.
Go to the API section in the account dashboard.
Generate a new API Key.

Remember to treat your API keys as sensitive data, do not expose them publicly or to anyone you do not trust.

Adjustments

You will need to adjust the name of the book/course at a few points in the code indicated by comments.

If you are creating indexes for more books or less books, adjust the range in the second for loop in aggregator.py to match the number of books you are indexing. Running the Scripts

Run the indexer.py script for each book you want to index.
Once all books have been indexed, run the aggregator.py script to create a composite index of all the books.

Error Handling

If you receive an error stating that gpt-4 is not available, you will need to replace "gpt-4" in the openai.ChatCompletion.create() call with the identifier of the most recent model available in the OpenAI models list.

Visit the OpenAI Models page to see the currently available models.

Please note that using a different model may result in differing results from the ones mentioned above.

Final Notes

The scripts provided assume a specific format and content for the PDF and might not work as expected for all SANS books or other types of documents. You may need to tweak the scripts based on the specifics of your PDFs.

Once the index is fully generated you can open it in excel and format appropriately to your preference. Here is an example of the final product:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
aggregator.py		aggregator.py
example.env		example.env
example.png		example.png
indexer.py		indexer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

Setup Instructions

.env file

OpenAI API Key

Adjustments

Error Handling

Final Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Bulky642/SANS_Auto_Indexer

Folders and files

Latest commit

History

Repository files navigation

Description

Setup Instructions

.env file

OpenAI API Key

Adjustments

Error Handling

Final Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages