-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Adding transaction classification notebooks #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.gitignore
Outdated
@@ -127,3 +127,9 @@ dmypy.json | |||
|
|||
# Pyre type checker | |||
.pyre/ | |||
|
|||
# helpers | |||
*helpers.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been removed and the file checked out of Git
"import pandas as pd\n", | ||
"import numpy as np\n", | ||
"\n", | ||
"from helpers import OPENAI_API_KEY\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this import, and you can replace with:
import os
openai.api_key = os.getenv("OPENAI_API_KEY")
"import json\n", | ||
"\n", | ||
"def check_finetune_classes(train_file,valid_file):\n", | ||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a very short docstring explaining what this code does
"outputs": [], | ||
"source": [ | ||
"zero_shot_prompt = '''You are a data expert working for the National Library of Scotland. \n", | ||
" You are analysing all transactions over £25,000 in value and classifying them into one of five categories.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a bunch of whitespace in every line. You should start at the beginning of line (not idented) for every line of the multiline string.
"from sklearn.model_selection import train_test_split\n", | ||
"from sklearn.metrics import classification_report, accuracy_score\n", | ||
"\n", | ||
"fs_df = pd.read_csv(embedding_path)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a parameter index_col=0 (to get rid of the "Unnamed: 0" column
"import matplotlib\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"\n", | ||
"from helpers import OPENAI_API_KEY\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace this with the same as for the previous notebook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make those small changes first, then it's good to go! Thanks
Adding transaction classification notebooks
Adding transaction classification notebooks
I've made two examples of transaction classification using GPT-3, one with multiclass classification and one using clustering on an unlabelled dataset.
I've also included the source dataset used in the multiclass classification notebook, plus a set of labelled examples I made based on it.
I added a .gitignore to the repo as well