|
12 | 12 | "metadata": {},
|
13 | 13 | "source": [
|
14 | 14 | "# 2. Creating a synthetic Q&A dataset\n",
|
15 |
| - "We use [`davinci-instruct-beta-v2`](https://beta.openai.com/docs/engines/instruct-series-beta), a model specialized in following instructions, to create questions based on the given context. Then we also use [`davinci-instruct-beta-v2`](https://beta.openai.com/docs/engines/instruct-series-beta) to answer those questions, given the same context. \n", |
| 15 | + "We use [`davinci-instruct-beta-v3`](https://beta.openai.com/docs/engines/instruct-series-beta), a model specialized in following instructions, to create questions based on the given context. Then we also use [`davinci-instruct-beta-v3`](https://beta.openai.com/docs/engines/instruct-series-beta) to answer those questions, given the same context. \n", |
16 | 16 | "\n",
|
17 | 17 | "This is expensive, and will also take a long time, as we call the davinci engine for each section. You can simply download the final dataset instead.\n",
|
18 | 18 | "\n",
|
|
175 | 175 | "def get_questions(context):\n",
|
176 | 176 | " try:\n",
|
177 | 177 | " response = openai.Completion.create(\n",
|
178 |
| - " engine=\"davinci-instruct-beta-v2\",\n", |
| 178 | + " engine=\"davinci-instruct-beta-v3\",\n", |
179 | 179 | " prompt=f\"Write questions based on the text below\\n\\nText: {context}\\n\\nQuestions:\\n1.\",\n",
|
180 | 180 | " temperature=0,\n",
|
181 | 181 | " max_tokens=257,\n",
|
|
255 | 255 | "def get_answers(row):\n",
|
256 | 256 | " try:\n",
|
257 | 257 | " response = openai.Completion.create(\n",
|
258 |
| - " engine=\"davinci-instruct-beta-v2\",\n", |
| 258 | + " engine=\"davinci-instruct-beta-v3\",\n", |
259 | 259 | " prompt=f\"Write questions based on the text below\\n\\nText: {row.context}\\n\\nQuestions:\\n{row.questions}\\n\\nAnswers:\\n1.\",\n",
|
260 | 260 | " temperature=0,\n",
|
261 | 261 | " max_tokens=257,\n",
|
|
385 | 385 | }
|
386 | 386 | ],
|
387 | 387 | "source": [
|
388 |
| - "answer_question(olympics_search_fileid, \"davinci-instruct-beta-v2\", \n", |
| 388 | + "answer_question(olympics_search_fileid, \"davinci-instruct-beta-v3\", \n", |
389 | 389 | " \"Where did women's 4 x 100 metres relay event take place during the 2020 Summer Olympics?\")"
|
390 | 390 | ]
|
391 | 391 | },
|
392 | 392 | {
|
393 | 393 | "cell_type": "markdown",
|
394 | 394 | "metadata": {},
|
395 | 395 | "source": [
|
396 |
| - "After we fine-tune the model for Q&A we'll be able to use it instead of [`davinci-instruct-beta-v2`](https://beta.openai.com/docs/engines/instruct-series-beta), to obtain better answers when the question can't be answered based on the context. We see a downside of [`davinci-instruct-beta-v2`](https://beta.openai.com/docs/engines/instruct-series-beta), which always attempts to answer the question, regardless of the relevant context being present or not. (Note the second question is asking about a future event, set in 2024.)" |
| 396 | + "After we fine-tune the model for Q&A we'll be able to use it instead of [`davinci-instruct-beta-v3`](https://beta.openai.com/docs/engines/instruct-series-beta), to obtain better answers when the question can't be answered based on the context. We see a downside of [`davinci-instruct-beta-v3`](https://beta.openai.com/docs/engines/instruct-series-beta), which always attempts to answer the question, regardless of the relevant context being present or not. (Note the second question is asking about a future event, set in 2024.)" |
397 | 397 | ]
|
398 | 398 | },
|
399 | 399 | {
|
|
413 | 413 | }
|
414 | 414 | ],
|
415 | 415 | "source": [
|
416 |
| - "answer_question(olympics_search_fileid, \"davinci-instruct-beta-v2\", \n", |
| 416 | + "answer_question(olympics_search_fileid, \"davinci-instruct-beta-v3\", \n", |
417 | 417 | " \"Where did women's 4 x 100 metres relay event take place during the 2048 Summer Olympics?\", max_len=1000)"
|
418 | 418 | ]
|
419 | 419 | },
|
|
0 commit comments