Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 48 additions & 114 deletions 10-Retriever/10-Kiwi-BM25-Retriever.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
"metadata": {},
"source": [
"# Kiwi BM25 Retriever\n",
"\n",
"- Author: [JeongGi Park](https://github.com/jeongkpa)\n",
"- Design: []()\n",
"- Peer Review: \n",
Expand All @@ -14,27 +13,24 @@
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/01-Basic/07-LCEL-Interface.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/01-Basic/07-LCEL-Interface.ipynb)\n",
"\n",
"## Overview\n",
"This document explores the use of `kiwipiepy` for Korean morphological analysis and demonstrates its integration within the `LangChain` framework. \n",
"It highlights methods to tokenize text, compare retrieval models like `BM25` and `FAISS`, and analyze relationships between queries and documents using metrics such as cosine similarity. \n",
"Additionally, it emphasizes the role of these techniques in enhancing workflows like text analysis and information retrieval.\n",
"\n",
"Since this tutorial covers Korean morphological analysis, the output primarily contains Korean text, reflecting the language structure being analyzed\n",
"For international users, we provide English translations alongside Korean examples in this tutorial.\n",
"This tutorial explores the use of `kiwipiepy` for Korean morphological analysis and demonstrates its integration within the `LangChain` framework. \n",
"It highlights Korean text tokenization, and the comparison of different retrievers with various setups.\n",
"\n",
"Since this tutorial covers Korean morphological analysis, the output primarily contains Korean text, reflecting the language structure being analyzed.\n",
"For international users, we provide English translations alongside Korean examples.\n",
"\n",
"### Table of Contents\n",
"\n",
"- [Overview](#overview)\n",
"- [Environment Setup](#environment-setup)\n",
"- [Korean Word Retriever Tuning](#Korean-Word-Retriever-Tuning)\n",
"- [Testing with Various Sentences](#Testing-with-Various-Sentences)\n",
"- [Experiment: Compare Search Results Using Different Retrievers](#Experiment-Compare-Search-Results-Using-Different-Retrievers)\n",
"- [Korean Tokenization](#korean-tokenization)\n",
"- [Testing with Various Sentences](#testing-with-various-sentences)\n",
"- [Comparing Search Results Using Different Retrievers](#comparing-search-results-using-different-retrievers)\n",
"- [Conclusion](#conclusion)\n",
"\n",
"### References\n",
"- [kiwipiepy](https://github.com/bab2min/kiwipiepy)\n",
"- [fiass](https://python.langchain.com/docs/integrations/vectorstores/faiss/)\n",
"- [openai-embeddings](https://python.langchain.com/docs/integrations/text_embedding/openai/)\n",
"- [FAISS](https://python.langchain.com/docs/integrations/vectorstores/faiss/)\n",
"- [OpenAIEmbeddings](https://python.langchain.com/docs/integrations/text_embedding/openai/)\n",
"\n",
"---"
]
Expand Down Expand Up @@ -150,26 +146,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Why Korean Tokenization?\n",
"## Korean Tokenization\n",
"\n",
"- In Korean, words are morphologically rich. For instance, โ€œ์•ˆ๋…•ํ•˜์„ธ์š”โ€ is tokenized into:\n",
"Korean words are morphologically rich. A single word is often split into multiple morphemes (root, affix, suffix, etc.).\n",
"\n",
"For instance, โ€œ์•ˆ๋…•ํ•˜์„ธ์š”โ€ is tokenized into:\n",
" - Token(form='์•ˆ๋…•', tag='NNG')\n",
" - Token(form='ํ•˜', tag='XSA')\n",
" - Token(form='์„ธ์š”', tag='EF')\n",
"\n",
"- Compared to English tokenization at the word level (e.g., โ€œHelloโ€ remains one word), Korean often splits into multiple morphemes (์–ด๊ทผ, ์ ‘์‚ฌ, ์–ด๋ฏธ ๋“ฑ).\n",
"- Kiwi provides detailed POS tagging such as NNG(์ผ๋ฐ˜ ๋ช…์‚ฌ), XSA(ํ˜•์šฉ์‚ฌ ํŒŒ์ƒ ์ ‘์‚ฌ), EF(์ข…๊ฒฐ ์–ด๋ฏธ) to reflect these language-specific nuances.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Korean Word Retriever Tuning\n",
"\n",
"Install the Korean morphological analyzer library, `kiwipiepy`.\n",
"\n",
"[Project Link for kiwipiepy](https://github.com/bab2min/kiwipiepy)"
"We utilize `kiwipiepy`, which is a Python module for **Kiwi**, an open-source Korean morphological analyzer, to tokenize Korean text."
]
},
{
Expand All @@ -187,7 +173,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform Tokenization"
"With this, we can easily perform tokenization."
]
},
{
Expand Down Expand Up @@ -216,7 +202,7 @@
],
"source": [
"kiwi.tokenize(\"์•ˆ๋…•ํ•˜์„ธ์š”? ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ํ‚ค์œ„์ž…๋‹ˆ๋‹ค\")\n",
"# \"์•ˆ๋…•ํ•˜์„ธ์š”? ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ํ‚ค์œ„์ž…๋‹ˆ๋‹ค.\" it means \"Hi, this is Kiwi, the morphological analyser.\""
"# Translation: Hi, this is Kiwi, a morphological analyzer."
]
},
{
Expand All @@ -225,10 +211,7 @@
"source": [
"## Testing with Various Sentences\n",
"\n",
"BM25 is a traditional ranking function based on term frequency and inverse document frequency. It works well when exact keyword matches are important.\n",
"\n",
"\n",
"FAISS uses vector embeddings to capture semantic similarity. By combining BM25 with FAISS in an ensemble, we can leverage the lexical match benefits from BM25 and the semantic understanding from FAISS."
"To test different retrieval methods, we define a list of documents composed of similar yet distinguishable contents."
]
},
{
Expand All @@ -247,47 +230,33 @@
"docs = [\n",
" Document(\n",
" page_content=\"๊ธˆ์œต๋ณดํ—˜์€ ์žฅ๊ธฐ์ ์ธ ์ž์‚ฐ ๊ด€๋ฆฌ์™€ ์œ„ํ—˜ ๋Œ€๋น„๋ฅผ ๋ชฉ์ ์œผ๋กœ ๊ณ ์•ˆ๋œ ๊ธˆ์œต ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค.\"\n",
" # Translation: Financial insurance is a financial product designed for long term asset management and risk coverage.\n",
" ),\n",
" Document(\n",
" page_content=\"๊ธˆ์œต์ €์ถ•๋ณดํ—˜์€ ๊ทœ์น™์ ์ธ ์ €์ถ•์„ ํ†ตํ•ด ๋ชฉ๋ˆ์„ ๋งˆ๋ จํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ƒ๋ช…๋ณดํ—˜ ๊ธฐ๋Šฅ๋„ ๊ฒธ๋น„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.\"\n",
" # Translation: Financial savings insurance allows individuals to accumulate a lump sum through regular savings, and also offers life insurance benefits.\n",
" ),\n",
" Document(\n",
" page_content=\"์ €์ถ•๊ธˆ์œต๋ณดํ—˜์€ ์ €์ถ•๊ณผ ๊ธˆ์œต์„ ํ†ตํ•ด ๋ชฉ๋ˆ ๋งˆ๋ จ์— ๋„์›€์„ ์ฃผ๋Š” ๋ณดํ—˜์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‚ฌ๋ง ๋ณด์žฅ ๊ธฐ๋Šฅ๋„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.\"\n",
" # Translation: Savings financial insurance helps individuals gather a lump sum through savings and finance, and also provides death benefit coverage.\n",
" ),\n",
" Document(\n",
" page_content=\"๊ธˆ์œต์ €์ถ•์‚ฐ๋ฌผ๋ณดํ—˜์€ ์žฅ๊ธฐ์ ์ธ ์ €์ถ• ๋ชฉ์ ๊ณผ ๋”๋ถˆ์–ด, ์ถ•์‚ฐ๋ฌผ ์ œ๊ณต ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ”๊ณ  ์žˆ๋Š” ํŠน๋ณ„ ๊ธˆ์œต ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค.\"\n",
" # Translation: Financial savings livestock insurance is a special financial product designed for long term savings, which also includes provisions for livestock products.\n",
" ),\n",
" Document(\n",
" page_content=\"๊ธˆ์œต๋‹จํญ๊ฒฉ๋ณดํ—˜์€ ์ €์ถ•์€ ์ปค๋…• ์œ„ํ—˜ ๋Œ€๋น„์— ์ดˆ์ ์„ ๋งž์ถ˜ ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค. ๋†’์€ ์œ„ํ—˜์„ ๊ฐ์ˆ˜ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ณ ๊ฐ์—๊ฒŒ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.\"\n",
" # Translation: Financial 'carpet bombing' insurance focuses on risk coverage rather than savings. It is suitable for customers willing to take on high risk.\n",
" ),\n",
" Document(\n",
" page_content=\"๊ธˆ๋ณดํ—˜์€ ์ €์ถ•์„ฑ๊ณผ๋ฅผ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๋…ธํ›„ ๋Œ€๋น„ ์ €์ถ•์— ์œ ๋ฆฌํ•˜๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.\"\n",
" # Translation: Gold insurance maximizes returns on savings. It is especially advantageous for retirement savings.\n",
" ),\n",
" Document(\n",
" page_content=\"๊ธˆ์œต๋ณด์”จ ํ—˜ํ•œ๋ง ์ข€ ํ•˜์ง€๋งˆ์‹œ๊ณ , ์ €์ถ•์ด๋‚˜ ์ข€ ํ•˜์‹œ๋˜๊ฐ€์š”. ๋ญ๊ฐ€ ๊ทธ๋ฆฌ ๊ธ‰ํ•˜์‹ ์ง€ ๋ชจ๋ฅด๊ฒ ๋„ค์š”.\"\n",
" # Translation: Hey, Mr. 'Financial Bo,' please refrain from harsh words and consider saving money. I'm not sure why you're in such a hurry.\n",
" ),\n",
"]\n",
"\n",
"# ๊ธˆ์œต๋ณดํ—˜์€ ์žฅ๊ธฐ์ ์ธ ์ž์‚ฐ ๊ด€๋ฆฌ์™€ ์œ„ํ—˜ ๋Œ€๋น„๋ฅผ ๋ชฉ์ ์œผ๋กœ ๊ณ ์•ˆ๋œ ๊ธˆ์œต ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค.\n",
"# Financial insurance is a financial product designed for long term asset management and risk coverage.\n",
"\n",
"# ๊ธˆ์œต์ €์ถ•๋ณดํ—˜์€ ๊ทœ์น™์ ์ธ ์ €์ถ•์„ ํ†ตํ•ด ๋ชฉ๋ˆ์„ ๋งˆ๋ จํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ƒ๋ช…๋ณดํ—˜ ๊ธฐ๋Šฅ๋„ ๊ฒธ๋น„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.\n",
"# Financial savings insurance allows individuals to accumulate a lump sum through regular savings, and also offers life insurance benefits.\n",
"\n",
"# ์ €์ถ•๊ธˆ์œต๋ณดํ—˜์€ ์ €์ถ•๊ณผ ๊ธˆ์œต์„ ํ†ตํ•ด ๋ชฉ๋ˆ ๋งˆ๋ จ์— ๋„์›€์„ ์ฃผ๋Š” ๋ณดํ—˜์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‚ฌ๋ง ๋ณด์žฅ ๊ธฐ๋Šฅ๋„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.\n",
"# Savings financial insurance helps individuals gather a lump sum through savings and finance, and also provides death benefit coverage.\n",
"\n",
"# ๊ธˆ์œต์ €์ถ•์‚ฐ๋ฌผ๋ณดํ—˜์€ ์žฅ๊ธฐ์ ์ธ ์ €์ถ• ๋ชฉ์ ๊ณผ ๋”๋ถˆ์–ด, ์ถ•์‚ฐ๋ฌผ ์ œ๊ณต ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ”๊ณ  ์žˆ๋Š” ํŠน๋ณ„ ๊ธˆ์œต ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค.\n",
"# Financial savings livestock insurance is a special financial product designed for long term savings, which also includes provisions for livestock products.\n",
"\n",
"# ๊ธˆ์œต๋‹จํญ๊ฒฉ๋ณดํ—˜์€ ์ €์ถ•์€ ์ปค๋…• ์œ„ํ—˜ ๋Œ€๋น„์— ์ดˆ์ ์„ ๋งž์ถ˜ ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค. ๋†’์€ ์œ„ํ—˜์„ ๊ฐ์ˆ˜ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ณ ๊ฐ์—๊ฒŒ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.\n",
"# Financial 'carpet bombing' insurance focuses on risk coverage rather than savings. It is suitable for customers willing to take on high risk.\n",
"\n",
"# ๊ธˆ๋ณดํ—˜์€ ์ €์ถ•์„ฑ๊ณผ๋ฅผ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๋…ธํ›„ ๋Œ€๋น„ ์ €์ถ•์— ์œ ๋ฆฌํ•˜๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.\n",
"# Gold insurance maximizes returns on savings. It is especially advantageous for retirement savings.\n",
"\n",
"# ๊ธˆ์œต๋ณด์”จ ํ—˜ํ•œ๋ง ์ข€ ํ•˜์ง€๋งˆ์‹œ๊ณ , ์ €์ถ•์ด๋‚˜ ์ข€ ํ•˜์‹œ๋˜๊ฐ€์š”. ๋ญ๊ฐ€ ๊ทธ๋ฆฌ ๊ธ‰ํ•˜์‹ ์ง€ ๋ชจ๋ฅด๊ฒ ๋„ค์š”.\n",
"# Hey, Mr. 'Financial Bo,' please refrain from harsh words and consider saving money. I'm not sure why you're in such a hurry.\n"
"]"
]
},
{
Expand Down Expand Up @@ -321,42 +290,42 @@
"metadata": {},
"outputs": [],
"source": [
"# Create a tokenization function\n",
"\n",
"# Define a tokenization function\n",
"def kiwi_tokenize(text):\n",
" return [token.form for token in kiwi.tokenize(text)]\n"
" return [token.form for token in kiwi.tokenize(text)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Experiment: Compare Search Results Using Different Retrievers\n",
"## Comparing Search Results Using Different Retrievers\n",
"\n",
"\n",
"In this section, we compare how different retrieval methods rank documents when given the same query. We are using:\n",
"\n",
"* `BM25`: A traditional ranking function based on term frequency (TF) and inverse document frequency (IDF).\n",
"* `Kiwi BM25`: `BM25` with an added benefit of kiwipiepy tokenization, enabling more accurate splitting of Korean words into morphemes (especially important for Korean queries).\n",
"* `FAISS`: A vector-based retriever using embeddings (in this case, `OpenAIEmbeddings`). It captures semantic similarity, so itโ€™s less reliant on exact keyword matches and more on meaning.\n",
"* `Ensemble`: A combination of BM25 (or `Kiwi BM25`) and `FAISS`, weighted to leverage both the lexical matching strengths of `BM25` and the semantic understanding of FAISS.\n",
"- **BM25**: A traditional ranking function based on term frequency (TF) and inverse document frequency (IDF).\n",
"- **Kiwi BM25**: BM25 with an added benefit of kiwipiepy tokenization, enabling more accurate splitting of Korean words into morphemes (especially important for Korean queries).\n",
"- **FAISS**: A vector-based retriever using embeddings (in this case, `OpenAIEmbeddings`). It captures semantic similarity, so itโ€™s less reliant on exact keyword matches and more on meaning.\n",
"- **Ensemble**: A combination of BM25 (or Kiwi BM25) and FAISS, weighted to leverage both the lexical matching strengths of BM25 and the semantic understanding of FAISS.\n",
"\n",
"### Key points of Comparison\n",
"\n",
"**Exact Keyword Matching vs. Semantic Matching**\n",
"\n",
"* `BM25` (and `Kiwi BM25`) excel in finding documents that share exact terms or closely related morphological variants.\n",
"* `FAISS` retrieves documents that may not have exact lexical overlap but are semantically similar (e.g., synonyms or paraphrases).\n",
"- **BM25** (and **Kiwi BM25**) excel in finding documents that share exact terms or closely related morphological variants.\n",
"- **FAISS** retrieves documents that may not have exact lexical overlap but are semantically similar (e.g., synonyms or paraphrases).\n",
"\n",
"**Impact of Korean Morphological Analysis**\n",
"**Impact of Korean morphological analysis**\n",
"\n",
"* Korean often merges stems and endings into single words (โ€œ์•ˆ๋…•ํ•˜์„ธ์š”โ€ โ†’ โ€œ์•ˆ๋…• + ํ•˜ + ์„ธ์š”โ€). `Kiwi BM25` handles this by splitting the query and documents more precisely.\n",
"* This can yield more relevant results when dealing with conjugated verbs, particles, or compound nouns.\n",
"- Korean often merges stems and endings into single words (โ€œ์•ˆ๋…•ํ•˜์„ธ์š”โ€ โ†’ โ€œ์•ˆ๋…• + ํ•˜ + ์„ธ์š”โ€). **Kiwi BM25** handles this by splitting the query and documents more precisely.\n",
"- This can yield more relevant results when dealing with conjugated verbs, particles, or compound nouns.\n",
"\n",
"**Ensemble Approaches**\n",
"\n",
"* By combining lexical (`BM25`) and semantic (`FAISS`) retrievers, we can produce a more balanced set of results.\n",
"* The weighting (e.g., 70:30 or 30:70) can be tuned to emphasize one aspect over the other.\n",
"* Using MMR (Maximal Marginal Relevance) ensures diversity in the retrieved results, reducing redundancy."
"- By combining lexical (BM25) and semantic (FAISS) retrievers, we can produce a more balanced set of results.\n",
"- The weighting (e.g., 70:30 or 30:70) can be tuned to emphasize one aspect over the other.\n",
"- Using MMR (Maximal Marginal Relevance) ensures diversity in the retrieved results, reducing redundancy."
]
},
{
Expand Down Expand Up @@ -424,7 +393,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Function to print search results from multiple retrievers\n",
"# Define a function to print search results from multiple retrievers\n",
"def print_search_results(retrievers, query):\n",
" \"\"\"\n",
" Prints the top search result from each retriever for a given query.\n",
Expand All @@ -444,41 +413,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Display Search Results\n",
"\n",
"๊ธˆ์œต๋ณดํ—˜์€ ์žฅ๊ธฐ์ ์ธ ์ž์‚ฐ ๊ด€๋ฆฌ์™€ ์œ„ํ—˜ ๋Œ€๋น„๋ฅผ ๋ชฉ์ ์œผ๋กœ ๊ณ ์•ˆ๋œ ๊ธˆ์œต ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค.\n",
"\n",
"-> Financial insurance is a financial product designed for long term asset management and risk coverage\n",
"\n",
"\n",
"๊ธˆ์œต์ €์ถ•๋ณดํ—˜์€ ๊ทœ์น™์ ์ธ ์ €์ถ•์„ ํ†ตํ•ด ๋ชฉ๋ˆ์„ ๋งˆ๋ จํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ƒ๋ช…๋ณดํ—˜ ๊ธฐ๋Šฅ๋„ ๊ฒธ๋น„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.\n",
"\n",
"-> Financial savings insurance allows individuals to accumulate a lump sum through regular savings, and also offers life insurance benefits\n",
"\n",
"\n",
"์ €์ถ•๊ธˆ์œต๋ณดํ—˜์€ ์ €์ถ•๊ณผ ๊ธˆ์œต์„ ํ†ตํ•ด ๋ชฉ๋ˆ ๋งˆ๋ จ์— ๋„์›€์„ ์ฃผ๋Š” ๋ณดํ—˜์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‚ฌ๋ง ๋ณด์žฅ ๊ธฐ๋Šฅ๋„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.\n",
"\n",
"-> Savings financial insurance helps individuals gather a lump sum through savings and finance, and also provides death benefit coverage\n",
"\n",
"\n",
"๊ธˆ์œต์ €์ถ•์‚ฐ๋ฌผ๋ณดํ—˜์€ ์žฅ๊ธฐ์ ์ธ ์ €์ถ• ๋ชฉ์ ๊ณผ ๋”๋ถˆ์–ด, ์ถ•์‚ฐ๋ฌผ ์ œ๊ณต ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ”๊ณ  ์žˆ๋Š” ํŠน๋ณ„ ๊ธˆ์œต ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค.\n",
"\n",
"-> Financial savings livestock insurance is a special financial product designed for long term savings, which also includes provisions for livestock products\n",
"\n",
"\n",
"๊ธˆ์œต๋‹จํญ๊ฒฉ๋ณดํ—˜์€ ์ €์ถ•์€ ์ปค๋…• ์œ„ํ—˜ ๋Œ€๋น„์— ์ดˆ์ ์„ ๋งž์ถ˜ ์ƒํ’ˆ์ž…๋‹ˆ๋‹ค. ๋†’์€ ์œ„ํ—˜์„ ๊ฐ์ˆ˜ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ณ ๊ฐ์—๊ฒŒ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.\n",
"\n",
"-> Financial 'carpet bombing' insurance focuses on risk coverage rather than savings. It is suitable for customers willing to take on high risk\n",
"\n",
"\n",
"๊ธˆ๋ณดํ—˜์€ ์ €์ถ•์„ฑ๊ณผ๋ฅผ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๋…ธํ›„ ๋Œ€๋น„ ์ €์ถ•์— ์œ ๋ฆฌํ•˜๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.\n",
"\n",
"-> Gold insurance maximizes returns on savings. It is especially advantageous for retirement savings\n",
"\n",
"\n",
"๊ธˆ์œต๋ณด์”จ ํ—˜ํ•œ๋ง ์ข€ ํ•˜์ง€๋งˆ์‹œ๊ณ , ์ €์ถ•์ด๋‚˜ ์ข€ ํ•˜์‹œ๋˜๊ฐ€์š”. ๋ญ๊ฐ€ ๊ทธ๋ฆฌ ๊ธ‰ํ•˜์‹ ์ง€ ๋ชจ๋ฅด๊ฒ ๋„ค์š”.\n",
"\n",
"-> Hey, Mr. 'Financial Bo,' please refrain from harsh words and consider saving money. I'm not sure why you're in such a hurry.\n"
"### Displaying Search Results\n",
"Let's display the search results for a variety of queries, and see how different retrievers perform."
]
},
{
Expand Down Expand Up @@ -637,13 +573,11 @@
"source": [
"## Conclusion\n",
"\n",
"By running the code and observing the top documents returned for each query, youโ€™ll see how each retriever type has its strengths:\n",
"\n",
"`BM25` / `Kiwi BM25`: Great for precise keyword matching, beneficial for Korean morphological nuances.\n",
"\n",
"`FAISS`: Finds semantically related documents even if the wording differs.\n",
"By running the code and observing the top documents returned for each query, you can see how each retriever type has its strengths:\n",
"\n",
"`Ensemble`: Balances both worlds, often achieving better overall coverage for a wide range of queries.\n"
"- `BM25` / `Kiwi BM25`: Great for precise keyword matching, beneficial for Korean morphological nuances.\n",
"- `FAISS`: Finds semantically related documents even if the wording differs.\n",
"- `Ensemble`: Balances both worlds, often achieving better overall coverage for a wide range of queries.\n"
]
}
],
Expand Down
Loading