From 9583e3a4b65f61cbc54a5019f9bcd65fd6b8f0ed Mon Sep 17 00:00:00 2001
From: greencode <greencode99@gmail.com>
Date: Mon, 6 Jan 2025 20:37:08 +0900
Subject: [PATCH] [N-2] 07-Text Splitter / 05-CodeSplitter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[N-2] 07-Text Splitter / 05-CodeSplitter

ISSUE 07-05-CodeSplitter #36 에 대해서 수정 하였습니다.
---
 07-TextSplitter/05-CodeSplitter.ipynb | 318 +++++++++++++-------------
 1 file changed, 158 insertions(+), 160 deletions(-)

diff --git a/07-TextSplitter/05-CodeSplitter.ipynb b/07-TextSplitter/05-CodeSplitter.ipynb
index 38ab8737c..ef3b461ca 100644
--- a/07-TextSplitter/05-CodeSplitter.ipynb
+++ b/07-TextSplitter/05-CodeSplitter.ipynb
@@ -6,19 +6,20 @@
    "source": [
     "# Split code with Langchain\n",
     "\n",
-    "- Author: [greencode](https://github.com/greencode-99)\n",
+    "- Author: [Jongcheol Kim](https://github.com/greencode-99)\n",
     "- Design: \n",
+    "- Peer Review:\n",
     "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
     "\n",
     "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)\n",
     "\n",
     "## Overview\n",
     "\n",
-    "`RecursiveCharacterTextSplitter` includes pre-built lists of separators that are useful for splitting text in a specific programming language.\n",
+    "`RecursiveCharacterTextSplitter` includes pre-built separator lists optimized for splitting text in different programming languages.\n",
     "\n",
-    "You can split code written in various programming languages using `CodeTextSplitter`.\n",
+    "The `CodeTextSplitter` provides even more specialized functionality for splitting code.\n",
     "\n",
-    "To do this, import the `Language` enum and specify the corresponding programming language.\n",
+    "To use it, import the `Language` enum(enumeration) and specify the desired programming language.\n",
     "\n",
     "\n",
     "### Table of Contents\n",
@@ -27,8 +28,8 @@
     "- [Environment Setup](#environment-setup)\n",
     "- [Code Spliter Examples](#code-splitter-examples)\n",
     "   - [Python](#python)\n",
-    "   - [JS](#js)\n",
-    "   - [TS](#ts)\n",
+    "   - [JavaScript](#javascript)\n",
+    "   - [TypeScript](#typescript)\n",
     "   - [Markdown](#markdown)\n",
     "   - [LaTeX](#latex)\n",
     "   - [HTML](#html)\n",
@@ -52,13 +53,13 @@
     "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
     "\n",
     "**[Note]**\n",
-    "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
+    "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.\n",
     "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -68,7 +69,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -86,7 +87,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
@@ -114,7 +115,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
@@ -123,7 +124,7 @@
        "True"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 4,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -140,7 +141,7 @@
    "source": [
     "## Code Splitter Examples\n",
     "\n",
-    "Here is an example of splitting text using `RecursiveCharacterTextSplitter`.\n",
+    "Here is an example of splitting text using the `RecursiveCharacterTextSplitter`.\n",
     "\n",
     "- Import the `Language` and `RecursiveCharacterTextSplitter` classes from the `langchain_text_splitters` module.\n",
     "- `RecursiveCharacterTextSplitter` is a text splitter that recursively splits text at the character level."
@@ -148,7 +149,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -164,14 +165,14 @@
    "source": [
     "Supported languages are stored in the langchain_text_splitters.Language enum. \n",
     "\n",
-    "API Reference: [Language](https://python.langchain.com/docs/api_reference/text_splitters/Language) | [RecursiveCharacterTextSplitter](https://python.langchain.com/docs/api_reference/text_splitters/RecursiveCharacterTextSplitter)\n",
+    "API Reference: [Language](https://python.langchain.com/api_reference/text_splitters/base/langchain_text_splitters.base.Language.html#language) | [RecursiveCharacterTextSplitter](https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html#recursivecharactertextsplitter)\n",
     "\n",
-    "Below is the full list of supported languages."
+    "See below for the full list of supported languages."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
@@ -201,10 +202,11 @@
        " 'lua',\n",
        " 'perl',\n",
        " 'haskell',\n",
-       " 'elixir']"
+       " 'elixir',\n",
+       " 'powershell']"
       ]
      },
-     "execution_count": 10,
+     "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -218,14 +220,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can use the `get_separators_for_language` method of the `RecursiveCharacterTextSplitter` class to check the separators used for a specific language.\n",
+    "You can use the `get_separators_for_language` method of the `RecursiveCharacterTextSplitter` class to see the separators used for a given language.\n",
     "\n",
-    "- In the example, the `Language.PYTHON` enum value is passed as an argument to check the separators used for the Python language."
+    "- For example, passing `Language.PYTHON` retrieves the separators used for Python:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [
     {
@@ -234,7 +236,7 @@
        "['\\nclass ', '\\ndef ', '\\n\\tdef ', '\\n\\n', '\\n', ' ', '']"
       ]
      },
-     "execution_count": 11,
+     "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -250,25 +252,25 @@
    "source": [
     "### Python\n",
     "\n",
-    "Use `RecursiveCharacterTextSplitter` to split Python code into document units.\n",
-    "- Specify `Language.PYTHON` as the `language` parameter to use the Python language.\n",
-    "- Set `chunk_size` to 50 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents."
+    "Here's how to split Python code into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.PYTHON` for the `language` parameter. It tells the splitter you're working with Python code.\n",
+    "- Then, set `chunk_size` to 50. This limits the size of each resulting chunk to a maximum of 50 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='def hello_world():\\n    print(\"Hello, World!\")'),\n",
-       " Document(page_content='hello_world()')]"
+       "[Document(metadata={}, page_content='def hello_world():\\n    print(\"Hello, World!\")'),\n",
+       " Document(metadata={}, page_content='hello_world()')]"
       ]
      },
-     "execution_count": 13,
+     "execution_count": 8,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -292,7 +294,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [
     {
@@ -318,27 +320,27 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### JS\n",
+    "### JavaScript\n",
     "\n",
-    "Here is an example of using the JS text splitter.\n",
-    "- Specify `Language.JS` as the `language` parameter to use the JavaScript language.\n",
-    "- Set `chunk_size` to 60 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents.\n"
+    "Here's how to split JavaScript code into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.JS` for the `language` parameter. It tells the splitter you're working with JavaScript code.\n",
+    "- Then, set `chunk_size` to 60. This limits the size of each resulting chunk to a maximum of 60 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='function helloWorld() {\\n  console.log(\"Hello, World!\");\\n}'),\n",
-       " Document(page_content='helloWorld();')]"
+       "[Document(metadata={}, page_content='function helloWorld() {\\n  console.log(\"Hello, World!\");\\n}'),\n",
+       " Document(metadata={}, page_content='helloWorld();')]"
       ]
      },
-     "execution_count": 12,
+     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -365,28 +367,28 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### TS  \n",
+    "### TypeScript\n",
     "\n",
-    "Here is an example of using the TS text splitter.\n",
-    "- Specify `Language.TS` as the `language` parameter to use the TypeScript language.\n",
-    "- Set `chunk_size` to 60 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents.\n"
+    "Here's how to split TypeScript code into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.TS` for the `language` parameter. It tells the splitter you're working with TypeScript code.\n",
+    "- Then, set `chunk_size` to 60. This limits the size of each resulting chunk to a maximum of 60 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='function helloWorld(): void {'),\n",
-       " Document(page_content='console.log(\"Hello, World!\");\\n}'),\n",
-       " Document(page_content='helloWorld();')]"
+       "[Document(metadata={}, page_content='function helloWorld(): void {'),\n",
+       " Document(metadata={}, page_content='console.log(\"Hello, World!\");\\n}'),\n",
+       " Document(metadata={}, page_content='helloWorld();')]"
       ]
      },
-     "execution_count": 15,
+     "execution_count": 11,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -415,31 +417,31 @@
    "source": [
     "### Markdown\n",
     "\n",
-    "Here is an example of using the Markdown text splitter.\n",
+    "Here's how to split Markdown text into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
     "\n",
-    "- Specify `Language.MARKDOWN` as the `language` parameter to use the Markdown language.\n",
-    "- Set `chunk_size` to 60 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents."
+    "- First, Specify `Language.MARKDOWN` for the `language` parameter. It tells the splitter you're working with Markdown text.\n",
+    "- Then, set `chunk_size` to 60. This limits the size of each resulting chunk to a maximum of 60 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='# 🦜️🔗 LangChain'),\n",
-       " Document(page_content='⚡ Building applications with LLMs through composability ⚡'),\n",
-       " Document(page_content='## What is LangChain?'),\n",
-       " Document(page_content=\"# Hopefully this code block isn't split\"),\n",
-       " Document(page_content='LangChain is a framework for...'),\n",
-       " Document(page_content='As an open-source project in a rapidly developing field, we'),\n",
-       " Document(page_content='are extremely open to contributions.')]"
+       "[Document(metadata={}, page_content='# 🦜️🔗 LangChain'),\n",
+       " Document(metadata={}, page_content='⚡ Building applications with LLMs through composability ⚡'),\n",
+       " Document(metadata={}, page_content='## What is LangChain?'),\n",
+       " Document(metadata={}, page_content=\"# Hopefully this code block isn't split\"),\n",
+       " Document(metadata={}, page_content='LangChain is a framework for...'),\n",
+       " Document(metadata={}, page_content='As an open-source project in a rapidly developing field, we'),\n",
+       " Document(metadata={}, page_content='are extremely open to contributions.')]"
       ]
      },
-     "execution_count": 14,
+     "execution_count": 12,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -476,43 +478,43 @@
     "\n",
     "LaTeX is a markup language for document creation, widely used for representing mathematical symbols and formulas.\n",
     "\n",
-    "Here is an example of LaTeX text.\n",
-    "- Specify `Language.LATEX` as the `language` parameter to use the LaTeX language.\n",
-    "- Set `chunk_size` to 60 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents.\n"
+    "Here's how to split LaTeX text into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.LATEX` for the `language` parameter. It tells the splitter you're working with LaTeX text.\n",
+    "- Then, set `chunk_size` to 60. This limits the size of each resulting chunk to a maximum of 60 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 13,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='\\\\documentclass{article}\\n\\n\\x08egin{document}\\n\\n\\\\maketitle'),\n",
-       " Document(page_content='\\\\section{Introduction}\\nLarge language models (LLMs) are a'),\n",
-       " Document(page_content='type of machine learning model that can be trained on vast'),\n",
-       " Document(page_content='amounts of text data to generate human-like language. In'),\n",
-       " Document(page_content='recent years, LLMs have made significant advances in a'),\n",
-       " Document(page_content='variety of natural language processing tasks, including'),\n",
-       " Document(page_content='language translation, text generation, and sentiment'),\n",
-       " Document(page_content='analysis.'),\n",
-       " Document(page_content='\\\\subsection{History of LLMs}\\nThe earliest LLMs were'),\n",
-       " Document(page_content='developed in the 1980s and 1990s, but they were limited by'),\n",
-       " Document(page_content='the amount of data that could be processed and the'),\n",
-       " Document(page_content='computational power available at the time. In the past'),\n",
-       " Document(page_content='decade, however, advances in hardware and software have'),\n",
-       " Document(page_content='made it possible to train LLMs on massive datasets, leading'),\n",
-       " Document(page_content='to significant improvements in performance.'),\n",
-       " Document(page_content='\\\\subsection{Applications of LLMs}\\nLLMs have many'),\n",
-       " Document(page_content='applications in industry, including chatbots, content'),\n",
-       " Document(page_content='creation, and virtual assistants. They can also be used in'),\n",
-       " Document(page_content='academia for research in linguistics, psychology, and'),\n",
-       " Document(page_content='computational linguistics.\\n\\n\\\\end{document}')]"
+       "[Document(metadata={}, page_content='\\\\documentclass{article}\\n\\n\\x08egin{document}\\n\\n\\\\maketitle'),\n",
+       " Document(metadata={}, page_content='\\\\section{Introduction}\\nLarge language models (LLMs) are a'),\n",
+       " Document(metadata={}, page_content='type of machine learning model that can be trained on vast'),\n",
+       " Document(metadata={}, page_content='amounts of text data to generate human-like language. In'),\n",
+       " Document(metadata={}, page_content='recent years, LLMs have made significant advances in a'),\n",
+       " Document(metadata={}, page_content='variety of natural language processing tasks, including'),\n",
+       " Document(metadata={}, page_content='language translation, text generation, and sentiment'),\n",
+       " Document(metadata={}, page_content='analysis.'),\n",
+       " Document(metadata={}, page_content='\\\\subsection{History of LLMs}\\nThe earliest LLMs were'),\n",
+       " Document(metadata={}, page_content='developed in the 1980s and 1990s, but they were limited by'),\n",
+       " Document(metadata={}, page_content='the amount of data that could be processed and the'),\n",
+       " Document(metadata={}, page_content='computational power available at the time. In the past'),\n",
+       " Document(metadata={}, page_content='decade, however, advances in hardware and software have'),\n",
+       " Document(metadata={}, page_content='made it possible to train LLMs on massive datasets, leading'),\n",
+       " Document(metadata={}, page_content='to significant improvements in performance.'),\n",
+       " Document(metadata={}, page_content='\\\\subsection{Applications of LLMs}\\nLLMs have many'),\n",
+       " Document(metadata={}, page_content='applications in industry, including chatbots, content'),\n",
+       " Document(metadata={}, page_content='creation, and virtual assistants. They can also be used in'),\n",
+       " Document(metadata={}, page_content='academia for research in linguistics, psychology, and'),\n",
+       " Document(metadata={}, page_content='computational linguistics.\\n\\n\\\\end{document}')]"
       ]
      },
-     "execution_count": 16,
+     "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -553,36 +555,36 @@
    "source": [
     "### HTML\n",
     "\n",
-    "Here is an example of using the HTML text splitter.\n",
-    "- Specify `Language.HTML` as the `language` parameter to use the HTML language.\n",
-    "- Set `chunk_size` to 60 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents.\n"
+    "Here's how to split HTML text into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.HTML` for the `language` parameter. It tells the splitter you're working with HTML.\n",
+    "- Then, set `chunk_size` to 60. This limits the size of each resulting chunk to a maximum of 60 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 51,
+   "execution_count": 14,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='<!DOCTYPE html>\\n<html>'),\n",
-       " Document(page_content='<head>\\n        <title>Codestin Search App</title>'),\n",
-       " Document(page_content='<style>\\n            body {\\n                font-family: Aria'),\n",
-       " Document(page_content='l, sans-serif;\\n            }\\n            h1 {'),\n",
-       " Document(page_content='color: darkblue;\\n            }\\n        </style>\\n    </head'),\n",
-       " Document(page_content='>'),\n",
-       " Document(page_content='<body>'),\n",
-       " Document(page_content='<div>\\n            <h1>🦜️🔗 LangChain</h1>'),\n",
-       " Document(page_content='<p>⚡ Building applications with LLMs through composability ⚡'),\n",
-       " Document(page_content='</p>\\n        </div>'),\n",
-       " Document(page_content='<div>\\n            As an open-source project in a rapidly dev'),\n",
-       " Document(page_content='eloping field, we are extremely open to contributions.'),\n",
-       " Document(page_content='</div>\\n    </body>\\n</html>')]"
+       "[Document(metadata={}, page_content='<!DOCTYPE html>\\n<html>'),\n",
+       " Document(metadata={}, page_content='<head>\\n        <title>Codestin Search App</title>'),\n",
+       " Document(metadata={}, page_content='<style>\\n            body {\\n                font-family: Aria'),\n",
+       " Document(metadata={}, page_content='l, sans-serif;\\n            }\\n            h1 {'),\n",
+       " Document(metadata={}, page_content='color: darkblue;\\n            }\\n        </style>\\n    </head'),\n",
+       " Document(metadata={}, page_content='>'),\n",
+       " Document(metadata={}, page_content='<body>'),\n",
+       " Document(metadata={}, page_content='<div>\\n            <h1>🦜️🔗 LangChain</h1>'),\n",
+       " Document(metadata={}, page_content='<p>⚡ Building applications with LLMs through composability ⚡'),\n",
+       " Document(metadata={}, page_content='</p>\\n        </div>'),\n",
+       " Document(metadata={}, page_content='<div>\\n            As an open-source project in a rapidly dev'),\n",
+       " Document(metadata={}, page_content='eloping field, we are extremely open to contributions.'),\n",
+       " Document(metadata={}, page_content='</div>\\n    </body>\\n</html>')]"
       ]
      },
-     "execution_count": 51,
+     "execution_count": 14,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -628,31 +630,27 @@
    "source": [
     "### Solidity\n",
     "\n",
-    "Here is an example of using the Solidity text splitter:\n",
-    "\n",
-    "- The Solidity code is stored in the `SOL_CODE` variable as a string.\n",
-    "- The `RecursiveCharacterTextSplitter` is used to create `sol_splitter`, which splits the Solidity code into chunks.\n",
-    "  - The `language` parameter is set to `Language.SOL` to specify the Solidity language.\n",
-    "  - The `chunk_size` is set to 128 to specify the maximum size of each chunk.\n",
-    "  - The `chunk_overlap` is set to 0 to prevent overlap between chunks.\n",
-    "  \n",
-    "- The `sol_splitter.create_documents()` method is used to split `SOL_CODE` into chunks and store the split chunks in the `sol_docs` variable.\n",
-    "- The `sol_docs` are output to verify the split Solidity code chunks.\n"
+    "Here's how to split Solidity code (sotred as a string in the `SOL_CODE` variable) into smaller chunks by creating a `RecursiveCharacterTextSplitter` instance called `sol_splitter` to handle the splitting.\n",
+    "- First, specify `Language.SOL` for the `language` parameter. It tells the splitter you're working with Solidity code.\n",
+    "- Then, set `chunk_size` to 128. This limits the size of each resulting chunk to a maximum of 128 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping.\n",
+    "- The `sol_splitter.create_documents()` method splits the Solidity code(`SOL_CODE`) into chunks and stores them in the `sol_docs` variable.\n",
+    "- Print or display the output(`sol_docs`) to verify the split.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 52,
+   "execution_count": 15,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='pragma solidity ^0.8.20;'),\n",
-       " Document(page_content='contract HelloWorld {  \\n   function add(uint a, uint b) pure public returns(uint) {\\n       return a + b;\\n   }\\n}')]"
+       "[Document(metadata={}, page_content='pragma solidity ^0.8.20;'),\n",
+       " Document(metadata={}, page_content='contract HelloWorld {  \\n   function add(uint a, uint b) pure public returns(uint) {\\n       return a + b;\\n   }\\n}')]"
       ]
      },
-     "execution_count": 52,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -681,29 +679,29 @@
    "source": [
     "### C#\n",
     "\n",
-    "Here is an example of using the C# text splitter.\n",
-    "- Specify `Language.CSHARP` as the `language` parameter to use the C# language.\n",
-    "- Set `chunk_size` to 128 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents."
+    "Here's how to split C# code into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.CSHARP` for the `language` parameter. It tells the splitter you're working with C# code.\n",
+    "- Then, set `chunk_size` to 128. This limits the size of each resulting chunk to a maximum of 128 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 54,
+   "execution_count": 16,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='using System;'),\n",
-       " Document(page_content='class Program\\n{\\n    static void Main()\\n    {\\n        Console.WriteLine(\"Enter a number (1-5):\");'),\n",
-       " Document(page_content='int input = Convert.ToInt32(Console.ReadLine());\\n        for (int i = 1; i <= input; i++)\\n        {'),\n",
-       " Document(page_content='if (i % 2 == 0)\\n            {\\n                Console.WriteLine($\"{i} is even.\");\\n            }\\n            else'),\n",
-       " Document(page_content='{\\n                Console.WriteLine($\"{i} is odd.\");\\n            }\\n        }\\n        Console.WriteLine(\"Goodbye!\");'),\n",
-       " Document(page_content='}\\n}')]"
+       "[Document(metadata={}, page_content='using System;'),\n",
+       " Document(metadata={}, page_content='class Program\\n{\\n    static void Main()\\n    {\\n        Console.WriteLine(\"Enter a number (1-5):\");'),\n",
+       " Document(metadata={}, page_content='int input = Convert.ToInt32(Console.ReadLine());\\n        for (int i = 1; i <= input; i++)\\n        {'),\n",
+       " Document(metadata={}, page_content='if (i % 2 == 0)\\n            {\\n                Console.WriteLine($\"{i} is even.\");\\n            }\\n            else'),\n",
+       " Document(metadata={}, page_content='{\\n                Console.WriteLine($\"{i} is odd.\");\\n            }\\n        }\\n        Console.WriteLine(\"Goodbye!\");'),\n",
+       " Document(metadata={}, page_content='}\\n}')]"
       ]
      },
-     "execution_count": 54,
+     "execution_count": 16,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -747,30 +745,30 @@
    "source": [
     "### PHP\n",
     "\n",
-    "Here is an example of using the PHP text splitter.\n",
-    "- Specify `Language.PHP` as the `language` parameter to use the PHP language.\n",
-    "- Set `chunk_size` to 50 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents."
+    "Here's how to split PHP code into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.PHP` for the `language` parameter. It tells the splitter you're working with PHP code.\n",
+    "- Then, set `chunk_size` to 50. This limits the size of each resulting chunk to a maximum of 50 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 56,
+   "execution_count": 17,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='<?php\\nnamespace foo;'),\n",
-       " Document(page_content='class Hello {'),\n",
-       " Document(page_content='public function __construct() { }\\n}'),\n",
-       " Document(page_content='function hello() {\\n    echo \"Hello World!\";\\n}'),\n",
-       " Document(page_content='interface Human {\\n    public function breath();\\n}'),\n",
-       " Document(page_content='trait Foo { }\\nenum Color\\n{\\n    case Red;'),\n",
-       " Document(page_content='case Blue;\\n}')]"
+       "[Document(metadata={}, page_content='<?php\\nnamespace foo;'),\n",
+       " Document(metadata={}, page_content='class Hello {'),\n",
+       " Document(metadata={}, page_content='public function __construct() { }\\n}'),\n",
+       " Document(metadata={}, page_content='function hello() {\\n    echo \"Hello World!\";\\n}'),\n",
+       " Document(metadata={}, page_content='interface Human {\\n    public function breath();\\n}'),\n",
+       " Document(metadata={}, page_content='trait Foo { }\\nenum Color\\n{\\n    case Red;'),\n",
+       " Document(metadata={}, page_content='case Blue;\\n}')]"
       ]
      },
-     "execution_count": 56,
+     "execution_count": 17,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -808,28 +806,28 @@
    "source": [
     "### Kotlin\n",
     "\n",
-    "Here is an example of using the kotlin text splitter.\n",
-    "- Specify `Language.KOTLIN` as the `language` parameter to use the PowerShell language.\n",
-    "- Set `chunk_size` to 100 to limit the maximum size of each document.\n",
-    "- Set `chunk_overlap` to 0 to disallow overlap between documents."
+    "Here's how to split Kotline code into smaller chunks using the `RecursiveCharacterTextSplitter`.\n",
+    "- First, specify `Language.KOTLIN` for the `language` parameter. It tells the splitter you're working with Kotline code.\n",
+    "- Then, set `chunk_size` to 100. This limits the size of each resulting chunk to a maximum of 100 characters.\n",
+    "- Finally, set `chunk_overlap` to 0. It prevents any of the chunks from overlapping."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 65,
+   "execution_count": 18,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[Document(page_content='fun main() {\\n    val directoryPath = System.getProperty(\"user.dir\")'),\n",
-       " Document(page_content='val files = File(directoryPath).listFiles()?.filter { !it.isDirectory }?.sortedBy {'),\n",
-       " Document(page_content='it.lastModified() } ?: emptyArray()'),\n",
-       " Document(page_content='files.forEach { file ->'),\n",
-       " Document(page_content='println(\"Name: ${file.name} | Last Write Time: ${file.lastModified()}\")\\n    }\\n}')]"
+       "[Document(metadata={}, page_content='fun main() {\\n    val directoryPath = System.getProperty(\"user.dir\")'),\n",
+       " Document(metadata={}, page_content='val files = File(directoryPath).listFiles()?.filter { !it.isDirectory }?.sortedBy {'),\n",
+       " Document(metadata={}, page_content='it.lastModified() } ?: emptyArray()'),\n",
+       " Document(metadata={}, page_content='files.forEach { file ->'),\n",
+       " Document(metadata={}, page_content='println(\"Name: ${file.name} | Last Write Time: ${file.lastModified()}\")\\n    }\\n}')]"
       ]
      },
-     "execution_count": 65,
+     "execution_count": 18,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -871,7 +869,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.11.11"
   }
  },
  "nbformat": 4,