LangChain-OpenTutorial · pupba · Dec 29, 2024 · Dec 28, 2024 · Dec 29, 2024 · Dec 29, 2024
diff --git a/08-EMBEDDING/05-OllamaEmbeddings.ipynb b/08-EMBEDDING/05-OllamaEmbeddings.ipynb
@@ -0,0 +1,309 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Ollama Embeddings With Langchain\n",
+    "\n",
+    "- Author: [Gwangwon Jung](https://github.com/pupba)\n",
+    "- Design: []()\n",
+    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
+    "\n",
+    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "This tutorial covers how to perform `Text Embedding` using `Ollama` and `Langchain`.\n",
+    "\n",
+    "`Ollama` is an open-source project that allows you to easily serve models locally.\n",
+    "\n",
+    "In this tutorial, we will create a simple example to measure the similarity between `Documents` and an input `Query` using `Ollama` and `Langchain`.\n",
+    "\n",
+    "![example](./assets/example-flow-ollama-embedding-cal-similarity.png)\n",
+    "\n",
+    "### Table of Contents\n",
+    "\n",
+    "- [Overview](#overview)\n",
+    "- [Environement Setup](#environment-setup)\n",
+    "- [Ollama Install and Model Serving](#ollama-install-and-model-serving)\n",
+    "- [Identify Supported Embedding Models and Serving Model](#identify-supported-embedding-models-and-serving-model)\n",
+    "- [Model Load and Embedding](#model-load-and-embedding)\n",
+    "- [The similarity calculation results](#the-similarity-calculation-results)\n",
+    "\n",
+    "### References\n",
+    "\n",
+    "- [Ollama](https://ollama.com/)\n",
+    "- [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity)\n",
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Environment Setup\n",
+    "\n",
+    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
+    "\n",
+    "**[Note]**\n",
+    "- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
+    "- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 66,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%capture --no-stderr\n",
+    "%pip install langchain-opentutorial"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install required packages\n",
+    "from langchain_opentutorial import package\n",
+    "\n",
+    "package.install(\n",
+    "    [\n",
+    "        \"langchain_community\",\n",
+    "        \"langchain-ollama\",\n",
+    "        \"scikit-learn\",\n",
+    "    ],\n",
+    "    verbose=False,\n",
+    "    upgrade=False,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Ollama Install and Model Serving\n",
+    "\n",
+    "Ollama is an open-source project that makes it easy to run large language models(LLM) in a local environment. This tool allows users to download and run various LLMs with simple commands, enabling developers to experiment with and use AI models directly on their computers. Ollama is a tool with a user-friendly interface and fast performance, making AI development and experimentation more accessible and efficient.\n",
+    "\n",
+    "- [Official Website/Installation](https://ollama.com/)\n",
+    "\n",
+    "### Ollama User Guide\n",
+    "**1. Installation Ollama** <br>\n",
+    "    ![guide1](./assets/guide1.png)\n",
+    "\n",
+    "**2. Verify Ollama Installation** <br>\n",
+    "    ![guide2](./assets/guide2.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Identify Supported Embedding Models and Serving Model\n",
+    "\n",
+    "👇 You can find the model in the hyperlink below.\n",
+    "\n",
+    "- [Ollama Models](https://ollama.com/search)\n",
+    "\n",
+    "### Ollama Model Pull Guide\n",
+    "\n",
+    "**1. Search Models** <br>\n",
+    "\n",
+    "![guide3](./assets/guide3.png)\n",
+    "\n",
+    "**2. Pull a Model** <br>\n",
+    "\n",
+    "![guide4](./assets/guide4.png)\n",
+    "\n",
+    "**3. Verify the Model** <br>\n",
+    "\n",
+    "![guide5](./assets/guide5.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Model Load and Embedding\n",
+    "\n",
+    "Now that you have downloaded the model, let's load the model you downloaded and proceed with the embedding.\n",
+    "\n",
+    "First, define `Query` and `Documents`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Query\n",
+    "q = \"Please tell me more about LangChain.\"\n",
+    "\n",
+    "# Documents for Text Embedding\n",
+    "docs = [\n",
+    "    \"Hi, nice to meet you.\",\n",
+    "    \"LangChain simplifies the process of building applications with large language models.\",\n",
+    "    \"The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.\",\n",
+    "    \"LangChain simplifies the process of building applications with large-scale language models.\",\n",
+    "    \"Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.\",\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, let's load the embedding model downloaded with `Ollama` using `Langchain`.\n",
+    "\n",
+    "\n",
+    "The `OllamaEmbeddings` class in `langchain_community/embeddings.py` will be removed in `langchain-community` version 1.0.0."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load Embedding Model : Legacy\n",
+    "from langchain_community.embeddings import OllamaEmbeddings\n",
+    "\n",
+    "# Serving and load the desired embedding model.\n",
+    "ollama_embeddings = OllamaEmbeddings(\n",
+    "    model=\"nomic-embed-text\",  # model=<model-name>\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So, in this tutorial,we used the `OllamaEmbeddings` class from `langchain-ollama`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load Embedding Model : Latest\n",
+    "from langchain_ollama import OllamaEmbeddings\n",
+    "\n",
+    "# Serving and load the desired embedding model.\n",
+    "ollama_embeddings = OllamaEmbeddings(\n",
+    "    model=\"nomic-embed-text\",  # model=<model-name>\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's use the loaded model to embed the `Query` and `Documents`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Embedding Dimension Output: 768\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Embedding Query\n",
+    "embedded_query = ollama_embeddings.embed_query(q)\n",
+    "\n",
+    "# Embedding Documents\n",
+    "embedded_docs = ollama_embeddings.embed_documents(docs)\n",
+    "\n",
+    "print(f\"Embedding Dimension Output: {len(embedded_query)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The similarity calculation results\n",
+    "\n",
+    "Let's use the vector values of the query and documents obtained earlier to calculate the similarity.\n",
+    "\n",
+    "In this tutorial, we will use [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to calculate the similarity between the `Query` and the `Documents`.\n",
+    "\n",
+    "Using the `Sklearn` library in Python, you can easily calculate **cosine similarity**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 72,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Query] Tell me about LangChain.\n",
+      "====================================\n",
+      "[0] similarity: 0.775 | The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.\n",
+      "\n",
+      "[1] similarity: 0.748 | LangChain simplifies the process of building applications with large language models.\n",
+      "\n",
+      "[2] similarity: 0.745 | LangChain simplifies the process of building applications with large-scale language models.\n",
+      "\n",
+      "[3] similarity: 0.399 | Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.\n",
+      "\n",
+      "[4] similarity: 0.398 | Hi, nice to meet you.\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.metrics.pairwise import cosine_similarity\n",
+    "\n",
+    "# Calculate Cosine Similarity\n",
+    "similarity = cosine_similarity([embedded_query], embedded_docs)\n",
+    "\n",
+    "# Sorting by Similarity in descending order\n",
+    "sorted_idx = similarity.argsort()[0][::-1]\n",
+    "\n",
+    "# Output Result\n",
+    "print(\"[Query] Tell me about LangChain.\\n====================================\")\n",
+    "for i, idx in enumerate(sorted_idx):\n",
+    "    print(f\"[{i}] similarity: {similarity[0][idx]:.3f} | {docs[idx]}\")\n",
+    "    print()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "test",
+   "language": "python",
+   "name": "langchain"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/08-EMBEDDING/assets/example-flow-ollama-embedding-cal-similarity.png b/08-EMBEDDING/assets/example-flow-ollama-embedding-cal-similarity.png
diff --git a/08-EMBEDDING/assets/guide1.png b/08-EMBEDDING/assets/guide1.png
diff --git a/08-EMBEDDING/assets/guide2.png b/08-EMBEDDING/assets/guide2.png
diff --git a/08-EMBEDDING/assets/guide3.png b/08-EMBEDDING/assets/guide3.png
diff --git a/08-EMBEDDING/assets/guide4.png b/08-EMBEDDING/assets/guide4.png
diff --git a/08-EMBEDDING/assets/guide5.png b/08-EMBEDDING/assets/guide5.png