{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Combining Qdrant and LlamaIndex to keep Q&A systems up-to-date\n",
    "---\n",
    "#  Introduction\n",
    "\n",
    "Have you ever been frustrated with an answer engine that is stuck in the past? As our world rapidly evolves, the accuracy of information changes accordingly. Traditional models can become outdated, providing answers that were once accurate but are now obsolete. The cost of outdated knowledge can be high - misinforming users, impacting decision-making, and ultimately undermining trust in your system.\n",
    "\n",
    "Qdrant and LlamaIndex work together seamlessly, continually adapting your engine to the relentless pace of information change. By mastering these tools, you can transform your applications from static knowledge repositories into dynamic, adaptable knowledge machines. Whether you're a seasoned data scientist or an AI enthusiast, join us on this learning journey - the future of answer engines is here, and it's time to embrace it.\n",
    "\n",
    "## Learning Outcomes\n",
    "\n",
    "In this tutorial, you will learn the following:\n",
    "\n",
    "- 1️⃣ How to build a question-answering system using LlamaIndex and Qdrant.\n",
    "    - We will load a news dataset, store it with Qdrant client, and load the data into LlamaIndex.\n",
    "- 2️⃣ How to keep the QA engine updated and improve the ranking system.\n",
    "    - We will define two postprocessors: Recency and Cohere Rerank; and use these to create various query engines.\n",
    "- 3️⃣ How to use Node Sources in LlamaIndex to investigate questions and sources on which the answers are based.\n",
    "    - We will query these engines with various questions and compare their responses.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "Main Tools\n",
    "1. `llama_index`: A powerful tool for building large-scale information retrieval systems. [Learn More](https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html) \n",
    "2. `qdrant_client`: A high-performance vector database designed for storing and searching large-scale high-dimensional vectors. In this tutorial, we use Qdrant as our vector storage system.\n",
    "3. `cohere`: A key reranking service to be used in postprocessing. It takes in a query and a list of texts and returns an ordered array with each text assigned a _new_ relevance score.\n",
    "4. `OpenAI`: Important for answer generation, as it takes the top few candidates to produce a final answer.\n",
    "5. `datasets`: Library necessary to import our dataset.\n",
    "6. `pandas`: Relevant library for data manipulation and analysis.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Packages\n",
    "\n",
    "Before you start, install the required packages with pip:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install llama-index cohere datasets pandas\n",
    "# !pip install -U qdrant-client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Optional: install Rich to make error messages and stack traces easier to read.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install 'rich[jupyter]'\n",
    "%load_ext rich"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import your packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "import os\n",
    "import random\n",
    "from pathlib import Path\n",
    "from typing import Any\n",
    "\n",
    "import pandas as pd\n",
    "from datasets import load_dataset\n",
    "from IPython.display import Markdown, display_markdown\n",
    "from llama_index import (GPTVectorStoreIndex, ServiceContext,\n",
    "                         SimpleDirectoryReader)\n",
    "from llama_index.indices.postprocessor import FixedRecencyPostprocessor\n",
    "from llama_index.indices.postprocessor.cohere_rerank import CohereRerank\n",
    "from llama_index.vector_stores.qdrant import QdrantVectorStore\n",
    "from qdrant_client import QdrantClient\n",
    "\n",
    "Path.ls = lambda x: list(x.iterdir())\n",
    "random.seed(42)  # This is the answer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieve API Keys:\n",
    "\n",
    "Before you start, you must retrieve two API keys for the following services:\n",
    "\n",
    "1. OpenAI key for LLM. [Link](https://platform.openai.com/account/api-keys) \n",
    "2. Cohere key for Rerank. [Link](https://dashboard.cohere.ai/api-keys) or additionally, read [Cohere Documentation](https://docs.cohere.com/reference/key). \n",
    "\n",
    "This tutorial by default uses the Qdrant Client, which doesn't require an API key. However, if you choose Qdrant Cloud instead, then you need a third key. You can get it [the Qdrant Cloud main control panel](https://cloud.qdrant.io/)   \n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL\n"
     ]
    }
   ],
   "source": [
    "def check_environment_keys():\n",
    "    \"\"\"\n",
    "    Utility Function that you have the NECESSARY Keys\n",
    "    \"\"\"\n",
    "    if os.environ.get(\"OPENAI_API_KEY\") is None:\n",
    "        raise ValueError(\n",
    "            \"OPENAI_API_KEY cannot be None. Set the key using os.environ['OPENAI_API_KEY']='sk-xxx'\"\n",
    "        )\n",
    "    if os.environ.get(\"COHERE_API_KEY\") is None:\n",
    "        raise ValueError(\n",
    "            \"COHERE_API_KEY cannot be None. Set the key using os.environ['COHERE_API_KEY']='xxx'\"\n",
    "        )\n",
    "    if os.environ.get(\"QDRANT_API_KEY\") is None:\n",
    "        print(\"[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL\")\n",
    "\n",
    "\n",
    "check_environment_keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Architecture\n",
    "\n",
    "Our answer engine consists of two main parts:\n",
    "\n",
    "1. Retrieval - Done with Qdrant\n",
    "2. Synthesis - Done with OpenAI API\n",
    "\n",
    "We will use LlamaIndex to make the Query Engine and Qdrant for our Vector Store. Later, we will add components to keep the engine updated and improve ranking after retrieval\n",
    "\n",
    "The arrow point represents the direction of data flow. The \"Query Engine\" box encapsulates the postprocessing step to indicate that it's a part of the query engine's function. This diagram is meant to provide a high-level understanding of the process and does not include all the details involved.\n",
    "\n",
    "![](images/SetupFocus.png)\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load Sample Dataset\n",
    "\n",
    "First we need to load our documents. In this example, we will use the [News Category Dataset v3](https://huggingface.co/datasets/heegyu/news-category-dataset). This dataset contains news articles with various fields like `headline`, `category`, `short_description`, `link`, `authors`, and date. Once we load the data, we will reformat it to suit our needs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Found cached dataset json (/Users/nirantk/.cache/huggingface/datasets/heegyu___json/heegyu--news-category-dataset-a0dcb53f17af71bf/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)\n"
     ]
    }
   ],
   "source": [
    "dataset = load_dataset(\"heegyu/news-category-dataset\", split=\"train\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>link</th>\n",
       "      <th>headline</th>\n",
       "      <th>category</th>\n",
       "      <th>short_description</th>\n",
       "      <th>authors</th>\n",
       "      <th>date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>https://www.huffpost.com/entry/covid-boosters-...</td>\n",
       "      <td>Over 4 Million Americans Roll Up Sleeves For O...</td>\n",
       "      <td>U.S. NEWS</td>\n",
       "      <td>Health experts said it is too early to predict...</td>\n",
       "      <td>Carla K. Johnson, AP</td>\n",
       "      <td>2022-09-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>https://www.huffpost.com/entry/american-airlin...</td>\n",
       "      <td>American Airlines Flyer Charged, Banned For Li...</td>\n",
       "      <td>U.S. NEWS</td>\n",
       "      <td>He was subdued by passengers and crew when he ...</td>\n",
       "      <td>Mary Papenfuss</td>\n",
       "      <td>2022-09-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>https://www.huffpost.com/entry/funniest-tweets...</td>\n",
       "      <td>23 Of The Funniest Tweets About Cats And Dogs ...</td>\n",
       "      <td>COMEDY</td>\n",
       "      <td>\"Until you have a dog you don't understand wha...</td>\n",
       "      <td>Elyse Wanshel</td>\n",
       "      <td>2022-09-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>https://www.huffpost.com/entry/funniest-parent...</td>\n",
       "      <td>The Funniest Tweets From Parents This Week (Se...</td>\n",
       "      <td>PARENTING</td>\n",
       "      <td>\"Accidentally put grown-up toothpaste on my to...</td>\n",
       "      <td>Caroline Bologna</td>\n",
       "      <td>2022-09-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>https://www.huffpost.com/entry/amy-cooper-lose...</td>\n",
       "      <td>Woman Who Called Cops On Black Bird-Watcher Lo...</td>\n",
       "      <td>U.S. NEWS</td>\n",
       "      <td>Amy Cooper accused investment firm Franklin Te...</td>\n",
       "      <td>Nina Golgowski</td>\n",
       "      <td>2022-09-22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                link   \n",
       "0  https://www.huffpost.com/entry/covid-boosters-...  \\\n",
       "1  https://www.huffpost.com/entry/american-airlin...   \n",
       "2  https://www.huffpost.com/entry/funniest-tweets...   \n",
       "3  https://www.huffpost.com/entry/funniest-parent...   \n",
       "4  https://www.huffpost.com/entry/amy-cooper-lose...   \n",
       "\n",
       "                                            headline   category   \n",
       "0  Over 4 Million Americans Roll Up Sleeves For O...  U.S. NEWS  \\\n",
       "1  American Airlines Flyer Charged, Banned For Li...  U.S. NEWS   \n",
       "2  23 Of The Funniest Tweets About Cats And Dogs ...     COMEDY   \n",
       "3  The Funniest Tweets From Parents This Week (Se...  PARENTING   \n",
       "4  Woman Who Called Cops On Black Bird-Watcher Lo...  U.S. NEWS   \n",
       "\n",
       "                                   short_description               authors   \n",
       "0  Health experts said it is too early to predict...  Carla K. Johnson, AP  \\\n",
       "1  He was subdued by passengers and crew when he ...        Mary Papenfuss   \n",
       "2  \"Until you have a dog you don't understand wha...         Elyse Wanshel   \n",
       "3  \"Accidentally put grown-up toothpaste on my to...      Caroline Bologna   \n",
       "4  Amy Cooper accused investment firm Franklin Te...        Nina Golgowski   \n",
       "\n",
       "        date  \n",
       "0 2022-09-23  \n",
       "1 2022-09-23  \n",
       "2 2022-09-23  \n",
       "3 2022-09-23  \n",
       "4 2022-09-22  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def get_single_text(k):\n",
    "    return f\"Under the category:\\n{k['category']}:\\n{k['headline']}\\n{k['short_description']}\"\n",
    "\n",
    "\n",
    "df = pd.DataFrame(dataset)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assuming `df` is your original dataframe\n",
    "df[\"year\"] = df[\"date\"].dt.year\n",
    "\n",
    "category_columns_to_keep = [\"POLITICS\", \"THE WORLDPOST\", \"WORLD NEWS\", \"WORLDPOST\", \"U.S. NEWS\"]\n",
    "\n",
    "# Filter by category\n",
    "df_filtered = df[df[\"category\"].isin(category_columns_to_keep)]\n",
    "\n",
    "# Sample data for each year\n",
    "\n",
    "\n",
    "def sample_func(x):\n",
    "    return x.sample(min(len(x), 200), random_state=42)\n",
    "\n",
    "\n",
    "df_sampled = df_filtered.groupby(\"year\").apply(sample_func).reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "year\n",
       "2014    200\n",
       "2015    200\n",
       "2016    200\n",
       "2017    200\n",
       "2018    200\n",
       "2019    200\n",
       "2020    200\n",
       "2021    200\n",
       "2022    200\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_sampled[\"year\"].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "del df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df_sampled"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       Under the category:\\nWORLDPOST:\\nAfghans Don't...\n",
       "1       Under the category:\\nPOLITICS:\\nACLU Seeks To ...\n",
       "2       Under the category:\\nPOLITICS:\\nWork and Worth...\n",
       "3       Under the category:\\nPOLITICS:\\nJody Hice, Ant...\n",
       "4       Under the category:\\nPOLITICS:\\nCapito Wins We...\n",
       "                              ...                        \n",
       "1795    Under the category:\\nPOLITICS:\\nA Hard-Right R...\n",
       "1796    Under the category:\\nPOLITICS:\\nHerschel Walke...\n",
       "1797    Under the category:\\nU.S. NEWS:\\nStocks Fall, ...\n",
       "1798    Under the category:\\nWORLD NEWS:\\nPeru Court O...\n",
       "1799    Under the category:\\nPOLITICS:\\nMichigan Secre...\n",
       "Name: text, Length: 1800, dtype: object"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"text\"] = df.apply(get_single_text, axis=1)\n",
    "df[\"text\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"Under the category:\\nWORLDPOST:\\nFreed Taliban Commander Tells Relative He'll Fight Americans Again\\n\""
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"text\"][9]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.drop(columns=[\"year\"], inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, write these documents to text files in a directory. Each document will be written to a text file named after its date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 45.6 ms, sys: 116 ms, total: 161 ms\n",
      "Wall time: 162 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "write_dir = Path(\"../data/sample\").resolve()\n",
    "if write_dir.exists():\n",
    "    [f.unlink() for f in write_dir.ls()]\n",
    "write_dir.mkdir(exist_ok=True, parents=True)\n",
    "for index, row in df.iterrows():\n",
    "    date = str(row[\"date\"]).replace(\"-\", \"_\")  # replace '-' in date with '_' to avoid issues with file names\n",
    "    file_path = write_dir / f\"date_{date}_row_{index}.txt\"\n",
    "    with file_path.open(\"w\") as f:\n",
    "        f.write(row[\"text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# del dataset, df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Store Dataset with Qdrant Client\n",
    "We'll be using Qdrant as our vector storage system. Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors.\n",
    "\n",
    "### Local Qdrant Server/Docker + Cloud Instructions\n",
    "- If you're running a local Qdrant instance with Docker, use `uri`:\n",
    "  - `uri=\"http://<host>:<port>\"`\n",
    "  \n",
    "Here I'll be using the cloud, so I am using the url set to my cloud instance\n",
    "\n",
    "- Set the API KEY for Qdrant Cloud:\n",
    "  - `api_key=\"<qdrant-api-key>\"`\n",
    "  - `url`\n",
    "\n",
    "### Memory\n",
    "\n",
    "- You can use `:memory:` mode for fast and lightweight experiments. It does not require Qdrant to be deployed anywhere."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = QdrantClient(\":memory:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data into LlamaIndex\n",
    "LlamaIndex has a simple way to load documents from a directory. We can define a function to get the metadata from a file name, and pass this function to the `SimpleDirectoryReader` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_file_metadata(file_name: str):\n",
    "    \"\"\"Get file metadata.\"\"\"\n",
    "    date_str = Path(file_name).stem.split(\"_\")[1:4]\n",
    "    return {\"date\": \"-\".join(date_str)}\n",
    "\n",
    "\n",
    "documents = SimpleDirectoryReader(input_files=write_dir.ls(), file_metadata=get_file_metadata).load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1800"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at the date ranges in our dataset: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "dates, years = [], []\n",
    "\n",
    "for document in documents:\n",
    "    dt = datetime.datetime.fromisoformat(document.extra_info[\"date\"])\n",
    "    #     print(d)\n",
    "    try:\n",
    "        dates.append(dt)\n",
    "        years.append(dt.year)\n",
    "    except:\n",
    "        print(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This `date` key is *necessary* for the Recency Postprocessor that we are going to use later.\n",
    "\n",
    "We have to parse these documents into nodes and create our QdrantVectorStore:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# define service context (wrapper container around current classes)\n",
    "service_context = ServiceContext.from_defaults(chunk_size_limit=512)\n",
    "vector_store = QdrantVectorStore(client=client, collection_name=\"NewsCategoryv3PoliticsSample\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will create our `GPTVectorStoreIndex` from the documents. This operation might take some time as it's creating the index from the documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "index = GPTVectorStoreIndex.from_documents(documents, vector_store=vector_store, service_context=service_context)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run a Test Query \n",
    "\n",
    "We have made an index. But as we saw in the diagram, we also need some added functionality to do 3 things:\n",
    "\n",
    "1. Retrieval\n",
    "    - Convert the text query into embedding\n",
    "    - Find the most similar documents \n",
    "2. Synthesis\n",
    "    - The LLM (here, OpenAI) texts the question, similar documents and a prompt to give you an answer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(similarity_top_k=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "The US President is Joe Biden.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"Who is the US President?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "The current US President is Joe Biden.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"Who is the current US President?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "# Adding Postprocessors\n",
    "\n",
    "LlamaIndex excels at composing Retrieval and Ranking steps. \n",
    "\n",
    "The intention behind this is to improve answer quality. Let's see if we can use Postprocessors to improve answer quality by using two approaches: \n",
    "1. Selecting the most recent nodes (Recency).\n",
    "2. Reranking using a different model (Cohere Rerank).\n",
    "\n",
    "![](images/RankFocus.png)\n",
    "\n",
    "Here is what the diagram represents:\n",
    "1. The user issues a query to the query engine.\n",
    "2. The query engine, which has been configured with certain postprocessors, performs a search on the vector store based on the query.\n",
    "3. The query engine then postprocesses the results.\n",
    "4. The postprocessed results are then returned to the user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define a Recency Postprocessor\n",
    "\n",
    "LlamaIndex allows us to add postprocessors to our query engine. These postprocessors can modify the results of our queries after they are returned from the index. Here, we'll add a recency postprocessor to our query engine. This postprocessor will prioritize recent documents in the results.\n",
    "\n",
    "We'll define a single type of recency postprocessor: `FixedRecencyPostprocessor`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "recency_postprocessor = FixedRecencyPostprocessor(service_context=service_context, top_k=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Rerank with Cohere \n",
    "\n",
    "Cohere Rerank works on the top K results which the Retrieval step from Qdrant returns. While Qdrant works on your entire corpus (here thousands, but Qdrant is designed to work with millions) -- Cohere works with the result from Qdrant. This can improve the search results since it's working on smaller number of entries. \n",
    "\n",
    "![](images/RerankFocus.png)\n",
    "\n",
    "\n",
    "Rerank endpoint takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score. We'll define a `CohereRerank` postprocessor and add it to our query engine. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining Query Engines\n",
    "We'll define four query engines for this tutorial: \n",
    "1. Just the Vector Store i.e. Qdrant here\n",
    "1. A recency query engine\n",
    "1. A reranking query engine\n",
    "1. And a combined query engine.\n",
    "\n",
    "The recency query engine uses the `FixedRecencyPostprocessor`, the reranking query engine uses the `CohereRerank` postprocessor, and the combined query engine uses both."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "top_k = 10  # set one, reuse from now on, ensures consistency"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "index_query_engine = index.as_query_engine(\n",
    "    similarity_top_k=top_k,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "recency_query_engine = index.as_query_engine(\n",
    "    similarity_top_k=top_k,\n",
    "    node_postprocessors=[recency_postprocessor],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "cohere_rerank = CohereRerank(api_key=os.environ[\"COHERE_API_KEY\"], top_n=top_k)\n",
    "reranking_query_engine = index.as_query_engine(\n",
    "    similarity_top_k=top_k,\n",
    "    node_postprocessors=[cohere_rerank],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=top_k,\n",
    "    node_postprocessors=[cohere_rerank, recency_postprocessor],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying the Engine\n",
    "Finally, we can query our engine. Let's ask it \"Who is the current US President?\" and see the results from each query engine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "The US President is Joe Biden.\n"
     ]
    }
   ],
   "source": [
    "# question = \"Who is the current US President?\"\n",
    "response = index_query_engine.query(\"Who is the US President?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `response` object has a few interesting attributes which help us quickly debug and understand what happened in each of our steps: \n",
    "1. What source nodes (similar to Document Chunks in Langchain) were used to answer the question\n",
    "2. What `extra_info` does the index have which we can use? This could also be sent as a payload to Qdrant to filter on (via epoch time) -- but Llama Index does not\n",
    "\n",
    "Let's unpack that a bit, and we'll use what we learn from `response` to improve our understanding of the query engines and post processors themselves. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that `10` which is the top-k parameter we set. This confirms that we retrieved the 10 documents most similar to the question (or more correct: 10 nearest neighbours to the question) and a confidence score. \n",
    "\n",
    "Can we show this in a more human-readable way?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Source (Doc id: 24ec05e1-cb35-492e-8741-fdfe2c582e43): date: 2017-01-28 00:00:00\n",
      "\n",
      "Under the category:\n",
      "THE WORLDPOST:\n",
      "World Leaders React To The Reality ...\n",
      "\n",
      "> Source (Doc id: 098c2482-ce52-4e31-aa1c-825a385b56a1): date: 2015-01-18 00:00:00\n",
      "\n",
      "Under the category:\n",
      "POLITICS:\n",
      "The Issue That's Looming Over The Final ...\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(response.get_formatted_sources()[:318])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's check what is stored in the `extra_info` attribute. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'24ec05e1-cb35-492e-8741-fdfe2c582e43': {'date': '2017-01-28 00:00:00'},\n",
       " '098c2482-ce52-4e31-aa1c-825a385b56a1': {'date': '2015-01-18 00:00:00'},\n",
       " 'a3993bb5-64a4-46ce-aa15-0e0672f0994f': {'date': '2014-08-21 00:00:00'},\n",
       " 'e48f4521-1bf3-45a3-b00b-fd6a03855d6f': {'date': '2018-12-26 00:00:00'},\n",
       " '2a13360c-2c18-4917-aef8-1002931d6a3c': {'date': '2016-06-24 00:00:00'},\n",
       " '77bd45bf-5418-4eee-bc47-33d2942e2fb8': {'date': '2014-05-31 00:00:00'},\n",
       " '51ab3ea9-67af-48a0-864a-5fa1559b2a63': {'date': '2017-06-29 00:00:00'},\n",
       " '023a5a27-1f92-4028-aea6-38e681ff2032': {'date': '2014-12-03 00:00:00'},\n",
       " '360fac77-ff67-475e-96d8-1480f2447971': {'date': '2014-12-20 00:00:00'},\n",
       " '95f092f4-0bed-46de-bae0-4107b775d603': {'date': '2022-03-26 00:00:00'}}"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response.extra_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This has a `date` key-value as a string against the `doc id`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's setup some tools to have a question, answer and the responses from the index engine in the same object - this will come handy in a bit for explaining a wrong answer. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mprint(text: str):\n",
    "    display_markdown(Markdown(text))\n",
    "\n",
    "\n",
    "class QAInfo:\n",
    "    \"\"\"This class is used to store the question, correct answer and responses from different query engines.\"\"\"\n",
    "\n",
    "    def __init__(self, question: str, correct_answer: str, query_engines: dict[str, Any]):\n",
    "        self.question = question\n",
    "        self.query_engines = query_engines\n",
    "        self.correct_answer = correct_answer\n",
    "        self.responses = {}\n",
    "\n",
    "    def add_response(self, engine: str, response: str):\n",
    "        # This method is used to add the response of a query engine to the responses dictionary.\n",
    "        self.responses[engine] = response\n",
    "\n",
    "    def compare_responses(self):\n",
    "        \"\"\"This function takes in a QAInfo object and a dictionary of query engines, and runs the question through each query engine.\n",
    "        The responses from each engine are added to the QAInfo object.\"\"\"\n",
    "        mprint(f\"### Question: {self.question}\")\n",
    "\n",
    "        for engine_name, engine in query_engines.items():\n",
    "            response = engine.query(self.question)\n",
    "            self.add_response(engine_name, response)\n",
    "            mprint(f\"**{engine_name.title()}**: {response}\")\n",
    "\n",
    "        mprint(f\"Correct Answer is: {self.correct_answer}\")\n",
    "\n",
    "    def node_print(self, index, preview_count=5):\n",
    "        source_nodes = self.responses[index].source_nodes\n",
    "        for i in range(preview_count):\n",
    "            mprint(f\"- {source_nodes[i].node.text}\")\n",
    "\n",
    "\n",
    "query_engines = {\n",
    "    \"qdrant\": index_query_engine,\n",
    "    \"recency\": recency_query_engine,\n",
    "    \"reranking\": reranking_query_engine,\n",
    "    \"both\": query_engine,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "### Question: Who is the US President?"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Qdrant**: \n",
       "The US President is Joe Biden."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Recency**: \n",
       "The US President is Joe Biden."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Reranking**: \n",
       "The US President is Barack Obama."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Both**: \n",
       "The US President is Joe Biden."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "Correct Answer is: Joe Biden"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "question = \"Who is the US President?\"\n",
    "correct_answer = \"Joe Biden\"  # This would normally be determined programmatically.\n",
    "president_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)\n",
    "president_qa_info.compare_responses()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLD NEWS:\n",
       "Biden On Putin: 'For God's Sake, This Man Cannot Remain In Power'\n",
       "President Joe Biden visited Poland's capital on Saturday to speak with refugees who've been displaced amid Russia's attack on Ukraine."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "president_qa_info.node_print(index=\"recency\", preview_count=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "THE WORLDPOST:\n",
       "World Leaders React To The Reality Of A Trump Presidency\n",
       "Many of the presidential memorandums and executive decisions will fundamentally affect countries around the globe."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "president_qa_info.node_print(index=\"qdrant\", preview_count=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Impact of how a question is asked"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "### Question: Who is US President in 2022?"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Qdrant**: \n",
       "Joe Biden is the US President in 2022."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Recency**: \n",
       "The US President in 2022 is unknown at this time."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Reranking**: \n",
       "Joe Biden is the US President in 2022."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Both**: \n",
       "The US President in 2022 is unknown at this time."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "Correct Answer is: Joe Biden"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "question = \"Who is US President in 2022?\"\n",
    "correct_answer = \"Joe Biden\"  # This would normally be determined programmatically.\n",
    "current_president_qa_info = QAInfo(\n",
    "    question=question, correct_answer=correct_answer, query_engines=query_engines\n",
    ")\n",
    "current_president_qa_info.compare_responses()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Investigating for Ranking Challenges\n",
    "\n",
    "We pull the few top documents which from each query engine. To make them easy to read, we've a utility `node_print` here. \n",
    "\n",
    "\n",
    "💡 We notice that Qdrant (using embeddings) correctly pulls out a few mentions of \"2024\", \"Joe Biden\" and \"President Joe Biden\"\n",
    "\n",
    "💡 Cohere also re-orders the top 10 candidates to give the top 3 which mention \"President Joe Biden\". \n",
    "\n",
    "With Recency, we get an undetermined answer. This is because we're only using the one, most recent result. \n",
    "\n",
    "## 🎓 Try this now:\n",
    "\n",
    "> Change the `top_k` value passed to `llama_index` and see how that changes the answers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Joe Biden Says He 'Can't Picture' U.S. Troops Being In Afghanistan In 2022\n",
       "The president doubled down on his promise to end America's longest-running war at a Thursday press conference, though he said a May 1 deadline seemed unlikely."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "How A Crowded GOP Field Could Bolster A Trump 2024 Campaign\n",
       "As Donald Trump considers another White House run, polls show he's the most popular figure in the Republican Party."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Biden To Give First State Of The Union Address At Fraught Moment\n",
       "President Joe Biden aims to navigate the country out a pandemic, reboot his stalled domestic agenda and confront Russia’s aggression."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "current_president_qa_info.node_print(index=\"qdrant\", preview_count=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "GOP Senators Refuse To Rule Out Supporting Donald Trump Again — Even If He's Indicted\n",
       "With the ex-president reportedly under criminal investigation, many Senate Republicans are taking a wait-and-hope-it-doesn’t-happen stance."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "current_president_qa_info.node_print(index=\"recency\", preview_count=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Biden To Give First State Of The Union Address At Fraught Moment\n",
       "President Joe Biden aims to navigate the country out a pandemic, reboot his stalled domestic agenda and confront Russia’s aggression."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLD NEWS:\n",
       "Biden On Putin: 'For God's Sake, This Man Cannot Remain In Power'\n",
       "President Joe Biden visited Poland's capital on Saturday to speak with refugees who've been displaced amid Russia's attack on Ukraine."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Joe Biden Says He 'Can't Picture' U.S. Troops Being In Afghanistan In 2022\n",
       "The president doubled down on his promise to end America's longest-running war at a Thursday press conference, though he said a May 1 deadline seemed unlikely."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "current_president_qa_info.node_print(index=\"reranking\", preview_count=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Add a specific Year\n",
    "\n",
    "That looks interesting. Let's try this question after specifying the year: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "### Question: Who was the US President in 2010?"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Qdrant**: \n",
       "The US President in 2010 was Barack Obama."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Recency**: \n",
       "In 2010, the US President was Barack Obama."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Reranking**: \n",
       "The US President in 2010 was Barack Obama."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Both**: \n",
       "In 2010, the US President was Barack Obama."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "Correct Answer is: Barack Obama"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "question = \"Who was the US President in 2010?\"\n",
    "correct_answer = \"Barack Obama\"  # This would normally be determined programmatically.\n",
    "president_2010_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)\n",
    "president_2010_qa_info.compare_responses()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try a different variant of this question, specify a year and see what happens? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "### Question: Who was the Finance Minister of India under Manmohan Singh Govt?"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Qdrant**: \n",
       "The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Recency**: \n",
       "The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Reranking**: \n",
       "The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Both**: \n",
       "The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "Correct Answer is: P. Chidambaram"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "question = \"Who was the Finance Minister of India under Manmohan Singh Govt?\"\n",
    "correct_answer = \"P. Chidambaram\"  # This would normally be determined programmatically.\n",
    "prime_minister_jan2014 = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)\n",
    "prime_minister_jan2014.compare_responses()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Observation\n",
    "\n",
    "In this question: All the engines give the correct answer! \n",
    "\n",
    "This is despite the fact that the Recency Postprocessor response does not even talk about the Indian Prime Minister! ❌\n",
    "\n",
    "Qdrant via OpenAI Embeddings and Cohere Rerank do not do that much better\n",
    "\n",
    "The correct answer comes from OpenAI LLM's knowledge of the world!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Robbing Main Street to Prop Up Wall Street:  Why Jerry Brown's Rainy Day Fund Is a Bad Idea\n",
       "There is no need to sequester funds urgently needed by Main Street to pay for Wall Street's malfeasance. Californians can have their cake and eat it too - with a state-owned bank."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Cities Need To Get Smarter -- And India's On It\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "It Takes Just 4 Charts To Show A Big Part Of What's Wrong With Congress\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLD NEWS:\n",
       "Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds\n",
       "Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "The World Bank Must Commit to Food Security\n",
       "Much will be said about bringing roads, electricity and infrastructure to underdeveloped regions. But how committed is the World Bank to the planet as a whole when it is doling out its loans?"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Former Prime Minister: Japan Should Shelve the Islands Dispute With China to Avoid A Spiral into Conflict\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Senate Delays Vote On $1.1 Trillion Spending Bill\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Sweden Election Results Offer Uncertain Future For Austerity\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "THE WORLDPOST:\n",
       "Greece Demands IMF Explain 'Disaster' Remarks In Explosive Leak\n",
       "A letter from Greek prime minister Alexis Tsipras questions whether the country \"can trust\" the lender."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Comedians Send Powerful Message Against Sexual Harassment In India\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "prime_minister_jan2014.node_print(index=\"qdrant\", preview_count=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLD NEWS:\n",
       "Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds\n",
       "Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "prime_minister_jan2014.node_print(index=\"recency\", preview_count=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Comedians Send Powerful Message Against Sexual Harassment In India\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLD NEWS:\n",
       "Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds\n",
       "Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "It Takes Just 4 Charts To Show A Big Part Of What's Wrong With Congress\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Sweden Election Results Offer Uncertain Future For Austerity\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Cities Need To Get Smarter -- And India's On It\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "WORLDPOST:\n",
       "Former Prime Minister: Japan Should Shelve the Islands Dispute With China to Avoid A Spiral into Conflict\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Senate Delays Vote On $1.1 Trillion Spending Bill\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "The World Bank Must Commit to Food Security\n",
       "Much will be said about bringing roads, electricity and infrastructure to underdeveloped regions. But how committed is the World Bank to the planet as a whole when it is doling out its loans?"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "POLITICS:\n",
       "Robbing Main Street to Prop Up Wall Street:  Why Jerry Brown's Rainy Day Fund Is a Bad Idea\n",
       "There is no need to sequester funds urgently needed by Main Street to pay for Wall Street's malfeasance. Californians can have their cake and eat it too - with a state-owned bank."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "- Under the category:\n",
       "THE WORLDPOST:\n",
       "Greece Demands IMF Explain 'Disaster' Remarks In Explosive Leak\n",
       "A letter from Greek prime minister Alexis Tsipras questions whether the country \"can trust\" the lender."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "prime_minister_jan2014.node_print(index=\"reranking\", preview_count=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Recap\n",
    "\n",
    "- 1️⃣ Crafting a Q&A bot with LlamaIndex and Qdrant\n",
    "    - We dumped a news dataset, kicked up a Qdrant client, and stuffed our data into a LlamaIndex\n",
    "- 2️⃣ Keeping our Q&A bot fresh and cranking up the ranking goodness\n",
    "    - We used a recency postprocessor and a Cohere reranking postprocessor, and put them to work building different query engines\n",
    "- 3️⃣ Using Node Sources in Llama Index to dig into the Q&A trails\n",
    "    - We threw a bunch of questions at these engines and saw how they stacked up!\n",
    "\n",
    "We figured out that recency postprocessing has its perks, but it can leave us hanging when we narrow down the info too much. Plugging in a reranking postprocessor like Cohere can help sort the responses better."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}