Tricked-dev · June 23, 2023 10:29
diff --git a/gist.txt b/gist.txt
 Combining Qdrant and LlamaIndex to keep Q&A systems up-to-date
 Introduction
 Have you ever been frustrated with an answer engine that is stuck in the past? As our world rapidly evolves, the accuracy of information changes accordingly. Traditional models can become outdated, providing answers that were once accurate but are now obsolete. The cost of outdated knowledge can be high - misinforming users, impacting decision-making, and ultimately undermining trust in your system.

 Qdrant and LlamaIndex work together seamlessly, continually adapting your engine to the relentless pace of information change. By mastering these tools, you can transform your applications from static knowledge repositories into dynamic, adaptable knowledge machines. Whether you're a seasoned data scientist or an AI enthusiast, join us on this learning journey - the future of answer engines is here, and it's time to embrace it.

 Learning Outcomes
 In this tutorial, you will learn the following:

 1️⃣ How to build a question-answering system using LlamaIndex and Qdrant.
 We will load a news dataset, store it with Qdrant client, and load the data into LlamaIndex.
 2️⃣ How to keep the QA engine updated and improve the ranking system.
 We will define two postprocessors: Recency and Cohere Rerank; and use these to create various query engines.
 3️⃣ How to use Node Sources in LlamaIndex to investigate questions and sources on which the answers are based.
 We will query these engines with various questions and compare their responses.
 Prerequisites
 Main Tools

 llama_index: A powerful tool for building large-scale information retrieval systems. Learn More
 qdrant_client: A high-performance vector database designed for storing and searching large-scale high-dimensional vectors. In this tutorial, we use Qdrant as our vector storage system.
 cohere: A key reranking service to be used in postprocessing. It takes in a query and a list of texts and returns an ordered array with each text assigned a new relevance score.
 OpenAI: Important for answer generation, as it takes the top few candidates to produce a final answer.
 datasets: Library necessary to import our dataset.
 pandas: Relevant library for data manipulation and analysis.
 Install Packages
 Before you start, install the required packages with pip:

 # !pip install llama-index cohere datasets pandas
 # !pip install -U qdrant-client
 Optional: install Rich to make error messages and stack traces easier to read.

 # !pip install 'rich[jupyter]'
 %load_ext rich
 Import your packages

 import datetime
 import os
 import random
 from pathlib import Path
 from typing import Any

 import pandas as pd
 from datasets import load_dataset
 from IPython.display import Markdown, display_markdown
 from llama_index import (GPTVectorStoreIndex, ServiceContext,
                         SimpleDirectoryReader)
 from llama_index.indices.postprocessor import FixedRecencyPostprocessor
 from llama_index.indices.postprocessor.cohere_rerank import CohereRerank
 from llama_index.vector_stores.qdrant import QdrantVectorStore
 from qdrant_client import QdrantClient

 Path.ls = lambda x: list(x.iterdir())
 random.seed(42)  # This is the answer
 Retrieve API Keys:
 Before you start, you must retrieve two API keys for the following services:

 OpenAI key for LLM. Link
 Cohere key for Rerank. Link or additionally, read Cohere Documentation.
 This tutorial by default uses the Qdrant Client, which doesn't require an API key. However, if you choose Qdrant Cloud instead, then you need a third key. You can get it the Qdrant Cloud main control panel

 def check_environment_keys():
    """
    Utility Function that you have the NECESSARY Keys
    """
    if os.environ.get("OPENAI_API_KEY") is None:
        raise ValueError(
            "OPENAI_API_KEY cannot be None. Set the key using os.environ['OPENAI_API_KEY']='sk-xxx'"
        )
    if os.environ.get("COHERE_API_KEY") is None:
        raise ValueError(
            "COHERE_API_KEY cannot be None. Set the key using os.environ['COHERE_API_KEY']='xxx'"
        )
    if os.environ.get("QDRANT_API_KEY") is None:
        print("[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL")


 check_environment_keys()
 [Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL
 Architecture
 Our answer engine consists of two main parts:

 Retrieval - Done with Qdrant
 Synthesis - Done with OpenAI API
 We will use LlamaIndex to make the Query Engine and Qdrant for our Vector Store. Later, we will add components to keep the engine updated and improve ranking after retrieval

 The arrow point represents the direction of data flow. The "Query Engine" box encapsulates the postprocessing step to indicate that it's a part of the query engine's function. This diagram is meant to provide a high-level understanding of the process and does not include all the details involved.



 Load Sample Dataset
 First we need to load our documents. In this example, we will use the News Category Dataset v3. This dataset contains news articles with various fields like headline, category, short_description, link, authors, and date. Once we load the data, we will reformat it to suit our needs.

 dataset = load_dataset("heegyu/news-category-dataset", split="train")
 Found cached dataset json (/Users/nirantk/.cache/huggingface/datasets/heegyu___json/heegyu--news-category-dataset-a0dcb53f17af71bf/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
 def get_single_text(k):
    return f"Under the category:\n{k['category']}:\n{k['headline']}\n{k['short_description']}"


 df = pd.DataFrame(dataset)
 df.head()
 link	headline	category	short_description	authors	date
 0	https://www.huffpost.com/entry/covid-boosters-...	Over 4 Million Americans Roll Up Sleeves For O...	U.S. NEWS	Health experts said it is too early to predict...	Carla K. Johnson, AP	2022-09-23
 1	https://www.huffpost.com/entry/american-airlin...	American Airlines Flyer Charged, Banned For Li...	U.S. NEWS	He was subdued by passengers and crew when he ...	Mary Papenfuss	2022-09-23
 2	https://www.huffpost.com/entry/funniest-tweets...	23 Of The Funniest Tweets About Cats And Dogs ...	COMEDY	"Until you have a dog you don't understand wha...	Elyse Wanshel	2022-09-23
 3	https://www.huffpost.com/entry/funniest-parent...	The Funniest Tweets From Parents This Week (Se...	PARENTING	"Accidentally put grown-up toothpaste on my to...	Caroline Bologna	2022-09-23
 4	https://www.huffpost.com/entry/amy-cooper-lose...	Woman Who Called Cops On Black Bird-Watcher Lo...	U.S. NEWS	Amy Cooper accused investment firm Franklin Te...	Nina Golgowski	2022-09-22
 # Assuming `df` is your original dataframe
 df["year"] = df["date"].dt.year

 category_columns_to_keep = ["POLITICS", "THE WORLDPOST", "WORLD NEWS", "WORLDPOST", "U.S. NEWS"]

 # Filter by category
 df_filtered = df[df["category"].isin(category_columns_to_keep)]

 # Sample data for each year


 def sample_func(x):
    return x.sample(min(len(x), 200), random_state=42)


 df_sampled = df_filtered.groupby("year").apply(sample_func).reset_index(drop=True)
 df_sampled["year"].value_counts()
 year
 2014    200
 2015    200
 2016    200
 2017    200
 2018    200
 2019    200
 2020    200
 2021    200
 2022    200
 Name: count, dtype: int64
 del df
 df = df_sampled
 df["text"] = df.apply(get_single_text, axis=1)
 df["text"]
 0       Under the category:\nWORLDPOST:\nAfghans Don't...
 1       Under the category:\nPOLITICS:\nACLU Seeks To ...
 2       Under the category:\nPOLITICS:\nWork and Worth...
 3       Under the category:\nPOLITICS:\nJody Hice, Ant...
 4       Under the category:\nPOLITICS:\nCapito Wins We...
                              ...                        
 1795    Under the category:\nPOLITICS:\nA Hard-Right R...
 1796    Under the category:\nPOLITICS:\nHerschel Walke...
 1797    Under the category:\nU.S. NEWS:\nStocks Fall, ...
 1798    Under the category:\nWORLD NEWS:\nPeru Court O...
 1799    Under the category:\nPOLITICS:\nMichigan Secre...
 Name: text, Length: 1800, dtype: object
 df["text"][9]
 "Under the category:\nWORLDPOST:\nFreed Taliban Commander Tells Relative He'll Fight Americans Again\n"
 df.drop(columns=["year"], inplace=True)
 Next, write these documents to text files in a directory. Each document will be written to a text file named after its date.

 %%time
 write_dir = Path("../data/sample").resolve()
 if write_dir.exists():
    [f.unlink() for f in write_dir.ls()]
 write_dir.mkdir(exist_ok=True, parents=True)
 for index, row in df.iterrows():
    date = str(row["date"]).replace("-", "_")  # replace '-' in date with '_' to avoid issues with file names
    file_path = write_dir / f"date_{date}_row_{index}.txt"
    with file_path.open("w") as f:
        f.write(row["text"])
 CPU times: user 45.6 ms, sys: 116 ms, total: 161 ms
 Wall time: 162 ms
 # del dataset, df
 Store Dataset with Qdrant Client
 We'll be using Qdrant as our vector storage system. Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors.

 Local Qdrant Server/Docker + Cloud Instructions
 If you're running a local Qdrant instance with Docker, use uri:
 uri="http://<host>:<port>"
 Here I'll be using the cloud, so I am using the url set to my cloud instance

 Set the API KEY for Qdrant Cloud:
 api_key="<qdrant-api-key>"
 url
 Memory
 You can use :memory: mode for fast and lightweight experiments. It does not require Qdrant to be deployed anywhere.
 client = QdrantClient(":memory:")
 Load Data into LlamaIndex
 LlamaIndex has a simple way to load documents from a directory. We can define a function to get the metadata from a file name, and pass this function to the SimpleDirectoryReader class.

 def get_file_metadata(file_name: str):
    """Get file metadata."""
    date_str = Path(file_name).stem.split("_")[1:4]
    return {"date": "-".join(date_str)}


 documents = SimpleDirectoryReader(input_files=write_dir.ls(), file_metadata=get_file_metadata).load_data()
 len(documents)
 1800
 Let's look at the date ranges in our dataset:

 dates, years = [], []

 for document in documents:
    dt = datetime.datetime.fromisoformat(document.extra_info["date"])
    #     print(d)
    try:
        dates.append(dt)
        years.append(dt.year)
    except:
        print(d)
 This date key is necessary for the Recency Postprocessor that we are going to use later.

 We have to parse these documents into nodes and create our QdrantVectorStore:

 # define service context (wrapper container around current classes)
 service_context = ServiceContext.from_defaults(chunk_size_limit=512)
 vector_store = QdrantVectorStore(client=client, collection_name="NewsCategoryv3PoliticsSample")
 Next, we will create our GPTVectorStoreIndex from the documents. This operation might take some time as it's creating the index from the documents.

 %%time
 index = GPTVectorStoreIndex.from_documents(documents, vector_store=vector_store, service_context=service_context)
 Run a Test Query
 We have made an index. But as we saw in the diagram, we also need some added functionality to do 3 things:

 Retrieval
 Convert the text query into embedding
 Find the most similar documents
 Synthesis
 The LLM (here, OpenAI) texts the question, similar documents and a prompt to give you an answer
 query_engine = index.as_query_engine(similarity_top_k=10)
 response = query_engine.query("Who is the US President?")
 print(response)
 The US President is Joe Biden.
 response = query_engine.query("Who is the current US President?")
 print(response)
 The current US President is Joe Biden.
 Adding Postprocessors
 LlamaIndex excels at composing Retrieval and Ranking steps.

 The intention behind this is to improve answer quality. Let's see if we can use Postprocessors to improve answer quality by using two approaches:

 Selecting the most recent nodes (Recency).
 Reranking using a different model (Cohere Rerank).


 Here is what the diagram represents:

 The user issues a query to the query engine.
 The query engine, which has been configured with certain postprocessors, performs a search on the vector store based on the query.
 The query engine then postprocesses the results.
 The postprocessed results are then returned to the user
 Define a Recency Postprocessor
 LlamaIndex allows us to add postprocessors to our query engine. These postprocessors can modify the results of our queries after they are returned from the index. Here, we'll add a recency postprocessor to our query engine. This postprocessor will prioritize recent documents in the results.

 We'll define a single type of recency postprocessor: FixedRecencyPostprocessor.

 recency_postprocessor = FixedRecencyPostprocessor(service_context=service_context, top_k=1)
 Rerank with Cohere
 Cohere Rerank works on the top K results which the Retrieval step from Qdrant returns. While Qdrant works on your entire corpus (here thousands, but Qdrant is designed to work with millions) -- Cohere works with the result from Qdrant. This can improve the search results since it's working on smaller number of entries.



 Rerank endpoint takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score. We'll define a CohereRerank postprocessor and add it to our query engine.

 Defining Query Engines
 We'll define four query engines for this tutorial:

 Just the Vector Store i.e. Qdrant here
 A recency query engine
 A reranking query engine
 And a combined query engine.
 The recency query engine uses the FixedRecencyPostprocessor, the reranking query engine uses the CohereRerank postprocessor, and the combined query engine uses both.

 top_k = 10  # set one, reuse from now on, ensures consistency
 index_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
 )
 recency_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[recency_postprocessor],
 )
 cohere_rerank = CohereRerank(api_key=os.environ["COHERE_API_KEY"], top_n=top_k)
 reranking_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank],
 )
 query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank, recency_postprocessor],
 )
 Querying the Engine
 Finally, we can query our engine. Let's ask it "Who is the current US President?" and see the results from each query engine.

 # question = "Who is the current US President?"
 response = index_query_engine.query("Who is the US President?")
 print(response)
 The US President is Joe Biden.
 The response object has a few interesting attributes which help us quickly debug and understand what happened in each of our steps:

 What source nodes (similar to Document Chunks in Langchain) were used to answer the question
 What extra_info does the index have which we can use? This could also be sent as a payload to Qdrant to filter on (via epoch time) -- but Llama Index does not
 Let's unpack that a bit, and we'll use what we learn from response to improve our understanding of the query engines and post processors themselves.

 Note that 10 which is the top-k parameter we set. This confirms that we retrieved the 10 documents most similar to the question (or more correct: 10 nearest neighbours to the question) and a confidence score.

 Can we show this in a more human-readable way?

 print(response.get_formatted_sources()[:318])
 > Source (Doc id: 24ec05e1-cb35-492e-8741-fdfe2c582e43): date: 2017-01-28 00:00:00

 Under the category:
 THE WORLDPOST:
 World Leaders React To The Reality ...

 > Source (Doc id: 098c2482-ce52-4e31-aa1c-825a385b56a1): date: 2015-01-18 00:00:00

 Under the category:
 POLITICS:
 The Issue That's Looming Over The Final ...


 Let's check what is stored in the extra_info attribute.

 response.extra_info
 {'24ec05e1-cb35-492e-8741-fdfe2c582e43': {'date': '2017-01-28 00:00:00'},
 '098c2482-ce52-4e31-aa1c-825a385b56a1': {'date': '2015-01-18 00:00:00'},
 'a3993bb5-64a4-46ce-aa15-0e0672f0994f': {'date': '2014-08-21 00:00:00'},
 'e48f4521-1bf3-45a3-b00b-fd6a03855d6f': {'date': '2018-12-26 00:00:00'},
 '2a13360c-2c18-4917-aef8-1002931d6a3c': {'date': '2016-06-24 00:00:00'},
 '77bd45bf-5418-4eee-bc47-33d2942e2fb8': {'date': '2014-05-31 00:00:00'},
 '51ab3ea9-67af-48a0-864a-5fa1559b2a63': {'date': '2017-06-29 00:00:00'},
 '023a5a27-1f92-4028-aea6-38e681ff2032': {'date': '2014-12-03 00:00:00'},
 '360fac77-ff67-475e-96d8-1480f2447971': {'date': '2014-12-20 00:00:00'},
 '95f092f4-0bed-46de-bae0-4107b775d603': {'date': '2022-03-26 00:00:00'}}
 This has a date key-value as a string against the doc id

 Let's setup some tools to have a question, answer and the responses from the index engine in the same object - this will come handy in a bit for explaining a wrong answer.

 def mprint(text: str):
    display_markdown(Markdown(text))


 class QAInfo:
    """This class is used to store the question, correct answer and responses from different query engines."""

    def __init__(self, question: str, correct_answer: str, query_engines: dict[str, Any]):
        self.question = question
        self.query_engines = query_engines
        self.correct_answer = correct_answer
        self.responses = {}

    def add_response(self, engine: str, response: str):
        # This method is used to add the response of a query engine to the responses dictionary.
        self.responses[engine] = response

    def compare_responses(self):
        """This function takes in a QAInfo object and a dictionary of query engines, and runs the question through each query engine.
        The responses from each engine are added to the QAInfo object."""
        mprint(f"### Question: {self.question}")

        for engine_name, engine in query_engines.items():
            response = engine.query(self.question)
            self.add_response(engine_name, response)
            mprint(f"**{engine_name.title()}**: {response}")

        mprint(f"Correct Answer is: {self.correct_answer}")

    def node_print(self, index, preview_count=5):
        source_nodes = self.responses[index].source_nodes
        for i in range(preview_count):
            mprint(f"- {source_nodes[i].node.text}")


 query_engines = {
    "qdrant": index_query_engine,
    "recency": recency_query_engine,
    "reranking": reranking_query_engine,
    "both": query_engine,
 }
 question = "Who is the US President?"
 correct_answer = "Joe Biden"  # This would normally be determined programmatically.
 president_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
 president_qa_info.compare_responses()
 Question: Who is the US President?
 Qdrant: The US President is Joe Biden.

 Recency: The US President is Joe Biden.

 Reranking: The US President is Barack Obama.

 Both: The US President is Joe Biden.

 Correct Answer is: Joe Biden

 president_qa_info.node_print(index="recency", preview_count=1)
 Under the category:
 WORLD NEWS: Biden On Putin: 'For God's Sake, This Man Cannot Remain In Power' President Joe Biden visited Poland's capital on Saturday to speak with refugees who've been displaced amid Russia's attack on Ukraine.

 president_qa_info.node_print(index="qdrant", preview_count=1)
 Under the category:
 THE WORLDPOST: World Leaders React To The Reality Of A Trump Presidency Many of the presidential memorandums and executive decisions will fundamentally affect countries around the globe.

 Impact of how a question is asked
 question = "Who is US President in 2022?"
 correct_answer = "Joe Biden"  # This would normally be determined programmatically.
 current_president_qa_info = QAInfo(
    question=question, correct_answer=correct_answer, query_engines=query_engines
 )
 current_president_qa_info.compare_responses()
 Question: Who is US President in 2022?
 Qdrant: Joe Biden is the US President in 2022.

 Recency: The US President in 2022 is unknown at this time.

 Reranking: Joe Biden is the US President in 2022.

 Both: The US President in 2022 is unknown at this time.

 Correct Answer is: Joe Biden

 Investigating for Ranking Challenges
 We pull the few top documents which from each query engine. To make them easy to read, we've a utility node_print here.

 💡 We notice that Qdrant (using embeddings) correctly pulls out a few mentions of "2024", "Joe Biden" and "President Joe Biden"

 💡 Cohere also re-orders the top 10 candidates to give the top 3 which mention "President Joe Biden".

 With Recency, we get an undetermined answer. This is because we're only using the one, most recent result.

 🎓 Try this now:
 Change the top_k value passed to llama_index and see how that changes the answers

 current_president_qa_info.node_print(index="qdrant", preview_count=3)
 Under the category:
 POLITICS: Joe Biden Says He 'Can't Picture' U.S. Troops Being In Afghanistan In 2022 The president doubled down on his promise to end America's longest-running war at a Thursday press conference, though he said a May 1 deadline seemed unlikely.

 Under the category:
 POLITICS: How A Crowded GOP Field Could Bolster A Trump 2024 Campaign As Donald Trump considers another White House run, polls show he's the most popular figure in the Republican Party.

 Under the category:
 POLITICS: Biden To Give First State Of The Union Address At Fraught Moment President Joe Biden aims to navigate the country out a pandemic, reboot his stalled domestic agenda and confront Russia’s aggression.

 current_president_qa_info.node_print(index="recency", preview_count=1)
 Under the category:
 POLITICS: GOP Senators Refuse To Rule Out Supporting Donald Trump Again — Even If He's Indicted With the ex-president reportedly under criminal investigation, many Senate Republicans are taking a wait-and-hope-it-doesn’t-happen stance.

 current_president_qa_info.node_print(index="reranking", preview_count=3)
 Under the category:
 POLITICS: Biden To Give First State Of The Union Address At Fraught Moment President Joe Biden aims to navigate the country out a pandemic, reboot his stalled domestic agenda and confront Russia’s aggression.

 Under the category:
 WORLD NEWS: Biden On Putin: 'For God's Sake, This Man Cannot Remain In Power' President Joe Biden visited Poland's capital on Saturday to speak with refugees who've been displaced amid Russia's attack on Ukraine.

 Under the category:
 POLITICS: Joe Biden Says He 'Can't Picture' U.S. Troops Being In Afghanistan In 2022 The president doubled down on his promise to end America's longest-running war at a Thursday press conference, though he said a May 1 deadline seemed unlikely.

 Add a specific Year
 That looks interesting. Let's try this question after specifying the year:

 question = "Who was the US President in 2010?"
 correct_answer = "Barack Obama"  # This would normally be determined programmatically.
 president_2010_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
 president_2010_qa_info.compare_responses()
 Question: Who was the US President in 2010?
 Qdrant: The US President in 2010 was Barack Obama.

 Recency: In 2010, the US President was Barack Obama.

 Reranking: The US President in 2010 was Barack Obama.

 Both: In 2010, the US President was Barack Obama.

 Correct Answer is: Barack Obama

 Let's try a different variant of this question, specify a year and see what happens?

 question = "Who was the Finance Minister of India under Manmohan Singh Govt?"
 correct_answer = "P. Chidambaram"  # This would normally be determined programmatically.
 prime_minister_jan2014 = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
 prime_minister_jan2014.compare_responses()
 Question: Who was the Finance Minister of India under Manmohan Singh Govt?
 Qdrant: The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram.

 Recency: The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram.

 Reranking: The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram.

 Both: The Finance Minister of India under Manmohan Singh Govt was Palaniappan Chidambaram.

 Correct Answer is: P. Chidambaram

 Observation
 In this question: All the engines give the correct answer!

 This is despite the fact that the Recency Postprocessor response does not even talk about the Indian Prime Minister! ❌

 Qdrant via OpenAI Embeddings and Cohere Rerank do not do that much better

 The correct answer comes from OpenAI LLM's knowledge of the world!

 prime_minister_jan2014.node_print(index="qdrant", preview_count=10)
 Under the category:
 POLITICS: Robbing Main Street to Prop Up Wall Street: Why Jerry Brown's Rainy Day Fund Is a Bad Idea There is no need to sequester funds urgently needed by Main Street to pay for Wall Street's malfeasance. Californians can have their cake and eat it too - with a state-owned bank.

 Under the category:
 WORLDPOST: Cities Need To Get Smarter -- And India's On It

 Under the category:
 POLITICS: It Takes Just 4 Charts To Show A Big Part Of What's Wrong With Congress

 Under the category:
 WORLD NEWS: Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to

 Under the category:
 POLITICS: The World Bank Must Commit to Food Security Much will be said about bringing roads, electricity and infrastructure to underdeveloped regions. But how committed is the World Bank to the planet as a whole when it is doling out its loans?

 Under the category:
 WORLDPOST: Former Prime Minister: Japan Should Shelve the Islands Dispute With China to Avoid A Spiral into Conflict

 Under the category:
 POLITICS: Senate Delays Vote On $1.1 Trillion Spending Bill

 Under the category:
 WORLDPOST: Sweden Election Results Offer Uncertain Future For Austerity

 Under the category:
 THE WORLDPOST: Greece Demands IMF Explain 'Disaster' Remarks In Explosive Leak A letter from Greek prime minister Alexis Tsipras questions whether the country "can trust" the lender.

 Under the category:
 WORLDPOST: Comedians Send Powerful Message Against Sexual Harassment In India

 prime_minister_jan2014.node_print(index="recency", preview_count=1)
 Under the category:
 WORLD NEWS: Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to

 prime_minister_jan2014.node_print(index="reranking", preview_count=10)
 Under the category:
 WORLDPOST: Comedians Send Powerful Message Against Sexual Harassment In India

 Under the category:
 WORLD NEWS: Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to

 Under the category:
 POLITICS: It Takes Just 4 Charts To Show A Big Part Of What's Wrong With Congress

 Under the category:
 WORLDPOST: Sweden Election Results Offer Uncertain Future For Austerity

 Under the category:
 WORLDPOST: Cities Need To Get Smarter -- And India's On It

 Under the category:
 WORLDPOST: Former Prime Minister: Japan Should Shelve the Islands Dispute With China to Avoid A Spiral into Conflict

 Under the category:
 POLITICS: Senate Delays Vote On $1.1 Trillion Spending Bill

 Under the category:
 POLITICS: The World Bank Must Commit to Food Security Much will be said about bringing roads, electricity and infrastructure to underdeveloped regions. But how committed is the World Bank to the planet as a whole when it is doling out its loans?

 Under the category:
 POLITICS: Robbing Main Street to Prop Up Wall Street: Why Jerry Brown's Rainy Day Fund Is a Bad Idea There is no need to sequester funds urgently needed by Main Street to pay for Wall Street's malfeasance. Californians can have their cake and eat it too - with a state-owned bank.

 Under the category:
 THE WORLDPOST: Greece Demands IMF Explain 'Disaster' Remarks In Explosive Leak A letter from Greek prime minister Alexis Tsipras questions whether the country "can trust" the lender.

 Recap
 1️⃣ Crafting a Q&A bot with LlamaIndex and Qdrant
 We dumped a news dataset, kicked up a Qdrant client, and stuffed our data into a LlamaIndex
 2️⃣ Keeping our Q&A bot fresh and cranking up the ranking goodness
 We used a recency postprocessor and a Cohere reranking postprocessor, and put them to work building different query engines
 3️⃣ Using Node Sources in Llama Index to dig into the Q&A trails
 We threw a bunch of questions at these engines and saw how they stacked up!
 We figured out that recency postprocessing has its perks, but it can leave us hanging when we narrow down the info too much. Plugging in a reranking postprocessor like Cohere can help sort the responses better.