Outline
Enterprise data in AI is all the rage right now! The ability to leverage your own data to train models and make predictions is a huge advantage. However, there are still many challenges in this space. One of the biggest challenges is the ability to ask questions about your own, private data. This is where Retrieval Augmented Generation (RAG) and vector databases come in. RAG is a relatively new technique developed by Meta that allows you to ground your AI LLM responses in factual data. This is done by using a vector database to retrieve relevant data and then using that data as context to generate a grounded response. In this article, we will explore a novel implementation of RAG and vector databases to interpret and answer questions in AI systems.
What is RAG?
IBM's Blog post I think says it best: "It’s the difference between an open-book and a closed-book exam. In a RAG system, you are asking the model to respond to a question by browsing through the content in a book, as opposed to trying to remember facts from memory."
- In one sentence, RAG is a method where you gather relevant sources of data and then use that data as context for an AI model to generate an accurate, citable response to the given question.
What is a vector database?
First, we need to understand what a vector is. A vector is a mathematical entity with both magnitude and direction, often used to represent physical quantities like velocity or acceleration, but can also represent a point in space. In the context of vector databases, vectors are used to represent data points in multi-dimensional space, enabling efficient similarity searches and other complex data operations.
In vector databases, text content is encoded into a vector representation using a language model, such as Google's textembedding-gecko
, and then stored in a vector database as an array of floating point numbers. We can then use these vectors to perform similarity searches to compare the similarity of two pieces of text. For example, we can use a vector database to find the most similar documents to a given query.
Example Table Definition
Below is a basic table definition example of how one might store documents in a Postgres database with PG Vector. This is a very basic example and you would likely want to add more columns to your table to store additional metadata about your documents.
CREATE TABLE documents (
document_id INT PRIMARY KEY, -- i.e. The ID of the document
collection_id INT, -- i.e. The collection the document belongs to
document_part INT, -- i.e. The part of the document (if chunked)
title TEXT, -- i.e. The title of the document
source TEXT, -- i.e. The URL of the source
body TEXT NOT NULL, -- i.e. The body of the document
embedding VECTOR NOT NULL, -- i.e. The vector representation of the document
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE collections (
collection_id INT PRIMARY KEY, -- i.e. The ID of the collection
parent_collection_id INT, -- i.e. The ID of the parent collection (if applicable)
name TEXT NOT NULL, -- i.e. The name of the collection
description TEXT NOT NULL, -- i.e. The description of the collection
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Side note, if you are looking to get started with vector databases and need tons of scalability, I highly recommend checking out AlloyDB. It is a fully managed Postgress database with PG Vector and Vertex AI built in.
Why have collections?
- You may be wondering why we need collections and can't just dump all of our documents into one table. In practice, once we have thousands and tens of thousands of documents, it is likely we will have similar content that is not actually related. For example, if you are storing notes for each of your clients and within that same database you also store internal documentation about communication procedures, if you ask a question about communication procedures, you will likely get results from both your client notes and your internal documentation. This is not ideal. By having collections, we can limit our search to only the relevant collections.
- Another reason is role limiting. If you have a large team, you may want to limit what collections they have access to. For example, you may want to limit your sales team to only having access to your client notes and not your internal documentation. This is also possible with collections.
How do we use RAG and vector databases to answer questions?
Step 1: Ask a question to the AI (the prompt)
A user will ask a question to the AI via a chat-bot interface. We pass this question to the backend.
Example question:
What is the capital of France and what is its population?
Step 2: Determine the questions being asked and what sources are available
The backend will look at the prompt and see what questions are being asked. We will also pass in context of what sources are available to us in our vector database.
First, get the available sources from the vector database:
SELECT collection_id, name, description FROM collections;
Next, we will use a fast LLM (i.e. GPT-3.5 Turbo) to determine the questions being asked in the prompt. We will also use the available sources to determine what sources we should try and get answers from. Below is an example prompt and response.
Prompt:
You are an expert at matching questions to data sources. Your task is to output the final JSON from task 3 below. Do not output any other text or formatting.
Task 1: Please identify all of the questions being asked in the provided prompt and re-write them into complete questions.
Task 2: Please search through the provided sources and match the best sources to answer each question.
Task 3: Please output a JSON array with the questions and their respective sources. Each object in the array should have two keys: "question" and "sources".
- Please follow this format for the output and make sure to return valid JSON:
\`\`\`json
[{"question": "<question>", "sources": [{"title": "<source_title>", "id": <source_id>}]}]
\`\`\`
Prompt:
"""
"What is the capital of France and its population?"
"""
**Sources:**
"""
1. Title: Countries
- ID: 1
- Description: A list of countries
2. Title: Capitals
- ID: 2
- Description: A list of capitals and their countries
3. Title: Population
- ID: 3
- Description: A list of countries and their populations
4. Title: GDP
- ID: 4
- Description: A list of countries and their GDPs
"""
Response:
[
{
"question": "What is the capital of France?",
"sources": [{"title": "Capitals", "id": 2}]
},
{
"question": "What is the population of France?",
"sources": [{"title": "Population", "id": 3}]
}
]
Step 3: Ask the questions to the vector database and return the results
We will now ask each question to our vector database (i.e. AlloyDB or Postgres with PG Vector) and ensure filter by source (i.e. collection_id) to return the results.
As an example, lets use the L2 or Euclidean distance formula to search our vector database.
1. Encode the question into a vector using a language model (i.e. textembedding-gecko
)
2. Search the vector database for the most similar vectors using the L2 distance formula (Limit to the top 3 results). Note, we will want to filter by source (i.e. collection_id) to ensure we only get results from the relevant sources.
SELECT
document_id,
collection_id,
document_part,
title,
source,
body,
created_at
FROM documents
WHERE collection_id IN (2, 3) -- i.e. Filter by Collection
ORDER BY embedding <-> 'vector' -- i.e. Search by vector similarity | <-> is the L2 distance formula in PG Vector
LIMIT 3; -- i.e. Limit to top 3 results
- Return the results to be added as context to the LLM in order to answer our initial question.
Step 4: Generate the answer
We will now use a slower, more accurate LLM (i.e. GPT-4 or Claude 2) to generate the answer to our question. We will use the results from the vector database as context for the LLM. Below is an example prompt that will cite the source we have provided.
Prompt:
Please answer the given prompt using the provided sources. Please cite the source in your answer using the following format:
- Inline citation: [1]
Also, add a bullet point list of sources to the bottom of your answer if any sources were used. Title the section "**Sources**" and list each source in the following format:
- [1] [source_title](source_url)
Prompt:
"""
What is the capital of France and its population?
"""
Sources:
"""
1. Title: Capitals
- Description: A list of capitals and their countries
- URL: https://en.wikipedia.org/wiki/List_of_national_capitals
- Content: "..."
2. Title: Population
- Description: A list of countries and their populations
- URL: https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)
- Content: "..."
"""
Response:
The capital of France is Paris[1]. The population of France is 67,848,156[2].
**Sources**
- [1] [Capitals](https://en.wikipedia.org/wiki/List_of_national_capitals)
- [2] [Population](https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations))
As you can see, we have now generated a citable response to our question using RAG and vector databases. This approach drastically improves the accuracy of our responses, lowers the chance of hallucination, and allows us to cite our sources.
Wrapping it up
TLDR
The use of Retrieval Augmented Generation (RAG) and vector databases provides an effective way to utilize enterprise data in AI. This approach enhances the accuracy of AI responses by grounding them in factual data, enabling the system to answer questions about private data. Implementing this involves a four-step process—asking a question, determining the questions and available sources, querying the vector database, and generating a response. This methodology improves the reliability of AI responses and is increasingly valuable in the evolving AI landscape.
Conclusion
You know know the basics for how to use RAG and vector databases to accurately answer questions with LLM AI systems. I hope you found this article helpful and I look forward to hearing what you build!
Comments
Login to Add comments.