RAG: Retrieval-Augmented Generation
RAG is a framework for improving responses from an LLM by giving it access to external data and documents - we look at some resources
Retrieval-Augmented Generation
Retrieval-Augmented Generation is receiving a lot of coverage - my inbox and Medium feed seem to be overwhelmed with RAG articles and tools.
Some of it is good - and some not so good.
So, I put together a short list of resources that I have found useful and/or interesting as opposed to confused and confusing like some that I’ve read
But first, let’s unpack that rather pompous name. RAG basically means finding some relevant data (Retrieval) and sending it to the LLM to help it produce a better (Augmented) result (Generation).
The benefits are to reduce some of the problems that are inherent to LLMs.
they are out-of-date (the data that they were trained on could be a couple of years
old)
they don’t know about your data
the hallucinate (sometimes they will give entirely plausible responses that are totally false)
By providing the LLM with data you can reduce these problems. For example, if you want the LLM to produce Streamlit code, you could give it data from the latest documentation set to enable it to use the newest features of the framework. Or, if you want it to do some analysis on your own data, then clearly giving it that data is essential. And, finally, by providing the relevant data to the LLM, you increase the chance of it providing a suitable response and so reduce the possibility of it just making things up.
How does it work
A RAG application takes the user prompt but before passing it on to the LLM, it searches a database in order to find relevant information that the LLM can work on. It then combines this with the original prompt and only then passes it on the the LLM for a response.
The database has to exist in the first place, of course, and there are tools that can help you create them from either your own data set or one that you find elsewhere. Typically, the data is broken up into usable chunks encoded and stored in a vector database.
RAG could work with a text database and use keyword searches to find relevant data. However, a vector approach allows a semantic search, retrieving data with similar meanings rather than simply the same words. For example, the sentence “There’s a man with a gun over there” would not be matched with “There is a person holding a firearm nearby” by a keyword search but would be seen as similar when searched semantically.
But that’s enough from me. I hope you will find the following resources useful.
To start with, we have a very accessible introduction to RAG from Marina Danilevsky, a Senior Research Scientist at IBM, where she frames her explanation with a question put to her by one of her children: which planet has the most moons?
What is retrieval-augmented generation?
The video is included in a useful blog post from IBM that provides a business-oriented explanation of the framework and the problems that it solves:
These resources give a good overview of what RAG is and how it can be used to solve business problems. Luis Lastras, director of language technologies at IBM Research, say of RAG: “It’s the difference between an open-book and a closed-book exam. In a RAG system, you are asking the model to respond to a question by browsing through the content in a book, as opposed to trying to remember facts from memory.”
While these resources give a good overall explanation of RAG as a concept, they do not attempt to explain the technical details of how RAG might be implemented.
You can find the original concept described in a paper by Meta (then Facebook) engineers in 2020: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks but since then a whole raft of tools and information have become available.
Add Your Own Data to an LLM Using Retrieval-Augmented Generation (RAG)
(Medium article - paywall)
This is a very interesting and comprehensive technical introduction to RAG. The author takes us through the basic concepts and implementation of RAG.
The article also includes the code for three implementations using the Microsoft Azure platform. The first implementation is based on OpenAI APIs, the second uses Langchain and the third is built with Semantic Kernel.
You can find the code for the first implementation in the article and the author has created a GitHub repository which contains the code for all three implementations (see here).
At 21 minutes, it’s a fairly long read but worth the effort.
LangChain: Chat with Your Data
The creator of Langchain, Harrison Chase has teamed up with Andrew Ng of DeepLearning.AI to create a course that demonstrates how to implement RAG using Langchain.
This is a short video course (currently free) of about an hour that covers the loading and storing of documents in a vector store, retrieving the data and answering questions by connecting to an LLM, and, finally, creating a chatbot that you can have a conversation with.
This is only one of the collaborations between Chase and Ng.
RAG with LlamaIndex
There are those who think that LangChain is overly complicated but there are alternatives such as LlamaIndex.
While this tutorial doesn’t mention RAG, as such, it is a basic pattern for developing a RAG application.
And now for something vaguely related…
Document Focused Generation
In a very crude way, my article below is related to RAG but it focuses on a single document and a hand-crafted prompt. So while it lacks the sophistication of RAG it is fine for analysing CSV or JSON data and has the benefit of being extremely simple.