Outline
In my last post, I made a few predictions about what I thought the near future of AI might look like and wanted to explore one of them in more detail: Long Term Memory.
To set the stage, let's consider how people are currently interacting with AI models with a common user story:
- Marketer Mary is working on a new campaign for her company and is getting a bit stuck trying to come up with new ideas.
- Mary opens up her favorite chat app (ChatGPT, TeamAI, etc.) and asks for some ideas.
- The ideas we ok, but a bit generic and not tailored to her company so Mary adds her company's name and some info about her company's audience to the conversation.
- The model then generates a new set of ideas tailored to her company and audience which Mary reviews and thinks are great.
- A week passes...
- Mary needs more ideas for some new campaigns and opens up the chat app again.
- Mary: "Hey, I need some more ideas for some new campaigns."
- The problem: The model doesn't have any context of the previous conversation, what it recommended, what Mary liked/disliked, or what it knows about her company. It has to start from scratch each time!
Problem: Generative AI models don't retain or ingest past conversations and user interactions leading to the user having to repeat information and be precise in their prompting.
This article will explore a possible solution to this problem of long term memory and how it will change the way we interact with AI models (Note: I think this will become even more important as new vision and speed based models are introduced where people can interact at the speed of thought).
Long Term Memory
How can we allow Generative AI models to personalize their interactions with a user from all of the user's past interactions?
Context & Action
First we need to understand how a human would do this. When you live or work with someone for a while, you start to pick up when ABC happens, then XYZ happens or is expected. For example, if your boss is asking for a report they likely have a few formats they expect: Font size 12, Arial, cover page, and exported as a PDF. Now every time they ask for a report, you instinctively know to format it that way without them having to tell you. I have attempted to boil this down into a couple of key components:
- Context: The situation when a specific interaction occurs.
- Action: The action taken based on that context.
From the example above, we would say that the context is "the boss is asking for a report" and the action is "format it in font size 12, Arial, cover page, and exported as a PDF".
How can we apply this to AI models?
To apply this to Generative AI models, we need a couple of things:
- Consistent Entity ID: A way to identify the user (or abstract to any entity).
- Chat History: All of the user's past interactions ideally stored in a structured format in a database.
The consistent entity ID is key here and will let us search through the chat history to find all of that user's (or entity's) past interactions.
Data structure
Let's assume we can extract live, at runtime, the context of any user's interaction (i.e. Mary asks for a report and we extract the context of "Mary is asking for a report"). We then need a way based on a given interaction context to check if we a similar context in our database. We need this to be fast and to understand the similarity between contexts that are not exactly the same (i.e. "Mary is asking for a report" and "Mary is asking for a report with a cover page"). You may have a lightbulb going off right now and think "this is a job for a vector database!" and you would be correct. A vector database will allow us to efficiently search for similar contexts based on a given context without the need to do a full text search - matching based on intent, not exact text.
TLDR schema for the vector database:
- entity_id
: The unique identifier for the entity.
- context_vector
: The vector representation of the context.
- context
: The context of the interaction.
- action
: The action taken based on the context.
Extracting Context/Action Pairs (historical)
Before we can start make recommendations, we first need to populate the vector database with historical context/action pairs. I would recommend doing this with background jobs vs. doing it in real time to not slow down the user experience.
We will want to loop through all of the user's past chats and run an extraction prompt that will extract the context and actions from each chat. Consider using something like OpenAI's structured output to extract the context and actions from each chat or simply asking the model to output a JSON object. We will then parse the JSON and generate an embedding for the context then store the new record into the vector database.
- Consider doing a check first to see if a very similar context already exists in the vector database. If it does, maybe run a prompt to see if we should add a new context/action pair or update the existing one.
Depending on the number of context/action pairs per user, you could consider running K-Means clustering on the contexts to group them into similar contexts. Later on you could then run each incoming vector against just the clusters and then search within just the matching clusters for more efficient searching (or generate a master action plan for each cluster). I also think you could get some interesting visualizations out of the clusters if you label them to understand what sorts of context they are typically have.
Extracting Context/Action Pairs (live)
When a user asks a question or gives a command, we want to run a pre-processing prompt that will extract the context of what they want. If Mary asks for a report, we want to extract the context of "Mary is asking for a report". We will then generate a vector for this context and search the vector database for similar contexts. If we find a high-similarity context, we can then inject that into the current conversation as a sort of "memory" of what the user has done in the past to help the model understand what the user wants.
Bringing it all together
The concepts introduced above could be boiled down to a few key steps:
- Extract the context of the user's interaction.
- Generate a vector for the context.
- Search the vector database for similar contexts.
- Inject the similar context into the current conversation as a sort of "memory" of what the user has done in the past to help the model understand what the user wants.
This process is fairly simple, but should work for most use cases. To continue to improve upon it, you might start generating a graph database for the user with node/edge relationships to store relational data beyond just the context/action pairs (i.e. the user's preferences, habits, etc.).
As large corporations continue to integrate Generative AI into their products (think Microsoft's Copilot, Google's Gemini, etc.), consumers are going to expect a level of personalization that isn't currently there requiring a lot more data to be stored and processed (+1 for the devs).
How do you think personalization and long term memory will evolve? What are some other use cases for this? Drop a comment below!
Comments
Login to Add comments.