Outline
ChatGPT was released on Nov. 30, 2022 [Forbes] and introduced the public to the power of Generative AI. Their model at the time was GPT 3.5 which blew my mind - it could answer most questions and respond in a human-like way, something completely new to the world. In the two and half years since, Generative AI has exploded in popularity and has shifted the business landscape with every company trying to figure out how they can re-brand into an AI first company (heck, even Samsung has an AI washing machine).
Is Generative AI the future? A bunch of fluff & buzz? Or is something new brewing?
Context is KING
What differentiates a Generative AI model from a human?
Human
A human approaches a problem with the context of their personal knowledge and experiences, their company's business objectives, and their industry knowledge. They can not only answer questions, but they know where to look for context, and can then take action based on that context in a accurate and reliable manner.
Generative AI Model
A Generative AI model has a broad understand of the world, but doesn't have your company's business objectives, your industry knowledge, or your personal knowledge and experiences (hence why RAG is so popular). In addition, it can't reliably take action based on that context (fill out a form, email a client, stop a production line, etc.) since it doesn't have any real world presence.
Context & Action
This boils down to two key factors that Generative AI models lack:
- Context: The knowledge and experiences of the user inherent to them (i.e. they know what they want to do, but maybe don't know how to write explicitly into a prompt).
- Action: The ability to reliably take action based on that context (i.e. based on the user's intent, the model can interact with the real world).
A lot of work is being done to address these two issues. Context is being solved with RAG, Graph Knowledge bases, vision models, and integrating web data, but it still lacks truly extracting the user's intent if they aren't able to clearly express what they want. Action is being solved with Agents (personally this is a bit of a buzzword to make things sell - basically you just have AI generate JSON which then normal code acts up on), but it still has a large margin of error and limited to "simple" tasks like writing sales emails, answering support tickets, etc.
Predictions
Long Term Memory
When you live with someone for a long time (i.e. a husband/wife, parent/child, etc.), you start to understand their personality, their habits, their preferences, etc. You can then start to predict their wants and needs even when they don't explicitly tell you and this is distinct to that relationship.
People interact with AI models in a start/stop manner. They open up their favorite chat app (ChatGPT, TeamAI, etc.) and ask a question, maybe a few follow ups, and then close the app. Next time they have a question, they open the app and start a new chat completely distinct from the last. The LLM has NO context of the user's previous interactions, who they are (aside from maybe a name or title), or their preferences meaning the user needs to be very explicit each time in what they want.
I predict, that each person will have their own AI context repository that will be continuously reviewing their past interactions and extracting preferences, habits, and other context into a knowledge graph or database of some sort so each new question they ask will be filtered though the lens of their past interactions and preferences.
Real World Interaction
Chat has been the primary way people interact with AI models, but the popularity of speech is rising. I think we will see this segment into two segments:
Chat = White Collar
Chat for the business world. People in shared office setting won't want to be talking out loud to their Generative AI model, but will tend towards text based interaction.
Speech = Blue Collar
Speech for the consumer world. Service agents, trade workers, and people who work with their hands will want to stay hands free and chat at the speed of thought as they work vs. sitting down and typing. They will need models with vision to gain the context of their environment and situation (i.e. a mechanic diagnosing a car problem doesn't want to explain the type of car, where exactly the problem is, etc. they are looking at the car and just want to ask the question - the Generative AI model can use vision gain all that extra context via vision).
Robotics
On the vision side of things, Google is making me excited with their bounding box extractions for their Gemini AI models. Now you can combine the logic of a Generative AI model with the dimensionality of a vision model to create a system that can understand the world and take action. For example, you ask a robot to make you a cup of coffee - the Generative AI model can take a picture of the room and locate the coffee maker. Now it can move towards it, take another picture to understand how it works (buttons, etc.) and locate the coffee grounds, etc... and do this in an iterative loop until the coffee is made. Each time it can take that last picture and reason from that context what objects it has, what it currently needs, and where it needs to move to complete the next step.
This sort of dynamic interaction where you don't need to hard-code in the logic flow of "how to make a coffee" is what will take these sorts of robots to the next level (think C3PO from Star Wars). And combine this with Long Term Memory and you now the robot will remember which room the coffee maker is in, which coffee you prefer, your grind setting, etc.
Wrapping Up
Generative AI is still in its infancy and we are just scratching the surface of what is possible. I have been blessed to be able to work on the cutting edge of this for the last few years and I am excited to see where it goes with so much potential! How do you think Generative AI will evolve? What will be the next big thing? Drop a comment below!
Comments
Login to Add comments.