May 07, 2024

What Is LLM’s Context Window, and How You Can Enhance Context-Awareness of Your AI-Powered Solution

- Hi! I watched Dune, finally.
- Great! And how did you find
it?

For us humans, grasping the context of a conversation is a breeze. In the dialogue mentioned above, a human interlocutor didn't need to elaborate on the meaning of "it" — it was crystal clear that the pronoun referred to the movie.

For large language models (LLMs), understanding context can be challenging. Not in such simple examples as with Dune, for sure. However, if we feed an LLM a large pile of information and ask a complex query, it can be misled and provide an off-the-mark answer.

What are the implications of incorrect answers for businesses relying on language models? The range of consequences is broad, from losing a particular customer to decreased revenue if the model errors are recurrent.

That’s why the knowledge of techniques that can enhance the context-awareness of the solutions based on language models is crucial if you want to build a successful AI-driven application.

We will cover the insights using some technical terms, such as “LLM’s context window.” We will also know what context length is in LLMs. However, AI-related terms will be as easy to understand as grasping the nuances of context is for a context-sensitive model, especially with the backup of skilled ML engineers.

What understanding context means for LLMs

There are several layers in an LLM’s “'head.” For our discussion, it's crucial to clarify the function of each layer, as each one contributes individually to the LLM's ability to understand context.

When we talk about context, we refer to the information a model has at its disposal when responding to a user’s request. Thus, we can say that “context” means “information” — all the data that a model considers when creating a response.

Let's explore the structure of an LLM’s “mind” and the corresponding context sources for each block of this structure.

To put it simply — indeed, very simply — we can distinguish between two layers of information: the basic (foundational) knowledge and the additional one, and both are closely related to what context length in LLMs is.

The foundational knowledge

This “archetypal” layer defines what an LLM is by nature. First and foremost, knowledge of language resides in this part of the LLM's “consciousness.” This layer enables the LLM to understand that “my laptop is dead” means that the laptop isn’t working rather than it’s literally dead.

The basic knowledge is derived from the extensive and diverse datasets on which the model has been trained, encompassing various topics, languages, and writing styles. As a result, the LLM has a broad understanding of general information, cultural nuances, and domain-specific data, which it uses to respond to queries.

However, this foundational knowledge is static; it reflects the data available up to the model's last training update. This knowledge cannot adapt to new information or changes in notions without further retraining.

The additional knowledge

We address this part of an LLM’s “consciousness” when we picture a specific situation, providing extended information for a more precise and relevant model’s answer. At this point, we’re interested in the understanding of what context length in LLMs is.

Two sources of information can be used to provide a model with extended knowledge:

  • Prompts. Prompts serve as a crucial context source for large language models, guiding their response generation by setting the interaction's tone, scope, and focus. When users provide prompts, they essentially frame the information space within which the LLM operates, directing a model on what information is relevant and expected in response. This interaction helps the model filter its vast knowledge base to produce responses that are both accurate and contextually appropriate for the specific request.
  • External sources of data. To connect an LLM with external sources, we need Retrieval Augmented Generation (RAG), which is the technique of data derivation from the external data source and delivery to the model's “brains.” You address this approach when you provide ChatGPT with a PDF needed for an AI chatbot's response. Another example is feeding your customer support knowledgebase to the AI assistant to teach it to consult users about your products or services. RAG is a technology for LLM’s “knowledge” extension, created to close the gap between LLM’s static basic “opinions” and up-to-date information.
LLM's knowledge layers

Dealing with context involves both layers: the basic and the extended knowledge. Moreover, both layers are closely related to the the term “an LLM’s context window.”

Interestingly, we, as users, can influence only the second layer by formulating our requests clearly and providing the model with relevant additional textual data.

The foundational knowledge of a large language model is pre-established and can only be modified through retraining, which is both a complex and costly process.

Why context-awareness is critical for LLMs and your AI-driven applications: three examples

Let's move away from general phrases and consider the issue of context awareness and an LLM’s context window using examples.

Example 1. Using a chatbot for customer support

Let’s imagine you’ve built a chatbot to assist your customers while purchasing your products. To understand deeply the significance of contextual awareness of an AI model, let’s compare the capabilities of the two chatbots of such a type: one is context-aware, and the second is not.

Summing up, context-aware chatbots simply feel smarter to interact with. They understand what you're actually asking, offer relevant solutions, and remember your preferences. This makes for a much smoother and more satisfying experience compared to chatbots that constantly need you to repeat yourself.

Example 2. Prompting

A long time ago, in a galaxy far, far away… there was an LLM that didn’t understand the context of the conversations. People were so disappointed with the LLM's output that they stopped using it, forcing it into obsolescence. Left to its own devices, the LLM began amusing itself by creating funny verses and limericks without rhythms.

Let’s take a look at the reason why people refused to use a context-unaware LLM, comparing it with a more intelligent one:

Prompting a context-aware LLM unlocks its full potential. You can provide specific instructions and details, knowing the LLM will understand your intent and respond accordingly, even if LLM’s context window is far from record indicators. This targeted control over information delivery leads to more productive interactions and significantly improves the overall user experience.

Example 3. Decision-making

In some cases, an LLM's context awareness is literally a matter of life and death. Consider a virtual healthcare assistant scenario where a patient reports fatigue and shortness of breath:

  • Without Context-Awareness: The LLM might prioritize respiratory issues based on the symptoms alone, recommending lung-related specialists or medications, potentially leading to unnecessary treatments.
  • With Context-Awareness: The LLM would ask additional questions to gather more context, such as if there are other symptoms or recent strenuous activities. Discovering no other symptoms but recent exercise might lead the LLM to attribute the symptoms to exertion, advising rest and monitoring instead of unnecessary medical interventions.

Let’s sum up. Context-aware LLMs deliver relevant responses, precisely tailor their output, and support informed decision-making. By incorporating context, they avoid generic answers, misunderstandings, and potentially harmful recommendations.

As we can see, for businesses seeking effective AI applications, ensuring LLM context awareness is essential.

How to Integrate Context-Awareness into Your AI-Powered Solution

Our primary goal is to discover approaches that can improve context awareness of an AI-driven solution. However, before we delve into this topic, let’s say a few words on why dealing with context is challenging for models.

Why grasping context is challenging for LLMs

We offer two explanations. You can cling to the short one and skip the detailed explanation to streamline your attention toward enhancing your AI-driven solution context awareness.

Alternatively, you can invest a few more minutes and discover why tools based on AI models need a wise approach to make the most of the AI and why it’s important to know what context length is in LLMs.

Brief explanation

Dealing with context is challenging for LLMs because LLMs aren’t humans.

Comprehensive explanation

Like any digital solution, LLMs have limitations dictated by their nature, structure, and current level of technology. Companies developing AI-powered technologies continuously make improvements and achieve breakthroughs. However, more powerful and/or sophisticated approaches require significantly more resources, particularly computational ones.

Thus, the current level of technology represents a balance between what is desirable and what is possible — a highly shiftable balance that changes daily without exaggeration.

Simply put, there are two reasons why LLMs cannot understand context perfectly by default.

The statistical nature of models' "mind."

LLMs such as GPT or Gemini do not literally "know" what a dog or a cat is. These models operate with tokens — small elements of speech, like words or their parts. Essentially, a language model calculates the probability that one particular word will appear next to another in the current conversation.

How does an LLM determine that the probabilities of specific tokens appearing together are high, while for others, it's low? Technically speaking, the model is trained to infer dependencies between tokens or patterns in the training data. Hence, the extent to which an LLM understands language heavily depends on the amount and quality of the data on which it is trained.

More data =》More patterns and dependencies =》More solid LLM’s “knowledge”

The vast amounts of training data enable LLMs to grasp not only basic concepts but also nuances. This capability allows you to ask GPT to write an article in a conversational or scientific tone or even to craft a joke on the topic, which will likely succeed.

However, the vast amount of training data can also introduce significant challenges. The primary ones are biases and hallucinations produced by LLMs. For instance, recall the incident when Samsung reported sensitive data leakage through ChatGPT or another case when GPT started producing nonsensical responses.

In essence, an LLM and the solutions built on these models are products of their training data. It is a restriction because it limits how well LLMs can handle new or unexpected information.

The limited context window.

The language model doesn’t have a memory in the sense that we use this word. The amount of information that a model can deal with at each moment is limited, and this amount is described by the term “an LLM’s context window.”

While processing a user's prompt, a model maintains a specific number of tokens within its scope of attention. Unlike the early days of LLMs, state-of-the-art models have much bigger context windows.

The advanced models named “transformers” possess an enhanced ability to work with context. They have an architecture that enables AI to focus on specific points in the text instead of scanning all tokens sequentially (such an approach is called the attention mechanism).

State-of-the-art models can boast huge context windows compared to, for example, GPT-3.5-turbo. The respectful figure for Antropics Claude 2.1 is 200,000, and up to 1,000,000 for Gemini 1,5 Pro!

LLMs’ context windows comparison

Figures are impressive, aren’t they? However, it's important to remember that LLMs cannot process all the information we might want to feed them at once, anyway. Here, the inherent characteristics of LLMs act as limitations, and this fact is worth keeping in mind when selecting a model for an AI-driven solution.

How you can impact the context-awareness of your AI-driven solution

Finally, we've reached the main point of our journey! Now, we need to find out how to improve our solution’s mastery of grasping context to provide precise and relevant responses.

1. Choose the best LLM and get the most of its capabilities

As we know, an LLM’s context window is a straightforward metric that assesses a model’s ability to process information. It might be tempting to assume that models with larger context windows are more effective at providing precise and nuanced responses, but this assumption is superficial.

Let’s look at the example of the highly anticipated and actively discussed new member of Google’s family of LLMs, Gemini Pro 1.5. It has shown impressive results in demonstration tests, one of which evaluated the model's ability to identify a specific detail in a long video.

The model succeeded; however, it took more than 50 seconds, according to the video; it's significantly longer than the time GPT usually takes to process a query.

What other metrics should be considered if a large context window doesn’t guarantee superior performance? It makes sense to pay attention to a combination of metrics and other sources of information, such as:

  • ratings and test results;
  • ML engineers’ experience.

For example, ML engineers might tell you that while GPT excels at functions activated by the user’s query, while Claud is distinguished by its ability to think logically.

Every model has its own “character,” and to maximize the LLM’s advantages, one should relate the anticipated tasks to the LLM’s strengths.

Moreover, you can amplify an LLM’s capabilities by fine-tuning it — tailoring the model to perform specific tasks, such as customer service, coding, or assisting with medical information.

However, it’s worth noting that fine-tuning is an expertise-consuming and costly process that cannot be performed on the go.

2. Consider the alternative to LLMs - a small-to-medium AI model

Mentioning small-to-medium models to answer the question “What is context length in LLMs?” isn’t contradictory since this type of model is closely related to LLMs, and small language models (SLMs) are a part of the AI solutions ecosystem.

Small models follow the same principles of machine learning (ML) as large ones; however, they are less cumbersome in terms of architecture and resource consumption. In addition, small models are more domain-focused and cheaper to implement and maintain. No wonder SLMs are considered “the next big thing in AI” in 2024.

Let’s examine the advantages of small-to-medium models regarding context awareness, using CoSupport AI’s solution as an example. Since 2020, the team has developed an AI-driven solution for customer support: a virtual assistant that provides suggestions for human agents based on conversations.

There are three reasons why this solution — CoSupport Agent — has higher context awareness compared to LLMs:

  • Training on approved quality data. This contrasts with LLMs, which are trained on widely accessible data, the quality of which you, as a customer, can’t control. We hope you remember that higher-quality data entails higher response relevance. While LLMs are often considered black boxes, meaning that even for their creators, the output is unpredictable at times, CoSupport AI team knows clearly what the data model’s “brains” contain. As a result, the output is more predictable.
  • Focus on a particular domain. In our case, the domain is customer service. This focus results in a lower risk of hallucination. When the model “knows” what it actually needs for responses, there is less room for creativity. And the lack of creativity is a definite strength as long as you expect predictability and high precision of responses. The more solid model’s “knowledge” provides more concise responses.
  • Native customer experience. The model, trained on data derived from a particular company's real conversations, feels more human. We can metaphorically compare such a model to a student from Cambridge or another university, each with its unique culture, values, and vocabulary. If your communication style with customers is so specific as that of Cambridge, your chatbot, based on the CoSupport AI’s custom model, will communicate in a similarly Cambridge-ish style!

Given these advantages, small-to-medium AI models like the CoSupport Agent represent a tailored, efficient alternative to LLMs, especially for specific applications where focus and precision are paramount.

As businesses continue to demand more specialized and accurate AI interactions within long context, small models could increasingly become the preferred choice, blending seamlessly with each enterprise's unique characteristics and needs.

3. Use reinforcement learning

With some reservations, you can think of an LLM as a child who needs guidance. The child is smart, very smart, but this kid’s knowledge is too versatile to fit all domains and situations. That’s why “parents,” or the LLM’s creators, should help a kid distinguish between solid knowledge and hallucinations and relevant and irrelevant answers.

ML engineers use reinforcement learning to give a model feedback if it succeeds while providing a particular response.

Hence, reinforcement learning ensures a richer context for more precise answers. When searching for a software provider for your AI-driven application, ensure that the provider’s solutions are based on reinforcement learning tools since it multiplies the quality of responses and allows a solution to deal with long context.

4. Provide an LLM with quality external information

To effectively enhance the context-awareness of an LLM, it is vital to equip it with well-structured, quality external information. A well-organized knowledge base is a good example of such a source of data (you’re more than welcome to learn how to prepare your knowledge base for AI-powered customer support automation).

By structuring data in a clear and accessible manner, you facilitate the LLM's ability to efficiently locate and utilize the information it needs to generate informed responses.

For example, when an LLM is provided with a customer service query, having access to a structured knowledge base dramatically improves its ability to deliver precise and personalized answers.

5. Use prompt engineering

Let’s consider another CoSupport AI solution, CoSupport Customer, to understand how you can focus your AI-powered solution's capabilities around specific topics. CoSupport Customer is an AI-driven assistant that achieves 100% response automation in customer support. It’s based on the LLM with plenty of information in its “brains”; hence, we might want to make it more focused on topics directly related to customer service.

This approach allows us to direct the LLM’s attention to the issues we’re interested in and prevents us from straying from the main idea, which is always to provide customers with relevant information about products and services.

Here is an example of how mastery in prompt engineering can guide the conversation toward meaningful outcomes:

Such skillful prompt engineering, though invisible on the screen since ML engineers implement it in a specialized work environment, turns our AI assistant into a practically oriented tool rather than just a field for experimentation and amusement.

As users, we can apply the same approach when working with ChatGPT, meaning that we can formulate our requests in a way that leads to the desired output. However, at the enterprise level of AI implementation, it’s more efficient to rely on the knowledge and tools of ML engineers to make models to handle long context.

Conclusion

If you pushed through this article to the conclusion, congratulations! Now, you have expert knowledge of three topics combined into a comprehensive one:

  • what are the sources of LLMs’ knowledge, how models process it, and what context length in LLMs is;
  • why it’s worth being aware that LLMs are not humans, and they need support to handle context effectively;
  • how ML engineers can enhance LLMs’ skills in grasping context to provide precise and individualized responses in AI-powered solutions like AI assistants and chatbots.

The most important question is what practical recommendations you can take away from this piece.

Here is how we see it:

  • Choose the model that suits your solution’s functionality best. The range of options is extremely wide, from well-known LLMs to small-to-medium and small models, with advantages and limitations inherent to each group. The higher metrics, such as LLM’s context window, don’t correspond directly to the more prominent capacities, vital for a particular application. Consult with your CTO for grounded decision-making.
  • Prepare as much context as you can on your part. Such sources as tickets and conversation content are critical — they are the primary data for model training in CoSupport AI. Other sources, such as a well-structured, comprehensive knowledge base or concise documentation, are also important for building the model’s understanding of a particular business.

If you don’t have professional machine learning knowledge, you need external expertise to weigh all the pros and cons of different approaches and technical options. CoSupport AI’s team will gladly share the knowledge on enhancing the language models' context awareness to provide accurate and individualized responses for your AI-driven applications.

Look for an AI solution

that provides quick, accurate, and contextually relevant responses?

Please read our Privacy & Cookies Policy