Skip to content
BaseThread
Back to Blog

Context engineering for teams

What is a context window? A plain-English guide

A context window is how much text an AI can hold in mind at once: your prompt plus its reply. Here is what it is, why it matters, and where it breaks.

April 20, 2026by BaseThread

A context window is how much text an AI model can hold in mind at once for a single response. It covers everything you type, anything the tool feeds in alongside it, and the model's own reply, all of which has to fit inside one fixed budget. If that sounds like short-term memory, that is the right instinct. The context window is the model's working memory for the task in front of it.

This one idea explains a surprising amount: why a long chat starts forgetting your early messages, why pasting a giant document sometimes makes answers worse, and why "just give it more context" is not the simple fix it sounds like.

Definition

Context window

A context window is the maximum amount of text, measured in tokens, that an AI model can take in and reason over for a single response. It includes your prompt, any files or chat history the tool adds, and the model's reply. Content outside the window is invisible to the model unless it is fed back in.

What counts as "context"?

Everything the model sees for one response lives in the window, not just the words you typed. For a typical chat that is:

  • Your prompt. The question or instruction you just sent.
  • The conversation so far. Earlier messages the tool replays so the model has continuity.
  • Anything attached. A pasted document, a file, search results, or notes the tool pulls in behind the scenes.
  • The reply. The model's answer is generated into the same window, so it counts against the budget too.

Add all of that up and it has to fit. When it does not, something gets cut.

What is a token?

Window sizes are measured in tokens, not words. A token is a chunk of text the model reads as one unit, usually part of a word. In English a token is very roughly three quarters of a word, so 1,000 tokens is around 750 words. Common words are often a single token, while rarer words and code split into several.

You do not need to count tokens by hand. The useful takeaway is the rough conversion: a 128,000-token window holds somewhere around 90,000 to 100,000 words of combined input and output. That is a lot, but a long document plus a long conversation plus a detailed answer can still add up faster than you expect.

How big are context windows today?

They have grown fast. A few years ago a few thousand tokens was normal. Current models commonly run from about 128,000 tokens up to a million or more, and the number keeps climbing with each release.

~128K to 1M+
tokens in the context windows of current large language models, up from a few thousand a few years ago

That is genuinely useful. A bigger window means you can hand the model a whole file or a long thread without it losing the thread. But size is the easy part of the story, and it hides a trap we will get to in a moment.

What happens when you run out of room?

When a conversation grows past the window, the tool cannot show the model everything, so it makes a choice. Most chat tools quietly trim or summarize the oldest messages so the recent ones still fit. That is exactly why a long session starts to forget the constraint you set ten messages ago, or the name you gave it at the start. Some tools throw an error instead and ask you to start fresh.

Either way, the rule is the same: anything no longer inside the window is gone from the model's view. The model is not choosing to ignore it. It literally cannot see it. This is one of the core reasons AI agents forget across sessions: a new session starts with an empty window, and last week's context is nowhere in it.

BaseThread, your team's AI tools finally on the same page. Get started.

The myth: bigger window, better answers

Here is where intuition leads people astray. If the window is the model's memory, surely a bigger window means smarter, more informed answers? Not quite.

A larger window is more capacity, not more relevance. If you fill it with the right, current information, great. If you fill it with a stale wiki dump, last quarter's plan, and three half-related documents, you have just given the model more noise to wade through. The relevant fact is still in there somewhere, but it is competing for attention with a lot that does not matter, and answers can get vaguer or plain wrong.

That decline has a name. It is called context rot, and it is the reason "paste in everything and let the AI sort it out" is a bad habit. The model does not reward volume. It rewards relevance.

The mental model

Treat the context window like a desk, not a filing cabinet. A bigger desk helps, but a desk piled with every document you own is harder to work on than a clean one holding exactly the papers for the task at hand.

What actually makes the window work well

If size is not the answer, what is? Putting the right things in front of the model at the right time. That is the whole discipline of getting good output, and it has two parts:

  • Relevance. Give the model the slice of information that fits the task, not your entire knowledge base.
  • Freshness. Stale context that contradicts the current state drags answers backward. Context that reflects where things actually stand keeps answers honest.

The smart approach is to load only what a task needs, when it needs it, rather than stuffing the window up front. That idea is just-in-time context for AI, and it is how good tools keep the window full of signal instead of clutter. For why simply buying a bigger window does not solve this at the team level, see bigger context windows won't fix team knowledge.

Why this matters beyond a single chat

For one person in one chat, the fix is mostly discipline: keep prompts focused, start fresh when a thread gets bloated. For a team it is harder. Every tool starts each session with an empty window, and the information it needs, what the team decided, what shipped, what is in flight, lives scattered across people and tools. Filling the window with the right context, every session, for every tool, is not something any individual can do by hand.

That is the problem a shared context layer is built for: a curated source your tools read the relevant slice of at the start of a task, so the window starts full of signal instead of empty. See the read-everywhere flow on the how it works page.

TL;DR

A context window is how much text an AI can hold in mind for one response: your prompt, the history and files the tool adds, and the reply, all measured in tokens and capped at a fixed size. Run past it and the oldest content drops out of view, which is why long chats forget. Bigger windows add capacity, not quality, so the real skill is filling the window with relevant, current context rather than the most. For teams, that means a shared source every tool can read the right slice of, every session.

A curated source your AI tools read the relevant slice of at the start of every task, so the window holds what matters.

See how the window stays full of signal

Related reading

Frequently asked questions

What is a context window in simple terms?

It is the amount of text an AI model can read and keep in mind for a single response. Everything you type, plus any files or history the tool feeds in, plus the model's own reply, all has to fit inside that window. Think of it as the model's short-term working memory for one task. When the conversation gets longer than the window, the oldest parts fall out of view.

How big is a context window?

It depends on the model. Older models held a few thousand tokens, while current ones range from roughly 128,000 tokens to a million or more. A token is a chunk of text, very roughly three quarters of a word in English, so a 128,000-token window is around 90,000 to 100,000 words. Bigger windows let you fit more in, but they do not automatically make answers better.

What happens when you exceed the context window?

The tool has to drop something. Most chat tools quietly trim or summarize the oldest messages so the recent ones still fit, which is why a long conversation starts to forget what you said early on. Some tools error out instead. Either way, content that is no longer in the window is gone from the model's view unless it is fed back in.

Does a bigger context window mean better answers?

Not on its own. A larger window is more capacity, not more relevance. If you fill it with stale or off-topic material, the model has more noise to sort through and answers can actually get worse. Quality comes from putting the right, current information in the window, not the most.

Get your team's AI tools on the same page

BaseThread is the shared context-graph that Claude Code, Cursor, and every AI tool your team uses can read, so no one re-explains the same context twice.

Request access