← Glossary AI

Attention

The mechanism that lets an AI focus on the right parts of its input.

Explained simply.

When you read 'I walked to the bank', your brain instantly figures out which 'bank' is meant based on surrounding context. Attention is the math that lets a model do the same thing. For every word it processes, it computes which OTHER words in the input are most important to pay attention to. This is how transformers handle long, complicated inputs without losing the thread.

An example.

In 'The dog that chased the cat was brown', attention links 'brown' back to 'dog' (not 'cat'), because it has learned from training data that adjectives usually refer to the grammatical subject. Without attention, the model would struggle to know which noun to describe.

Why it matters.

Attention is what makes LLMs feel smart. It's also why they're slow and expensive - comparing every word to every other word is computationally heavy. Most performance research on LLMs is really about making attention cheaper.