Transformer

The neural network design that made modern AI possible. What's inside every LLM.

Explained simply.

The transformer is a blueprint for a type of neural network, introduced by Google in a 2017 paper titled 'Attention Is All You Need.' Its key insight was the attention mechanism - a way for the model to weigh which parts of the input matter most for each word it's generating. Before transformers, AI read text word-by-word, losing track of long-range context. Transformers can see the whole input at once. Every major LLM - GPT, Claude, Gemini - is a transformer.

An example.

When the model reads 'The bank of the river was muddy', the transformer's attention mechanism lets 'bank' attend to 'river' strongly. This is how it knows you mean the river edge, not a financial institution. That same mechanism operating across thousands of words is what makes LLMs coherent.

Why it matters.

You don't need to understand transformer internals to build with AI, but knowing the architecture name helps you read the literature and understand why models have context limits, why attention is computationally expensive, and why different model families differ.

Transformer

Explained simply.

An example.

Why it matters.

Related terms

Further reading

Watch