The neural network design that made modern AI possible. What's inside every LLM.
The transformer is a blueprint for a type of neural network, introduced by Google in a 2017 paper titled 'Attention Is All You Need.' Its key insight was the attention mechanism - a way for the model to weigh which parts of the input matter most for each word it's generating. Before transformers, AI read text word-by-word, losing track of long-range context. Transformers can see the whole input at once. Every major LLM - GPT, Claude, Gemini - is a transformer.
You don't need to understand transformer internals to build with AI, but knowing the architecture name helps you read the literature and understand why models have context limits, why attention is computationally expensive, and why different model families differ.