Blog

AES Encryption Explained

From plaintext to ciphertext — every single step of AES broken down with real examples, actual hex values, and working code.

imtrace 27 May 2026

BlogCryptographyAESCybersecurity

Blog | Cryptography | AES | Cybersecurity

AES Encryption Explained

From plaintext to ciphertext — every single step of AES broken down with real examples, actual hex values, and working code. Zero assumptions. Zero shortcuts.

BY imtrace

PUBLISHED May 2026

READ TIME ~25 MIN

You’ve seen “AES-256” on every VPN ad, every password manager, every messaging app. It’s the most widely used encryption algorithm on the planet. Banks, governments, military — they all trust it. But what actually happens inside AES when you encrypt something? How does it turn “Hello” into meaningless garbage — and back again?

This post tears it apart. Every step, from scratch, with a real worked example you can follow byte by byte.

Plaintext:  "Hello, AES World"   (exactly 16 bytes — one AES block)
Key:        "MySecretKey12345"   (16 bytes — AES-128)

By the end, you’ll watch these 16 readable characters transform into unrecognizable ciphertext — and understand exactly why no computer on Earth can reverse it without the key.

🔐 01. What Is AES?

AES stands for Advanced Encryption Standard. It’s a symmetric block cipher — “symmetric” meaning the same key encrypts and decrypts, and “block cipher” meaning it processes data in fixed-size chunks.

Here are the fundamentals:

Block size: always 128 bits (16 bytes). Every piece of data is split into 16-byte blocks before encryption.
Key sizes: 128, 192, or 256 bits. Bigger key = more rounds = harder to break.
Rounds: AES-128 runs 10 rounds. AES-192 runs 12. AES-256 runs 14. Each round scrambles the data further.
Designers: Joan Daemen and Vincent Rijmen created the algorithm (originally called “Rijndael”). NIST standardized it in 2001 after a five-year public competition.

AES achieves security through two properties:

Property	What it means	Which step does it
Confusion	The relationship between key and ciphertext is as complex as possible	SubBytes (S-Box substitution)
Diffusion	Changing one bit of plaintext changes ~50% of ciphertext bits	ShiftRows + MixColumns

Neither property alone is enough. Confusion without diffusion means each byte is encrypted independently — trivially breakable. Diffusion without confusion means the relationship is linear — solvable with algebra. AES combines both, then repeats 10 times. That’s what makes it unbreakable.

📦 02. The Block and State Matrix

AES works on exactly 16 bytes at a time. Those 16 bytes are arranged into a 4×4 grid called the State Matrix. This grid is the central data structure — every operation in AES reads from it and writes back to it.

The critical detail: bytes fill the grid column by column, not row by row. This is called column-major order.

Let’s convert our plaintext:

"Hello, AES World"

Character:  H    e    l    l    o    ,    (sp) A    E    S    (sp) W    o    r    l    d
Hex:        48   65   6c   6c   6f   2c   20   41   45   53   20   57   6f   72   6c   64
Byte index: 0    1    2    3    4    5    6    7    8    9    10   11   12   13   14   15

Now fill the 4×4 grid column by column:

State Matrix:
             Col 0    Col 1    Col 2    Col 3
Row 0    [   48       2c       45       6f   ]      H    o    E    o
Row 1    [   65       20       53       72   ]      e    ,    S    r
Row 2    [   6c       41       20       6c   ]      l   (sp) (sp)  l
Row 3    [   6c       45       57       64   ]      l    A    W    d

Column 0 = bytes 0–3 (48 65 6c 6c). Column 1 = bytes 4–7 (2c 20 41 45). And so on.

This state matrix is the battlefield. Every AES operation — SubBytes, ShiftRows, MixColumns, AddRoundKey — transforms this grid. After 10 rounds of transformation, the grid holds the ciphertext.

🔑 03. Key Schedule — Turning 1 Key Into 11

AES-128 has 10 rounds, plus an initial key addition. That’s 11 points where a key is needed — but the user only provides one 16-byte key. The Key Schedule algorithmically derives 10 more round keys from the original.

The original key as four columns

Key: "MySecretKey12345"
Hex: 4d 79 53 65 | 63 72 65 74 | 4b 65 79 31 | 32 33 34 35

W[0] = [4d, 79, 53, 65]
W[1] = [63, 72, 65, 74]
W[2] = [4b, 65, 79, 31]
W[3] = [32, 33, 34, 35]

These four columns are Round Key 0 — the original key itself.

Generating Round Key 1 (W[4] through W[7])

Every 4th column (W[4], W[8], W[12]…) goes through a special transformation. The other columns are simple XORs.

Step 1 — RotWord: Take W[3], rotate it one byte left.

W[3] = [32, 33, 34, 35]  →  [33, 34, 35, 32]

The first byte moves to the end. That’s it.

Step 2 — SubWord: Run each byte through the S-Box (the same substitution table used in encryption — we’ll cover it in the next section).

S-Box[0x33] = 0xc3
S-Box[0x34] = 0x18
S-Box[0x35] = 0x96
S-Box[0x32] = 0x23

Result: [c3, 18, 96, 23]

Step 3 — XOR with Rcon: XOR the first byte with a round constant. Round 1’s constant is 0x01. The other three bytes XOR with 0x00 (unchanged).

[c3, 18, 96, 23]  XOR  [01, 00, 00, 00]  =  [c2, 18, 96, 23]

The round constants are powers of 2 in GF(2⁸): 01, 02, 04, 08, 10, 20, 40, 80, 1b, 36. They prevent symmetry between rounds.

Step 4 — XOR with W[0]:

[c2, 18, 96, 23]  XOR  W[0] = [4d, 79, 53, 65]
=  [8f, 61, c5, 46]

W[4] = [8f, 61, c5, 46]

Steps 5–7 — Simple XOR for W[5], W[6], W[7]:

W[5] = W[4] XOR W[1] = [8f,61,c5,46] XOR [63,72,65,74] = [ec, 13, a0, 32]
W[6] = W[5] XOR W[2] = [ec,13,a0,32] XOR [4b,65,79,31] = [a7, 76, d9, 03]
W[7] = W[6] XOR W[3] = [a7,76,d9,03] XOR [32,33,34,35] = [95, 45, ed, 36]

Round Key 1 = W[4] + W[5] + W[6] + W[7]:

8f  ec  a7  95
61  13  76  45
c5  a0  d9  ed
46  32  03  36

This process repeats through W[43], producing all 10 additional round keys. The RotWord + SubWord + Rcon steps make the relationship between round keys non-linear — knowing one round key doesn’t easily reveal others.

📦 04. SubBytes — The S-Box

SubBytes is the confusion layer. Every byte in the state matrix is independently replaced using a fixed 256-entry lookup table called the S-Box.

Take a byte, split it into two hex digits. The first digit is the row, the second is the column.

Byte = 0x53
Row  = 5, Column = 3
S-Box[5][3] = 0xed

So: 0x53 → 0xed

A few more examples from our actual state (after Round 0’s AddRoundKey):

0x05 → 0x6b    (row 0, col 5)
0x1c → 0x9c    (row 1, col c)
0x3f → 0x75    (row 3, col f)
0x09 → 0x01    (row 0, col 9)
0x52 → 0x00    (row 5, col 2)  — yes, a byte can become 0x00!

Why this specific table?

The S-Box isn’t random. Each entry is computed in two steps:

Multiplicative inverse in GF(2⁸): Find the value that, when multiplied by the input in the Galois Field, produces 1. Zero maps to zero (special case).
Affine transformation: Multiply the inverse by a fixed binary matrix and XOR with a constant (0x63).

This two-step construction guarantees:

No byte maps to itself (no fixed points)
No byte maps to its bitwise complement
Maximum non-linearity — the S-Box resists both linear and differential cryptanalysis

The S-Box is the reason AES is non-linear. Without it, the entire cipher would be a system of linear equations — solvable with basic algebra regardless of key length.

🔄 05. ShiftRows and MixColumns

These two operations work together to achieve diffusion — spreading the influence of each input byte across the entire state.

ShiftRows

Each row of the state matrix is shifted (rotated) to the left by a different offset:

Row 0: no shift         [a  b  c  d]  →  [a  b  c  d]
Row 1: shift left by 1  [e  f  g  h]  →  [f  g  h  e]
Row 2: shift left by 2  [i  j  k  l]  →  [k  l  i  j]
Row 3: shift left by 3  [m  n  o  p]  →  [p  m  n  o]

Applied to our state after SubBytes:

Before ShiftRows:          After ShiftRows:
6b  84  ab  4a             6b  84  ab  4a    ← row 0 unchanged
9c  00  05  83      →      00  05  83  9c    ← row 1 shifted 1
75  36  cb  6a             cb  6a  75  36    ← row 2 shifted 2
01  c7  33  d1             d1  01  c7  33    ← row 3 shifted 3

Why this matters: Before ShiftRows, each column contains bytes from the same original column. After ShiftRows, each column contains bytes from four different original columns. This sets up MixColumns to create full diffusion.

Without ShiftRows, AES would effectively be four independent 4-byte ciphers running in parallel — far weaker than one 16-byte cipher.

MixColumns

Each column is treated as a polynomial over GF(2⁸) and multiplied by a fixed matrix:

[2  3  1  1]   [a0]   [b0]
[1  2  3  1] × [a1] = [b1]
[1  1  2  3]   [a2]   [b2]
[3  1  1  2]   [a3]   [b3]

All arithmetic is in GF(2⁸) — addition is XOR, multiplication follows special rules (next section).

For column 0 after ShiftRows [6b, 00, cb, d1]:

b0 = (2 × 6b) ⊕ (3 × 00) ⊕ (1 × cb) ⊕ (1 × d1)
   = d6 ⊕ 00 ⊕ cb ⊕ d1
   = 0x24

b1 = (1 × 6b) ⊕ (2 × 00) ⊕ (3 × cb) ⊕ (1 × d1)
   = 6b ⊕ 00 ⊕ 5c ⊕ d1
   = 0x4c

The matrix is MDS (Maximum Distance Separable) — it guarantees that changing any t input bytes changes at least 5 - t output bytes. This is the mathematically optimal diffusion.

MixColumns is skipped in the final round (Round 10). This maintains a symmetry between encryption and decryption that simplifies implementation.

🧮 06. GF(2⁸) — The Math That Makes It Work

MixColumns requires multiplying bytes together. But normal multiplication doesn’t work — 200 × 200 = 40,000, which doesn’t fit in a byte. AES uses a special number system: Galois Field GF(2⁸), a finite field with exactly 256 elements (0x00 through 0xFF).

Addition: XOR

In GF(2⁸), addition is bitwise XOR. No carries, no overflow.

0x57 ⊕ 0x83:
  0101 0111
⊕ 1000 0011
= 1101 0100 = 0xd4

Subtraction is also XOR (every element is its own additive inverse).

Multiplication: Polynomial multiplication modulo an irreducible polynomial

Each byte represents a polynomial. The bits are coefficients:

0x57 = 01010111 → x⁶ + x⁴ + x² + x + 1
0x83 = 10000011 → x⁷ + x + 1

Multiply them like normal polynomials (using XOR for addition of coefficients), then take the remainder when divided by AES’s irreducible polynomial:

m(x) = x⁸ + x⁴ + x³ + x + 1  =  0x11b

“Irreducible” means it can’t be factored — like a prime number, but for polynomials. This specific polynomial was chosen by the AES designers.

In code: the “double and add” method

uint8_t gf_mult(uint8_t a, uint8_t b) {
    uint8_t result = 0;
    for (int i = 0; i < 8; i++) {
        if (b & 1)              // if lowest bit of b is set
            result ^= a;        // add a to result (XOR = addition in GF)
        uint8_t overflow = a & 0x80;  // check if a's high bit is set
        a <<= 1;                // double a (multiply by x)
        if (overflow)           
            a ^= 0x1b;          // reduce modulo the irreducible poly
        b >>= 1;                // move to next bit of b
    }
    return result;
}

The key insight: 0x1b is the lower 8 bits of 0x11b. When a overflows past 8 bits (the high bit was set before shifting), XORing with 0x1b performs the polynomial reduction — keeping the result within the 256-element field.

The only multiplications MixColumns needs

MixColumns only multiplies by 1, 2, and 3:

gf_mult(1, x) = x                          (identity)
gf_mult(2, x) = x << 1, XOR 0x1b if overflow   (called "xtime")
gf_mult(3, x) = gf_mult(2, x) XOR x       (double then add)

Example:

gf_mult(2, 0x6b):
  0x6b = 0110 1011
  Shift left: 1101 0110 = 0xd6
  High bit was 0 → no XOR needed
  Result: 0xd6

gf_mult(3, 0x6b):
  0xd6 XOR 0x6b = 0xbd

📏 07. Padding — When Data Isn’t 16 Bytes

AES processes exactly 16 bytes per block. Real data is rarely an exact multiple of 16. Padding fills the gap.

The standard is PKCS#7: count how many bytes are missing, then fill with that number.

Data: "Hello"  (5 bytes)
Missing: 16 - 5 = 11 bytes
Padded: "Hello" + 0x0b 0x0b 0x0b 0x0b 0x0b 0x0b 0x0b 0x0b 0x0b 0x0b 0x0b
                  (eleven bytes, each with value 11)

Data: "Hello World!!!!" (15 bytes)
Missing: 1 byte
Padded: "Hello World!!!!" + 0x01

Data: "Exactly16Bytes!!" (16 bytes)
Missing: 0... but you still add a full block of padding!
Padded: "Exactly16Bytes!!" + 0x10 × 16

That last case is critical. If data happens to be exactly 16 bytes, PKCS#7 still adds 16 bytes of padding. Why? Because during decryption, the algorithm reads the last byte to determine how much padding to remove. If there were no padding, the last byte of the real data would be misinterpreted as a padding indicator — corrupting the message.

Validating padding on decryption

After decrypting, the receiver checks:

Last byte = 0x03? → Check that the last 3 bytes are all 0x03 → valid ✓
Last byte = 0x05? → Check last 5 bytes are all 0x05 → valid ✓
Last byte = 0x03 but second-to-last = 0x07? → invalid ✗ (inconsistent)
Last byte = 0x00? → invalid ✗ (zero padding doesn't exist in PKCS#7)

This validation step is the basis of the padding oracle attack — one of the most devastating attacks against AES-CBC. But that’s a story for another post.

🔁 Putting It All Together — Full Encryption Walkthrough

Now let’s run the complete AES-128 encryption on our example.

Round 0: AddRoundKey (just XOR with the original key)

State:              Key:                Result:
48  2c  45  6f      4d  63  4b  32      05  4f  0e  5d
65  20  53  72  ⊕   79  72  65  33  =   1c  52  36  41
6c  41  20  6c      53  65  79  34      3f  24  59  58
6c  45  57  64      65  74  31  35      09  31  66  51

Rounds 1–9: SubBytes → ShiftRows → MixColumns → AddRoundKey

Each round applies all four operations using that round’s key from the schedule. Let’s trace Round 1:

SubBytes — every byte through the S-Box:

05→6b  4f→84  0e→ab  5d→4a
1c→9c  52→00  36→05  41→83
3f→75  24→36  59→cb  58→6a
09→01  31→c7  66→33  51→d1

ShiftRows — rotate rows:

6b  84  ab  4a  →  6b  84  ab  4a   (row 0: no shift)
9c  00  05  83  →  00  05  83  9c   (row 1: shift 1)
75  36  cb  6a  →  cb  6a  75  36   (row 2: shift 2)
01  c7  33  d1  →  d1  01  c7  33   (row 3: shift 3)

MixColumns — matrix multiply each column in GF(2⁸).

AddRoundKey — XOR with Round Key 1.

Rounds 2 through 9 repeat identically with their respective round keys.

Round 10 (Final): SubBytes → ShiftRows → AddRoundKey

No MixColumns in the last round. This maintains a structural symmetry that makes the decryption algorithm mirror the encryption algorithm with inverse operations.

The result

After 10 rounds, the state matrix contains the ciphertext — 16 bytes that bear absolutely no statistical relationship to the original plaintext.

Plaintext:   "Hello, AES World"  →  48 65 6c 6c 6f 2c 20 41 45 53 20 57 6f 72 6c 64
Ciphertext:  (after 10 rounds)   →  completely unrecognizable hex bytes

Change one bit of the plaintext — say, “Hello” to “Hellp” — and on average 64 of the 128 ciphertext bits flip. This is the avalanche effect, and it’s what makes AES secure.

🛡️ 08. Why No Computer Can Break AES

Let’s be precise about what “can’t break AES” means.

Brute force against AES-128

AES-128 has a 128-bit key. That’s 2¹²⁸ possible keys — approximately 3.4 × 10³⁸.

2¹²⁸ = 340,282,366,920,938,463,463,374,607,431,768,211,456

That's 340 undecillion keys.

Now consider the fastest supercomputer on Earth — Frontier, performing roughly 10¹⁸ operations per second (1 exaflop). Assume each operation checks one key:

Time = 2¹²⁸ / 10¹⁸ seconds
     = 3.4 × 10³⁸ / 10¹⁸
     = 3.4 × 10²⁰ seconds
     = ~10.8 trillion years

The universe is 13.8 billion years old.
AES-128 brute force takes ~780× the age of the universe.

And that’s with a machine that doesn’t exist — real key-checking is far slower than one operation per key.

No mathematical shortcut exists

Brute force isn’t the only way to attack a cipher. Mathematicians have tried for over 20 years to find algebraic, statistical, or structural weaknesses in AES. The best known attack against full AES-128 is a biclique attack that reduces the search space from 2¹²⁸ to 2¹²⁶·¹. That sounds impressive until you realize:

2¹²⁶·¹ is still approximately 8.5 × 10³⁷ operations.
It saves a factor of ~4 compared to brute force.
Still completely infeasible.

The design of AES — non-linear S-Box, optimal diffusion via MDS matrix, carefully chosen round constants — closes every known class of attack:

Attack Type	What it tries	Why it fails against AES
Linear Cryptanalysis	Find linear approximations between plaintext, ciphertext, and key bits	S-Box non-linearity makes correlations negligibly small after 4+ rounds
Differential Cryptanalysis	Track how input differences propagate through the cipher	MDS matrix in MixColumns ensures maximum diffusion; differences spread too fast
Algebraic Attacks	Express AES as a system of equations and solve	System has ~8,000 equations in ~1,600 variables with degree-high non-linearity; unsolvable in practice
Related-Key Attacks	Exploit relationships between different keys	Key schedule’s RotWord + SubWord + Rcon prevents predictable key relationships
Side-Channel Attacks	Measure timing, power consumption, EM radiation	Not a flaw in AES math — a flaw in implementation. Mitigated with constant-time code

AES has survived over two decades of analysis by the world’s best cryptographers. No practical attack has ever been found against AES used correctly.

🏔️ 09. AES-256 — Why Even Quantum Computers Can’t Touch It

AES-256 uses a 256-bit key: 2²⁵⁶ possible keys.

2²⁵⁶ ≈ 1.16 × 10⁷⁷

For reference, the estimated number of atoms in the observable universe
is approximately 10⁸⁰. The number of AES-256 keys is in the same ballpark.

The quantum threat: Grover’s algorithm

Quantum computers can run Grover’s algorithm, which searches an unsorted database of N items in √N steps instead of N. Applied to AES-256:

Classical brute force: 2²⁵⁶ operations
Grover's algorithm:    2²⁵⁶/² = 2¹²⁸ operations

Grover’s algorithm effectively halves the key length. AES-256 under quantum attack has the security of AES-128 under classical attack — which we already showed requires 780× the age of the universe on the fastest supercomputer.

AES-128 post-quantum security: 2⁶⁴   → potentially vulnerable in the far future
AES-256 post-quantum security: 2¹²⁸  → still completely infeasible

This is exactly why AES-256 is classified as quantum-resistant. NIST’s post-quantum cryptography guidelines recommend AES-256 for symmetric encryption — no replacement needed.

Could a bigger quantum computer help?

Grover’s is proven optimal — no quantum algorithm can search faster than √N. Building a bigger quantum computer doesn’t change the exponent. Even with a trillion-qubit machine running Grover’s at impossible speeds:

2¹²⁸ / (10¹⁸ quantum operations/sec)
= 3.4 × 10²⁰ seconds
= ~10.8 trillion years

Same wall. Different computer. Same answer: not happening.

What about Shor’s algorithm?

Shor’s algorithm devastates RSA and elliptic curve cryptography by efficiently factoring integers and computing discrete logarithms. But Shor’s doesn’t apply to AES. AES is a symmetric cipher — there’s no mathematical trapdoor like factoring to exploit. Shor’s algorithm is irrelevant here.

The bottom line

AES-128: Safe against every classical computer for the foreseeable future.
AES-256: Safe against every classical AND quantum computer for the foreseeable future.

Governments classify data with AES-256 at the TOP SECRET level. When the NSA says it’s good enough for national secrets, it’s good enough.

The Complete Flow

"Hello, AES World"
        ↓
   State Matrix (4×4, column-major)
        ↓
   AddRoundKey ← Round 0 Key (original key)
        ↓
   ┌─────────────────────────────────────┐
   │  SubBytes    (confusion)            │
   │  ShiftRows   (begin diffusion)      │  × 9 rounds
   │  MixColumns  (complete diffusion)   │
   │  AddRoundKey (inject round key)     │
   └─────────────────────────────────────┘
        ↓
   SubBytes → ShiftRows → AddRoundKey     (Round 10, no MixColumns)
        ↓
   Ciphertext (16 bytes of apparent randomness)

AES isn’t magic. It’s the disciplined application of substitution, permutation, and modular arithmetic — repeated enough times that the result is indistinguishable from random noise. The math is public. The algorithm is public. The only secret is the key. And that’s exactly how good cryptography works.