The most complex model we actually understand

Published at : 23 Dec 2025

New AI Book! https://www.welchlabs.com/resources/ai-book-ezrzm Get a free ebook version today when you order a copy from our January 2026 print run! You’ll receive a discount code for 100% off the ebook in your purchase confirmation email.

ebook: https://www.welchlabs.com/resources/the-welch-labs-illustrated-guide-to-ai-digital-download

Patreon: https://www.patreon.com/welchlabs

Sections
0:00 - Intro
2:39 - Modular Addition
3:54 - The Model’s Perspective
6:52 - An Accidental Discovery at OpenAI
7:56 - It Groks!
8:49 - Some Clues
13:17 - New Welch Labs Book!
13:57 - Deeper into the model
15:13 - Linear Probes
16:59 - Clocks perform modular addition
19:17 - How do x and y interact exactly?
23:19 - It learns a trig identity?!
26:38 - Putting the pieces together & excluded loss
30:02 - Anthropic finds 6D manifolds
32:24 - Final thoughts
34:02 - Welch Labs update

Special thanks to Neel Nanda for discussing his work and Mech Interp with me, if you want to learn more about Mech Interp, check out Neel’s getting started post here: https://neelnanda.io/getting-started

Thanks to Emmanuel Ameisen and Wes Gurnee for discussing their work on Claude Haiku. Their paper is incredibly in depth and interesting: https://transformer-circuits.pub/2025/linebreaks/index.html

OpenAI team’s grokking paper: https://arxiv.org/pdf/2201.02177. I wasn’t able to reach to team for comment on the origin story, but it is told here: https://www.youtube.com/watch?v=gYGWFjMf9JA&t=1236s
Nanda et al. https://arxiv.org/pdf/2301.05217v1
More on Grokking: https://www.quantamagazine.org/how-do-machines-grok-data-20240412/
Code based on excellent these notebooks from Neel Nanda and collaborators:
https://colab.research.google.com/drive/1F6_1_cWXE5M7WocUcpQWp3v8z4b1jL20

https://colab.research.google.com/github/callummcdougall/ARENA_3.0/blob/main/chapter1_transformer_interp/exercises/part52_grokking_and_modular_arithmetic/1.5.2_Grokking_%26_Modular_Arithmetic_solutions.ipynb
Andrej Karapathy on “Summoning Ghosts”: https://karpathy.bearblog.dev/animals-vs-ghosts/

Code: https://github.com/stephencwelch/manim_videos/tree/master/_2025/grokking

Technical Notes
- It’s very natural for the attention layer to take the sum of it’s inputs (e.g. cos(kx)+cos(ky)), however we also find strong product terms. There’s a couple ways the network can compute products like cos(kx)cos(ky). One option is to approximate the product using ReLU activation functions (see Nanda’s notebooks for more). It’s also feasible for the attention block to do this, I found more evidence of this is my own exploration than of the product happening in the MLP’s ReLU units as suggested by Nanda et al.
- In the first 2D fourier decomposition, we’re leaving out one component, specifically a “negative frequency component” → 0.26 * np.cos(2*np.pi*((4*i)/113)) * np.cos(2*np.pi*((109*j)/113)). We left this out to avoid digging into a discussion of negative/aliased frequencies, and having this 4th component doesn’t add to our intuition about what the network is doing here.
- at 28:00 we’re not showing removing the 8pi/113 frequency from the model’s final output surface.

Patrons
Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin, Nicolas baumann, Jason Singh, Robert Riley, vornska, Barry Silverman, Jake Ehrlich, Mitch Jacobs, Lauren Steely, Jeff Eastman, Rodolfo Ibarra, Clark Barrus, Rob Napier, Andrew White, Richard B Johnston, abhiteja mandava, Burt Humburg, Kevin Mitchell, Daniel Sanchez, Ferdie Wang, Tripp Hill, Richard Harbaugh Jr, Prasad Raje, Kalle Aaltonen, Midori Switch Hound, Zach Wilson, Chris Seltzer, Ven Popov, Hunter Nelson, Amit Bueno, Scott Olsen, Johan Rimez, Shehryar Saroya, Tyler Christensen, Beckett Madden-Woods, Darrell Thomas, Javier Soto, U007D, Caleb Begly, Rick Rubenstein, Brent Hunsaker, Dan Patterson, Tchsurvives, Alex Adai, Walter Reade, Zyansheep, Walter Reade, Duncan Stannett, Reginald Carey, Jean-Manuel Izaret, dh71633, Adrian Rodriguez, Dimitar Stojanovski, Michael Harder, Peter Maldonado, Emily Pesce, David Johnston, Insang Song, FaeTheWolf, Stephen Taylor, KittenKaboodle, EMatter, PATRICKMCCORMACK, John Beahan, Cameron, Cole Jones, Garrett Thornburg, Jeroen W, Rohit Sharma, GlennB, Emmanuel Cortes, Katie Quinn, Karina C, Cakra WW, Mike Ton, Eric Gometz, MacCallister Higgins, Niko Drossos, David Eraso, Tom Zehle, Steve, Brian Lineburg, rjbl, Michael Loh, Perry Vais, Bengal0, Farhad Manjoo, Sara Chipps, Ellis Driscoll, William Taysom, Will Harmon, CK, Abdullah, Peter Cho, Leo Nikora, Griffin Smith, Ash Katnoria, Alex, Markus Hays Nielsen, Catherine H., Vi, David Dobáš, Peter Wang, Sina Sohangir, Danny Thomas, Julian Francis, Hans Adler, Jiayu Peng

Created by: Stephen Welch, Sam Baskin, and Pranav Gundu