top of page



How do we tackle noisy recognition?
Something I've been thinking about a lot lately is how humans handle noisy recognition. Maybe you recognize the image above, if not you...
Ethan Smith
6 days ago13 min read
107 views
0 comments

Boneless Attention and Low Rank Attention Layers
I’ve seen a lot of convoluted tutorials on attention but nothing really made it click for me more as understanding as mixing a projected...
Ethan Smith
Mar 238 min read
414 views
0 comments


The Need for Relative Optimizers | Hypothesis on Muon
Presently, most optimizers used in deep learning do not explicitly accommodate their updates with respect to the expected range of...
Ethan Smith
Mar 1811 min read
453 views
0 comments

Softmax Attention is a Fluke
Calibrated Attention Calibrated Attention NanoGPT Attention is the magic ingredient of modern neural networks. It is the core of what has...
Ethan Smith
Mar 1310 min read
2,654 views
1 comment

How I like to think about diffusion
It's a bit hard to see in the diagram but in addition to being convolved with a gaussian, these points are also drifting towards zero....
Ethan Smith
Jan 262 min read
157 views
1 comment

Classifier free guidance and reinforcement learning
https://sweet-hall-e72.notion.site/Classifier-Free-Guidance-to-Approximate-RL-9f78c02801c6434da61f37c8d843c5bf
Ethan Smith
Jan 261 min read
70 views
0 comments

Why are Modern Neural Nets the way they are? And Hidden Hypernetworks.
https://sweet-hall-e72.notion.site/Why-are-Modern-Neural-Nets-the-way-they-are-And-Hidden-Hypernetworks-6c7195709e7b4abbada921875a951c54
Ethan Smith
Oct 6, 20241 min read
163 views
0 comments

Do Diffusion Transformers Deserve The Hype?
https://sweet-hall-e72.notion.site/Do-Diffusion-Transformers-Deserve-The-Hype-9b9ca7bead374b47aac96558714c203b
Ethan Smith
Jul 28, 20241 min read
238 views
0 comments

Automated LoRA Discovery and Teaching Neural Networks to make Neural Networks
https://sweet-hall-e72.notion.site/Automated-LoRA-Discovery-and-Teaching-Neural-Networks-to-make-Neural-Networks-22aa3b5ad66e4bc985ff2c93...
Ethan Smith
May 26, 20241 min read
263 views
0 comments

Diffusion and Autoregressive Models for Learning to Solve Mazes
https://sweet-hall-e72.notion.site/Diffusion-and-Autoregressive-Models-for-Learning-to-Solve-Mazes-c3bc4bcdfa304ecd9531ee5445a4da66
Ethan Smith
May 21, 20241 min read
361 views
0 comments

Traversing through CLIP Space, PCA and Latent Directions
https://sweet-hall-e72.notion.site/Traversing-through-CLIP-Space-PCA-and-Latent-Directions-b898932e13684d58957405b4a2747a79
Ethan Smith
May 6, 20241 min read
1,703 views
0 comments


Learning Space Filling Curves with Autoencoders
https://sweet-hall-e72.notion.site/Learning-Space-Filling-Curves-with-Autoencoders-e39e41ce75894c3a8fecfee0f3bbfb23?pvs=4
Ethan Smith
Apr 14, 20241 min read
99 views
0 comments

Mimicking Diffusion Models by Sequencing Frequency Coefficients
https://sweet-hall-e72.notion.site/Mimicking-Diffusion-Models-by-Sequencing-Frequency-Coefficients-8e5a60e876d640c390369627d55330b1
Ethan Smith
Mar 13, 20241 min read
1,105 views
0 comments


ContrastiveDPO for Diffusion, Generalizing DPO to multiple items
https://sweet-hall-e72.notion.site/ContrastiveDPO-for-Diffusion-Generalizing-DPO-to-multiple-items-PART1-226b3746aa4d4ff9995d1e26b38a9674
Ethan Smith
Mar 8, 20241 min read
168 views
0 comments


Dipole Attention: Opposites May Be Deep Connections
Image from: https://twitter.com/toshi2fly/status/911306344376012800 Post: https://sweet-hall-e72.notion.site/Dipole-Attention-Opposites-M...
Ethan Smith
Mar 5, 20241 min read
515 views
0 comments


Speeding up Diffusion: Reviewing DiffusionGANs, Consistency Models, and Flow Models
https://sweet-hall-e72.notion.site/Speeding-up-Diffusion-Reviewing-DiffusionGANs-Consistency-Models-and-Flow-Models-80b985120b8f472094cdc...
Ethan Smith
Mar 4, 20241 min read
91 views
0 comments

Exploring Cross Attention maps in Diffusion Models
https://sweet-hall-e72.notion.site/Exploring-Cross-Attention-maps-in-Diffusion-Models-98a85f552fbe4e62887b896b756177f4?pvs=4
Ethan Smith
Feb 25, 20241 min read
128 views
0 comments


CLIP vs T5 as a text encoder for diffusion models
https://sweet-hall-e72.notion.site/CLIP-vs-T5-as-a-text-encoder-for-diffusion-models-df76bf09cacb425797640da86131267f?pvs=4
Ethan Smith
Feb 25, 20241 min read
181 views
0 comments


Response to the broken VAE claim
https://sweet-hall-e72.notion.site/Followup-to-the-broken-VAE-claim-f8e154081cc74b49b81172d4b84af4aa?pvs=4
Ethan Smith
Feb 23, 20241 min read
120 views
0 comments
bottom of page