How I like to think about diffusion
- Ethan Smith
- Jan 26
- 2 min read
Updated: Apr 9

It's a bit hard to see in the diagram but in addition to being convolved with a gaussian, these points are also drifting towards zero.
There's two perspectives here actually. There's setting a point to
xt = x0 * alpha + noise * sigma
where sigma and alpha are both numbers between 0 and 1
and then there's
xt = x0 + noise * sigma
but sigma goes towards infinity at the end of the diffusion schedule.
In both cases we can achieve a desired signal to noise ratio but one case involves reducing the image signal while the other keeps it constant and continues raising the noise's variance to overwhelm the signal entirely. I believe these are the Variance Preserving and Variance Exploding perspectives respectively.
I put together this repo, Boneless Flow, to explore it some more. Instead of training a model, with weights, to obtain estimates of the flow towards the manifold of clean data, we can compute the ground truth flow analytically if we have the whole dataset in memory.
A problem with this is that the ground truth score actually won't allow for generating new samples
Tweedie's formula is an equation that sits at the foundation of diffusion. It describes the visual above of moving towards regions of higher probability.
I liked this post on it. Though I wanted to give some additional intuition.

Here X is a clean data point, e is noise, and we get out Y when adding that noise to X.
f (denoiser) is a function that takes in Y, the noisy data point, and attempts to estimate X, what it was before being noised.
f_hat is a specific instantiation of f namely, the MMSE denoiser.
It says that we can denoise Y by starting from it and moving in a direction that increases probability, similar to our original diagram, scaled by sigma, a known level of noise
We can consider an example of this without needing to consider diffusion: Linear regression

In a linear regression problem we have a set of points following some kind of trend, we attempt to estimate this trend with a model, a straight line represented by Y_hat = mX + b.
Y_hat is the predicted value from our model.
Y is the actual data points, which we can think of as clean data points from our model but with added noise. This is often a reasonable assumption due to other unaccounted for variables, which we can represent as noise.
Therefore we can think of the Y's represented as
Y = Y_hat + e
or
Y = mX + b + e
This noise, assuming gaussian distribution, creates a fuzzy field/distribution around our model, which we can visualize as either of these two. Where height of the distribution represents probability density, or the opaqueness


Now like before, we can ascend the probability curve, thus denoising the real data points with respect to our model.


how does sigma go to inf when it's at the same time from 0 to 1 interval?