It's a bit hard to see in the diagram but in addition to being convolved with a gaussian, these points are also drifting close to zero.
There's two perspectives here actually. There's setting a point to
xt = x0 * alpha + noise * sigma
where sigma and alpha are both numbers between 0 and 1
and then there's
xt = x0 + noise * sigma
but sigma goes towards infinity at the end of the diffusion schedule.
In both cases we can achieve a desired signal to noise ratio but one case involves reducing the image signal while the other keeps it constant and continues raising the noise's variance to overwhelm the signal entirely. I believe these are the Variance Preserving and Variance Exploding perspectives respectively.
I put together this repo, Boneless Flow, to explore it some more. Instead of training a model, with weights, to obtain estimates of the flow towards the manifold of clean data, we can compute the ground truth flow analytically if we have the whole dataset in memory.
A problem with this is that the ground truth score actually won't allow for generating new samples
Kommentarer