Benchmark for SageAttention kernels using real attention shapes logged from ComfyUI models (image / video / audio)

May 2, 2026 · 10:34 AM UTC · 0 reactions · 0 comments · 2 views

What this is — and what it is not This is not a benchmark of how fast a model generates an image or video. No model weights, no inference pipeline. The benchmark runs on randomly generated tensors that reproduce the exact attention shapes — (batch, heads, seq_len, head_dim, dtype) — that real models use during sampling inside ComfyUI. More precisely: it measures only the attention operation itself, one step inside the denoising loop. Everything else — VAE, CLIP, scheduler, ComfyUI overhead — is

Original article

StableDiffusion

Read full at StableDiffusion →

Anonymous · no account needed

Discussion

0 comments

Benchmark for SageAttention kernels using real attention shapes logged from ComfyUI models (image / video / audio)

Discussion

More from StableDiffusion