CUDA Glossary

CUDA Profiling

Untitled

Intro

Naive matrix multiplication
2304 thread blocks on 68 SMs

Naive matrix multiplication 2304 thread blocks on 68 SMs

v0 - 4 thread blocks on 4 SMs

v0 - 4 thread blocks on 4 SMs

v1 - 4 thread blocks on 4 SMs

v1 - 4 thread blocks on 4 SMs

v2 - 1 thread block on 1 SM

v2 - 1 thread block on 1 SM

v3 - 1 thread block on 1 SM

v3 - 1 thread block on 1 SM

Bible

Diffusion

Language Model