https://docs.google.com/presentation/d/15jNNCisvF09H6NB-Wc_JYFeZihCYxeZtvoHrZSGD_5E/edit?usp=sharing

https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

How to time CUDA GEMM accurately