CUDA Glossary

References


Untitled

https://www.mishalaskin.com/posts/tensor_parallel

https://www.mishalaskin.com/posts/tensor_parallel

https://www.mishalaskin.com/posts/tensor_parallel

https://www.mishalaskin.com/posts/tensor_parallel

Megatron


Untitled

Untitled

Untitled

Untitled

Untitled

Mixed-precision Training


https://www.youtube.com/watch?v=UvRl4ansfCg

https://www.youtube.com/watch?v=UvRl4ansfCg

https://arxiv.org/pdf/1710.03740.pdf

https://arxiv.org/pdf/1710.03740.pdf

Untitled

Untitled

Example: Adam optimizer

Untitled

ZeRO


https://www.youtube.com/watch?v=By_O0k102PY

https://www.youtube.com/watch?v=By_O0k102PY