Efficient Transformer
data:image/s3,"s3://crabby-images/2bc34/2bc34c325331d1962df103ffdacd70a1de0c176d" alt="Untitled"
data:image/s3,"s3://crabby-images/25387/25387b2289747d12055e289d15f3af77772cfbe1" alt="https://www.tensorflow.org/text/tutorials/transformer#define_the_components"
https://www.tensorflow.org/text/tutorials/transformer#define_the_components
The global self attention layer
data:image/s3,"s3://crabby-images/bb2ab/bb2ab8bd17cc3e5a63aa326005c6d74d6003adca" alt="Untitled"
data:image/s3,"s3://crabby-images/a106e/a106ea77027fa4db92df94838ab4c72b9ebd7237" alt="Untitled"
The causal self attention layer
data:image/s3,"s3://crabby-images/6f196/6f196f10b45a165f21198705ca1ed4de0e2f7e1f" alt="Untitled"
data:image/s3,"s3://crabby-images/3c920/3c920f3884d0c739e84eb19e28e2c32ca8580575" alt="Untitled"
The cross-attention layer
data:image/s3,"s3://crabby-images/bc746/bc746b16b8ad4241b16107cc03da0d46388ed06e" alt="Untitled"
data:image/s3,"s3://crabby-images/197d8/197d8b350bb168cea653a0525439e7578f03c2c3" alt="Each query sees the whole context."
Each query sees the whole context.
data:image/s3,"s3://crabby-images/8f552/8f552f47f9eb891c157a213070bee091ae3565ff" alt="Untitled"
Layer normalization
- Normalize the activation for each layer
- Subtract the average and divide it by its standard deviation (across the features)
- Usually located right before activation
- Why?
- Mitigate internal covariant shift
- Avoid the vanishing gradient problem
- Work as regularization
- Faster convergence
Residual connection
- Allow gradients to flow directly through the skip connection
data:image/s3,"s3://crabby-images/51885/51885f3729a4088a22c16cfdf461d2465f235348" alt="Untitled"
data:image/s3,"s3://crabby-images/3791f/3791f23b4c727b002cad6a48638df4c1f4023cfb" alt="Untitled"
data:image/s3,"s3://crabby-images/8e08d/8e08d9d04dbc91bda7fd388091c17836f808925f" alt="Untitled"
data:image/s3,"s3://crabby-images/775ea/775eaeb800ee99e6f2869b5d70aa23c019eb4e9d" alt="transformer_decoding_2.gif"
data:image/s3,"s3://crabby-images/ba374/ba3742abd9d8a6585d63e298ce2f6d29f8e6e9b8" alt="transform20fps.gif"