Blog of science and life
What is mask attention?
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
Under the hood, the model is composed of an encoder and a decoder.
The encoder processes each item in the input sequence, it compiles the information it captures into a vector (called the context). After processing the entire input sequence, the encoder sends the context over to the decoder, which begins producing the output sequence item by item.