What is mask attention?

Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

Under the hood, the model is composed of an encoder and a decoder.

The encoder processes each item in the input sequence, it compiles the information it captures into a vector (called the context). After processing the entire input sequence, the encoder sends the context over to the decoder, which begins producing the output sequence item by item.

