Gradio

🕸️ Visualize Attentions in Translated Text (English to Chinese)

This app aims to help users better understand the behavior behind the attention layers in transformer models by visualizing the cross-attention and self-attention weights in an encoder-decoder model to see the alignment between and within the source and target tokens.

After translating your English input to Chinese, you can check the cross attentions and self-attentions of the translation in the lower section of the page.

Input Text (English)

Translated Text (Chinese)

Examples

Check Cross Attentions

Cross attention is a key component in transformers, where a sequence (English Text) can attend to another sequence’s information (Chinese Text). Hover your mouse over an output (Chinese) word/token to see which input (English) word/token it is attending to.

Check Self Attentions for Encoder

Hover your mouse over an input (English) word/token to see which word/token it is self-attending to.

Check Self Attentions for Decoder

Hover your mouse over an output (Chinese) word/token to see which word/token it is self-attending to. Notice that decoder tokens only attend to tokens on its left as during the generation of each token, it pays attention only to the past not to the future.

Note: I'm using a transformer model of encoder-decoder architecture (Helsinki-NLP/opus-mt-en-zh) in order to obtain cross attention from the decoder layers.