What are the two sublayers of each encoder in a Transformer model?Embedding and classificationSelf-attention and feedforwardRecurrent and feedforwardConvolution and pooling
Question
What are the two sublayers of each encoder in a Transformer model?Embedding and classificationSelf-attention and feedforwardRecurrent and feedforwardConvolution and pooling
Solution
The two sublayers of each encoder in a Transformer model are Self-attention and feedforward.
Similar Questions
What are the encoder and decoder components of a transformer model?The encoder ingests an input sequence and produces a sequence of tokens. The decoder takes in the tokens from the encoder and produces an output sequence.The encoder ingests an input sequence and produces a single hidden state. The decoder takes in the hidden state from the encoder and produces an output sequence.The encoder ingests an input sequence and produces a sequence of hidden states. The decoder takes in the hidden states from the encoder and produces an output sequence.The encoder ingests an input sequence and produces a sequence of images. The decoder takes in the images from the encoder and produces an output sequence.
What are the three different embeddings that are generated from an input sentence in a Transformer model?Recurrent, feedforward, and attention embeddingsEmbedding, classification, and next sentence embeddingsToken, segment, and position embeddingsConvolution, pooling, and recurrent embeddings
What is the main role of the decoder in a Transformer model?Question 14Answera.To generate output tokens based on the final encoder representation.b.To compute attention scores between input and output tokens.c.Learning positional encodings.d.To encode the input sequence.
In a Transformer decoder, what is the purpose of the masked self-attention layer?Question 2Answera.Assign weights to relevant parts of the input sequence.b.None of thesec.Generate a representation of the entire output sequence.d.Allow the model to "attend" to previously generated tokens.
Which of the following is NOT a core component of the Transformer self-attention mechanism?Question 5Answera.Convolutional Layerb.Query Vectorc.Key Vectord.Value Vector
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.