Does the selfattentionLayer also perform softmax and scaling?

Question

0 votes

In https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html, it states that:

A self-attention layer computes single-head or multihead self-attention of its input.

The layer:

Computes the queries, keys, and values from the input
Computes the scaled dot-product attention across heads using the queries, keys, and values
Merges the results from the heads
Performs a linear transformation on the merged result

I wonder if the layer also apply softmax to the scaling (i.e. divide (Q*K) by sqrt(dim))? My understanding is that, within step 2, this softmax and scaling should happen.

Please clarify that for me or more general users.

Thanks.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Rohit le 20 Avr 2023

0 votes

I understand that you want to know whether ‘selfAttentionLayer’ performs softmax and scaling operations which are involved to compute attention score.

Yes, we perform both operations to compute scaled attention score and then apply softmax as required in attention mechanism.

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Chih le 20 Avr 2023

Thank you very much, Rohit.

Connectez-vous pour commenter.

Answer 2

xingxingcui le 11 Jan 2024

Modifié(e) : xingxingcui le 27 Avr 2024

0 votes

Hi,@Chih

Please check out the details of the code I wrote here link.

-------------------------Off-topic interlude, 2024-------------------------------

I am currently looking for a job in the field of CV algorithm development, based in Shenzhen, Guangdong, China,or a remote support position. I would be very grateful if anyone is willing to offer me a job or make a recommendation. My preliminary resume can be found at: https://cuixing158.github.io/about/ . Thank you!

Email: cuixingxing150@gmail.com

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Does the selfattentionLayer also perform softmax and scaling?

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (1)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Community Treasure Hunt

Does the selfattentionLayer also perform softmax and scaling?

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (1)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens