Cosine Similarity using BERT
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Nicholas Ang
le 30 Juin 2021
Commenté : Nicholas Ang
le 30 Juin 2021
I am using BERT to calculate similarities in Question Answering. I have encoded my Question data using
data.Tokens = encode(mdl.Tokenizer,data.Questions) which returns me a cell array.
Next, I proceeded to encode new text to test the similiarity with the already encoded Questions in the database: testTokens = encode(mdl.Tokenizer,text)
However, I am imable to use the cosineSimilarity(data.Tokens,testTokens) and I receive an error that says:
Input must be a matrix, a tokenizedDocument array, a bagOfWords model, a bagOfNgrams model, a string array of words, or a cell array of character vectors.
Do I need padding here or reshape of my cell vectors?
0 commentaires
Réponse acceptée
Divyam Gupta
le 30 Juin 2021
Hi Nicholas, I notice that you're facing an issue while computing the cosine similarity using a text encoder. As per the documentation mentioned at https://www.mathworks.com/help/textanalytics/ref/cosinesimilarity.html#d123e8335 the cosineSimilarity function takes a matrix to compute the similarity between two documents.
Since the encoded vector sizes for each of the questions is different, constructing a matrix might be difficult. You can do a pairwise comparision between the data.Tokens and the testTokens to compute the similarities. This can be achieved by running a nested loop while simultaneously storing the similarity scores.
Hope this helps.
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Language Support dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!