Extract word matrix and context matrix from output of trainWordEmbedding / word2vec

6 vues (au cours des 30 derniers jours)
Daniel Ringel
Daniel Ringel le 13 Juil 2018
Réponse apportée : Jayanti le 14 Fév 2025
When I use trainWordEmbedding on a set of documents to train a word embedding that I can then use word2vec with, I get an object "emb" as output that I can input into word2vec. Using word2vec I then get, for each word, the vectors that I can then further process.
However, I would like to also receive as output the underlying word matrix and context matrix (as well as the value of the loss of the training). Does anyone know how I can access these data?
  1 commentaire
Christopher Creutzig
Christopher Creutzig le 26 Nov 2018
What exactly do you mean by “word matrix” and “context matrix”?
I guess the “context matrix” is what (some) other people call the cooccurrence matrix in the skip-gram model? We do not currently have a way to compute that.

Connectez-vous pour commenter.

Réponses (1)

Jayanti
Jayanti le 14 Fév 2025
Hi Daniel,
By word matrix I assume you want the unique words in the document. When you use “trainWordEmbedding” to train a word embedding model on a set of documents, it returns an object called “emb”. This object includes a property named “Vocabulary”, which contains the unique words from the model, stored as a string vector. You can access these unique words using the following code:
emb = trainWordEmbedding(filename);
words = emb.Vocabulary;
By context matrix I assume you mean cooccurrence matrix. However, I couldn't find specific documentation on accessing a co-occurrence matrix directly through the “trainWordEmbedding” or “word2vec”.
Hope this will be helpful!

Catégories

En savoir plus sur Text Analytics Toolbox dans Help Center et File Exchange

Produits


Version

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by