resume
Resume fitting LDA model
Syntax
Description
updatedMdl = resume(ldaMdl,bag)bag. The input bag
                must be the same model used to fit ldaMdl.
updatedMdl = resume(ldaMdl,counts)counts. The input
                    counts must be the same matrix used to fit
                    ldaMdl.
updatedMdl = resume(___,Name,Value)
Examples
To reproduce the results in this example, set rng to 'default'.
rng('default')Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
filename = "sonnetsPreprocessed.txt";
str = extractFileText(filename);
textData = split(str,newline);
documents = tokenizedDocument(textData);Create a bag-of-words model using bagOfWords.
bag = bagOfWords(documents)
bag = 
  bagOfWords with properties:
        NumWords: 3092
          Counts: [154×3092 double]
      Vocabulary: ["fairest"    "creatures"    "desire"    "increase"    "thereby"    "beautys"    "rose"    "might"    "never"    "die"    "riper"    "time"    "decease"    "tender"    "heir"    "bear"    "memory"    "thou"    …    ] (1×3092 string)
    NumDocuments: 154
Fit an LDA model with four topics. The resume function does not support the default solver for fitlda. Set the LDA solver to be collapsed variational Bayes, zeroth order.
numTopics = 4; mdl = fitlda(bag,numTopics,'Solver','cvb0')
===================================================================================== | Iteration | Time per | Relative | Training | Topic | Topic | | | iteration | change in | perplexity | concentration | concentration | | | (seconds) | log(L) | | | iterations | ===================================================================================== | 0 | 0.08 | | 3.292e+03 | 1.000 | 0 | | 1 | 0.02 | 1.4970e-01 | 1.147e+03 | 1.000 | 0 | | 2 | 0.01 | 7.1229e-03 | 1.091e+03 | 1.000 | 0 | | 3 | 0.01 | 8.1261e-03 | 1.031e+03 | 1.000 | 0 | | 4 | 0.01 | 8.8626e-03 | 9.703e+02 | 1.000 | 0 | | 5 | 0.02 | 8.5486e-03 | 9.154e+02 | 1.000 | 0 | | 6 | 0.02 | 7.4632e-03 | 8.703e+02 | 1.000 | 0 | | 7 | 0.01 | 6.0480e-03 | 8.356e+02 | 1.000 | 0 | | 8 | 0.01 | 4.5955e-03 | 8.102e+02 | 1.000 | 0 | | 9 | 0.01 | 3.4068e-03 | 7.920e+02 | 1.000 | 0 | | 10 | 0.01 | 2.5353e-03 | 7.788e+02 | 1.000 | 0 | | 11 | 0.05 | 1.9089e-03 | 7.690e+02 | 1.222 | 10 | | 12 | 0.01 | 1.2486e-03 | 7.626e+02 | 1.176 | 7 | | 13 | 0.01 | 1.1243e-03 | 7.570e+02 | 1.125 | 7 | | 14 | 0.01 | 9.1253e-04 | 7.524e+02 | 1.079 | 7 | | 15 | 0.01 | 7.5878e-04 | 7.486e+02 | 1.039 | 6 | | 16 | 0.01 | 6.6181e-04 | 7.454e+02 | 1.004 | 6 | | 17 | 0.02 | 6.0400e-04 | 7.424e+02 | 0.974 | 6 | | 18 | 0.02 | 5.6244e-04 | 7.396e+02 | 0.948 | 6 | | 19 | 0.02 | 5.0548e-04 | 7.372e+02 | 0.926 | 5 | | 20 | 0.01 | 4.2796e-04 | 7.351e+02 | 0.905 | 5 | ===================================================================================== | Iteration | Time per | Relative | Training | Topic | Topic | | | iteration | change in | perplexity | concentration | concentration | | | (seconds) | log(L) | | | iterations | ===================================================================================== | 21 | 0.02 | 3.4941e-04 | 7.334e+02 | 0.887 | 5 | | 22 | 0.01 | 2.9495e-04 | 7.320e+02 | 0.871 | 5 | | 23 | 0.01 | 2.6300e-04 | 7.307e+02 | 0.857 | 5 | | 24 | 0.01 | 2.5200e-04 | 7.295e+02 | 0.844 | 4 | | 25 | 0.01 | 2.4150e-04 | 7.283e+02 | 0.833 | 4 | | 26 | 0.01 | 2.0549e-04 | 7.273e+02 | 0.823 | 4 | | 27 | 0.01 | 1.6441e-04 | 7.266e+02 | 0.813 | 4 | | 28 | 0.02 | 1.3256e-04 | 7.259e+02 | 0.805 | 4 | | 29 | 0.01 | 1.1094e-04 | 7.254e+02 | 0.798 | 4 | | 30 | 0.01 | 9.2849e-05 | 7.249e+02 | 0.791 | 4 | =====================================================================================
mdl = 
  ldaModel with properties:
                     NumTopics: 4
             WordConcentration: 1
            TopicConcentration: 0.7908
      CorpusTopicProbabilities: [0.2654 0.2531 0.2480 0.2336]
    DocumentTopicProbabilities: [154×4 double]
        TopicWordProbabilities: [3092×4 double]
                    Vocabulary: ["fairest"    "creatures"    "desire"    "increase"    "thereby"    "beautys"    "rose"    "might"    "never"    "die"    "riper"    "time"    "decease"    "tender"    "heir"    "bear"    "memory"    …    ] (1×3092 string)
                    TopicOrder: 'initial-fit-probability'
                       FitInfo: [1×1 struct]
View information about the fit.
mdl.FitInfo
ans = struct with fields:
          TerminationCode: 1
        TerminationStatus: "Relative tolerance on log-likelihood satisfied."
            NumIterations: 30
    NegativeLogLikelihood: 6.3042e+04
               Perplexity: 724.9445
                   Solver: "cvb0"
                  History: [1×1 struct]
Resume fitting the LDA model with a lower log-likelihood tolerance.
tolerance = 1e-5; updatedMdl = resume(mdl,bag, ... 'LogLikelihoodTolerance',tolerance)
===================================================================================== | Iteration | Time per | Relative | Training | Topic | Topic | | | iteration | change in | perplexity | concentration | concentration | | | (seconds) | log(L) | | | iterations | ===================================================================================== | 30 | 0.00 | | 7.249e+02 | 0.791 | 0 | | 31 | 0.02 | 8.0569e-05 | 7.246e+02 | 0.785 | 3 | | 32 | 0.01 | 7.4692e-05 | 7.242e+02 | 0.779 | 3 | | 33 | 0.02 | 6.9802e-05 | 7.239e+02 | 0.774 | 3 | | 34 | 0.01 | 6.1154e-05 | 7.236e+02 | 0.770 | 3 | | 35 | 0.02 | 5.3163e-05 | 7.233e+02 | 0.766 | 3 | | 36 | 0.01 | 4.7807e-05 | 7.231e+02 | 0.762 | 3 | | 37 | 0.00 | 4.1820e-05 | 7.229e+02 | 0.759 | 3 | | 38 | 0.00 | 3.6237e-05 | 7.227e+02 | 0.756 | 3 | | 39 | 0.01 | 3.1819e-05 | 7.226e+02 | 0.754 | 2 | | 40 | 0.02 | 2.7772e-05 | 7.224e+02 | 0.751 | 2 | | 41 | 0.01 | 2.5238e-05 | 7.223e+02 | 0.749 | 2 | | 42 | 0.02 | 2.2052e-05 | 7.222e+02 | 0.747 | 2 | | 43 | 0.01 | 1.8471e-05 | 7.221e+02 | 0.745 | 2 | | 44 | 0.01 | 1.5638e-05 | 7.221e+02 | 0.744 | 2 | | 45 | 0.01 | 1.3735e-05 | 7.220e+02 | 0.742 | 2 | | 46 | 0.03 | 1.2298e-05 | 7.219e+02 | 0.741 | 2 | | 47 | 0.01 | 1.0905e-05 | 7.219e+02 | 0.739 | 2 | | 48 | 0.01 | 9.5581e-06 | 7.218e+02 | 0.738 | 2 | =====================================================================================
updatedMdl = 
  ldaModel with properties:
                     NumTopics: 4
             WordConcentration: 1
            TopicConcentration: 0.7383
      CorpusTopicProbabilities: [0.2679 0.2517 0.2495 0.2309]
    DocumentTopicProbabilities: [154×4 double]
        TopicWordProbabilities: [3092×4 double]
                    Vocabulary: ["fairest"    "creatures"    "desire"    "increase"    "thereby"    "beautys"    "rose"    "might"    "never"    "die"    "riper"    "time"    "decease"    "tender"    "heir"    "bear"    "memory"    …    ] (1×3092 string)
                    TopicOrder: 'initial-fit-probability'
                       FitInfo: [1×1 struct]
View information about the fit.
updatedMdl.FitInfo
ans = struct with fields:
          TerminationCode: 1
        TerminationStatus: "Relative tolerance on log-likelihood satisfied."
            NumIterations: 48
    NegativeLogLikelihood: 6.3001e+04
               Perplexity: 721.8357
                   Solver: "cvb0"
                  History: [1×1 struct]
Input Arguments
Input LDA model, specified as an ldaModel object. To resume fitting a model, you must fit
                            ldaMdl with solver 'savb',
                            'avb', or 'cvb0'.
Input bag-of-words or bag-of-n-grams model, specified as a bagOfWords object or a bagOfNgrams object. If bag is a
                bagOfNgrams object, then the function treats each n-gram as a
            single word.
Frequency counts of words, specified as a matrix of nonnegative integers. If you specify
                'DocumentsIn' to be 'rows', then the value
                counts(i,j) corresponds to the number of times the
                jth word of the vocabulary appears in the ith
            document. Otherwise, the value counts(i,j) corresponds to the number
            of times the ith word of the vocabulary appears in the
                jth document.
Note
The arguments bag and counts must be the
                same used to fit ldaMdl.
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
      Before R2021a, use commas to separate each name and value, and enclose 
      Name in quotes.
    
Example: 'LogLikelihoodTolerance',0.001 specifies a
                log-likelihood tolerance of 0.001.
Solver Options
Orientation of documents in the word count matrix, specified as the comma-separated pair
      consisting of 'DocumentsIn' and one of the following: 
- 'rows'– Input is a matrix of word counts with rows corresponding to documents.
- 'columns'– Input is a transposed matrix of word counts with columns corresponding to documents.
This option only applies if you specify the input documents as a matrix of word counts.
Note
If you orient your word count matrix so that documents correspond to columns and specify
          'DocumentsIn','columns', then you might experience a significant
        reduction in optimization-execution time.
Option for fitting topic concentration, specified as the comma-separated pair consisting of 'FitTopicConcentration' and either true or false.
The default value is the value used to fit
                            ldaMdl.
Example: 'FitTopicConcentration',true
Data Types: logical
Option for fitting topic concentration, specified as the comma-separated pair consisting of 'FitTopicConcentration' and either true or false.
The default value is the value used to fit
                            ldaMdl.
The function fits the Dirichlet prior on the topic mixtures, where is the topic concentration and are the corpus topic probabilities which sum to 1.
Example: 'FitTopicProbabilities',true
Data Types: logical
Relative tolerance on log-likelihood, specified as the comma-separated pair consisting
            of 'LogLikelihoodTolerance' and a positive scalar. The optimization
            terminates when this tolerance is reached.
Example: 'LogLikelihoodTolerance',0.001
Batch Solver Options
Maximum number of iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer.
This option supports models fitted with batch solvers only
                                ('cgs', 'avb', and
                                'cvb0').
Example: 'IterationLimit',200
Stochastic Solver Options
Maximum number of passes through the data, specified as the comma-separated pair consisting of 'DataPassLimit' and a positive integer.
If you specify 'DataPassLimit' but not 'MiniBatchLimit',
        then the default value of 'MiniBatchLimit' is ignored. If you specify
        both 'DataPassLimit' and 'MiniBatchLimit', then
            resume uses the argument that results in processing the fewest
        observations.
This option supports models fitted with stochastic solvers only
                                ('savb').
Example: 'DataPassLimit',2
Maximum number of mini-batch passes, specified as the comma-separated pair consisting of 'MiniBatchLimit' and a positive integer.
If you specify 'MiniBatchLimit' but not 'DataPassLimit',
        then resume ignores the default value of
            'DataPassLimit'. If you specify both
            'MiniBatchLimit' and 'DataPassLimit', then
            resume uses the argument that results in processing the fewest
        observations. The default value is ceil(numDocuments/MiniBatchSize),
        where numDocuments is the number of input documents.
This option supports models fitted with stochastic solvers only
                                ('savb').
Example: 'MiniBatchLimit',200
Mini-batch size, specified as the comma-separated pair consisting of 'MiniBatchLimit' and a positive integer. The function processes MiniBatchSize documents in each iteration.
This option supports models fitted with stochastic solvers only
                                ('savb').
Example: 'MiniBatchSize',512
Display Options
Validation data to monitor optimization convergence, specified as the comma-separated
            pair consisting of 'ValidationData' and a bagOfWords
            object, a bagOfNgrams object, or a sparse matrix of word counts. If the
            validation data is a matrix, then the data must have the same orientation and the same
            number of words as the input documents.
Frequency of model validation in number of iterations, specified as the comma-separated pair consisting of 'ValidationFrequency' and a positive integer.
The default value depends on the solver used to fit the model. For the stochastic solver, the default value is 10. For the other solvers, the default value is 1.
Verbosity level, specified as the comma-separated pair consisting of
                'Verbose' and one of the following:
- 0 – Do not display verbose output. 
- 1 – Display progress information. 
Example: 'Verbose',0
Output Arguments
Updated LDA model, returned as an ldaModel object.
Version History
Introduced in R2017b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)