This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Use Parallel Processing for Regression TreeBagger Workflow

This example shows you how to:

  • Use an ensemble of bagged regression trees to estimate feature importance.

  • Improve computation speed by using parallel computing.

The sample data is a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3.

Load the sample data and separate it into predictor and response arrays.

load imports-85;
Y = X(:,1);
X = X(:,2:end);

Set up the parallel environment to use two cores.

mypool = parpool(2)
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 2).

mypool = 

 Pool with properties: 

            Connected: true
           NumWorkers: 2
              Cluster: local
        AttachedFiles: {}
    AutoAddClientPath: true
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true

Set the options to use parallel processing.

paroptions = statset('UseParallel',true);

Estimate feature importance using leaf size 1 and 5000 trees in parallel. Time the function for comparison purposes.

tic
b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ...
    'cat',16:25,'MinLeafSize',1,'Options',paroptions);
toc
Elapsed time is 49.133258 seconds.

Perform the same computation in serial for timing comparison.

tic
b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ...
    'cat',16:25,'MinLeafSize',1);
toc
Elapsed time is 125.779240 seconds.

The results show that computing in parallel takes a fraction of the time it takes to compute serially. Note that the elapsed time can vary depending on your operating system.

See Also

| |

Related Topics