version 1.5.2 (760 KB) by
Stephen Meehan

An algorithm for manifold learning and dimension reduction.

Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration. See the comments at the top of the file run_umap.m for documentation and many examples of how to use this code.

The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville. See their original paper for a long-form description (https://arxiv.org/pdf/1802.03426.pdf). Also see the documentation for the original Python implementation (https://umap-learn.readthedocs.io/en/latest/index.html).

This MATLAB implementation follows a very similar structure to the Python implementation, and many of the function descriptions are nearly identical.

Here are some major differences in this MATLAB implementation:

1) All nearest-neighbour searches are performed through the built-in MATLAB function knnsearch.m. The original Python implementation uses random projection trees and nearest-neighbour descent to approximate nearest neighbours of data points. The function knnsearch.m either uses an exhaustive approach or k-d trees, both of which are slow for high-dimensional data. As such, this implementation may slow down more in the case of large, high-dimensional data sets.

2) The MATLAB function eigs.m does not appear to be as fast as the function "eigsh" in the Python package Scipy. For large data sets, we initialize a low-dimensional transform by binning the data using an algorithm known as probability binning. If the user downloads and installs the function lobpcg.m, made available here (https://www.mathworks.com/matlabcentral/fileexchange/48-locally-optimal-block-preconditioned-conjugate-gradient) by Andrew Knyazev, this can be used to find exact eigenvectors for medium-sized data sets.

3) We have built in the optional ability to detect clusters in the low-dimensional output of UMAP. For users with MATLAB R2019a or later, DBSCAN can be used to produce cluster IDs as explained in the code examples. We have also added the ability to match new clusters to old supervisors using quadratic form matching (described at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818510/), in the case that test data is transformed using a template created by supervised dimension reduction.

Overall, this MATLAB UMAP implementation tends to be faster than the original Python implementation. This version (1.5) makes all UMAP reductions faster with a new C++ MEX implementation of stochastic gradient descent. Thus 'MEX' is the new default for the input argument 'method'. Due to File Exchange requirements, users must download the .MEX file separately (a link to the file on Google Drive is provided upon calling "run_umap"). As examples 13 to 15 show, you can now test the speed difference between the implementations for yourself on your computer by setting the 'python' argument to true.

Additionally, version 1.5 enables users of supervised templates to request the post reduction services of supervisor matching, QF tree, and QF dissimilarity regardless of their input arguments for 'n_components' and 'verbose'. The function run_umap.m returns the results of these services via the new 4th output argument: extras. The properties of extras are documented in the file umap/UMAP_extra_results.m.

Our thanks to Dr. Julie Elie from UC Berkeley for these supervised template improvements.

Optional toolbox dependencies:

-The Bioinformatics Toolbox is required to change the 'qf_tree' argument.

-The Curve Fitting Toolbox is required to change the 'min_dist' argument.

This implementation is a work in progress. It has been looked over by Leland McInnes, who considers it "a fairly faithful direct translation of the original Python code (except for the nearest neighbour search)". We hope to continue improving it in the future.

Provided by the Herzenberg Lab at Stanford University.

We appreciate all and any help in finding bugs. Our priority has been determining the suitability of our concepts for research publications in flow cytometry for the use of UMAP supervised templates. Thus our testing of run_umap.m has been limited to combinations of correct parameters on suitable data similar to the 19 examples in the run_umap.m header comment. This means that the majority of possible combinations of correct input and output parameters have not been tested. Moreover, testing has not been extensively done on incorrect parameters, incorrect data, or even less suitable “edge” data. However, the Herzenberg Lab has begun and is continuing independent testing to improve this implementation.

Connor Meehan, Stephen Meehan, and Wayne Moore (2020). Uniform Manifold Approximation and Projection (UMAP) (https://www.mathworks.com/matlabcentral/fileexchange/71902), MATLAB Central File Exchange.

Created with
R2020a

Compatible with R2017a to R2020a

**Inspired:**
CytoMAP

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

Corrado AmeliVery nice.

Beaware that in the documentation the "cluster_2D_method" arguement should be called "cluster_method_2D".

Dan O'SheaVery nice work! Had some issues getting this running on Linux that are fairly easily resolved:

in run_umap L815 or so, instead of using `if ismac`, just use `mexext` to get the mex extension, since you need "mexa64" on linux.

```

exe=fullfile(curPath, sprintf('mexStochasticGradientDescent.%s', mexext));

```

Similarly in InstallMexAndExe.m L10:

```

mexFileName= sprintf('mexStochasticGradientDescent.%s', mexext);

```

Everything worked from there on the examples!

Thanks,

Dan

Stephen Meehanrun_umap now can access fall resources and examples on the Herzenberg lab servers

Stephen MeehanCorrection to my comment below: run_umap can not access examples on the Herzeberg servers..

Hayley SongHi, thank you for sharing this library. It's very useful for my project. I noticed that currently 'run_umap' function errors out with 'metric' set to 'spearman', which comes from the absence of 'spearman' as key,value in `U.METRIC_DICT` in THE 'UMap' class definition. Adding (key, value) of ('spearman' and 'spearman') in `Umap.m` (Line 153-161) fixed the error.

It'd be great if you could incorporate this change in the next version. Thanks again for sharing the code!

Hayley SongFollowing up with my question below:

It seems like there was a minor mistake of missing 'spearman' in 'UMap' class definition. I was able to make it work with 'separman' metric by adding (key, value) of ('spearman' and 'spearman') in `Umap.m` (Line 153-161) to get the 'spearman'.

It'd be great if you could incorporate this change in the next version. Thanks again for sharing the code!

Ziwei LiuHi, I just downloaded this implementation and tried to run the example file (run_umap), but I got the following error:

Error using websave (line 98)

Could not access server. http://cgworkspace.cytogenie.org/GetDown2/demo/samples.zip.

Error in run_umap/downloadCsv (line 902)

websave(zipFile, ...

Error in run_umap (line 352)

csv_file_or_data=downloadCsv;

Could anyone help out with this? Thank you.

MlfanFrants JensenHi Stephen, I have the same issue as Cortexlab - would love to use pre-computed (non-euclidean) distance matrix. Do those changes he suggested (Metric-Dict option precomputed, and insert dmat later instead of calculating) make sense? Thanks! -Frants

Yuchun Dinghi, I'm trying to reduce from 100D to 2D. the size of the dataset is around 100k. I was wondering realistically how long should it take? using the default setting the performance seems really really slow

Michal KvasnickaUpdate 1.5.0 with default MEX files works very well ... thanks for your effort!!!

Richard GardnerThanks for sharing this implementation – it's working great for me so far. The only problem I've experienced is in reproducing results with non-default parameter sets, even when I set the 'randomize' argument to false. I wonder if this issue might originate from the curve-fitting process in find_ab_params.m. The fit() function appears to use a random initial state (I see a warning about this every time I execute run_umap.m), and this occurs before UMAP.m sets the random seed. Could this be a bug, or am I getting something wrong?

CortexlabHI, thanks so much for porting UMAP from Python to Matlab. I have a question about running UMAP with precomputed distance matrices (which are supported in the Python version). I believe these are almost supported by your code, but one needs to make two modifications in the file UMAP.m: (1) change METRIC_DICT so that it includes the option 'precomputed'; (2) prevent the code from calculating dmat in that case (dmat is what the user passes). I *think* that by making these two changes I got things to work but would you give me your opinion on whether this makes sense? Many thanks

-Matteo

BinxuBiaobin JiangTristan WießallaStephen MeehanBoth exceptions that Bryan Bates found were reproduced and fixed this week on Feb 21 in update 1.4.1

Stephen MeehanThanks Bryan Bates. I suspect run_umap does need more testing of combinations of input parameters. Please send details to me at swmeehan@stanford.edu. I need the input files that file plus an exact copy of the command you type. I look forward to getting a fix to you quick. Thanks again.

Bryan BatesHi there! So far this function is awesome and has helped my project loads! However there are a few more bugs that keep appearing that I'm having some trouble squashing. When adding a 'label_column' input argument to run_map() function (and ensuring that my last column of my input data has the labels), I get the following error:

"

Matrix index is out of range for deletion.

Error in run_umap (line 611)

parameter_names(args.label_column)=[];

611 parameter_names(args.label_column)=[];

"

I thought that simply commenting this out was enough of a fix, but then after UMAP runs, right before the last plotting execution, I get the next error:

"

Dot indexing is not supported for variables of this type.

Error in run_umap/updatePlot (line 938)

umap.supervisors.prepareForTemplate;

Error in run_umap (line 822)

updatePlot(reduction, true)

938 umap.supervisors.prepareForTemplate;

"

Could you guys help out with this? Thanks!

Stephen MeehanHi Michal. Thanks for your comment. Our overview indicates that the run_umap.m file is the starting place for effectively using this package. If you type "doc run_umap" on the command line AFTER downloading you see a similar extent of textual information to what you see when you type "doc tsne". Can you (or anyone) send us a "how to" link that documents comprehensively how to add additional tabs like "Examples" to file exchange so user can see the comprehensive documentation BEFORE they download? And is there a similar link explaining how to enrich documentation in m files to include pictures and web formatting. Sorry for not knowing this. Thanks again for your interest in improving our submission.

Michal KvasnickaI think that is really important to create some comprehensive documentation and/or tutorial with examples. Upgrade of UMAP 1.3.4 -> 1.4.0 significantly change whole UMAP concept (Python codes). I am really not sure, how to effectively use this package. I am just guessing ...

Stephen MeehanHi Mohammed,

We have updated the accepted metrics for UMAP in the latest update, 1.4.0. You can try running the new version and seeing if it fixes your problem.

If you are still receiving an error, would you mind sending us exactly what commands you are calling to receive this error so that we can try reproducing the error on our computers? You can e-mail it to us at swmeehan@stanford.edu or connor.gw.meehan@gmail.com.

Camden MacDowellReally appreciate this contribution. Thank you. Also easy to modify (logical flow). One occasional inconvenience is the restriction on the template file being a saved-off mat file with parameter names, etc. Easy workaround though: removed lines 405 - 422 e.g the two if/than checks for template_file parameters and replaced with

if ischar(template_file)

[umap, ~, canLoad, reOrgData]=Template.Get(inData, parameter_names, ...

template_file, 3);

else

umap = template_file;

canLoad = [];

reOrgData = [];

end

Messy but a quick fix. Now template_file can just be the umap variable when calling run_umap.

Iti Gov<a href="s">test</a>

Caleb StoltzfusMohammed Mostafizur RahmanWorks great. But i ran into an issue. I was running the algorithm, when it terminated midway. Next time whenever I run it, I get this error:

"Error using containers.Map/subsref

The specified key is not present in this container.

Error in UMAP/fit (line 340)

U.metric = U.METRIC_DICT(U.metric);

Error in UMAP/fit_transform (line 496)

U = fit(U, X, y);

Error in run_umap (line 542)

reduction = umap.fit_transform(inData);"

How to fix this? Thanks!

Rasmus BroThanks a lot. With the curve-fitting toolbox installed it works perfectly

Adam SciambiThis code is fantastic. Thanks for putting it together. I use it daily.

One error that I've encountered though is in function "smooth_knn_dist" around line 81, reproduced below.

rho = aug_dists(idx) + interpolation*(aug_dists(idx) - aug_dists(idx+height));

Sometimes "idx+height" is out of bounds of "aug_dists". Since "idx" itself is defined to go up to numel(aug_dists), this makes sense that it could go over when added to. I just put in a corrective factor shown below and it seems to work. At the edge case, it interpolates one column inward, rather than outward.

correction = zeros(size(idx));

correction(idx+height>numel(aug_dists)) = -height;

rho = aug_dists(idx) + interpolation*(aug_dists(idx+correction) - aug_dists(idx+height+correction));

jiaxin错误使用 -

矩阵维度必须一致。

出错 smooth_knn_dist (line 84)

d = distances(:,2:end) - rho;

出错 fuzzy_simplicial_set (line 108)

[sigmas, rhos] = smooth_knn_dist(knn_dists, n_neighbors, local_connectivity);

出错 UMAP/fit (line 420)

U.graph = fuzzy_simplicial_set(X, U.n_neighbors, randomState, U.metric,

'metric_kwds', U.metric_kwds,...

出错 UMAP/fit_transform (line 486)

U = fit(U, X, y);

出错 run_umap (line 495)

reduction = umap.fit_transform(inData);

jiaxinBeatriz MoyaThanks for the code, it's been very useful! However, I have tried to reduce the model to a 3-dimensional system, but I come up with this error:

Error using UMAP/validate_parameters (line 303)

The Java and C methods currently only support reducing to 2 dimensions

Error in UMAP/fit (line 358)

validate_parameters(U);

Error in UMAP/fit_transform (line 470)

U = fit(U, X, y);

Error in run_umap (line 441)

reduction = umap.fit_transform(inData);

When is this option going to be available?

JohnStephen MeehanHi ageorge and Rasmus,

We've looked into the error that you are both receiving. We realized that one of the MATLAB functions that we call, fit.m, actually requires the MATLAB Curve Fitting Toolbox (https://www.mathworks.com/products/curvefitting.html) and we mistakenly did not list this requirement on the download page. If you do not have the Curve Fitting Toolbox installed, this would explain the errors that you are receiving. We have now listed this requirement on the download page.

As a workaround for MATLAB users who do not have the Curve Fitting Toolbox, we have now hard-coded in values for the outputs of find_ab_params.m when the inputs have particular default inputs. In particular, all the examples in the documentation of run_umap.m should now run in the current version 1.2.1 without any problems for users without the Curve Fitting Toolbox.

Rasmus BroHi there

I am really interested in trying this, but I am also running into problems. I tried your updated version here and the one at your homepage. I get the following error (on matlab 2019a)

[reduction,umap] = run_umap(rand(10,100));

ans =

20

ans =

20

java.awt.Point[x=793,y=53] java.awt.Dimension[width=1146,height=1006]

DUDE [UMAP for 10x100

n\_neighbors=\color{blue}30\color{black}, min\_dist=\color{blue}0.3\color{black}, metric=\color{blue}euclidean\color{black},randomize=\color{blue}0\color{black}, labels=\color{blue}0

Undefined function 'fit' for input arguments of type 'function_handle'.

Error in find_ab_params (line 43)

f = fit(xv', yv', curve);

Error in UMAP/fit (line 352)

[U.a, U.b] = find_ab_params(U.spread, U.min_dist);

Error in UMAP/fit_transform (line 470)

U = fit(U, X, y);

Error in run_umap (line 441)

reduction = umap.fit_transform(inData);

Seng Bum YooThank you so much for the code. I wonder whether your set of codes includes re-embedding of new data to old embedding without modifying the old embeddings. Is init_transform relevant to that purpose?

One question: unless you change input parsing, it seems changing the 'n_epochs' are quite inflexible (I changed by myself). Like n_neighbor, for example, it would be great to have it as a free parameter.

ageorgeI'm getting the following error when run:

Undefined function 'fit' for input arguments of type 'function_handle'.

Error in find_ab_params (line 43)

f = fit(xv', yv', curve);

Error in UMAP/fit (line 352)

[U.a, U.b] = find_ab_params(U.spread, U.min_dist);

Error in UMAP/fit_transform (line 470)

U = fit(U, X, y);

Error in run_umap (line 441)

reduction = umap.fit_transform(inData);

Lucy DavisStephen MeehanHi Joanna,

Sorry for the delayed response to your issue. We have just uploaded a major update (version 1.2.0) that may resolve the issue, so try downloading the latest version and seeing if it is fixed! What was previously line 273 in version 1.1.0 should now be line 391 in 1.2.0.

If you are still receiving an error, would you mind sending us the full text of the exception so that we can investigate it further? We are having trouble reproducing the error. You can e-mail it to us at swmeehan@stanford.edu or connor.meehan@shaw.ca.

If you require a temporary workaround, we recommend downloading our UMAP distribution directly from our Web site at http://cgworkspace.cytogenie.org/GetDown2/demo/umapDistribution.zip. We are able to include some additional code in this distribution that does not meet File Exchange criteria. If an exception occurs with this version, it will switch to running the algorithm in C instead.

Joanna PolanskaI got the same error as Damon. Line 273: nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;

How to deal with it?

Thanks, Joanna

Henri JohanssonHow much slower is it than the python implementation?

Damon ClarkThanks for putting this together! Line 273 in run_umap throws an error for me -- it looks like it may be calling some local variable. I believe I had everything in the path correctly.

nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;