Main Content

mspalign

Align mass spectra from multiple peak lists from LC/MS or GC/MS data set

Syntax

[CMZ, AlignedPeaks] = mspalign(Peaklist)
[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'Quantile', QuantileValue, ...)
[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'EstimationMethod', EstimationMethodValue, ...)
[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'CorrectionMethod', CorrectionMethodValue, ...)
[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'ShowEstimation', ShowEstimationValue, ...)

Input Arguments

Peaklist Cell array of peak lists from a liquid chromatography/mass spectrometry (LC/MS) or gas chromatography/mass spectrometry (GC/MS) data set. Each element in the cell array is a two-column matrix with m/z values in the first column and ion intensity values in the second column. Each element corresponds to a spectrum or retention time.

Note

You can use the mzxml2peaks function or the mspeaks function to create the Peaklist cell array.

QuantileValueValue that determines which peaks are selected by the estimation method to create CMZ, the vector of common m/z values. Choices are any value ≥ 0 and ≤ 1. Default is 0.95.
EstimationMethodValueCharacter vector or string specifying the method to estimate CMZ, the vector of common mass/charge (m/z) values. Choices are:
  • histogram — Default method. Peak locations are clustered using a kernel density estimation approach. The peak ion intensity is used as a weighting factor. The center of all the clusters conform to the CMZ vector.

  • regression — Takes a sample of the distances between observed significant peaks and regresses the inter-peak distance to create the CMZ vector with similar inter-element distances.

CorrectionMethodValueCharacter vector or string specifying the method to align each peak list to the CMZ vector. Choices are:
  • nearest-neighbor — Default method. For each common peak in the CMZ vector, its counterpart in each peak list is the peak that is closest to the common peak's m/z value.

  • shortest-path — For each common peak in the CMZ vector, its counterpart in each peak list is selected using the shortest path algorithm.

ShowEstimationValueControls the display of an assessment plot relative to the estimation method and the vector of common mass/charge (m/z) values. Choices are true or false. Default is either:
  • false — When return values are specified.

  • true — When return values are not specified.

Output Arguments

CMZVector of common mass/charge (m/z) values estimated by the mspalign function.
AlignedPeaksCell array of peak lists, with the same form as Peaklist, but with corrected m/z values in the first column of each matrix.

Description

[CMZ, AlignedPeaks] = mspalign(Peaklist) aligns mass spectra from multiple peak lists (centroided data), by first estimating CMZ, a vector of common mass/charge (m/z) values estimated by considering the peaks in all spectra in Peaklist, a cell array of peak lists, where each element corresponds to a spectrum or retention time. It then aligns the peaks in each spectrum to the values in CMZ, creating AlignedPeaks, a cell array of aligned peak lists.

[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'PropertyName', PropertyValue, ...) calls mspalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'Quantile', QuantileValue, ...) determines which peaks are selected by the estimation method to create CMZ, the vector of common m/z values. Choices are a scalar between 0 and 1. Default is 0.95.

[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'EstimationMethod', EstimationMethodValue, ...) specifies the method used to estimate CMZ, the vector of common mass/charge (m/z) values. Choices are:

  • histogram — Default method. Peak locations are clustered using a kernel density estimation approach. The peak ion intensity is used as a weighting factor. The center of all the clusters conform to the CMZ vector.

  • regression — Takes a sample of the distances between observed significant peaks and regresses the inter-peak distance to create the CMZ vector with similar inter-element distances.

[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'CorrectionMethod', CorrectionMethodValue, ...) specifies the method used to align each peak list to the CMZ vector. Choices are:

  • nearest-neighbor — Default method. For each common peak in the CMZ vector, its counterpart in each peak list is the peak that is closest to the common peak's m/z value.

  • shortest-path — For each common peak in the CMZ vector, its counterpart in each peak list is selected using the shortest path algorithm.

[CMZ, AlignedPeaks] = mspalign(Peaklist, ...'ShowEstimation', ShowEstimationValue, ...) controls the display of an assessment plot relative to the estimation method and the estimated vector of common mass/charge (m/z) values. Choices are true or false. Default is either:

  • false — When return values are specified.

  • true — When return values are not specified.

Examples

  1. Load a MAT-file, included with the Bioinformatics Toolbox™ software, which contains liquid chromatography/mass spectrometry (LC/MS) data variables, including peaks and ret_time. peaks is a cell array of peak lists, where each element is a two-column matrix of m/z values and ion intensity values, and each element corresponds to a spectrum or retention time. ret_time is a column vector of retention times associated with the LC/MS data set.

    load lcmsdata
  2. Resample the unaligned data, display it in a heat map, and then overlay a dot plot.

    [MZ,Y] = msppresample(ms_peaks,5000);
    msheatmap(MZ,ret_time,log(Y))

    msdotplot(ms_peaks,ret_time)
  3. Click the Zoom In button, and then click the dot plot two or three times to zoom in and see how the dots representing peaks overlay the heat map image.

  4. Align the peak lists from the mass spectra using the default estimation and correction methods.

    [CMZ, aligned_peaks] = mspalign(ms_peaks);
  5. Resample the unaligned data, display it in a heat map, and then overlay a dot plot.

    [MZ2,Y2] = msppresample(aligned_peaks,5000);
    msheatmap(MZ2,ret_time,log(Y2))

    msdotplot(aligned_peaks,ret_time)
  6. Link the axes of the two heat plots and zoom in to observe the detail to compare the unaligned and aligned LC/MS data sets.

    linkaxes(findobj(0,'Tag','MSHeatMap'))
    axis([480 532 375 485])

References

[1] Jeffries, N. (2005) Algorithms for alignment of mass spectrometry proteomic data. Bioinfomatics 21:14, 3066–3073.

[2] Purvine, S., Kolker, N., and Kolker, E. (2004) Spectral Quality Assessment for High-Throughput Tandem Mass Spectrometry Proteomics. OMICS: A Journal of Integrative Biology 8:3, 255–265.

Version History

Introduced in R2007a