affygcrma
Perform GC Robust Multi-array Average (GCRMA) procedure on Affymetrix microarray probe-level data
Syntax
Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
)
Expression
= affygcrma(ProbeStructure
, Seq
)
Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
,
...'CELPath', CELPathValue
, ...)
Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
,
...'CDFPath', CDFPathValue
, ...)
Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
,
...'SeqPath', SeqPathValue
, ...)
Expression
= affygcrma(...,
'ChipIndex', ChipIndexValue
, ...)
Expression
= affygcrma(...,
'OpticalCorr', OpticalCorrValue
, ...)
Expression
= affygcrma(...,
'CorrConst', CorrConstValue
, ...)
Expression
= affygcrma(...,
'Method', MethodValue
, ...)
Expression
= affygcrma(...,
'TuningParam', TuningParamValue
, ...)
Expression
= affygcrma(...,
'GSBCorr', GSBCorrValue
, ...)
Expression
= affygcrma(...,
'Median', MedianValue
, ...)
Expression
= affygcrma(...,
'Output', OutputValue
, ...)
Expression
= affygcrma(...,
'Showplot', ShowplotValue
, ...)
Expression
= affygcrma(...,
'Verbose', VerboseValue
, ...)
Input Arguments
CELFiles | Any of the following:
|
CDFFile | Either of the following:
|
SeqFile | Either of the following:
|
Seq | An N-by-25 matrix of sequence information,
such as returned by |
ProbeStructure | MATLAB structure containing information from the
CEL files, including probe intensities, probe indices, and probe set
IDs, returned by the |
CELPathValue | Character vector or string specifying the path and folder where the files specified in
|
CDFPathValue | Character vector or string specifying the path and folder where the file specified in
|
SeqPathValue | Character vector or string specifying a folder or path and folder where
|
ChipIndexValue | Positive integer specifying a chip. This chip's sequence
information and mismatch probe intensity data is used to compute probe
affinities. Default is |
OpticalCorrValue | Controls the use of optical background correction on
the input probe intensity values. Choices are |
CorrConstValue | Value that specifies the correlation constant, rho, for
log background intensity for each PM/MM probe pair. Choices are any
value |
MethodValue | Character vector or string that specifies the method to estimate the signal. Choices
are |
TuningParamValue | Value that specifies the tuning parameter used by the
estimate method. This tuning parameter sets the lower bound of signal
values with positive probability. Choices are a positive value. Default
is Tip For information on determining a setting for this parameter, see Wu et al., 2004. |
GSBCorrValue | Specifies whether to perform gene-specific binding (GSB)
correction using probe affinity data. Choices are |
MedianValue | Specifies the use of the median of the ranked values
instead of the mean for normalization. Choices are |
OutputValue | Specifies the scale of the returned gene expression values. Choices are:
In the last instance, the data is transformed
as defined by the function |
ShowplotValue | Controls the display of a plot showing the log2 of
mismatch (MM) probe intensity values from a specified chip (CEL file),
versus that chip's MM probe affinities. The plot also shows the LOWESS
fit for computing NSB data of the specified chip. Choices are
|
VerboseValue | Controls the display of the status of the reading of
files and GCRMA processing. Choices are |
Output Arguments
Expression | DataMatrix object containing the log2 gene expression values that have been background adjusted, normalized, and summarized using the GC Robust Multi-array Average (GCRMA) procedure. Each
row in |
Description
reads
the specified Affymetrix CEL files, the associated CDF library
file (created from Affymetrix GeneChip arrays for expression
or genotyping assays), and the associated sequence file or matrix.
It then processes the probe intensity values using GCRMA background
adjustment, quantile normalization, and median-polish summarization
procedures, then returns Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
)Expression
, a DataMatrix object containing
the log2 based gene expression values in a
matrix, the probe set IDs as row names, and the CEL file names as
column names. Note that each row in Expression
corresponds
to a gene (probe set), and each column corresponds to an Affymetrix CEL
file. (Each CEL file is generated from a separate chip. All chips
should be of the same type.)
CELFiles
is a character vector, string, string vector, or cell
array of character vectors containing CEL file
names. CDFFile
is a
character vector or string specifying a CDF file
name. If you set CELFiles
to '*'
, then it reads all CEL
files in the current folder. If you set
CELFiles
or
CDFFile
to '
'
, then it opens the Select Files dialog
box from which you select the CEL files or CDF file.
From this dialog box, you can press and hold
Ctrl or
Shift while clicking to
select multiple CEL files.
SeqFile
is a file or
matrix containing sequence information for probes on
a specific type of Affymetrix
GeneChip array.
Note
For details on the reading of files and GCRMA processing, see celintensityread
, affyprobeseqread
, affyprobeaffinities
, gcrma
, gcrmabackadj
, quantilenorm
, and rmasummary
.
uses
GCRMA background adjustment, quantile normalization, and median-polish
summarization procedures to process the probe intensity values in Expression
= affygcrma(ProbeStructure
, Seq
)ProbeStructure
. ProbeStructure
is
a MATLAB structure containing information from the CEL files,
including probe intensities, probe indices, and probe set IDs, returned
by the celintensityread
function. Seq
is
a matrix containing sequence information for probes on a specific
type of Affymetrix GeneChip array.
calls Expression
= affygcrma(...,
'PropertyName
', PropertyValue
,
...)affygcrma
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
specifies
a path and folder where the files specified by Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
,
...'CELPath', CELPathValue
, ...)CELFiles
are
stored.
specifies
a path and folder where the file specified by Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
,
...'CDFPath', CDFPathValue
, ...)CDFFile
is
stored.
specifies
a path and folder where the file specified by Expression
= affygcrma(CELFiles
, CDFFile
, SeqFile
,
...'SeqPath', SeqPathValue
, ...)SeqFile
is
stored.
computes
probe affinities from MM probe intensity data using sequence information
and mismatch probe intensity values from the chip specified by Expression
= affygcrma(...,
'ChipIndex', ChipIndexValue
, ...)ChipIndexValue
.
Default ChipIndexValue
is 1
.
controls
the use of optical background correction on the input probe intensity
values. Choices are Expression
= affygcrma(...,
'OpticalCorr', OpticalCorrValue
, ...)true
(default) or false
.
specifies
the correlation constant, rho, for background intensity for each PM/MM
probe pair. Choices are any value Expression
= affygcrma(...,
'CorrConst', CorrConstValue
, ...)≥ 0
and ≤
1
. Default is 0.7
.
specifies
the method to estimate the signal. Choices are Expression
= affygcrma(...,
'Method', MethodValue
, ...)'MLE'
,
a faster, ad hoc Maximum Likelihood Estimate method, or 'EB'
,
a slower, more formal, empirical Bayes method. Default is 'MLE'
.
specifies
the tuning parameter used by the estimate method. This tuning parameter
sets the lower bound of signal values with positive probability. Choices
are a positive value. Default is Expression
= affygcrma(...,
'TuningParam', TuningParamValue
, ...)5
(MLE) or 0.5
(EB).
Tip
For information on determining a setting for this parameter, see Wu et al., 2004.
specifies
whether to perform gene-specific binding (GSB) correction using probe
affinity data. Choices are Expression
= affygcrma(...,
'GSBCorr', GSBCorrValue
, ...)true
(default) or false
.
If there is no probe affinity information, this property is ignored.
specifies
the use of the median of the ranked values instead of the mean for
normalization. Choices are Expression
= affygcrma(...,
'Median', MedianValue
, ...)true
or false
(default).
specifies
the scale of the returned gene expression values. Expression
= affygcrma(...,
'Output', OutputValue
, ...)OutputValue
can
be:
'log'
'log2'
'log10'
'linear'
@
functionname
In the last instance, the data is transformed as defined by
the function functionname
. Default is 'log2'
.
controls
the display of a plot showing the log2 of mismatch
(MM) probe intensity values from a specified chip (CEL file), versus
that chip's MM probe affinities. The plot also shows the LOWESS fit
for computing NSB data of the specified chip. Choices are Expression
= affygcrma(...,
'Showplot', ShowplotValue
, ...)true
, false
,
or I
, an integer specifying a chip. If
set to true
, the first chip is plotted. Default
is:
false
— When return values are specified.true
— When return values are not specified.
controls
the display of the status of the reading of files and GCRMA processing.
Choices are Expression
= affygcrma(...,
'Verbose', VerboseValue
, ...)true
(default) or false
.
Examples
The following example assumes that you have the HG_U95Av2.CDF
library
file stored at D:\Affymetrix\LibFiles\HGGenome
,
and that your current folder points to a location containing CEL files
and a sequence file associated with this CDF library file. In this
example, the affygcrma
function reads all the
CEL files and the sequence file in the current folder and a CDF file
in a specified folder. It also performs GCRMA background adjustment,
quantile normalization, and summarization procedures on the PM probe
intensity values, and returns a DataMatrix object, containing the
metadata and processed data.
Expression = affygcrma('*', 'HG_U95Av2.CDF','HG-U95Av2_probe_tab',... 'CDFPath', 'D:\Affymetrix\LibFiles\HGGenome');
References
[1] Naef, F., and Magnasco, M.O. (2003). Solving the Riddle of the Bright Mismatches: Labeling and Effective Binding in Oligonucleotide Arrays. Physical Review E 68, 011906.
[2] Wu, Z., Irizarry, R.A., Gentleman, R., Murillo, F.M., and Spencer, F. (2004). A Model Based Background Adjustment for Oligonucleotide Expression Arrays. Journal of the American Statistical Association 99(468), 909–917.
[3] Wu, Z., and Irizarry, R.A. (2005). Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays. Proceedings of RECOMB 2004. J Comput Biol. 12(6), 882–93.
[4] Wu, Z., and Irizarry, R.A. (2005). A Statistical Framework for the Analysis of Microarray Probe-Level Data. Johns Hopkins University, Biostatistics Working Papers 73.
[5] Wu, Z., and Irizarry, R.A. (2003). A Model Based Background
Adjustment for Oligonucleotide Expression Arrays. RSS Workshop on Gene Expression, Wye,
England, https://biosun01.biostat.jhsph.edu/%7Eririzarr/Talks/gctalk.pdf
.
[6] Speed, T. (2006). Background models and GCRMA. Lecture 10, Statistics 246, University of California Berkeley.
[7] Abd Rabbo, N.A., and Barakat, H.M. (1979). Estimation Problems in Bivariate Lognormal Distribution. Indian J. Pure Appl. Math 10(7), 815–825.
[8] Best, C.J.M., Gillespie, J.W., Yi, Y., Chandramouli, G.V.R., Perlmutter, M.A., Gathright, Y., Erickson, H.S., Georgevich, L., Tangrea, M.A., Duray, P.H., Gonzalez, S., Velasco, A., Linehan, W.M., Matusik, R.J., Price, D.K., Figg, W.D., Emmert-Buck, M.R., and Chuaqui, R.F. (2005). Molecular alterations in primary prostate cancer after androgen ablation therapy. Clinical Cancer Research 11, 6823–6834.
[9] Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P. (2003). Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics. 4, 249–264.
[10] Mosteller, F., and Tukey, J. (1977). Data Analysis and Regression (Reading, Massachusetts: Addison-Wesley Publishing Company), pp. 165–202.
Version History
Introduced in R2008b
See Also
affyprobeaffinities
| affyprobeseqread
| affyrma
| celintensityread
| gcrma
| gcrmabackadj
| mafdr
| mattest
| quantilenorm
| rmasummary