Main Content

geneentropyfilter

Remove genes with low entropy expression values

Syntax

Mask = geneentropyfilter(Data)
[Mask, FData] = geneentropyfilter(Data)
[Mask, FData, FNames] = geneentropyfilter(Data, Names)
geneentropyfilter(..., 'Percentile', PercentileValue)

Arguments

Data

DataMatrix object or numeric matrix where each row corresponds to the experimental results for one gene. Each column is the results for all genes from one experiment.

Names

Cell array of character vectors or string vector where each element corresponds to the name of a gene for each row of experimental data. Names has same number of rows as Data with each row containing the name or ID of the gene in the data set.

PercentileValue

Property to specify a percentile below which gene data is removed. Enter a value from 0 to 100.

Description

Mask = geneentropyfilter(Data) identifies gene expression profiles in Data with entropy values less than the 10th percentile.

Mask is a logical vector with one element for each row in Data. The elements of Mask corresponding to rows with a variance greater than the threshold have a value of 1, and those with a variance less than the threshold are 0.

[Mask, FData] = geneentropyfilter(Data) returns FData, a filtered data matrix. You can also create FData using FData = Data(Mask,:).

[Mask, FData, FNames] = geneentropyfilter(Data, Names) returns FNames, a filtered names array, where Names is a cell array of character vectors or string vector of the names of the genes corresponding to each row of Data. You can also create FNames using FNames = Names(Mask).

Note

If Data is a DataMatrix object with specified row names, you do not need to provide the second input Names to return the third output FNames.

geneentropyfilter(..., 'Percentile', PercentileValue) removes from Data, the experimental data, gene expression profiles with entropy values less than PercentileValue, the specified percentile.

Examples

  1. Load the MAT-file, provided with the Bioinformatics Toolbox™ software, that contains yeast data. This MAT-file includes three variables: yeastvalues, a matrix of gene expression data, genes, a cell array of GenBank® accession numbers for labeling the rows in yeastvalues, and times, a vector of time values for labeling the columns in yeastvalues

    load yeastdata
  2. Remove genes with low entropy expression values.

    [fyeastvalues, fgenes] = geneentropyfilter(yeastvalues,genes);

References

[1] Kohane I.S., Kho A.T., Butte A.J. (2003), Microarrays for an Integrative Genomics, Cambridge, MA:MIT Press.

Version History

Introduced before R2006a