setSubsequence

Update partial sequences

Description

example

newObject = setSubsequence(object,subsequences,subset,positions) returns a new object that is a copy of object with the partial sequences of a subset of elements set to subsequences. The positions argument specifies the sequence positions to be updated by subsequences. A one-to-one relationship must exist between the number and order of elements in subsequences and subset.

Examples

collapse all

Store read data from a SAM-formatted file in a BioRead object. Set 'InMemory' to true to load the object into memory so that you can modify its properties.

br = BioRead('SRR005164_1_50.fastq','InMemory',true)
br = 
  BioRead with properties:

     Quality: {50x1 cell}
    Sequence: {50x1 cell}
      Header: {50x1 cell}
       NSeqs: 50
        Name: ''

Assume that you want to update the sequences of the first two reads partially (for example, the first five positions). First check the existing sequences.

br.Sequence(1:2)
ans = 2x1 cell array
    {'TGGCTTTAAAGCAGAACTTGTGAAAGAAGGAAAGCATTATGATTATCTGGCTAAGCTTAGCATTGTTTAGAA'                                                     }
    {'TTACACTATCCTCTGATTACCAAAGACGTTTCTCGGTCATACAGACAGTCCTTGAGCAAGGGAAGAATTTATTTGCAGGCAAAAAAGTGTCCAACCGTATCGTGAGTATCGACCGGCATTACCTT'}

Define the subsequences. Each subsequence must have the same length.

subSequences = {'ATTCG','TACTA'}
subSequences = 1x2 cell array
    {'ATTCG'}    {'TACTA'}

Update the first five positions of the first two reads. The number of positions must equal the length of each subsequence. In this example, the total number of positions is five, as is the length of each subsequence. br2 is a copy of br with updated read sequences. If you need to update the br object itself, set it as the output of the function.

positions   = [1:5];
subset      = [1 2];
br2         = setSubsequence(br,subSequences,subset,positions);
br2.Sequence(1:2)
ans = 2x1 cell array
    {'ATTCGTTAAAGCAGAACTTGTGAAAGAAGGAAAGCATTATGATTATCTGGCTAAGCTTAGCATTGTTTAGAA'                                                     }
    {'TACTACTATCCTCTGATTACCAAAGACGTTTCTCGGTCATACAGACAGTCCTTGAGCAAGGGAAGAATTTATTTGCAGGCAAAAAAGTGTCCAACCGTATCGTGAGTATCGACCGGCATTACCTT'}

Input Arguments

collapse all

Object containing the read data, specified as a BioRead or BioMap object. If the object is not stored in memory, you cannot modify its properties, except the Name property.

Example: readData

Partial read sequences, specified as a cell array of character vectors or string vector. Each character vector or string (that is, each sequence) must be the same length.

Example: {'TGGCTTC','AAAGCAG'}

Subset of elements in the object, specified as a vector of positive integers, logical vector, string vector, or cell array of character vectors containing valid sequence headers.

Example: [1 3]

Tip

When you use a sequence header (or a cell array of headers) for subset, a repeated header specifies all elements with that header.

Sequence positions, specified as a vector of positive integers or logical vector. The number of positions must equal the length of every character vector or string in subsequences.

Example: [1:5]

Output Arguments

collapse all

New object with updated properties, returned as a BioRead or BioMap object.

Introduced in R2010a