Problem 939. DNA Pattern Match: Performance Metric - Speed
The Challenge is to Rapidly find matches of DNA sequences, Length=6, in a 1,800,000 long DNA file.
At IMACST the paper An Intelligent and Efficient Matching Algorithm to Finding a DNA Pattern claimed an astounding time improvement from 9.94 seconds to 7.84 seconds, 21% time reduction, to match six segments of length 6 in a 1.8M long DNA file. Basic probability asserts 1.8M/4^6 * 6 = 2637 matches. The paper's test case produced 2346 matches. The method employed used text processing in C++. The paper's L=25 and L=50 cases will be later challenges.
Matlab can achieve matching a six pattern set of L=6 in <15 msec (i5/16GB). This is merely a 99.8% time reduction.
Challenge Description: DNA is made of letters ACGT, wiki DNA, which for the purposes of this Matlab Cody Challenge are given values 0 thru 3. (ACGT= 0123)
Input: [DNA, DNA_ID, Patterns]
- DNA is a 1.8M long uint8 row vector of values 0 thru 3.
- DNA_ID identifies the DNA segment being processed. Multiple calls using the same DNA_ID will be performed. The first call of a DNA_ID is not timed.
- Patterns is an Nx6 uint8 array where each row corresponds to a search pattern.
Output: Locations
Locations of all start indices that match any of the patterns
Scoring: Average Time (msec) for a block of L=6 patterns
Example:
- DNA= [0 1 2 3 3 2 1 0 1 2 3 3]
- Pattern = [2 3 3 2 1 0]
- Locations = [3]
Hints:
- Vectorization (Base 4 to Base 10 of 6 character words)
- Bitshift/Reshape to create all words
- Logical Indexing
Coming soon: Genome DNA sequencing of PhagePhix174 and Haempphilus Influenza
Solution Stats
Problem Comments
Solution Comments
Show commentsProblem Recent Solvers8
Suggested Problems
-
Which values occur exactly three times?
5089 Solvers
-
Replace NaNs with the number that appears to its left in the row.
2976 Solvers
-
433 Solvers
-
Are all the three given point in the same line?
415 Solvers
-
Create an n-by-n null matrix and fill with ones certain positions
602 Solvers
More from this Author308
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!