Problem 797. Genome Sequence 003R: Sequence DNA of random positioned and flipped segments
This is Challenge 003R in the series on Genome DNA Sequencing. Challenge 3R includes flipped segments. DNA Sequencing is naively simplified into Cody Challenges. Genome sizes is another interesting wiki page.
DNA is represented by symbols ACGT, which for Matlab will be encoded as 0123. The basic goal is to reconstruct the original serial string of ACGT given multiple short segments.
Example: Genome = ACGTCGGCCATGGACATTACG
Given three overlapping pieces, ACGTCGGCCA, GCCATGGACATC, and GACATTACG these can be readily seen to overlap and create the original if the middle is recognized as being reversed with asymetric overlaps to its adjacent segments.
ACGTCGGCCATGGACATTACG
ACGTCGGCCAsssssssssss
ssssGCCATGGACATCsssss Middle Rev
ssssCTACAGGTACCGsssss Middle flipped
ssssssssssssGACATTACG
Genome_003R Challenge is to reconstruct a genome under near ideal error free segment creation conditions. Segments may be reversed. The segments start at random locations.
- Segments may be reversed (Change from 003)
- Segments start at random positions
- Genome length is unconstrained
- Length of each segment - 48
- All segments may overlap by 16 to 47 characters
- No errors in the segments
- Genome is random (No duplicate starts or ends for 16 symbols of segments)
- Segment order is scrambled
Input: segs, Array of M rows of 48 value segments. Values are [0, 1, 2, 3].
Output: Gout, Genome vector of values [0,1,2,3]
Example: [0 1 2 3 2;1 1 2 2 3 2 ; 2 2 1 1 2] creates [0 1 2 3 2 2 1 1 2] W=5, Overlap=varies, Reverses may occur
Future: Flipped segments(002), Random Position of Segment start locations(003),Random Segment Positions with Flips (003R), Parrot Sequence with Gen3 Long Segments, Extra Segments, Phage Phi X174, Parallel Processing Simulation(Shot Gun Approach), Haemophilus Influenza, Sequence with Segment Errors, and Chromosome 20 with its 59M length using 100K 4K-segments
Solution Stats
Problem Comments
-
1 Comment
Are Mjaavatten
on 19 Jul 2021
The previous three sequencing problems are all medium to hard. Once you have solved those, this one is easy.
Solution Comments
Show commentsProblem Recent Solvers2
Suggested Problems
-
27525 Solvers
-
Project Euler: Problem 4, Palindromic numbers
1041 Solvers
-
67 Solvers
-
549 Solvers
-
Propagate the effects of a blockage in a chemical plant
16 Solvers
More from this Author308
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!