DNA N-Gram Distribution - MATLAB Cody - MATLAB Central

Problem 79. DNA N-Gram Distribution

Difficulty:Rate
Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.

Solution Stats

63.87% Correct | 36.13% Incorrect
Last Solution submitted on Jan 07, 2025

Problem Comments

Solution Comments

Show comments
MATLAB Central 2024 In Review
...
Let's celebrate what made 2024 memorable! Together, we made big impacts, hosted...

Problem Recent Solvers1355

Suggested Problems

More from this Author96

Problem Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Go to top of page