File Exchange

image thumbnail

Longest Common Subsequence

version 1.4.0.0 (1.68 KB) by David Cumin
Gives the longest common substring between two stings.

14 Downloads

Updated 27 Jul 2011

View License

%%%INPUT
%%%X, Y - both are strings e.g. 'test' or 'stingtocompare'
%%%OUTPUT
%%%D is the substring over the length of the shortest string
%%%dist is the length of the substring
%%%aLongestString is a sting of length dist (only one of potentially many)

Comments and Ratings (12)

Common substring and common subsequence are different things. For example wiki:

"It (LCS) differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences."

I need matlab code for Shortest Common Supersequence (SCS). Can anyone help please?

Mansoor Ahmed

Really helpful code. Thanks for sharing. Stay blessed!

David Cumin

Mohammed,
Eamon Keogh has an elegant algorithm for converting time series into strings for evaluating and manipulating using these sorts of techniques, called sax. Have a look at http://www.cs.ucr.edu/~eamonn/SAX.htm.
Regards,
David

Hi. I am wondering how this method can be used for continues time series (i.e. not string series)? Thanks

Thank you, very useful submission. I tweaked this slightly to work on cell arrays of objects, using a user-provided "equals" function handle, in hopes of implementing a tree diff utility using this algorithm, which requires the use of LCS: http://ilpubs.stanford.edu:8090/115/1/1995-46.pdf

Bruno Luong

Bruno Luong

Thank you, I see now. I suggest to update the help and describe more clearly what function does. It is also good if you could add few words about memory requirement/complexity and algorithm.

David Cumin

Bruno,
Thanks for your comments. The LCS('fbce','abcde'); gives the right answer. The code is designed to find the longest common substring of two given inputs. In this example, both 'fbce' and 'abcde' contain 'bce':
fbce -> '-bce'
'abcde' -> '-bc-e'
Hope that makes sense.

The technique is common to pattern matching techniques. I'm not sure of the limit to the function. I guess it depends on memory.

Cheers,
David

Bruno Luong

I still do not function understand what the function does:

X = [ 8 8 9 3 7 1 2 4 5 1 ]
Y = [ 7 6 6 10 2 7 9 4 1 1]

[D, dist, aLongestString] = LCS(X,Y)
D = 0.6000
dist = 4
aLongestString = [ 7 2 4 1]

% Furthermore what is the limit of the function?

>> X=ceil(10*rand(1,1e4));
>> Y=ceil(10*rand(1,1e4));
>> [D, dist, aLongestString] = LCS(X,Y)
??? Out of memory. Type HELP MEMORY for your options.

Error in ==> LCS at 20
b = zeros(n+1,m+1);

Loginatorist

Something is still wrong.

[D,G,S] = LCS('fbce','abcde');S
S =
bce

Bruno Luong

Code seems Buggy, even for the example provided

>> X='test', Y='stingtocompare'

X =

test

Y =

stingtocompare

>> [D, dist, aLongestString] = LCS(X,Y)
??? Attempted to access b(2,0); index must be a positive integer or logical.

Error in ==> LCS at 55
if(b(i,j) == 3)

>>

Updates

1.4.0.0

Changed the title to better represent the function. http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

1.3.0.0

Included support for 0 similarity

1.2.0.0

Now will also work for integer inputs (not only strings).

1.1.0.0

Sorry - a simple operator change fixed that. Should work now. The answer to your 'test' 'stingtocompare' is [0.5 2 'st'].
Thanks for pointing out the error!

MATLAB Release Compatibility
Created with R2006b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Discover Live Editor

Create scripts with code, output, and formatted text in a single executable document.


Learn About Live Editor