Combing Algorithm: Finding Unique Equaltiy between Vectors of Different Sizes Using Ranges

I need to make an algorithm in MATLAB that can compare two vectors of different sizes, but of the same type of data. The algorithm goes as follows:
  1. There are 3 times as many variables in vector B as there are in vector A.
  2. We are trying to figure out what offset of vector A makes all of the elements in vector A equal to the same proportion of elements in vector B. In other words, A(1)=B(1), A(2)=B(4), and so on for each element in A; basically A needs to have an equal in B.
  3. Elements in vector A can be within certain ranges of vector B, where those ranges are defined by another vector we'll call vector C. To understand this visually, think of a single index of B as a static block that doesn't move, and an index of A as a bristle that must fit in the range of any given index B where each respective value in the offset vector A fit into specific indexes of vector B.
  4. To find this fit, these instances where A falls between [B,B+C] are then saved as a conditional (0s&1s) vector called "instances" which is then compared to a static instance vector called "siginstances". If instances == siginstances, then the offset of A has been found, else continue to offset A until a given number or until a match has been found.
To better understand the approach of this algorithm, you may want to consider that the vectors mentioned are being used to compare times caught by two different computers recording the same information. The relative algorithm goes as follows:
%Postive Incrimentation%
%test the offset of sigA between 0 and 90 seconds after sigA
IncrementA = sigA;
for a = 0.0:.000001:90
IncrementA = sigA+a; %offset vector A
instance = (IncrementA>=sigB) & (IncrementA<=sigB+sigC);
%see if the values of vector A are within [sigB,sigB+sigC]
%Think of sigB as a block and sigA as bristles. The proper bristles of sigA need to fit into
%the range of block sigB, where each of these blocks and bristles are a single index of that vector.
%The range in which an index of sigA mus fit into sigB is defined by sigC, whose size is the same
%as sigB, but contains different ranges for each instance of B.
if (instance == siginstances)
%compare the resulting conditional vector to the static vector; if true print out offset,
%else continue algorithm till an offset is found, if any.
disp(a);
break;
end %end if statement
end %end for statement
%use the same approach for decrementation (aka subtracting a from sigA)
I hope that this approach and request is clear to you so that you can help me create the proper algorithm that can properly process the above mathematical nightmare in MATLAB. Let me know if you need further clarification. Thank you.

5 commentaires

I think it would be really helpful if you provided an example which explains how the algorithm should work, and what the desired output would be.
Do you want unique variables?? use unique(A(1));
Please explain mathematically.
I'm being pretty clear about what I want. Its an algorithm that finds a match between two timestamps that don't necessarily line up 100%. One set of time acts like blocks while the other acts like a "comb", or prongs. Certain prongs need to fit in to certain boxes, (but not all prongs need to fit in boxes since some are noise) in order to create a "test" truth vector for the smaller array. When true, the resultant "test" truth vector (1s & 0s) should match that of a static truth vector that is already known from eliminating noise.
Theres really no other way to explain it.
No I don't want "unique" variables. The timestamps are already sorted numerically by ascending time in seconds.milliseconds and have few, if any repetitions. I am simply looking to match two separate timestamps is all to find their correlation.
Midimistro - is this homework? While you have explained the problem that you need to solve, you haven't indicated where in the algorithm you are getting stuck. Please clarify which part is causing you a problem and please show what you have tried so far.
I wouldn't say I am necessarily stuck, but rather unsure if I am taking the right approach in the code above. I want an efficient algorithm that can analyze several data points quickly and find an alignment point were all the points align according to the process described in my first comment.
And this more for personal confidence in my programming and analysis skills more than anything. Not Homework.

Connectez-vous pour commenter.

Réponses (1)

Despite your claim, it's not entirely clear what you want. Particularly I've not understood at all what your truth vectors correspond to. A simple example would really help.
If I understood correctly you have as input something like:
A = [1 4 7 10];
B = 0:12;
C = [0.5 1 2 0.5 0 1 2 1 1 0.5 1 0 1];
Creating a truth table that tells you which element of A is between which elements of [B, B+C] is easy:
inrange = bsxfun(@ge, A', B) & bsxfun(@le, A', B+C)
The rows of inrange corresponds to elements of A, the columns correspond to elements of B. A 1 indicates that the corresponding element of A is within the range of B. If you have several 1 in a row, that means that element of A can match several elements of B.
You can create all combinations of potential match with ndgrid:
Bmatches = cellfun(@find, num2cell(inrange, 2), 'UniformOutput', false);
[Bmatches{:}] = ndgrid(Bmatches{:});
Bcombs = reshape(cat(numel(A)+1, Bmatches{:}), [], numel(A))
Each row of BCombs is a combination of B indices that match A
Note that ndgrid will run out of memory for A with a moderate number of elements and you may have to use a loop instead to generate the combinations.

8 commentaires

I do have a lot of data I want to process, about 66000 data points at minimum to be precise. This would work if I could figure out the memory issue with bsxfun. And could you also explain the significance of why we use A' instead of A?
There shouldn't be any memory issue with bsxfun. It generates a logical array of size numel(A) x numel(B), so assuming a B of 66000 and A of 10000 elements, it's using about 660 MB of memory
It's the second part of my answer that may run quickly out of memory if there are two many possible combinations that match (irrespective of the size of A and B). I've still no idea what output you want, so I'm not going to come up with something different which may not be what you want.
A has to be transposed, so one input of bsxfun is a row vector, and the other a column vector. You could transpose B instead, in which case the rows of the truth table correspond to B values, and columns to A values.
Say I have an time array "A" that contains both noise and non-noise, in which each instance acts as a prong. I use the ge function to filter out noise, which generates a truth vector (1s & 0s) of the same size called "staticTruth". I have another time array called "B", which contains up to 3 times as many non-noise data points as "A", but contains only non-noise in the form of blocks (hence the need for evaluation of less than and greater than). Say "C" is the same size as B and contains the time added to B, thus creating the block. What the algorithm is supposed to do is see if A is between B & C to generate a "changingTruth" vector (of 1s & 0s) that at some point should match up with "staticTruth" only when A & B are aligned. Does that make it more clear? I can't get much simpler than that.
Note that the size of A (aka, the prongs; with noise) is ~66000 and the size of B & C (aka, the blocks; no noise) are ~63000. Not all prongs should fit in the blocks, hence the use of the two "Truth" vectors.
I hope that clears up confusion. The output I am looking for from the comparison is a truth vector the same size as "A", so that it can be compared to the "staticTruth" vector.
I'm afraid I still don't understand. An example with numbers would go a long way in making it clearer.
  • I don't know what you mean by an instance of A and why there would be multiple instances
  • I don't understand how a scalar number can be qualified as noise or non-noise
  • I don't understand the relationship between time and data points. Is B an array of time or data points? If you add time to B, what is noise / non-noise in relation to time?
  • There are many more things I don't understand fully. Words are not always the best way to convey algorithms.
Here Is an example of the vectors:
  • ATime: contains the time for a signal reading (in this case offset by 1 second, which is unknown to me, the analyzer)
  • Example: ATime = {0.012, 0.023, 0.032, 0.042}
  • ASig: Same size as ATime containing signal values. Includes noise (aka -100).
  • Example: ASig = {-14, -100, -14.2, -14.1}
  • ATruth: A truth vector same size as ASig and ATime that is used to filter out the noise, where noise is 0.
  • Example: ATruth = {1, 0, 1, 1}
  • BTime: A vector that contains only time of the same reading, but contains no noise. This vector also contains up to 3x as many no-noise points.
  • Example: BTime = {1.012, 1.013, 1.014, 1.022, 1.024, 1.025, 1.032, 1.033, 1.040, 1.041, 1.042}
  • BLife: A vector that contains the life of Btime and is same size as BTime:
  • Example: BLife ={.001, .001, .003, .001, .002, .003, .001, .004, .001, .001, .005}
  • CTime: BTime+BLife (no need for values here since its simple math.) To save memory, do this calculation outside the loop.
The purpose of the algorithm is to generate a truth vector when the prongs, or ATime, fit in the so-called blocks (space between BTime and CTime). This truth vector needs to be the same size as vector ATime.
For example when the times are not aligned, we get the following vector, which is not equal to ATruth:
  • Example: DynamicTruth = {0, 1, 0, 0}
Basically, we keep adding .001 to a temporary vector called Atemp, which contains the original time of ATime, plus however many .001s that have been added by the loop as a result of a failed match between DynamicTruth and ATruth. When the times finally align, DynamicTruth should equal ATruth, thus displaying the output of how much time was added to ATime, even if its up to 100 seconds.
In other words, the algorithm should find that the ATime = {0.012, 0.023, 0.032, 0.042} should become ATime = {1.012, 1.023, 1.032, 1.042} because of the 1 second offset. Does that make sense? I seriously can't get much clearer than that.
Even though it is helpful and provides an output, I am unsure if bsxfun will work in my case (based on its documentation) since all my vectors are simple arrays (aka one_column X Many_Rows). The issue with my original code in the question above is that sigB is not the same size as IncrementA, thus the le and ge operations couldn't be done. Is there any other function other than bsxfun that can get around this, while still providing the proper array format?
I've gotten the bsx function to work (after a few minor recalculations elsewhere), however I am not sure its doing what I want it to do. Please verify that what you provided me will work in tandem with the code I provided in the question (replacing the lessthan/greaterthan protion, of course).
Specifically, the dynamicTruth (the truth array generated by the evaluation) needs to also be in "one_column X Many_Rows" format so that it can be compared to the staticTruth array. This doesn't seem to be the case with the bsx function.
Your latest explanation does make things a lot clearer, thanks. But, unless I've misunderstood something, it seems to me that you're over complicating things a bit with your noise/non-noise and truth vectors.
You could just remove the noise values from ATime, and then try to match the resulting ATime with BTime. No need for DynamicTruth and you don't care about the signal values anymore.
My bsxfun method would work in combination with your loop approach:
ATime = [0.012, 0.023, 0.032, 0.042];
ASig = [-14, -100, -14.2, -14.1];
ATimeValid = ATime(ASig > -100);
BTime = [1.012, 1.013, 1.014, 1.022, 1.024, 1.025, 1.032, 1.033, 1.040, 1.041, 1.042];
BLife = [.001, .001, .003, .001, .002, .003, .001, .004, .001, .001, .005];
BUpper = BTime + BLife;
for offset = 0 : .001 : 90
AOffset = ATimeValid + offset;
if all(any(bsxfun(@ge, AOffset, BTime') & bsxfun(@le, AOffset, BUpper')))
fprintf('found a valid offset: %g\n', offset);
end
end
I'm not convinced that a trial and error method is the most efficient way of going about it. You're basically trying to match two signals, one being sampled at a lower and random frequency. I'm sure that there are some more efficient methods of doing that.
The approach above would work fine if neither set of data was prone to error in its data points. However, because of the nature of my data, it seems that even with the offset, certain points will never align properly (in other-words, some prongs (even w/o noise), may very well be an erroneous/inaccurate (late) data point.
This means that bsxfun will never produce a matrix of all 1s. For this reason, I still need to take the same approach, but instead of finding an exact alignment, I need to produce a vector of percentages of when a non-noise prong of A falls between the block of BTime/BUpper to a certain degree (doesn't need to be 100%). We can then use the highest percentage to find the best fit offset.
Is an approach you and I took above similar to what should be done, or is there an extremely simple approach for getting this percentage that is completely flying over my head?
Note that getting the percentage of points that are aligned, so that we can use those percentages to be compared to the offset, (were the highest percentage is the best offset)is the main goal of the algorithm now.
Relative Code:
Increment = a; %clear/define Increment before
percent = double.empty; %define an empty vector to to append each resulting percentage value
offset = double.empty; %define an empty vector to to append each offset value
for a = 0:.000001:90
Increment = A+a;
%find the percentage of points in Increment that meet this condition:
(Increment >= B) & (Increment <= C);
%this "finding" function is the part that I am stumbling on. Then continue below:
offset = [offset a]; %adds an additional element of the offset "a" to the growing vector of "offset" to be used for later comparison
percent = [percent a]; %does same thing as previous line.
end
Hope this slightly new approach is clear to you.

Connectez-vous pour commenter.

Catégories

Produits

Question posée :

le 22 Déc 2015

Modifié(e) :

le 22 Jan 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by