Identification of a curve

8 vues (au cours des 30 derniers jours)
alm56
alm56 le 28 Jan 2022
Commenté : Star Strider le 3 Fév 2022
Hi
I'm working in a system to identify a given curve from a source and compare this data to a database of hundreds of signals, each curve is stored in 8 bytes, every one of them in a column.
The first image named SOURCE is the one I'm looking for in the database, and the one named DATA TO FILTER is the one stored in a column of data and timestamp. Both are very similar and only vary in scale.
Is there any way to find sometype of ID from the source curve and compare it and find it in the data to filter? As I mentioned this a repetetive process of over 900 samples. Any help would be greatly appreciated.
  2 commentaires
Mathieu NOE
Mathieu NOE le 31 Jan 2022
hello
why not using the cross correlation ?
alm56
alm56 le 3 Fév 2022
I'' give it a shot. I'll tell you how it goes

Connectez-vous pour commenter.

Réponses (3)

Star Strider
Star Strider le 31 Jan 2022
If they are the same lengths, I would use either pdist2 or knnsearch to dientify them. Since they have different amplitudes, it would likely be best to subtract the mean (or median) from them first, so that the relative amplitudes are not a factor. Using rescale or normalize on all of them might also be options that could improve the probability of finding a matching signal.
  9 commentaires
alm56
alm56 le 3 Fév 2022
Just right after q = LD.q. it's necessary to select an ID, for instance.
my_iD=304;
pedix0=find(q{:,'ID'}==my_ID); % find indexes of my_ID
t0=q{pedix0,'Timestamp'}; %time variable for my_ID
myP0=q.P1(pedix0); %variable to work with
so, in the variable myP0 you apply the process you just did.
Star Strider
Star Strider le 3 Fév 2022
I’m confused.
The IDs refer to the rows, however the codewith the ‘P’ references implies columns.
Matching the columns against each other produces a huge matrix. Selecting one ID at a time and comparing it with the rest is straightforward with pdist2 although it will likely take a while.
See if this produces the desired result —
Uz = unzip('https://www.mathworks.com/matlabcentral/answers/uploaded_files/883280/Sample.zip')
Uz = 1×2 cell array
{'Sample/Sample1.m'} {'Sample/Workspace.mat'}
% mcode = fileread(Uz{1})
LD = load(Uz{2})
LD = struct with fields:
SampleS1: [262×3 table] pedix: [62602×1 double] q: [318986×13 table]
q = LD.q
q = 318986×13 table
Timestamp ID P1 P2 P3 P4 P5 P6 P7 P8 PayloadLength Type Name _________ ____ ___ ___ ___ ___ ___ ___ ___ ___ _____________ ____ ____ 0 1170 64 254 15 0 0 0 0 136 8 "S" "" 0.001112 304 255 255 255 255 17 0 9 221 8 "S" "" 0.001357 320 255 255 1 255 40 0 9 243 8 "S" "" 0.003115 688 202 255 2 7 154 0 0 0 5 "S" "" 0.003355 897 128 0 64 0 0 213 149 0 8 "S" "" 0.003601 902 0 192 0 192 255 127 255 127 8 "S" "" 0.003837 903 13 230 255 255 0 251 10 0 8 "S" "" 0.004083 906 27 0 0 0 0 0 0 0 8 "S" "" 0.005167 899 0 32 198 125 126 0 0 229 8 "S" "" 0.006933 790 49 29 232 10 29 19 0 127 8 "S" "" 0.007149 512 9 0 0 0 0 0 0 0 6 "S" "" 0.007389 399 0 144 29 0 0 70 0 0 8 "S" "" 0.007621 608 1 29 57 48 29 126 133 93 8 "S" "" 0.007851 809 15 185 94 140 17 55 14 12 8 "S" "" 0.011123 304 255 255 255 255 17 0 10 250 8 "S" "" 0.011367 320 255 255 1 255 40 0 10 212 8 "S" ""
qm = q{:,3:10} % Matrix Of 'P' Values
qm = 318986×8
64 254 15 0 0 0 0 136 255 255 255 255 17 0 9 221 255 255 1 255 40 0 9 243 202 255 2 7 154 0 0 0 128 0 64 0 0 213 149 0 0 192 0 192 255 127 255 127 13 230 255 255 0 251 10 0 27 0 0 0 0 0 0 0 0 32 198 125 126 0 0 229 49 29 232 10 29 19 0 127
format shortG
% return
qmn = normalize(qm,2) % Normalize To 'z-scores' Down Columns
qmn = 318986×8
0.058135 2.1131 -0.47184 -0.63408 -0.63408 -0.63408 -0.63408 0.83688 0.77558 0.77558 0.77558 0.77558 -1.1348 -1.2712 -1.199 0.50267 0.95402 0.95402 -1.0201 0.95402 -0.71697 -1.0279 -0.9579 0.86075 1.1535 1.6445 -0.6995 -0.65318 0.70877 -0.71803 -0.71803 -0.71803 0.69734 -0.82197 -0.062315 -0.82197 -0.82197 1.7062 0.9466 -0.82197 -1.4219 0.48056 -1.4219 0.48056 1.1048 -0.16349 1.1048 -0.16349 -0.87724 0.79627 0.98907 0.98907 -0.9775 0.95822 -0.90038 -0.9775 2.4749 -0.35355 -0.35355 -0.35355 -0.35355 -0.35355 -0.35355 -0.35355 -0.94949 -0.60714 1.1688 0.38782 0.39852 -0.94949 -0.94949 1.5005 -0.16264 -0.41528 2.149 -0.65528 -0.41528 -0.5416 -0.7816 0.82266
tic
for k = 1:1000 %size(q,1)
qmnID = q.ID(k);
dist = pdist2(qmn(k,:),qmn,'euclidean','Smallest',1);
idx = find(dist == min(dist));
if numel(idx) == 1
distnz = min(dist(dist~=0));
idx = find(dist == distnz);
end
C{k} = [qmnID, dist(idx); qmnID idx];
end
toc
Elapsed time is 7.277300 seconds.
C{1:50:end}
ans = 2×1304
1170 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1170 1 244 493 738 981 1226 1470 1715 1962 2206 2451 2694 2944 3189 3430 3676 3924 4167 4413 4656 4901 5148 5393 5637 5880 6126 6368 6618 6863
ans = 2×15689
512 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 512 8 32 51 59 64 67 76 86 89 97 107 110 119 130 133 142 152 155 178 183 206 229 251 272 303 308 331 353 376
ans = 2×1628
304 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 304 101 300 494 694 883 1083 1281 1471 1670 1861 2061 2256 2452 2650 2841 3041 3239 3431 3632 3821 4021 4220 4416 4614 4806 5006 5203 5398 5593
ans = 2×5
903 0 0 0 0 903 151 521 888 1254
ans = 2×1628
320 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 320 3 201 393 594 792 983 1183 1372 1572 1770 1964 2162 2354 2553 2751 2947 3145 3334 3535 3732 3926 4122 4312 4514 4710 4906 5104 5295 5496
ans = 2×15689
906 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 906 8 32 51 59 64 67 76 86 89 97 107 110 119 130 133 142 152 155 178 183 206 229 251 272 303 308 331 353 376
ans = 2×1628
320 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 320 102 301 495 695 884 1084 1282 1472 1671 1862 2062 2257 2453 2651 2842 3042 3240 3432 3633 3822 4022 4221 4417 4615 4807 5007 5204 5399 5594
ans = 2×4
902 0 0 0 902 351 743 1133
ans = 2×82
512 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 512 11 22 35 171 186 196 211 218 232 240 254 264 275 298 311 321 334 343 356 367 379 388 401 541 556 566 581 589 603
ans = 2×6
903 0 0 0 0 0 903 85 451 820 1187 13677
ans = 2×81
899 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 899 108 501 890 1290 1677 2068 2459 2848 3248 3639 4028 4421 4811 5208 5596 5985 6376 6768 7164 7555 7947 8337 8728 9124 9516 9906 10298 10686 11082
ans = 2×5
902 0 0 0 0 902 150 551 943 1331
ans = 2×81
899 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 899 207 601 989 1378 1778 2168 2559 2953 3341 3740 4127 4518 4909 5298 5697 6089 6480 6871 7260 7654 8048 8438 8829 9220 9613 10004 10395 10786 11176
ans = 2×498
1349 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1349 41 160 651 1385 1875 2242 2367 2735 2981 3100 3348 3467 3835 3960 5060 5305 5794 6775 8737 10084 10204 12041 13512 14980 15962 17309 18531 19022 20002
ans = 2×81
899 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 899 309 701 1090 1478 1868 2265 2657 3048 3438 3828 4227 4619 5010 5401 5787 6185 6577 6968 7359 7747 8145 8538 8927 9318 9707 10104 10500 10884 11274
ans = 2×4
809 0 0 0 809 737 751 757
ans = 2×2
1.0e+00 * 790 0.20979 790 1314
ans = 2×19
688 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 688 851 1050 1242 1440 1638 1833 2034 2222 2424 2620 2812 3013 3205 3402 3599 3791 6729 124096
ans = 2×39
1486 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1486 901 7396 20617 28212 35317 47073 60174 60790 61524 66179 66541 71934 74018 86259 109649 122624 133038 137448 137568 137693 138057 144304 149937 163411 178596 195247 214714 220346 225615
ans = 2×2
1.0e+00 * 608 0.13124 608 10068
The first value in each row are the IDs that the rest of the matrix is compared to, and the rest of the first row are the distances. The second row are the IDs and the indices of the matching distances. It appears to work for all of them, and in the event that there is only one 0 distance (the domparison row matching only itself), it computes and finds the next largest distance and returns those valued and the matching indices.
.

Connectez-vous pour commenter.


Image Analyst
Image Analyst le 3 Fév 2022
Let me expand on @Mathieu NOE's comment above. Cross correlation is the sum of the product of the two signals as one signal is shifted past the other. Now since your signals have similar shape but vastly different scaled (like up to a factor of 2 or more according to your example), just using xcorr() to find out which signal has the highest value will simply tell you which other signal has the higher scale (bigger magnitude). So before doing that you really need to do normalized cross correlation, which is done by normxcorr2() in the Image Processing Toolbox. Alternatively you can manually prepare the signals by using rescale, like Star says in his answer:
signal1 = rescale(signal1, 0, 1);
signal2 = rescale(signal2, 0, 1);
corrSignal = xcorr(signal1, signal2);
Now even this may not be good if a signal has outliers, like the very first point in your "source" plot shown above. In that case you'd have to do some more sophisticated scaling to align the means while ignoring stuff way far from the means.
Spectroscopist have several ways to deal with these kinds of signals. I could ask my spectroscopist if we need to. For example you could ratio the signals. If it's just a simple scaling, all the ratios at every element would be the same (like 2), so you could look at the standard deviation or the interquartile range of the ratios to find out which signal has the lowest StdDev or IQR when compared to your main signal. Like maybe one signal has a mean ratio of 2 +/- 0.2% while another signal has a mean ratio of 1.5 with a spread of 0.1%, so the second signal would be a better match.

Image Analyst
Image Analyst le 3 Fév 2022
I talked with my spectroscopist and said that there are many ways to determine which curve in a family of curves best matches a test curve. One metric you may want to search on is "Hit Quality Index" or HQI. Here is how ThermoScientific describes it:
Hit Quality Index
Traditional methods for reference-library searching are typically based on the assessment of
similarity metrics calculated via peak table comparisons, or more commonly, from those
generated by full spectrum comparisons. Full spectrum approaches typically generate a “hit
quality index” (HQI) between the unknown spectrum and each library spectrum. The HQI can
be calculated based on Euclidean distance, median absolute deviation, or perhaps most
frequently, the correlation coefficient between the test spectrum and each library spectrum.
The correlation coefficient is equivalent to measuring the cosine of the angle between two
spectra. The resulting correlation coefficient, R, is 1 when the two spectra are in perfect
correspondence and 0 when they are orthogonal.
While a correlation coefficient threshold of 0.95 is frequently used to determine whether two
spectra are a match, the correlation is merely an angle and not a probability. Thus, the
traditional threshold of 0.95 in no way means 95% likelihood, 95% confidence, or 95%
agreement. Furthermore, a correlation coefficient other than 0 or 1 has no direct interpretation
in the context of spectral identity because a transparent interpretation as a test statistic only
holds when dealing with random normal variates, clearly not the case for Infrared or Raman
spectra. While the correlation coefficient has been a popular choice for pure material
assessment, it is not particularly sensitive to discrepancies between spectra of interest.
Probabilistic Evaluation
As technical advances brought laboratory-quality instruments to the field, a new testing
approach was needed to address the challenge of unknown chemical identification. In the
process of identifying substances within a vast unknown library, handheld instruments put
the power of spectroscopy into the hands of a new user – field technicians without extensive
spectroscopy and chemical training. While HQI met the initial need for laboratory use, a new
approach was required for these less experienced users who operate in challenging
environments and sampling conditions.
======================================================================
So while one of the HQI metrics may work for you, they're already using even more sophisticated metrics.

Catégories

En savoir plus sur Statistics and Machine Learning Toolbox dans Help Center et File Exchange

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by