Identification of a curve
8 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi
I'm working in a system to identify a given curve from a source and compare this data to a database of hundreds of signals, each curve is stored in 8 bytes, every one of them in a column.

The first image named SOURCE is the one I'm looking for in the database, and the one named DATA TO FILTER is the one stored in a column of data and timestamp. Both are very similar and only vary in scale.
Is there any way to find sometype of ID from the source curve and compare it and find it in the data to filter? As I mentioned this a repetetive process of over 900 samples. Any help would be greatly appreciated.
Réponses (3)
Star Strider
le 31 Jan 2022
If they are the same lengths, I would use either pdist2 or knnsearch to dientify them. Since they have different amplitudes, it would likely be best to subtract the mean (or median) from them first, so that the relative amplitudes are not a factor. Using rescale or normalize on all of them might also be options that could improve the probability of finding a matching signal.
9 commentaires
Star Strider
le 3 Fév 2022
I’m confused.
The IDs refer to the rows, however the codewith the ‘P’ references implies columns.
Matching the columns against each other produces a huge matrix. Selecting one ID at a time and comparing it with the rest is straightforward with pdist2 although it will likely take a while.
See if this produces the desired result —
Uz = unzip('https://www.mathworks.com/matlabcentral/answers/uploaded_files/883280/Sample.zip')
% mcode = fileread(Uz{1})
LD = load(Uz{2})
q = LD.q
qm = q{:,3:10} % Matrix Of 'P' Values
format shortG
% return
qmn = normalize(qm,2) % Normalize To 'z-scores' Down Columns
tic
for k = 1:1000 %size(q,1)
qmnID = q.ID(k);
dist = pdist2(qmn(k,:),qmn,'euclidean','Smallest',1);
idx = find(dist == min(dist));
if numel(idx) == 1
distnz = min(dist(dist~=0));
idx = find(dist == distnz);
end
C{k} = [qmnID, dist(idx); qmnID idx];
end
toc
C{1:50:end}
The first value in each row are the IDs that the rest of the matrix is compared to, and the rest of the first row are the distances. The second row are the IDs and the indices of the matching distances. It appears to work for all of them, and in the event that there is only one 0 distance (the domparison row matching only itself), it computes and finds the next largest distance and returns those valued and the matching indices.
.
Image Analyst
le 3 Fév 2022
Let me expand on @Mathieu NOE's comment above. Cross correlation is the sum of the product of the two signals as one signal is shifted past the other. Now since your signals have similar shape but vastly different scaled (like up to a factor of 2 or more according to your example), just using xcorr() to find out which signal has the highest value will simply tell you which other signal has the higher scale (bigger magnitude). So before doing that you really need to do normalized cross correlation, which is done by normxcorr2() in the Image Processing Toolbox. Alternatively you can manually prepare the signals by using rescale, like Star says in his answer:
signal1 = rescale(signal1, 0, 1);
signal2 = rescale(signal2, 0, 1);
corrSignal = xcorr(signal1, signal2);
Now even this may not be good if a signal has outliers, like the very first point in your "source" plot shown above. In that case you'd have to do some more sophisticated scaling to align the means while ignoring stuff way far from the means.
Spectroscopist have several ways to deal with these kinds of signals. I could ask my spectroscopist if we need to. For example you could ratio the signals. If it's just a simple scaling, all the ratios at every element would be the same (like 2), so you could look at the standard deviation or the interquartile range of the ratios to find out which signal has the lowest StdDev or IQR when compared to your main signal. Like maybe one signal has a mean ratio of 2 +/- 0.2% while another signal has a mean ratio of 1.5 with a spread of 0.1%, so the second signal would be a better match.
0 commentaires
Image Analyst
le 3 Fév 2022
I talked with my spectroscopist and said that there are many ways to determine which curve in a family of curves best matches a test curve. One metric you may want to search on is "Hit Quality Index" or HQI. Here is how ThermoScientific describes it:
Hit Quality Index
Traditional methods for reference-library searching are typically based on the assessment of
similarity metrics calculated via peak table comparisons, or more commonly, from those
generated by full spectrum comparisons. Full spectrum approaches typically generate a “hit
quality index” (HQI) between the unknown spectrum and each library spectrum. The HQI can
be calculated based on Euclidean distance, median absolute deviation, or perhaps most
frequently, the correlation coefficient between the test spectrum and each library spectrum.
The correlation coefficient is equivalent to measuring the cosine of the angle between two
spectra. The resulting correlation coefficient, R, is 1 when the two spectra are in perfect
correspondence and 0 when they are orthogonal.
While a correlation coefficient threshold of 0.95 is frequently used to determine whether two
spectra are a match, the correlation is merely an angle and not a probability. Thus, the
traditional threshold of 0.95 in no way means 95% likelihood, 95% confidence, or 95%
agreement. Furthermore, a correlation coefficient other than 0 or 1 has no direct interpretation
in the context of spectral identity because a transparent interpretation as a test statistic only
holds when dealing with random normal variates, clearly not the case for Infrared or Raman
spectra. While the correlation coefficient has been a popular choice for pure material
assessment, it is not particularly sensitive to discrepancies between spectra of interest.
Probabilistic Evaluation
As technical advances brought laboratory-quality instruments to the field, a new testing
approach was needed to address the challenge of unknown chemical identification. In the
process of identifying substances within a vast unknown library, handheld instruments put
the power of spectroscopy into the hands of a new user – field technicians without extensive
spectroscopy and chemical training. While HQI met the initial need for laboratory use, a new
approach was required for these less experienced users who operate in challenging
environments and sampling conditions.
======================================================================
So while one of the HQI metrics may work for you, they're already using even more sophisticated metrics.
0 commentaires
Voir également
Catégories
En savoir plus sur Statistics and Machine Learning Toolbox dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

