How do I compare two data sets of unequal length?

I have two sets of data, taken on different days, from the same sensor. The temperature was swept from 26C to -30C to 80C and back to 26C. The sensor was read periodically during the temperature sweep. The data sets consist of a temperature column, and another column representing the sensor readings. I would like to take a difference between the two sets of sensor readings, generating another data set having a column of temperatures, and a column of differences between the two original sets of sensor readings. If each data set had exactly the same vector of temperatures, I could just subtract one vector of sensor readings from the other. However, the temperature vectors do not contain exactly the same temperatures, and they don't even have the same number of elements. I would like to interpolate one set of temperatures and sensor readings to match the temperatures of the other, so I have two data sets of the same size, at the same temperatures. One complicating factor is that, due to sensor hysteresis with respect to temperature, the sensor readings are different on the downward temperature ramps from those on the upward temperature ramp. Therefore I can't sort the data on temperature, because that would mix the upward and downward ramps. If I could sort the data on temperature, I could use timeseries objects, with temperature in place of time. However, that won't work in this case.

8 commentaires

Adam Danz
Adam Danz le 8 Août 2018
A sample of your data would be helpful to visualize the problem.
DH
DH le 8 Août 2018
Ok - here are two files. These are greatly reduced from my actual data sets, but they give you the idea.
Adam Danz
Adam Danz le 8 Août 2018
What is 'Sensory Data'?
Also, how are you going to pair the two temperature vectors? Are you pairing them by time-of-day? If so, where's the time data?
DH
DH le 8 Août 2018
The column header should be "Sensor Data" in both data sets. That is the column of data that I wish to compare at each temperature point. It doesn't really matter what it represents - it could be voltage, current, length, etc.
Consider the first few data points:
Data set 1:
Temperature, Sensor Data
26, 0.80
24, 0.82
22, 0.84
Data Set 2:
Temperature, Sensor Data
25, 0.78
23, 0.80
The output data set would interpolate data set 1 to the temperatures of data set 2:
Output Data Set:
Temperature, Sensor Data
25, 0.81
23, 0.83
Adam Danz
Adam Danz le 8 Août 2018
Modifié(e) : Adam Danz le 8 Août 2018
I think I understand your problem now (sensorY was a typo). I'll think about it. In the meantime, here are @DH's data in case anyone else is thinking about this. The red and blue are the two data sets and you can see the lag in temperature and the sensor between the two data sets.
Adam Danz
Adam Danz le 8 Août 2018
...this is a tough one. You can't use interp1() because the first input is required to be monotonic without duplicates which your data isn't. Even if you sort by temperature and store the sorted index values, you still have duplicates. What is the final goal here? I know you want to measure the difference between the sensors at the same temperature. But you have duplicate measures within (nearly) the same temperature. For example, your temperature data passes through 0 twice. Can you use the average of those 2 sensor measures for the temp=0 data point?
DH
DH le 9 Août 2018
It would not work to average the sensor data for two identical temperatures, because the identical temperatures may not be on the same ramp - one may be as the temperature is going up, and the other as the temperature is going down. Due to hysteresis, the sensor data on the upward ramp may be different from that on the downward ramp. I need to preserve the hysteresis. However, it would be acceptable to drop one of each pair of exact duplicates - something like this:
[uniquetemps, it, iu] = unique(data.Temperature, 'stable');
datareduced = data(it,:);
This would give me a dataset with no duplicate temperatures, and the order would be preserved. However, the temperatures are still not monotonic.
I think I see how to do it. Your mention of interp1 clued me in.
data1 = readtable([tstfldr 'dataset1.csv']);
data2 = readtable([tstfldr 'dataset2.csv']);
[uniquetemps, it, iu] = unique(data1.Temperature, 'stable');
data1reduced = data1(it,:);
interpSnsr1 = interp1(data1reduced.Temperature, ...
data1reduced.SensorData, data2.Temperature);
df = interpSnsr1 - data2.SensorData;
Thank you for your help.

Connectez-vous pour commenter.

 Réponse acceptée

DH
DH le 14 Août 2018

0 votes

Problem solved - see response from Adam Danz on 8 Aug 2018 at 20:10, and my response - DH on 9 Aug 2018 at 12:15.

Plus de réponses (2)

Yuvaraj Venkataswamy
Yuvaraj Venkataswamy le 8 Août 2018

0 votes

2 commentaires

DH
DH le 8 Août 2018
Modifié(e) : DH le 8 Août 2018
Those methods finds members of one array that are equal to members of another array, right? My two temperature arrays may not have any members that are exactly equal. I want to interpolate one to the other.
Adam Danz
Adam Danz le 8 Août 2018
Yeah, this method won't work.

Connectez-vous pour commenter.

if true
id = ismember(dataset1', dataset2', 'rows');
X = 1:size(dataset1, 2);
Y = X(id);
end

Question posée :

DH
le 8 Août 2018

Commenté :

le 14 Août 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by