Reduce rows of data based on increment size

I have a set of data with the depth, z (m) 0 until the maximum length. At the moment there is a total of 951 rows due to the small increment size of z. For analysis purposes, I only require the increment size to be 0.25 m.
So basically what I want to do is the following (pseudocode)
  • Import the excel file into matlab.
  • inc_size = 0.25
  • Iterating through the rows, deltaZ_total = deltaZ(i)
  • While deltaZ_total < (inc_size)
  • deltaZ_total = deltaZ_total + deltaZ(i+1) % while the total of the increments is less than inc_size add them together
  • Create a new row containing deltaZ_total and delete all the previous rows
  • Skip a row % skip a row so the new increments are not being added to the previous
  • Repeat the process until the end of the column
Could I please have some help translating this idea into matlab code.
Thanks very much,
Brian

4 commentaires

Image Analyst
Image Analyst le 19 Sep 2020
  1. What is the delta z in meters between two rows?
  2. If you chose a larger increment size than whatever it currently is, would the number of rows be more or less than 951?
  3. How does the number of rows depend on the increment size? Is there some relation, correlation, dependency between them?
  4. How many rows would you expect for an increment size of 0.25?
  1. Delta z is the incremental change of z, e.g z in the second row minus z in the second row.
  2. Choosing a larger increment size (and deleting the rows as described above), would result in much less rows.
  3. Yes there is a correlation, for example if we have numbers 1,2,3,4,5,...10 and we then double the increment size to 2, then we would have half the number of rows.
  4. Its hard to say as the data is not linear, so the increment size changes. To answer that question I would have to go through all the data and count it manually.
Hope this helps,
Brian
Not really, or very little. Are we talking about a 3-D dataset, like a CT or MRI volumetric image? Or just simply a 1-D situation where we have some number of elements in Z and the value of each element of Z is the depth into or above some material?
Please attach your data.
So, does Z go from say 20 to 10000 in 951 elements, but without a constant delta between each pair of elements in Z. Like it might be 0.1 between one pair of elements but 1.4 between a different pair of elements? And you want to resample that range 20-10000 with uniform spacing of 0.25. So the number of elements would be (max(z) - min(z)) / 0.25? Then you can just use linspace():
minValue = min(Z(:));
maxValue = max(Z(:));
numElements = (maxValue - minValue) / 0.25;
zUniform = linspace(minValue, maxValue, numElements);
Not really, or very little. Are we talking about a 3-D dataset, like a CT or MRI volumetric image? Or just simply a 1-D situation where we have some number of elements in Z and the value of each element of Z is the depth into or above some material?
It is simply a 1-d dataset where z is the depth below ground surface.
Please attach your data.
Please see attached.
So, does Z go from say 20 to 10000 in 951 elements, but without a constant delta between each pair of elements in Z. Like it might be 0.1 between one pair of elements but 1.4 between a different pair of elements? And you want to resample that range 20-10000 with uniform spacing of 0.25. So the number of elements would be (max(z) - min(z)) / 0.25? Then you can just use linspace():
As you will see from the data, z is not in constant increments but varies widely. In some cases delta_Z is > 0.25 so in this case the row should be left. Using linspace command can create a uniformly spaced vector, but that won't help in this case as the function will not know which rows of the table to consolidate.

Connectez-vous pour commenter.

 Réponse acceptée

Ameer Hamza
Ameer Hamza le 19 Sep 2020
Instead of deleting the rows, I suggest using interp1() to get the required output
z = .. % 951x1 vector
x = .. % 951x1 vector of data points
z_new = 0:0.25:max(z); % new z vector have increments of 0.25
x_new = interp1(z, x, z_new); % x_new are data points corresponding to z_new.

6 commentaires

This sounds like a good idea,
Could you just explain a bit further what x (vector of data point is)?
Also the ultimate goal is to reduce the number of rows of the table will I still be able to achieve this? How would you suggest I use the x_new vector to reduce the number of rows in my table.
Ameer Hamza
Ameer Hamza le 19 Sep 2020
Here x is a column from the table. You can apply the same concept to each column of the table. If there is any issue using this method for your table, please share a sample of your table as a .mat file so that we can know how the data is stored.
See attached.
So in this case, I believe we should rearrange the data so that delta_z is on the right hand side. Then x will be a matrix containing all the data to the left of delta_z. Then the x_new function should be our desired table with the reduced rows...
Importantly all the data from the columns and rows to the left delta_z should not be added because this data is not cumulative.So for example, it wouldn't make sense to add the depths of different rows together.
Try the following code. Note that I have deleted the first row of the table because it only contains NaN.
load pile_data.mat
purdueUS52MODIFIEDS2(1,:) = []; % first row is NaN, removing it
z = purdueUS52MODIFIEDS2.z;
z_new = 0:0.25:max(z);
data = table2array(purdueUS52MODIFIEDS2);
data_new = interp1(z, data, z_new);
data_new = array2table(data_new, 'VariableNames', purdueUS52MODIFIEDS2.Properties.VariableNames);
The first few rows of data_new are NaN, because you want to start from z=0, but in the real dataset, the first z value occurs at 1.2192. To avoid NaNs, start the new z from 1.25.
z_new = 1.25:0.25:max(z);
I tried this code, but I don't think its suitable because data in the z column is changed and also the data in the other column of the table is changed during the interp1 operation. I need the original z values to be preserved (other than when the increment size is too small deleting the entire rows of the table) and all the data from the corresponding columns to be preserved.
I have made that delta_Z column for ease in cleaning the table. So for example, the 'cleaned' table would look the same for the first 9 rows until we get to a delta_Z of 0.0488. The entire rows where delta_Z = 0.0488, 0.0518, 0.0518, 0.0488 can be deleted as these increments total less than 0.25. So the 'cleaned' table would go from delta_Z = 0.2560 to delta_Z = 0.0518 in row 10. And the other columns such as stroke for row 10 (= 1.5506) are preserved.
I hope this clarifies.
I think you want something like this
load pile_data.mat
purdueUS52MODIFIEDS2(1,:) = []; % first row is NaN, removing it
data = table2array(purdueUS52MODIFIEDS2);
count = 1;
while size(data, 1) ~= count
if data(count+1, 20) < data(count, 20)+0.25
data(count+1, :) = [];
else
count = count + 1;
end
end
data(2:end, 21) = diff(data(:, 20));
table_new = array2table(data, 'VariableNames', purdueUS52MODIFIEDS2.Properties.VariableNames);

Connectez-vous pour commenter.

Plus de réponses (0)

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by