How to filter out useless data

Hi everyone, I need to clean a big dataset (more than 1,5 million obs.) so to exclude all those meaningless/useless obs. Basically, each observation comes with several variables (price, delta, implied volatility ecc. ecc.) and I would need to get rid of any obs for which the implied volatility is more than 100%. Moreover, for many obs the implied volatility is just missing (i have a blank cell). So, for any value of the column "implied volatility" which is missing or >1, I want matlab to remove the corresponding observation, that is, the entire row. How could i do that in a smart and quick way? (I am a beginner in matlab) Thanks

4 commentaires

Adam
Adam le 23 Oct 2015
Is your data in a cell array? If so does it need to be in a cell array rather than a regular numeric array?
Angelo Catania
Angelo Catania le 23 Oct 2015
I think they are in a cell array. As i explained above I'm just a beginner and I don't know very well what is the difference.
dpb
dpb le 23 Oct 2015
Start with the "Getting Started" section in Matlab documentation and spend a few minutes getting familiar with basic concepts of array and cell notation, etc. It'll be time well spent in that it'll be much quicker than waiting on answers here, particularly when you don't yet even have the vocabulary to accurately describe the problem.
On that last, what does
whos _yourvariablename_
return? That'll tell us what the data storage as is, is...
yourvariablename is, of course whatever you are using for the data, be that data, x, whatever, not a literal string.
Nick Hobbs
Nick Hobbs le 27 Oct 2015
Modifié(e) : Nick Hobbs le 27 Oct 2015
I understand you want to remove rows from your cell array based on information in your data. The following documentation link may help you with your goal.
The following link provides an example on how to remove a row from a cell array.

Connectez-vous pour commenter.

Réponses (2)

Image Analyst
Image Analyst le 27 Oct 2015

1 vote

Check out the "ismissing()" function.
And to remove rows from your table with volatility more than 100 I think you can do this (untested)
badRows = mytable.volatility > 100;
mytable(badRows,:) = [];
Thorsten
Thorsten le 27 Oct 2015
Modifié(e) : Thorsten le 28 Oct 2015

0 votes

iv = data(:,3); % implied volatility, assumed to be stored in column 3
idx = isnan(iv) | iv > 1; % logical array of indices
data(idx,:) = []; % remove all rows where idx is true

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by