Read specific rows from a large .csv
Afficher commentaires plus anciens
Hi,
I try to find a solution, which computes fast, to handle a big .csv (35MB). Good part is I only a certain part of the file. Basically I would like to read only rows which start with a certain name.
Unfortunately the file is composed like this:
Varname_1 timestring(t=0) valueX valueY
Varname_2 timestring(t=0) valueX valueY
...
Varname_n timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_2 timestring(t=1) valueX valueY
...
Varname_n timestring(t=1) valueX valueY
...
... and so on
My idea would be to read the .csv-file line by line check for Varname = Varname1 i.e. and write it to an cellarray (or 4 vectors) like this:
Varname_1 timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_1 timestring(t=2) valueX valueY
...
Any idea for a smart code? Thank You! (add. notes: varname = string, time = string, value = number with , separated decimal)
------------------------------------ EDIT: example data
output would be i.e.
var2 10:10:10 16,1010138923
var2 10:10:20 89,1560542863
var2 10:10:30 69,557621819
var2 10:10:40 9,9246195517
3 commentaires
dpb
le 6 Juil 2016
"value = number with , separated decimal)"
What does the above mean, precisely? Examples are always welcome.. :)
Lorenzo
le 6 Juil 2016
dpb
le 6 Juil 2016
That, I think, you'll have to fixup outside Matlab; don't think it knows how to handle it?? If it's csv separated, that's a problem for certain.
Réponse acceptée
Plus de réponses (2)
Untested, but check that the pattern matching format string doesn't solve the problem directly...
vName='Varname_1'; % the variable name you're looking for
fmt=[vName '%s %f %f']; % match vName, string, two numerics
fid=fopen('yourbigfile.csv','r');
data=textscan(fid,fmt,'delimiter',',');
fid=fclose(fid);
As said I'm not positive, but I think there's at least a reasonable chance the pattern-matching will do what you're looking for. Worth a shot methinks...
Well, doggonit, magic doesn't happen, joy didn't ensue... :(
But, the original idea isn't difficult...
while ~feof(fid)
l=fgetl(fid);
if strfind(l,vName)
data{i}=textscan(l,fmt);
end
end
fid=fclose(fid);
worked for a sample file albeit I used space-delimited and '.' as the decimal indicator; I think that'll still be a problem.
I thought
while ~feof(fid)
try
data{i}=textscan(l,fmt);
catch
end
end
fid=fclose(fid);
would work around the issue but it didn't; textscan simply gave up and quit reading anything once if failed; it doesn't throw an error, it just throws up its hands silently. :(
3 commentaires
Image Analyst
le 6 Juil 2016
The problem is his columns contain strings and numbers while csvread() only takes numbers: "The file must contain only numeric values." That's why I recommended readtable() which can handle that kind of mixed data.
dpb
le 6 Juil 2016
I used textscan not csvread, IA???
He's also got comma as the decimal indicator and says he's got a .csv file in which case it's indeterminable--which comma is a delimiter and which is a decimal point?
Image Analyst
le 6 Juil 2016
Oh, sorry - I didn't notice.
Lorenzo
le 6 Juil 2016
1 commentaire
Steven Hunsinger
le 14 Sep 2022
Not so lightning fast if you get your company network involved. 67.5MB with a breakpoint after readtable. 10 minutes. This might be OK if I need all that data loaded into RAM, but seems excessive for reading the first line or so. Is there a better way?
Catégories
En savoir plus sur Workspace Variables and MAT Files dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!