Read CSV with row in quotes (string and numbers)
Afficher commentaires plus anciens
I have no idea to read the .CSV file correctly.
Here are the titles incl. the first three rows as example (in the File, there are over 22'000):
id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
"7129300520,""20141013T000000"",221900,3,1,1180,5650,""1"",0,0,3,7,1180,0,1955,0,""98178"",47.5112,-122.257,1340,5650"
"6414100192,""20141209T000000"",538000,3,2.25,2570,7242,""2"",0,0,3,7,2170,400,1951,1991,""98125"",47.721,-122.319,1690,7639"
"5631500400,""20150225T000000"",180000,2,1,770,10000,""1"",0,0,3,6,770,0,1933,0,""98028"",47.7379,-122.233,2720,8062"
Every row is in quotes, and the data are strings (double quotes) and numbers. The demiliter is the comma ",".
I tried this:
opts = detectImportOptions('house_data.csv')
T = readtable('house_data.csv', opts);
But what I get is a nx1-table but it should be nx21-table (21 is the amount of the features [id, date, price, bedroms, ...])
3 commentaires
Josh Zagorski
le 10 Mar 2020
Modifié(e) : Josh Zagorski
le 10 Mar 2020
I think your "quotes" are screwing up the data, I put your data into a csv with all the "quotes" removed and I was able to read in your data with importdata(*.csv,',') or if you just delete the first and last "quotes" on every line, it works as well
First way:
ans =
4×21 cell array
Columns 1 through 2
{'id' } {'date' }
{'7129300520'} {'20141013T000000'}
{'6414100192'} {'20141209T000000'}
{'5631500400'} {'20150225T000000'}
Columns 3 through 5
{'price' } {'bedrooms'} {'bathrooms'}
{0×0 char} {0×0 char } {0×0 char }
{0×0 char} {0×0 char } {0×0 char }
{0×0 char} {0×0 char } {0×0 char }
Second way:
{'id' } {'date' }
{'7129300520'} {'20141013T000000""'}
{'6414100192'} {'20141209T000000""'}
{'5631500400'} {'20150225T000000""'}
Columns 3 through 5
{'price' } {'bedrooms'} {'bathrooms'}
{'221900'} {'3' } {'1' }
{'538000'} {'3' } {'2.25' }
{'180000'} {'2' } {'1' }
Benjamin Brodwolf
le 10 Mar 2020
Modifié(e) : Benjamin Brodwolf
le 10 Mar 2020
Josh Zagorski
le 10 Mar 2020
I don't think so, I can't read it as a .csv into excel either because of the quotes. At least not easier than removing the quotes at the beginning and end of each line...
Réponse acceptée
Plus de réponses (1)
Cris LaPierre
le 11 Mar 2020
Modifié(e) : Cris LaPierre
le 11 Mar 2020
I'd take advantage of some of the built in capabilities of readtable. By setting the appropriate options, MATLAB can handle the mix of quotes and commas for you automatically, bringing the data into a table with the desired data type already set.
% detect and set import options
opts = delimitedTextImportOptions("Delimiter",{'"',','},...
"ConsecutiveDelimitersRule","join",...
"LeadingDelimitersRule","ignore",...
"VariableNamesLine",1,...
"NumVariables",21,...
"ExtraColumnsRule","ignore",...
"DataLines",2);
% Set data types
opts = setvartype(opts,[1,3:8,13:21],'double');
opts = setvartype(opts,2,"datetime");
opts = setvartype(opts,[9,10],"logical");
opts = setvartype(opts,[11,12,15,16,17],"categorical");
% Define datetime format for input and display
opts = setvaropts(opts,2,"InputFormat",'yyyyMMdd''T''HHmmss',"DatetimeFormat",'d MMM yyyy');
% Import data
data = readtable("house_data.csv",opts)

To explain, I set quotes and commas as the delimiter. If there are more than one of these together, they are treated as a single delimeter. I had to tell it what line contained the variable names, as well as how many there were. I also had to tell it what row the data started on.
Hopefully the code is fairly straightforward, but let me know if you have any questions.
1 commentaire
Benjamin Brodwolf
le 11 Mar 2020
Catégories
En savoir plus sur Tables dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!