Strange behavior from readtable

9 vues (au cours des 30 derniers jours)
Frank
Frank le 27 Juin 2023
Modifié(e) : Stephen23 le 28 Juin 2023
I have some text files that I'm reading in with readtable. For most of the files, everything seems to work fine (example1.txt). Strings come through as strings, and numbers come through as numbers. But for some reason, I'm getting strange results when trying to read in example2.txt. The last 2 columns....which are all numbers....are coming through as strings. I'm thinking there has to be some kind of strange format in the example2.txt file, but I've scanned both files, and don't see anything that would be causing this. My call is simply data = readtable(filename). Obviously I'll try reading in everything as strings to try to fix this, but was just curious as to what could be causing this.
  1 commentaire
Frank
Frank le 27 Juin 2023
Oops....my call is data = readtable(filename,'NumHeaderLines',2,'TreatAsMissing','-999')

Connectez-vous pour commenter.

Réponse acceptée

Stephen23
Stephen23 le 28 Juin 2023
Modifié(e) : Stephen23 le 28 Juin 2023
While READTABLE's automagic file format detection is great, the more a file deviates (missing data, lines filled with asterisks) the more help you will have to give it. For this use DETECTIMPORTOPTIONS, SETVARTYPE, etc.
Once it is set up correctly you do not need to repeat this for each file: you can reuse the options object for all files.
F1 = 'example1.txt';
F2 = 'example2.txt';
Ob = detectImportOptions(F1, 'FileType','fixedwidth', 'Range',1, 'TreatAsMissing','-999');
Ob = setvartype(Ob, 'double');
Ob = setvartype(Ob, {'Latitude___','Longitude___'},'char');
Ob = setvartype(Ob, {'FlightTimeSinceLaunch_mins_secs_','UTCTime_hrs_mins_secs_'},'duration');
Ob = setvaropts(Ob, 'FlightTimeSinceLaunch_mins_secs_', 'InputFormat','mm:ss');
T1 = readtable(F1,Ob)
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T1 = 7063×10 table
FlightTimeSinceLaunch_mins_secs_ UTCTime_hrs_mins_secs_ Altitud___ Latitude___ Longitude___ Pressure_hPa_ Temperature__C_ Humidity__RH_ WindSpeed_m_s_ WindDirection___ ________________________________ ______________________ __________ ___________ ____________ _____________ _______________ _____________ ______________ ________________ -08:31 NaN NaN {0×0 char} {0×0 char} 1023.6 NaN NaN NaN NaN -08:30 NaN NaN {0×0 char} {0×0 char} 1023.6 28.7 NaN NaN NaN -08:29 NaN NaN {0×0 char} {0×0 char} 1023.6 28.7 NaN NaN NaN -08:28 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 74.9 NaN NaN -08:27 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75 NaN NaN -08:26 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75.1 NaN NaN -08:25 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75.1 NaN NaN -08:24 NaN NaN {0×0 char} {0×0 char} 1023.6 NaN NaN NaN NaN -08:23 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75.2 NaN NaN -08:22 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75.2 NaN NaN -08:21 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75.1 NaN NaN -08:20 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75 NaN NaN -08:19 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75 NaN NaN -08:18 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75 NaN NaN -08:17 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75 NaN NaN -08:16 NaN NaN {0×0 char} {0×0 char} 1023.6 28.6 75 NaN NaN
T1{270:279,2}
ans = 10×1 duration array
NaN NaN NaN NaN 14:41:01 14:41:02 14:41:03 14:41:04 14:41:05 14:41:06
T2 = readtable(F2,Ob)
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T2 = 7136×10 table
FlightTimeSinceLaunch_mins_secs_ UTCTime_hrs_mins_secs_ Altitud___ Latitude___ Longitude___ Pressure_hPa_ Temperature__C_ Humidity__RH_ WindSpeed_m_s_ WindDirection___ ________________________________ ______________________ __________ _______________ ________________ _____________ _______________ _____________ ______________ ________________ -12:57 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 NaN NaN NaN -12:56 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 NaN NaN NaN -12:55 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 NaN NaN NaN -12:54 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 NaN NaN NaN -12:53 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 74.1 NaN NaN -12:52 NaN NaN {0×0 char } {0×0 char } 1021.8 NaN NaN NaN NaN -12:51 NaN NaN {0×0 char } {0×0 char } 1021.8 NaN NaN NaN NaN -12:50 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 74.2 NaN NaN -12:49 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 74.2 NaN NaN -12:48 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 74.2 NaN NaN -12:47 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 74.2 NaN NaN -12:46 NaN NaN {0×0 char } {0×0 char } 1021.8 26.7 74.2 NaN NaN -12:45 10:02:23 -18.3 {'28°28'32.9"'} {'-80°33'29.7"'} 1021.8 26.7 74.2 NaN NaN -12:44 10:02:24 -17.7 {'28°28'32.9"'} {'-80°33'29.7"'} 1021.8 26.7 74.2 NaN NaN -12:43 10:02:25 -18.4 {'28°28'32.9"'} {'-80°33'29.7"'} 1021.8 26.7 74.2 NaN NaN -12:42 10:02:26 -18.9 {'28°28'32.9"'} {'-80°33'29.7"'} 1021.8 26.7 74.2 NaN NaN
Note that you can use CONVERTVARS to efficiently convert the degree+minute+second text into numeric vectors (or scalars), e.g.:
T2 = convertvars(T2,@iscell,@myfun)
T2 = 7136×10 table
FlightTimeSinceLaunch_mins_secs_ UTCTime_hrs_mins_secs_ Altitud___ Latitude___ Longitude___ Pressure_hPa_ Temperature__C_ Humidity__RH_ WindSpeed_m_s_ WindDirection___ ________________________________ ______________________ __________ ___________________ ___________________ _____________ _______________ _____________ ______________ ________________ -12:57 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 NaN NaN NaN -12:56 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 NaN NaN NaN -12:55 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 NaN NaN NaN -12:54 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 NaN NaN NaN -12:53 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 74.1 NaN NaN -12:52 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 NaN NaN NaN NaN -12:51 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 NaN NaN NaN NaN -12:50 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 74.2 NaN NaN -12:49 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 74.2 NaN NaN -12:48 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 74.2 NaN NaN -12:47 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 74.2 NaN NaN -12:46 NaN NaN NaN NaN NaN NaN NaN NaN 1021.8 26.7 74.2 NaN NaN -12:45 10:02:23 -18.3 28 28 32.9 -80 33 29.7 1021.8 26.7 74.2 NaN NaN -12:44 10:02:24 -17.7 28 28 32.9 -80 33 29.7 1021.8 26.7 74.2 NaN NaN -12:43 10:02:25 -18.4 28 28 32.9 -80 33 29.7 1021.8 26.7 74.2 NaN NaN -12:42 10:02:26 -18.9 28 28 32.9 -80 33 29.7 1021.8 26.7 74.2 NaN NaN
Basic matrix multiplication could also be very useful to combine the degrees+minutes+seconds into degrees:
T2.Longitude___ * [1;1/60;1/3600]
ans = 7136×1
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
function V = myfun(C);
V = nan(numel(C),3);
V(~cellfun(@isempty,C),:) = sscanf([C{:}],'%f°%f''%f"',[3,Inf]).';
end
Thinks out loud: it would be interesting if DURATION accepted a wider range of formats, then degree+minutes+seconds could be natively imported as a duration type...
  3 commentaires
Stephen23
Stephen23 le 28 Juin 2023
Modifié(e) : Stephen23 le 28 Juin 2023
" It's interesting when I do 0b.VariableTypes on example1, the result is..."
That should not be the result after the STEVARTYPE calls. Please check after the variable types and options have been set. That is rather the whole point of many answer, I would be very surprised if it did not work.
As I wrote in my answer, you can reuse the same options object for all files. It should not change.
Frank
Frank le 28 Juin 2023
Yes.....I'm dumb.....sorry about that. I was so fixated on the fact that example1.txt was different than the others, that I didn't realize that I can specify what those data types should be. Thanks Stephen, your suggestion did the job. I actually made it simple by doing...
Ob = detectImportOptions(F2, 'FileType','fixedwidth', 'Range',1, 'TreatAsMissing','-999');
and then...
data = readtable('example1.txt',Ob);
Thanks all for the help!!

Connectez-vous pour commenter.

Plus de réponses (0)

Tags

Produits


Version

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by