Read specific column from .txt file with unkown format

Hello all, I am a beginner in matlab. I’m trying to import specific rows from a .txt file where the file format is unknown. Actually, the existing solutions allow me only to get rows instead of columns. For example, I want to read from the attached file, the column n=2 that includes (V1 D3 6.52 4.91 3.00 2.05 0.69 NAN NAN NAN) and to store the result into an array.

1 commentaire

There must be a few things that you know about the format. Do you need the first rows or just the numeric data? Is the first row always containing those V + a column index or not?

Connectez-vous pour commenter.

 Réponse acceptée

Cedric
Cedric le 12 Oct 2017
Modifié(e) : Cedric le 12 Oct 2017
content = fileread( 'Info.txt' ) ;
nCols = numel( strsplit( regexp( content, '[^\r\n]+', 'match', 'once' ), ' ')) ;
data = reshape( regexp(content, '\s+', 'split'), nCols, [] ).' ;
header = data(1:2,:) ;
data = str2double( data(3:end,:) ) ;
and then you can pick any column in both header and data. Or you can use the simpler:
data = importdata( 'Info.txt' ) ;
and see which field provides you with what you need.

5 commentaires

Thank you for your answer. However, this proposition allows me to extract all columns whereas I intend to select only some specific ones. To answer your questions, the first row always includes V+ format and also I’m trying to extract the whole column including the headers. For example, I want to extract all columns where the second row is equal to D1. I have succeeded to determine the columns’ index (C=1,3,4,8). However, I’m blocked on the extraction of all data from these columns. I hope that you understand me better now.
I am also confused why when the format of the file is known, I can successfully extract columns with the ‘textscan’ function. However, when the format is unknown this function allows me to extract rows instead of columns?
Cedric
Cedric le 13 Oct 2017
Modifié(e) : Cedric le 13 Oct 2017
You cannot extract specific columns without reading the whole file actually, and parsing it to some extent. But you can tell some parsing functions to discard some columns, so you don't have them in the output.
The format of your file content is actually mostly known, the only thing that you don't know is the number of columns. In such case, the simplest way to determine the number of columns is to parse the first row and count the number of elements. This can be done using FGETL on an open file, or using pattern matching as developed in my solution (which allows to work on the whole file content read in single access to the file).
If you only need specific columns, I'd advise to just index them
colIds = [2,7] ;
header = data(1:2,colIds) ;
data = str2double( data(3:end,colIds) ) ;
you can even wrap this in a small function:
function [data, header] = fileReadColumn( locator, colIds )
if nargin < 2
colIds = ':' ;
end
content = fileread( locator ) ;
nCols = numel( strsplit( regexp( content, '[^\r\n]+', 'match', 'once' ), ' ')) ;
data = reshape( regexp(content, '\s+', 'split'), nCols, [] ).' ;
header = data(1:2,colIds) ;
data = str2double( data(3:end,colIds) ) ;
end
Note that you cannot combine data and header items in a numeric array, as header items are CHAR. This is why I split the content in a cell array of header items, and a numeric array of data. If you read everything using TEXTSCAN in one shot, I suspect that you were getting a cell array of everything stored/represented as CHAR. The data part should have been converted to numeric afterwards, otherwise you could not use it (except maybe for directly re-exporting to text file). Also, for keeping header and data together, you would have had to keep numeric data in a cell array, which is not good (doesn't allow vector computation).
The approach that I propose is pretty efficient, especially if you limit the conversion to double to a selection of a few columns using the colIds array of relevant columns present in the code above.
If you use the function above, it can be called without specifying any column (without 2nd input argument), in which case it returns all columns.
That works well. Thank you!
My pleasure!

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by