How would I create a matrix from the following strings

I am trying to write a code with minimal preprocessing. I have many entries in a text file like this:
NODE{1 0 11 0 1.000000e+00 1.000000e-02 -1.000000e-02 1.500000e-03}
There are many rows of these in an Excel file. I want to try and read the Columns 1 and 6 to 8 inside the brackets, what would the best way to do this. I have tried fileread and textscan, but I haven't got anywhere because of the text NODE at the front of the brackets.

3 commentaires

Please attach your xlsx-file here.
Please learn to use comments intead of answers if you are just responding to a followup question or wish to make a comment.
I've moved the file, attaching it to this comment instead.
JLV
JLV le 26 Déc 2019
Apologies

Connectez-vous pour commenter.

 Réponse acceptée

Stephen23
Stephen23 le 26 Déc 2019
Modifié(e) : Stephen23 le 26 Déc 2019
Note that specifying a suitable format string is much more efficient than importing as character/string and then converting afterwards (like the other answers):
opt = {'HeaderLines',2,'CollectOutput',true};
fmt = ['NODE{',repmat('%f',1,8),'}'];
[fid,msg] = fopen('ThinPlateNodes.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1}
Giving:
M =
1.00000 0.00000 11.00000 0.00000 1.00000 0.01000 -0.01000 0.00150
2.00000 0.00000 11.00000 0.00000 1.00000 0.01000 0.01000 0.00150
3.00000 0.00000 11.00000 0.00000 1.00000 0.01000 -0.01000 -0.00150
4.00000 0.00000 11.00000 0.00000 1.00000 0.01000 0.01000 -0.00150
5.00000 0.00000 11.00000 0.00000 1.00000 -0.01000 0.01000 0.00150
6.00000 0.00000 11.00000 0.00000 1.00000 -0.01000 0.01000 -0.00150
... lots of lines here
179495.00000 0.00000 15.00000 0.00000 1.00000 0.00952 -0.00960 -0.00824
179496.00000 0.00000 15.00000 0.00000 1.00000 0.00964 -0.00978 -0.00902
179497.00000 0.00000 15.00000 0.00000 1.00000 -0.00985 0.00144 -0.00838
179498.00000 0.00000 15.00000 0.00000 1.00000 0.00912 -0.00254 -0.00858
179499.00000 0.00000 15.00000 0.00000 1.00000 0.00979 0.00995 -0.00745
179500.00000 0.00000 15.00000 0.00000 1.00000 -0.00981 0.00984 -0.00805
And checking the size:
>> size(M)
ans =
78410 8

6 commentaires

JLV
JLV le 26 Déc 2019
This did not work for me, and I do not understand the use of the first line.
Stephen23
Stephen23 le 26 Déc 2019
Modifié(e) : Stephen23 le 26 Déc 2019
"This did not work for me..."
I tested that exact code on MATLAB R2009a and R2012b and R2015b and also Octave 4.4.0 using your provided text file (also attached to my answer) and got the exactly the same output every time. It is quite universal code that should work with every MATLAB version since at least R2006a, if not earlier. It is also much more efficient than either of the either two answers.
Most likely you have not used the code I gave you, but as you did not show your attempt to use it, we cannot diagnose what you did wrong.
"...and I do not understand the use of the first line."
The first line defines a cell array of key-value pairs which are supplied to textscan via a comma-separated list:
If you decided to remove the first line then of course the code will not work.
+1 :)
JLV
JLV le 27 Déc 2019
I will try and diagnose the error when I am back in the office
The method above worked fine this time!
I assume you put the safeguard in to warn me if the file can't be opened.
Stephen23
Stephen23 le 1 Jan 2020
Modifié(e) : Stephen23 le 1 Jan 2020
"The method above worked fine this time!"
I'm glad. It will be more efficient than the other methods shown on this thread.
"I assume you put the safeguard in to warn me if the file can't be opened."
Yes. I recommend putting that assert statement (or something equivalent) after every fopen: it prints much more useful information than you would get otherwise when a file cannot be opened.

Connectez-vous pour commenter.

Plus de réponses (2)

Bhaskar R
Bhaskar R le 26 Déc 2019
Modifié(e) : Bhaskar R le 26 Déc 2019
data = fileread('ThinPlateNodes.txt'); % read file(it is in text)
ext_data = regexp(data, '[^{\]]+(?=})', 'match'); % get data between {}
ext_data(1) = []; % first cell is not required so removed
num_data = zeros(length(ext_data), 8); % your complete data
for ii = 1:length(ext_data)
num_data(ii,:) = cellfun(@str2num, strsplit(cell2mat(ext_data(ii))));
end
% you can get data any colum from the "num_data"
col_1 = num_data(:,1);
col_6_to_8 = num_data(:, 6:8);

4 commentaires

JLV
JLV le 26 Déc 2019
Thank you, now time to understand the code, I've never used some of those functions before!
I was trying to repeat the steps above for the folling file after understanding what each step does.
See my edited code below
data = fileread('NodeNosatElementsUnedited.txt'); % Node Locations in space. Read file(it is in text).
ext_data = regexp(data, '[^{\]]+(?=})', 'match'); % Get data between {}
ext_data(1) = []; % First cell is not required so remove
num_data = zeros(length(ext_data), 12); % Preallocating memory
for ii = 1:length(ext_data)
Split(ii,:) = strsplit(cell2mat(ext_data(ii)));
NodesatElements(ii,:) = cellfun(@str2num, Split(ii,:),'UniformOutput',false);
end
NodesatElements = NodesatElements(:, [1 9 10 11 12]);
This seems to be computationally ineffecient with large rows, primarily because of text in the array inside the bracket and that I am unable to preallocate for some reason.
What would be the best way to speed up the code.
Stephen Cobeldick provided a sophisticated answer !!
I am just giving his answer according to your context
opt = {'HeaderLines',4,'CollectOutput',true};
fmt = '"TET4{%f %f %f %f %f %s %f %f %f %f %f %f}"';
[fid,msg] = fopen('NodeNosatElementsUnedited.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,fmt,opt{:}); % open C in variable editor so that you can know extracted data C
fclose(fid);
NodesatElements = [C{1}(:,1),C{3}(:, 3:6)]; % this is final data
Stephen23
Stephen23 le 27 Déc 2019
Modifié(e) : Stephen23 le 27 Déc 2019
"What would be the best way to speed up the code."
  • By not importing numeric data as character/strings, and then awkwardly converting it to numeric afterwards.
  • By not using str2num (which hides slow eval inside).
  • By not expanding the output arrays nearly half-a-million times inside a loop.
  • By not using a cell array to store one numeric scalar per cell.
  • By not importing any data that you do not need.
For example, much like the efficient code I showed you earlier:
fmt = '"TET4{%f%*f%*f%*f%*f%*s%*f%*f%f%f%f%f}"'; % note the ignored fields!
opt = {'HeaderLines',4,'CollectOutput',true};
[fid,msg] = fopen('NodeNosatElementsUnedited.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1};

Connectez-vous pour commenter.

T = readtable('Path\your\txt\file\ThinPlateNodes.txt');
T.Varend = str2double(regexp(T{:,end},'(\-)?\d+(\.\d+e\-\d+)?(?=\}$)','match','once'));
T.Var0 = str2double(regexp(T{:,1},'\d+(?=$)','match','once'));
T = T(:,[end,6:7,end-1]);
T.Properties.VariableNames = {'LABEL','x','y','z'};

Produits

Version

R2019b

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by