Effacer les filtres
Effacer les filtres

How to convert text line into numbers

3 vues (au cours des 30 derniers jours)
Wisam
Wisam le 21 Sep 2014
Commenté : Wisam le 22 Sep 2014
I am trying to read this text and put it in a vector, some of the elements must be repeated according to the numbers before * symbol, for example the first five elements should have a value of 10 and so on:
5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57
perm_i=[];
fid=fopen(file_name_out);
textscan(fid, '%s', 1, 'delimiter', '\n', 'headerlines', row_permi_start-1);
for j=1:row_permi_end-row_permi_start
c=textscan(fid, '%s', 1, 'delimiter', '\n');
astring=cell2mat(c{1});
ind1=find(astring=='*');
ind_temp=[];
if ~isempty(ind1)
for k=1:length(ind1)
indspace=find(astring==' ');
indspace1=indspace(indspace<ind1(k));
display (indspace);
if isempty(indspace1)
indspace1=0;
else
indspace1=indspace1(end);
end
display (indspace1);
num_loc(k)=length(indspace1)+1;
indspace1=indspace1(end);
display (indspace1);
num_1(k)=str2double(astring(indspace1+1:ind1(k)-1))-1;
ind_temp=[ind_temp,indspace1+1:ind1(k)];
display (num_loc);
end
astring(ind_temp)=[];
end
acell=textscan(astring,'%f');
var_temp=acell{1,1};
if ~isempty(ind1)
var_temp_1=var_temp;
for k=1:length(ind1)
var_temp(num_loc(k)+num_1(k) :end+num_1(k))=var_temp(num_loc(k):end);
var_temp(num_loc(k)+1:num_loc(k)+num_1(k))=var_temp(num_loc(k));
display (var_temp);
num_loc=num_loc+num_1(k);
end
  2 commentaires
John
John le 21 Sep 2014
I have not tried the above solutions/suggestions, but this is a natural job for regular expressions. MATLAB, the most versatile numerical computing package, provides extensive regular expression (regex) functionality. It does not have the utility of Perl, but there are enough regex varieties in MATLAB to collapse those loops into a few lines of regex code.
To get you started on regex in MATLAB:
Some of the regex functions you will likely have to use to craft a concise solution: regexp, regexprep
You will have to do a bit of reading and practising to get the hang of it. To give you an idea of how regex can serve you in parsing and manipulating the string, consider these few lines of code which give you the starting indices of the tokens -whether they have a multiplier prepended or not- you would probably want to manipulate:
myString = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57'
regexQuery = '((\d+)\*)?\d+(\.\d+)?'
indices = regexp(myString, regexQuery)
% indices = 1 6 14 19 27 35 41 49 57 63 71
The elements of indices point to the starting indices of the tokens you would be interested in. To achieve the effect of repeating numbers prepended with multipliers, you would have to look into the more advanced features of 'regexprep'.
These, and not code that ordinarily parses string tokens, are more likely to give you graceful solutions that are maintainable and readable.
You may find MATLAB's string functions useful as well:
Wisam
Wisam le 22 Sep 2014
I appreciate your support, thanks

Connectez-vous pour commenter.

Réponse acceptée

Guillaume
Guillaume le 21 Sep 2014
Modifié(e) : Guillaume le 22 Sep 2014
I've not looked at your code (which is badly formatted), but to convert your example into a vector of numbers I would do:
str = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
v = [];
for group = strsplit(str) %split string at spaces into groups
groupparts = strsplit(group{1}, '*'); %split group at * (if no *, no split)
if numel(groupparts) == 1
v = [v str2num(groupparts{1})];
else
v = [v repmat(str2num(groupparts{2}), 1, str2num(groupparts{1}))];
end
end
Or as I said in my comment to John's answer, if you want to use a regexprep one liner:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));
  1 commentaire
Wisam
Wisam le 22 Sep 2014
great, it works thanks

Connectez-vous pour commenter.

Plus de réponses (1)

John
John le 21 Sep 2014
Modifié(e) : John le 21 Sep 2014
As mentioned before, regular expressions provide more intuitive solutions (once you get the hang of the basics). This short snippet below, which returns the answer as a numeric vector, seems to work:
input = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
regexQuery = '(?<pre>(\d+))?(\*)?(?<post>\d+(\.\d+)?)'
matches = regexp(input, regexQuery, 'names')
res = ''
for i = 1:size(matches, 2)
if (isempty(matches(i).pre))
matches(1).pre = 1;
end
res = [res repmat([' ' matches(i).post ' '], [1 str2num(matches(i).pre)])];
end
res = str2num(res)
It uses regexp once and the results of that in a simple loop that concatenates the nascent string. And I would consider this a crude solution (if it actually works :-) ) with a lot of superfluous code. My guess is that exploiting named captures and the command substitution functionality in regexprep could collapse all that into 2 or 3 commands.
  1 commentaire
Guillaume
Guillaume le 22 Sep 2014
Modifié(e) : Guillaume le 22 Sep 2014
I would argue that regular expressions are overkill in this case, considering you only need two strsplit, one to break the string at every space and one to break those split at the '*'.
You could indeed do it with a single line regexprep, but this involve a dynamic regular expression replacement string which is not particularly cheap in term of computation time (and not particularly easy to comprehend. For the record, the one liner is:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));
edit: On the other hand the regexprep is much faster than my strsplit solution.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Data Type Conversion dans Help Center et File Exchange

Tags

Aucun tag saisi pour le moment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by