Effacer les filtres
Effacer les filtres

Removing specific characters from string in nested cells

9 vues (au cours des 30 derniers jours)
Bob Thompson
Bob Thompson le 13 Juin 2018
Commenté : Stephen23 le 30 Déc 2022
I have a series of strings which are contained within a nested cell array (because regexp loves to nest cells), and I would like to remove any non numeric or white space characters from them so that I can convert them to doubles, namely astrick.
I'm looking for the least painful way of removing any of these special characters from all strings. I do not have a sample file to attach, sorry, but I have dictated the shape of a sample array below.
X == 1x1 cell
X{1} == 1x1 cell (because regexp can't help itself apparently)
X{1}{1} = {'1234., ';'12.,* ';'1234., ','123.,* ',' 321.,* '};
  12 commentaires
Bob Thompson
Bob Thompson le 15 Juin 2018
Ah, I see. It doesn't appear in regexp.m comments, which is where I was looking.
Stephen23
Stephen23 le 15 Juin 2018
@Bob Nbob: you are right, it does not appear in the Mfile help. I notice that many other useful regular expression features also do not appear in the Mfile help: notably missing are dynamic expressions, lookaround operators, and named capture.
Both the inbuilt help and the page I linked to give a very useful introduction, and explain all features of regular expressions in MATLAB:
doc regexp
doc('Regular Expressions')

Connectez-vous pour commenter.

Réponse acceptée

Paolo
Paolo le 15 Juin 2018
Modifié(e) : Paolo le 15 Juin 2018
Perhaps this can easily be achieved in two steps. For your input:
1 ****TABLE1****
COLUMN1= 1.12, 2.23, 3.34, 4.45, 5.56, 6.67,
COLUMN2= 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
COLUMN3= 1.23, 0.34, 3.45, 5.78*, 6.54*, 8.23,
1 ****TABLE2****
Step 1. Find and replace all punctuation characters (let's say ",", "." and "*"). Live regex here .
data = fileread('CORR.txt');
expression_sub = '(?<=\d\.\d*\*?)([\*\.,])';
data = regexprep(data,expression_sub,'');
Data will now not contain those characters. Data is now:
' 1 ****TABLE1****
COLUMN1= 1.12 2.23 3.34 4.45 5.56 6.67
COLUMN2= 0.00 0.00 0.00 0.00 0.00 0.00
COLUMN3= 1.23 0.34 3.45 5.78 6.54 8.23
1 ****TABLE2****
'
Step 2. Match your data. Live regex here. The expression is greedy and will try to match as many digit, full stop, digits combinations as it can. Therefore you don't need to repmat your expression like you showed.
expression_match = '(?<=COLUMN[1,3]=\s)(\d.?\d*\s)*';
[tokens,match] = regexp(data_sub,expression_match,'tokens','match');
Matlab manipulation.
column1 = str2double(strsplit(cell2mat(tokens{1}),' '));
column3 = str2double(strsplit(cell2mat(tokens{2}),' '));
column1 =
1.1200 2.2300 3.3400 4.4500 5.5600 6.6700
column3 =
1.2300 0.3400 3.4500 5.7800 6.5400 8.2300
  2 commentaires
Bob Thompson
Bob Thompson le 18 Juin 2018
Ha, using (\d.?\d*\s)* is pretty slick. I'm a little sad I didn't think of that.
Stephen23
Stephen23 le 30 Déc 2022
@Bob Thompson: the dot needs to be escaped as well (otherwise it matches all characters), e.g.:
(\d+\.?\d*\s)*

Connectez-vous pour commenter.

Plus de réponses (1)

George Abrahams
George Abrahams le 30 Déc 2022
The others are right to fix the root problem causing the tricky nested cell array. Having said that, for future reference, my deepreplace function on File Exchange / GitHub would have done exactly what you requested.
x = {{{'1234., ';'12.,* ';'1234., ';'123.,* ';' 321.,* '}}};
% Remove any character except for digits (0-9) and period (.)
match = regexpPattern('[^\d.]');
x = deepreplace(x,match,'');
% x = 1×1 cell array
% {1×1 cell}
% x{1} = 1×1 cell array
% {5×1 cell}
% x{1}{1} = 5×1 cell array
% {'1234.'}
% {'12.' }
% {'1234.'}
% {'12310'}
% {'321.' }

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by