Removing commas between columns in text data

Question

0 votes

I have a txt file which is the ouput of a lemmatizer, in the form

Sometimes, ,, I, use, commas, .
I, like, writing, ,, I, like, reading

How can I read it into a tokenizedDocument deleting the unneccessary commas between tokens? A simple approach would be

test=readlines('/path/to/file.txt')
test=strrep(test,',','')
test=tokenizedDocument(test)

but it would remove even the commas already present in the original text, while I'd like to preserve punctuation-

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Walter Roberson le 16 Oct 2021

Ouvrir dans MATLAB Online

2 votes

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, {'(?<=[^,]),\s', '\s*,,', '\s+\.'}, {' ', ',', '.'})
test = 2×1 cell array
    {'Sometimes, I use commas.'      }
    {'I like writing, I like reading'}

Notice we had to have a special rule for periods. You have 'use, commas' which should almost certainly translate to 'use commas' (so comma space becomes space), but after that 'commas, .' should not become 'commas .' .

To put it another way, we cannot use the rule that comma space pair is to be deleted: that works for the comma space between the word 'commas' and the period, but it does not work for the comma space pair between 'use' and 'commas': if you tried to apply that rule then 'use, commas' would merge together to 'usecommas' .

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Kim Maria Damiani le 16 Oct 2021

Thank you!

Connectez-vous pour commenter.

Answer 2

Chunru le 16 Oct 2021

Ouvrir dans MATLAB Online

0 votes

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, ',\s', ' ')
test = 2×1 cell array
    {'Sometimes , I use commas .'     }
    {'I like writing , I like reading'}

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Removing commas between columns in text data

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (1)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Community Treasure Hunt

Removing commas between columns in text data

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (1)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens