Efficient way to standardize large amounts of text
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
André Kucharzewski
le 19 Oct 2021
Commenté : André Kucharzewski
le 24 Oct 2021
Hello,
i have a table with a size of around 1 million rows. In one column there are different type of strings.
Mixed with letters and numbers. Like:
abc_123
cdf_123
123_cdf
123 (abc)
There are around 120 different text formats which repeat. Most of them are able to bring in a standard format like aa_11. Any format which is not able to fit get a standard undef format.
Any suggestions how i can handel such a large dataset without for loop over 1Million rows and check each cell?
Thanks in advance :)
0 commentaires
Réponse acceptée
Duncan Po
le 19 Oct 2021
You may be able to use patterns. For example, suppose the standard format is letters followed by underscore followed by numbers, you can detect this pattern:
>> x = ["abc_123", "cdf_123", "123_cdf", "123 (abc)"]; % create an example string array
>> matches(x, lettersPattern + "_" + digitsPattern) % check if the strings match the standard pattern
ans =
1×4 logical array
1 1 0 0
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Characters and Strings dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!