Effacer les filtres
Effacer les filtres

how to remove punctuation from Arabic text file

2 vues (au cours des 30 derniers jours)
Fateme Jalali
Fateme Jalali le 31 Juil 2016
Modifié(e) : Thorsten le 1 Août 2016
Hello,I have a Arabic string and want to discard all punctuations. I want to keep only text and white space between words.For example this is my string: str='سلام. دوست خوب من!'. can I change codes below to do it?
str= fileread('D:/docc111.txt');
str1 = regexprep(str,'\s+',' ');%replace enter with white space
%or str1 = regexprep(str,'[\n\r]+',' ')
%str1 = 'Hello, I need 1 MATLAB code to discard all punctuation, and signs from 9 text files.'
Lstr1=length(str1);
str_space='\s'; %String of characters
str_caps='[A-Z]';
str_ch='[a-z]';
str_nums='[0-9]';
ind_space=regexp(str1,str_space);%Match regular expression
ind_caps=regexp(str1,str_caps);
ind_chrs=regexp(str1,str_ch);
ind_nums=regexp(str1,str_nums);
mask=[ind_space ind_caps ind_chrs ind_nums];
num_str2=1:1:Lstr1;
num_str2(mask)=[];
str3=str1;
str3(num_str2)=[];
chars = [str3];
%insert space after first index and after last index in chars
charsWithWhitespace = [' ', chars(1:end), ' '];
newTest = sprintf(strrep(charsWithWhitespace, '\n', ' '));
fid = fopen('myySE1.txt','w');
fprintf(fid, '%s',charsWithWhitespace);
fclose(fid);

Réponses (1)

Walter Roberson
Walter Roberson le 31 Juil 2016

Catégories

En savoir plus sur Characters and Strings dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by