Search and Replace Text
Processing text data often involves finding and replacing substrings. There are several functions that find text and return different information: some functions confirm that the text exists, while others count occurrences, find starting indices, or extract substrings. These functions work on character vectors and string scalars, such as "yes"
, as well as character and string arrays, such as ["yes"
,"no";"abc"
,"xyz"
]. In addition, you can use patterns to define rules for searching, such as one or more letter or digit characters.
Search for Text
To determine if text is present, use a function that returns logical values, like contains
, startsWith
, or endsWith
. Logical values of 1
correspond to true, and 0
corresponds to false.
txt = "she sells seashells by the seashore"; TF = contains(txt,"sea")
TF = logical
1
Calculate how many times the text occurs using the count
function.
n = count(txt,"sea")
n = 2
To locate where the text occurs, use the strfind
function, which returns starting indices.
idx = strfind(txt,"sea")
idx = 1×2
11 28
Find and extract text using extraction functions, such as extract
, extractBetween
, extractBefore
, or extractAfter
.
mid = extractBetween(txt,"sea","shore")
mid = "shells by the sea"
Optionally, include the boundary text.
mid = extractBetween(txt,"sea","shore","Boundaries","inclusive")
mid = "seashells by the seashore"
Find Text in Arrays
The search and replacement functions can also find text in multi-element arrays. For example, look for color names in several song titles.
songs = ["Yellow Submarine"; "Penny Lane"; "Blackbird"]; colors =["Red","Yellow","Blue","Black","White"]; TF = contains(songs,colors)
TF = 3x1 logical array
1
0
1
To list the songs that contain color names, use the logical TF
array as indices into the original songs
array. This technique is called logical indexing.
colorful = songs(TF)
colorful = 2x1 string
"Yellow Submarine"
"Blackbird"
Use the function replace
to replace text in songs
that matches elements of colors
with the string "Orange"
.
replace(songs,colors,"Orange")
ans = 3x1 string
"Orange Submarine"
"Penny Lane"
"Orangebird"
Match Patterns
Since R2020b
In addition to searching for literal text, like “sea” or “yellow”, you can search for text that matches a pattern. There are many predefined patterns, such as digitsPattern
to find numeric digits.
address = "123a Sesame Street, New York, NY 10128";
nums = extract(address,digitsPattern)
nums = 2x1 string
"123"
"10128"
For additional precision in searches, you can combine patterns. For example, locate words that start with the character “S”. Use a string to specify the “S” character, and lettersPattern
to find additional letters after that character.
pat = "S" + lettersPattern;
StartWithS = extract(address,pat)
StartWithS = 2x1 string
"Sesame"
"Street"
For more information, see Build Pattern Expressions.
See Also
contains
| extract
| count
| pattern
| replace
| strfind