Main Content

pattern

Patterns to search and match text

Description

Apatterndefines rules for matching text with text-searching functions likecontains,matches, andextract. You can build apattern expressionusing pattern functions, operators, and literal text. For example, MATLAB®release names, start with"R", followed by the four-digit year, and then either"a"or"b". Define a pattern to match the format of the release names:

pat ="R"+ digitsPattern(4) + ("a"|"b");

Match that pattern in a string:

str = ["String was introduced in R2016b.""Pattern was added in R2020b."]; extract(str,pat)
ans = 2x1 string array "R2016b" "R2020b"

Creation

Patterns are composed of literal text and other patterns using the+,|, and~operators. You also can create common patterns usingObject Functions, which use rules often associated with regular expressions:

  • Character-Matching Patterns– Ranges of letters or digits, wildcards, or whitespaces, such aslettersPattern.

  • Search Rules– How many times the pattern must occur, case sensitivity, optional patterns, and named expressions, such asasManyOfPatternandoptionalPattern.

  • Boundaries– Boundaries at the start or end of a run of specific characters, such asalphanumericBoundary. Boundary patterns can be negated using the~操作符匹配边界阻止matching of their pattern expression.

  • Pattern Organization– Define pattern structure and specify how pattern expressions are displayed, such asmaskedPatternandnamedPattern.

The functionpatternalso creates pattern functions with the syntax,pat = pattern(txt), wheretxt是文字的文本patmatches. Pattern functions are useful for specifying pattern type for function argument validation. However, thepatternfunction is rarely needed for other cases because MATLAB text-matching functions accept text inputs.

Object Functions

expand all

contains Determine if pattern is in strings
matches Determine if pattern matches strings
count Count occurrences of pattern in strings
endsWith Determine if strings end with pattern
startsWith Determine if strings start with pattern
extract Extract substrings from strings
replace Find and replace one or more substrings
replaceBetween Replace substrings between start and end points
split Split strings at delimiters
erase Delete substrings within strings
eraseBetween Delete substrings between start and end points
extractAfter Extract substrings after specified positions
extractBefore Extract substrings before specified positions
extractBetween Extract substrings between start and end points
insertAfter Insert strings after specified substrings
insertBefore Insert strings before specified substrings
digitsPattern Match digit characters
lettersPattern Match letter characters
alphanumericsPattern Match letter and digit characters
characterListPattern Match characters from list
whitespacePattern Match whitespace characters
wildcardPattern Matches as few characters of any type
optionalPattern Make pattern optional to match
possessivePattern Match pattern without backtracking
caseSensitivePattern Match pattern with case sensitivity
caseInsensitivePattern Match pattern regardless of case
asFewOfPattern Match pattern as few times as possible
asManyOfPattern Match pattern as many times as possible
alphanumericBoundary Match boundary between alphanumeric and non-alphanumeric characters
digitBoundary Match boundary between digit characters and nondigit characters
letterBoundary Match boundary between letter characters and nonletter characters
whitespaceBoundary Match boundary between whitespace characters and non-whitespace characters
lineBoundary Match start or end of line
textBoundary Match start or end of text
lookAheadBoundary Match boundary before specified pattern
lookBehindBoundary Match boundary following specified pattern
regexpPattern Pattern that matches specified regular expression
maskedPattern Pattern with specified display name
namedPattern Designate named pattern

Examples

collapse all

lettersPatternis a typical character-matching pattern that matches letter characters. Create a pattern that matches one or more letter characters.

txt = ["This""is a""1x6""string""array""."]; pat = lettersPattern;

Usecontainsto determine if characters matched bypatare present in each string. The output logical array shows that the first five of the strings intxtcontain letters, but the sixth string does not.

contains(txt,pat)
ans =1x6 logical array1 1 1 1 1 0

Determine if text starts with the specified pattern. The output logical array shows that four of the strings intxtstart with letters, but two strings do not.

startsWith(txt,pat)
ans =1x6 logical array1 1 0 1 1 0

Determine if the string fully matches the specified pattern. The output logical array shows which of the strings intxtcontain nothing but letters.

matches(txt,pat)
ans =1x6 logical array1 0 0 1 1 0

Count the number of times a pattern matched. The output numerical array shows how many timeslettersPatternmatched in each element oftxt. Note thatlettersPatternmatches one or more letters so a group of concurrent letters is a single match.

count(txt,pat)
ans =1×61 2 1 1 1 0

digitsPatternis a typical character-matching pattern that matches digit characters. Create a pattern that matches digit characters.

txt = ["1 fish""2 fish""[1,0,0] fish""[0,0,1] fish"]; pat = digitsPattern;

Usereplaceto edit pieces of text that match the pattern.

replace(txt,pat,“#”)
ans =1x4 string"# fish" "# fish" "[#,#,#] fish" "[#,#,#] fish"

Create a new piece of text by inserting an"!"character after matched letters.

insertAfter(txt,pat,"!")
ans =1x4 string"1! fish" "2! fish" "[1!,0!,0!] fish" "[0!,0!,1!] fish"

Patterns can be created using the OR operator,|, with text. Erase text matched by the specified pattern.

txt = erase(txt,","|"]"|"[")
txt =1x4 string"1 fish" "2 fish" "100 fish" "001 fish"

Extractpatfrom the new text.

extract(txt,pat)
ans =1x4 string"1" "2" "100" "001"

Use patterns to count the occurrences of individual characters in a piece of text.

txt ="She sells sea shells by the sea shore.";

Createpatas apatternobject that matches individual letters usingalphanumericsPattern. Extract the pattern.

pat = alphanumericsPattern(1); letters = extract(txt,pat);

Display a histogram of the number of occurrences of each letter.

letters = lower(letters); letters = categorical(letters); histogram(letters)

Figure contains an axes object. The axes object contains an object of type categoricalhistogram.

UsemaskedPatternto display a variable in place of a complicated pattern expression.

Build a pattern that matches simple arithmetic expressions composed of numbers and arithmetic operators.

mathSymbols = asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
mathSymbols =patternMatching: asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)

Build a pattern that matches arithmetic expressions with whitespaces between characters usingmathSymbols.

longExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
longExpressionPat =patternMatching: asManyOfPattern(asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1) + whitespacePattern) + asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)

The displayed pattern expression is long and difficult to read. UsemaskedPatternto display the variable name,mathSymbols, in place of the pattern expression.

mathSymbols = maskedPattern(mathSymbols); shortExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
shortExpressionPat =patternMatching: asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols Show all details

Create a string containing some arithmetic expressions, and then extract the pattern from the text.

txt ="What is the answer to 1 + 1? Oh, I know! 1 + 1 = 2!"; arithmetic = extract(txt,shortExpressionPat)
arithmetic =2x1 string"1 + 1" "1 + 1 = 2"

Create a pattern from two named patterns. Naming patterns adds context to the display of the pattern.

Build two patterns: one that matches words that begin and end with the letter D, and one that matches words that begin and end with the letter R.

dWordsPat = letterBoundary + caseInsensitivePattern("d"+ lettersPattern +"d") + letterBoundary; rWordsPat = letterBoundary + caseInsensitivePattern("r"+ lettersPattern +"r") + letterBoundary;

Build a pattern using the named patterns that finds a word that starts and ends with D followed by a word that starts and ends with R.

dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat =patternMatching: letterBoundary + caseInsensitivePattern("d" + lettersPattern + "d") + letterBoundary + whitespacePattern + letterBoundary + caseInsensitivePattern("r" + lettersPattern + "r") + letterBoundary

This pattern is hard to read and does not convey much information about its purpose. UsenamedPatternto designate the patterns as named patterns that display specified names and descriptions in place of the pattern expressions.

dWordsPat = namedPattern(dWordsPat,"dWords","Words that start and end with D"); rWordsPat = namedPattern(rWordsPat,"rWords","Words that start and end with R"); dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat =patternMatching: dWords + whitespacePattern + rWords Using named patterns: dWords: Words that start and end with D rWords: Words that start and end with R Show more details

Create a string and extract the text that matches the pattern.

txt ="Dad, look at the divided river!"; words = extract(txt,dAndRWordsPat)
words = "divided river"

Build an easy to read pattern to match email addresses.

Email addresses follow the structureusername@domain.TLD, whereusernameanddomainare made up of identifiers separated by periods. Build a pattern that matches identifiers composed of any combination of alphanumeric characters and"_"characters. UsemaskedPatternto name this patternidentifier.

identifier = asManyOfPattern(alphanumericsPattern(1) |"_", 1); identifier = maskedPattern(identifier);

Build patterns to match domains and subdomains comprised of identifiers. Create a pattern that matches TLDs from a specified list.

subdomain = asManyOfPattern(identifier +".") + identifier; domainName = namedPattern(identifier,"domainName"); tld ="com"|"org"|"gov"|"net"|"edu";

Build a pattern for matching the local part of an email, which matches one or more identifiers separated by periods. Build a pattern for matching the domain, TLD, and any potential subdomains by combining the previously defined patterns. UsenamedPatternto assign each of these patterns to a named pattern.

username = asManyOfPattern(identifier +".") + identifier; domain = optionalPattern(namedPattern(subdomain) +".") +...domainName +"."+...namedPattern(tld);

我把所有的模式nto a single pattern expression. UsenamedPatternto assignusername,domain, andemailPatternto named patterns.

emailAddress = namedPattern(username) +"@"+ namedPattern(domain); emailPattern = namedPattern(emailAddress)
emailPattern =patternMatching emailAddress: username + "@" + domain Using named patterns: emailAddress : username + "@" + domain username : asManyOfPattern(identifier + ".") + identifier domain : optionalPattern(subdomain + ".") + domainName + "." + tld subdomain : asManyOfPattern(identifier + ".") + identifier domainName: identifier tld : "com" | "org" | "gov" | "net" | "edu" Show all details

Create a string that contains an email address, and then extract the pattern from the text.

txt ="You can reach me by email at John.Smith@department.organization.org"; extract(txt,emailPattern)
ans = "John.Smith@department.organization.org"

Named patterns allow dot-indexing in order to access named subpatterns. Use dot-indexing to assign a specific value to the named patterndomain.

emailPattern.emailAddress.domain ="mathworks.com"
emailPattern =patternMatching emailAddress: username + "@" + domain Using named patterns: emailAddress: username + "@" + domain username : asManyOfPattern(identifier + ".") + identifier domain : "mathworks.com" Show all details

Version History

Introduced in R2020b