pattern
Patterns to search and match text
Description
Apatterndefines rules for matching text with text-searching functions likecontains
,matches
, andextract
. You can build apattern expressionusing pattern functions, operators, and literal text. For example, MATLAB®release names, start with"R"
, followed by the four-digit year, and then either"a"
or"b"
. Define a pattern to match the format of the release names:
pat ="R"+ digitsPattern(4) + ("a"|"b");
Match that pattern in a string:
str = ["String was introduced in R2016b.""Pattern was added in R2020b."]; extract(str,pat)
ans = 2x1 string array "R2016b" "R2020b"
Creation
Patterns are composed of literal text and other patterns using the+
,|
, and~
operators. You also can create common patterns usingObject Functions, which use rules often associated with regular expressions:
Character-Matching Patterns– Ranges of letters or digits, wildcards, or whitespaces, such as
lettersPattern
.Search Rules– How many times the pattern must occur, case sensitivity, optional patterns, and named expressions, such as
asManyOfPattern
andoptionalPattern
.Boundaries– Boundaries at the start or end of a run of specific characters, such as
alphanumericBoundary
. Boundary patterns can be negated using the~
操作符匹配边界阻止matching of their pattern expression.Pattern Organization– Define pattern structure and specify how pattern expressions are displayed, such as
maskedPattern
andnamedPattern
.
The functionpattern
also creates pattern functions with the syntax,pat = pattern(txt)
, wheretxt
是文字的文本pat
matches. Pattern functions are useful for specifying pattern type for function argument validation. However, thepattern
function is rarely needed for other cases because MATLAB text-matching functions accept text inputs.
Object Functions
Search Text
contains |
Determine if pattern is in strings |
matches |
Determine if pattern matches strings |
count |
Count occurrences of pattern in strings |
endsWith |
Determine if strings end with pattern |
startsWith |
Determine if strings start with pattern |
Edit Text
extract |
Extract substrings from strings |
replace |
Find and replace one or more substrings |
replaceBetween |
Replace substrings between start and end points |
split |
Split strings at delimiters |
erase |
Delete substrings within strings |
eraseBetween |
Delete substrings between start and end points |
extractAfter |
Extract substrings after specified positions |
extractBefore |
Extract substrings before specified positions |
extractBetween |
Extract substrings between start and end points |
insertAfter |
Insert strings after specified substrings |
insertBefore |
Insert strings before specified substrings |
Character-Matching Patterns
digitsPattern |
Match digit characters |
lettersPattern |
Match letter characters |
alphanumericsPattern |
Match letter and digit characters |
characterListPattern |
Match characters from list |
whitespacePattern |
Match whitespace characters |
wildcardPattern |
Matches as few characters of any type |
搜索规则Patterns
optionalPattern |
Make pattern optional to match |
possessivePattern |
Match pattern without backtracking |
caseSensitivePattern |
Match pattern with case sensitivity |
caseInsensitivePattern |
Match pattern regardless of case |
asFewOfPattern |
Match pattern as few times as possible |
asManyOfPattern |
Match pattern as many times as possible |
Boundary Patterns
alphanumericBoundary |
Match boundary between alphanumeric and non-alphanumeric characters |
digitBoundary |
Match boundary between digit characters and nondigit characters |
letterBoundary |
Match boundary between letter characters and nonletter characters |
whitespaceBoundary |
Match boundary between whitespace characters and non-whitespace characters |
lineBoundary |
Match start or end of line |
textBoundary |
Match start or end of text |
lookAheadBoundary |
Match boundary before specified pattern |
lookBehindBoundary |
Match boundary following specified pattern |
Regular Expression Patterns
regexpPattern |
Pattern that matches specified regular expression |
Pattern Organization
maskedPattern |
Pattern with specified display name |
namedPattern |
Designate named pattern |
Examples
Search Text Using Patterns
lettersPattern
is a typical character-matching pattern that matches letter characters. Create a pattern that matches one or more letter characters.
txt = ["This""is a""1x6""string""array""."]; pat = lettersPattern;
Usecontains
to determine if characters matched bypat
are present in each string. The output logical array shows that the first five of the strings intxt
contain letters, but the sixth string does not.
contains(txt,pat)
ans =1x6 logical array1 1 1 1 1 0
Determine if text starts with the specified pattern. The output logical array shows that four of the strings intxt
start with letters, but two strings do not.
startsWith(txt,pat)
ans =1x6 logical array1 1 0 1 1 0
Determine if the string fully matches the specified pattern. The output logical array shows which of the strings intxt
contain nothing but letters.
matches(txt,pat)
ans =1x6 logical array1 0 0 1 1 0
Count the number of times a pattern matched. The output numerical array shows how many timeslettersPattern
matched in each element oftxt
. Note thatlettersPattern
matches one or more letters so a group of concurrent letters is a single match.
count(txt,pat)
ans =1×61 2 1 1 1 0
Edit Text Using Patterns
digitsPattern
is a typical character-matching pattern that matches digit characters. Create a pattern that matches digit characters.
txt = ["1 fish""2 fish""[1,0,0] fish""[0,0,1] fish"]; pat = digitsPattern;
Usereplace
to edit pieces of text that match the pattern.
replace(txt,pat,“#”)
ans =1x4 string"# fish" "# fish" "[#,#,#] fish" "[#,#,#] fish"
Create a new piece of text by inserting an"!"
character after matched letters.
insertAfter(txt,pat,"!")
ans =1x4 string"1! fish" "2! fish" "[1!,0!,0!] fish" "[0!,0!,1!] fish"
Patterns can be created using the OR operator,|
, with text. Erase text matched by the specified pattern.
txt = erase(txt,","|"]"|"[")
txt =1x4 string"1 fish" "2 fish" "100 fish" "001 fish"
Extractpat
from the new text.
extract(txt,pat)
ans =1x4 string"1" "2" "100" "001"
数字符在文本
Use patterns to count the occurrences of individual characters in a piece of text.
txt ="She sells sea shells by the sea shore.";
Createpat
as apattern
object that matches individual letters usingalphanumericsPattern
. Extract the pattern.
pat = alphanumericsPattern(1); letters = extract(txt,pat);
Display a histogram of the number of occurrences of each letter.
letters = lower(letters); letters = categorical(letters); histogram(letters)
Hide Details when Displaying Complicated Patterns
UsemaskedPattern
to display a variable in place of a complicated pattern expression.
Build a pattern that matches simple arithmetic expressions composed of numbers and arithmetic operators.
mathSymbols = asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
mathSymbols =patternMatching: asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
Build a pattern that matches arithmetic expressions with whitespaces between characters usingmathSymbols
.
longExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
longExpressionPat =patternMatching: asManyOfPattern(asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1) + whitespacePattern) + asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
The displayed pattern expression is long and difficult to read. UsemaskedPattern
to display the variable name,mathSymbols
, in place of the pattern expression.
mathSymbols = maskedPattern(mathSymbols); shortExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
shortExpressionPat =patternMatching: asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols Show all details
Create a string containing some arithmetic expressions, and then extract the pattern from the text.
txt ="What is the answer to 1 + 1? Oh, I know! 1 + 1 = 2!"; arithmetic = extract(txt,shortExpressionPat)
arithmetic =2x1 string"1 + 1" "1 + 1 = 2"
Specify Names and Descriptions for Complicated Patterns
Create a pattern from two named patterns. Naming patterns adds context to the display of the pattern.
Build two patterns: one that matches words that begin and end with the letter D, and one that matches words that begin and end with the letter R.
dWordsPat = letterBoundary + caseInsensitivePattern("d"+ lettersPattern +"d") + letterBoundary; rWordsPat = letterBoundary + caseInsensitivePattern("r"+ lettersPattern +"r") + letterBoundary;
Build a pattern using the named patterns that finds a word that starts and ends with D followed by a word that starts and ends with R.
dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat =patternMatching: letterBoundary + caseInsensitivePattern("d" + lettersPattern + "d") + letterBoundary + whitespacePattern + letterBoundary + caseInsensitivePattern("r" + lettersPattern + "r") + letterBoundary
This pattern is hard to read and does not convey much information about its purpose. UsenamedPattern
to designate the patterns as named patterns that display specified names and descriptions in place of the pattern expressions.
dWordsPat = namedPattern(dWordsPat,"dWords","Words that start and end with D"); rWordsPat = namedPattern(rWordsPat,"rWords","Words that start and end with R"); dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat =patternMatching: dWords + whitespacePattern + rWords Using named patterns: dWords: Words that start and end with D rWords: Words that start and end with R Show more details
Create a string and extract the text that matches the pattern.
txt ="Dad, look at the divided river!"; words = extract(txt,dAndRWordsPat)
words = "divided river"
Match Email Addresses
Build an easy to read pattern to match email addresses.
Email addresses follow the structureusername@domain.TLD, whereusernameanddomainare made up of identifiers separated by periods. Build a pattern that matches identifiers composed of any combination of alphanumeric characters and"_"
characters. UsemaskedPattern
to name this patternidentifier
.
identifier = asManyOfPattern(alphanumericsPattern(1) |"_", 1); identifier = maskedPattern(identifier);
Build patterns to match domains and subdomains comprised of identifiers. Create a pattern that matches TLDs from a specified list.
subdomain = asManyOfPattern(identifier +".") + identifier; domainName = namedPattern(identifier,"domainName"); tld ="com"|"org"|"gov"|"net"|"edu";
Build a pattern for matching the local part of an email, which matches one or more identifiers separated by periods. Build a pattern for matching the domain, TLD, and any potential subdomains by combining the previously defined patterns. UsenamedPattern
to assign each of these patterns to a named pattern.
username = asManyOfPattern(identifier +".") + identifier; domain = optionalPattern(namedPattern(subdomain) +".") +...domainName +"."+...namedPattern(tld);
我把所有的模式nto a single pattern expression. UsenamedPattern
to assignusername
,domain
, andemailPattern
to named patterns.
emailAddress = namedPattern(username) +"@"+ namedPattern(domain); emailPattern = namedPattern(emailAddress)
emailPattern =patternMatching emailAddress: username + "@" + domain Using named patterns: emailAddress : username + "@" + domain username : asManyOfPattern(identifier + ".") + identifier domain : optionalPattern(subdomain + ".") + domainName + "." + tld subdomain : asManyOfPattern(identifier + ".") + identifier domainName: identifier tld : "com" | "org" | "gov" | "net" | "edu" Show all details
Create a string that contains an email address, and then extract the pattern from the text.
txt ="You can reach me by email at John.Smith@department.organization.org"; extract(txt,emailPattern)
ans = "John.Smith@department.organization.org"
Named patterns allow dot-indexing in order to access named subpatterns. Use dot-indexing to assign a specific value to the named patterndomain
.
emailPattern.emailAddress.domain ="mathworks.com"
emailPattern =patternMatching emailAddress: username + "@" + domain Using named patterns: emailAddress: username + "@" + domain username : asManyOfPattern(identifier + ".") + identifier domain : "mathworks.com" Show all details
Version History
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)