Changes

Jump to: navigation, search

OPS102 - Regular Expressions

2,217 bytes added, 11:40, 5 December 2023
no edit summary
=== Characters ===
In a regular expression (regexp), any character that doesn't otherwise have a special meaning matches that character. So the digit <code><nowiki>"5"</nowiki></code>, for example, matches the digit <code><nowiki>"5"</nowiki></code>; similarly <code><nowiki>"cat"</nowiki></code> matches the letters <code><nowiki>"c"</nowiki></code>, <code><nowiki>"a"</nowiki></code>, and <code><nowiki>"t"</nowiki></code> in sequence.
A backslash can be used to remove any special meaning which a character has. The period character <code><nowiki>"."</nowiki></code> is a type of wildcard (see below), so to search for a literal period, we place a backslash in front of it: <code><nowiki>"\."</nowiki></code>
=== Wildcards ===
A period <code><nowiki>"."</nowiki></code> will match '''any''' single character. Similarly, three periods <code><nowiki>"..."</nowiki></code> will match any three characters.
=== Bracket Expressions / Character Classes ===
Bracket Expressions or Character Classes are contained in square brackets <code><nowiki>"[ ]"</nowiki></code>:* A list of characters in square brackets will match any ''one'' character from the list of characters: <code><nowiki>"[abc]"</nowiki></code> will match <code><nowiki>"a"</nowiki></code>, <code><nowiki>"b"</nowiki></code>, or <code><nowiki>"c"</nowiki></code>* A range of characters in square brackets, written as a starting character, a dash, and an ending character, will match any character in that range: <code><nowiki>"[0-9]"</nowiki></code> will match any one digit.* There are some pre-defined named character classes. These are selected by specifying the name of the character class surrounded by colons and square brackets, placed within outer square brackets, like <code><nowiki>"[[:digits:]]"</nowiki></code>. The available names are:
** alnum - alphanumeric
** alpha - alphabetic characters
** lower - lowercase letters
** xdigit - hexidecimal digits (digits plus a-f and A-F)
* Ranges, lists, and named character classes may be combined - e.g., <code><nowiki>"[[:digit:]+-.,]" </nowiki></code> <code><nowiki>"[[:digit:][:punct:]]" </nowiki></code> <code><nowiki>"[0-9_*]"</nowiki></code>* To invert a character class, add a carat ^ character as the first character after the opening square bracket: <code><nowiki>"[^[:digit:]]" </nowiki></code> matches any non-digit character, and <code><nowiki>"[^:]" </nowiki></code> matches any character that is not a colon.
* To include a literal carat, place it at the end of the character class. To include a literal dash or closing square bracket, place it at the start of the character class.
=== Repetition ===
* A repeat count can be placed in curly brackets. It applies to the previous element: <code><nowiki>"x{3}" </nowiki></code> matches <code><nowiki>"xxx"</nowiki></code>* A repeat can be a range, written as min,max in curly brackets: <code><nowiki>"x{2,5}" </nowiki></code> will match <code><nowiki>"xx"</nowiki></code>, <code><nowiki>"xxx"</nowiki></code>, <code><nowiki>"xxxx"</nowiki></code>, or <code><nowiki>"xxxxx"</nowiki></code>* The maximum value in a range can be omitted: <code><nowiki>"x{2,}" </nowiki></code> will two or more <code><nowiki>"x" </nowiki></code> characters in a row
* There are short forms for some commonly-used ranges:
** <code><nowiki>"*" </nowiki></code> is the same as <code><nowiki>"{0,}" </nowiki></code> (zero or more)** <code><nowiki>"+" </nowiki></code> is the same as <code><nowiki>"{1,}" </nowiki></code> (one or more)** <code><nowiki>"?" </nowiki></code> is the same as <code><nowiki>"{0,1}" </nowiki></code> (zero or one)
=== Alternation ===
* The vertical bar indicates alternation - either the expression on the left or the right can be matched: <code><nowiki>"hot|cold" </nowiki></code> will match <code><nowiki>"hot" </nowiki></code> or <code><nowiki>"cold"</nowiki></code>
=== Grouping ===
* Elements placed in parenthesis are treated as a group, and can be repeated: <code><nowiki>"(na)* batman" </nowiki></code> will match <code><nowiki>"nananana batman" </nowiki></code> and <code><nowiki>"nananananananana batman"</nowiki></code>* Grouping may also be used to limit alternation: <code><nowiki>"(fire|green)house" </nowiki></code> will match <code><nowiki>"firehouse" </nowiki></code> and <code><nowiki>"greenhouse"</nowiki></code>
=== Anchors ===
* Anchors match '''locations''', not characters.
* A carat symbol will match the start of a line: <code><nowiki>"^[[:upper:]]" </nowiki></code> wil match lines that start with an uppercase letter.* A dollar sign will match the end of a line: <code><nowiki>"[[:punct:]]$" </nowiki></code> will match lines that end with a punctuation mark.* The two characters may be used together: <code><nowiki>"cat" </nowiki></code> will match the word <code><nowiki>"cat" </nowiki></code> anywhere on a line, but <code><nowiki>"^cat$" </nowiki></code> will only match lines that contain nothing besides ''only'' the word <code><nowiki>"cat"</nowiki></code>. Likewise, <code><nowiki>"^[0-9.]$" </nowiki></code> will match lines that are made up of only digits and dot characters. == Examples == {|cellspacing="0" width="100%" cellpadding="5" border="1"|-!Description!!Regexp!!Matches!!Does not match!!Comments|-|Word||Hello||hello there!<br>Hello, World!<br>He said, "Hello James", in a very threatening tone||Hi there<br>Hell Of a Day<br>h el lo|||-|IP Address (IPv4 dotted quad)||<code><nowiki>((2[0-5][0-9]|[1-2][0-9][0-9]|[1-9][0-9]|[1-9])\.){3}(2[0-5][0-9]|[1-2][0-9][0-9]|[1-9][0-9]|[1-9])</nowiki></code>|||-|Private IP Address||<code><nowiki>(10\.((2[0-5][0-9]|[1-2][0-9][0-9]|[1-9][0-9]|[1-9]))|192\.168|172\.(1[6-9]|2[0-9]|3[0-1]))\.(2[0-5][0-9]|[1-2][0-9][0-9]|[1-9][0-9]|[1-9])\.(2[0-5][0-9]|[1-2][0-9][0-9]|[1-9][0-9]|[1-9])</nowiki></code>|| || ||Valid IPv4 address with a first octet of "10." or first two octets of "192.168." or first octet of "172." followed by a second octet in the range 16-31.|}

Navigation menu