Difference between revisions of "Tutorial 9 - Regular Expressions"

From CDOT Wiki
Jump to: navigation, search
Line 1: Line 1:
 
Content under development
 
Content under development
  
 +
=USING REGULAR EXPRESSIONS=
 +
<br>
 +
===Main Objectives of this Practice Tutorial===
 +
 +
:* Define the term '''Regular Expressions'''
 +
 +
:* Explain the difference between '''Regular Expressions''' and '''Filename Expansion'''
 +
 +
:* Explain the purpose of '''Literal (Simple)''' Regular Expressions
 +
 +
:* Understand and use common symbols for '''Complex''' Regular Expressions and their purpose
 +
 +
:* Understand and use command symbols for '''Extended''' Regular Expressions and their purpose
 +
 +
:* List several Linux commands that can use regular expressions
 +
<br>
 +
 +
===Tutorial Reference Material===
 +
 +
{|width="100%" cellspacing="0" cellpadding="10"
 +
 +
|- valign="top"
 +
 +
|colspan="2" style="font-size:16px;font-weight:bold;border-bottom: thin solid black;border-spacing:0px;"|Course Notes<br>
 +
 +
|colspan="2" style="font-size:16px;font-weight:bold;border-bottom: thin solid black;border-spacing:0px;padding-left:15px;"|Linux Command/Shortcut Reference<br>
 +
 +
|colspan="1" style="font-size:16px;font-weight:bold;border-bottom: thin solid black;border-spacing:0px;padding-left:15px;"|YouTube Videos<br>
 +
 +
|- valign="top" style="padding-left:15px;"
 +
 +
|colspan="2" |Course Notes:<ul><li>[https://ict.senecacollege.ca/~murray.saul/uli101/ULI101-Week9.pdf PDF] | [https://ict.senecacollege.ca/~murray.saul/uli101/ULI101-Week9.pptx PPTX]</li></ul>
 +
 +
 +
|  style="padding-left:15px;" |Regular Expressions
 +
* [https://techterms.com/definition/regular_expression#:~:text=A%20regular%20expression%20(or%20%22regex,wildcards%2C%20and%20ranges%20of%20characters.&text=A%20regular%20expression%20can%20be,%2C%20such%20as%20%22app%22. Definition]
 +
* [https://en.wikipedia.org/wiki/Regular_expression#:~:text=Regular%20expressions%20are%20used%20in,built%2Din%20or%20via%20libraries. Purpose (WIKI)]<br><br>
 +
 +
 +
|  style="padding-left:15px;"|Linux Commands
 +
* [https://ss64.com/bash/egrep.html egrep]
 +
* [https://www.man7.org/linux/man-pages/man1/man.1.html man]
 +
* [https://man7.org/linux/man-pages/man1/more.1.html more] / [https://www.man7.org/linux/man-pages/man1/less.1.html less]
 +
* [https://man7.org/linux/man-pages/man1/vi.1p.html vi] / [http://linuxcommand.org/lc3_man_pages/vim1.html vim]
 +
* [https://man7.org/linux/man-pages/man1/sed.1p.html sed]
 +
* [https://man7.org/linux/man-pages/man1/awk.1p.html awk]
 +
* [https://linux.die.net/man/1/wget wget]
 +
 +
|colspan="1" style="padding-left:15px;" width="30%"|Brauer Instructional Videos:<ul><li>[https://www.youtube.com/watch?v=-2pwLHcvCsU&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=12 Using grep Command with Regular Expressions]</li></ul>
 +
|}
 +
 +
= KEY CONCEPTS =
 +
 +
===Regular Expressions===
 +
 +
<i>A '''regular expression''' is a combination of two types of characters: '''literals''' and '''special characters'''.<br>Strings of text can be compared to this pattern to see if there is a match.</i>
 +
 +
This usually refers to text that is <u>contained</u> inside a '''file''' or text as a result<br>of issuing Linux commands using a '''Linux pipeline command'''.
 +
<br><br>
 +
 +
===Literal (Simple) Regular Expressions===
 +
 +
[[Image:re-3.png|thumb|right|200px|A '''simple''' ('''literal''') regular expression is a series of letters and numbers (tabs or spaces).]]
 +
The simplest regular expression is a series of letters and numbers, (tabs or spaces).<br>A '''simple''' ('''literal''') regular expression consists of normal characters, which used to match patterns.<br><br>
 +
Although there are many Linux commands that use regular expressions, the '''grep''' command is a useful command to learn how to display matches of patterns of strings within text files.<br><br>
 +
For example:
 +
<span style="color:blue;font-weight:bold;font-family:courier;">grep Linux document.txt</span><br><br>
 +
 +
=== Complex / Extended Regular Expressions ===
 +
 +
'''Complex Regular Expressions'''
 +
<br><br>
 +
The problem with just using '''simple''' ('''literal''') regular expressions is that only <u>simple</u> or <u>general</u> patterns are matched.
 +
 +
''Complex Regular Expressions'' use symbols to help match text for more <u>precise</u> (complex) patterns.<br>The most common complex regular expression symbols are displayed below:
 +
<br><br>
 +
:'''Anchors: ''' <span style="color:blue;font-family:courier;font-weight:bold;">^</span> , <span style="color:blue;font-family:courier;font-weight:bold;">$</span><br>Match lines the begin (^) or end ($) with a pattern.<br>
 +
:'''Single Character:''' &nbsp; <span style="color:blue;font-family:courier;font-weight:bold;">.</span><br>Represents a single character that can be any type of character.<br>
 +
:'''Character Class:'''  <span style="color:blue;font-family:courier;font-weight:bold;">[ ]</span> , <span style="color:blue;font-family:courier;font-weight:bold;">[^ ]</span><br>Represents a single character but with restrictions.<br>
 +
:'''Zero or More Occurrence:'''  <span style="color:blue;font-family:courier;font-weight:bold;">*</span><br>Zero or more occurrences of previous character.<br><br>
 +
 +
:Examples of '''complex regular expressions''' are displayed below:
 +
 +
<table align="left"><tr valign="top"><td>[[Image:re-4.png|thumb|right|200px|Example of using '''anchors'''.]]</td><td>[[Image:re-5.png|thumb|right|175px|Example of matching by '''character(s)'''.]]</td><td>[[Image:re-6.png|thumb|right|220px|Example of using '''character class'''.]]</td><td>[[Image:re-7.png|thumb|right|200px|Example of matching '''zero or more occurrence of preceding character'''.]]</td></tr></table>
 +
<br><br><br><br><br><br><br><br><br><br>
 +
 +
 +
'''Extended Regular Expressions'''
 +
 +
''Extended Regular Expressions'' consist of additional special characters to “extend”<br>the capability of regular expressions. You must use the '''egrep''' or '''grep -E''' commands<br>in order to properly use extended regular expressions.
 +
 +
 +
:'''Repetition:''' <span style="color:blue;font-family:courier;font-weight:bold;">{min,max}</span><br>Allows for more precise repetitions. Using braces, you can specify<br>the '''minimum''' and/or '''maximum''' number of repetitions.
 +
 +
:'''Groups:''' <span style="color:blue;font-family:courier;font-weight:bold;">( )</span><br>Allows you to search for repetition for a '''group of characters''', a '''word''', or a '''phase'''.<br>You enclose them within brackets <span style="font-family:courier;font-weight:bold;">( )</span> to specify a '''group'''.
 +
 +
:'''or Condition:'''  <span style="color:blue;font-family:courier;font-weight:bold;">|</span><br>Can be used with '''groups''' to match a variety of character(s), words or phases.<br>The | symbol is used to separate the variety of character(s) within a ''group''.<br><br>
 +
 +
:Examples of how to use '''extended regular expressions''' with the '''egrep''' command are displayed below:<br><br>
 +
 +
<table align="left"><tr valign="top"><td>[[Image:re-8.png|thumb|right|280px|Example of using '''repetition'''.]]</td><td>[[Image:re-9.png|thumb|right|250px|Example of using '''groups'''.]]</td><td>[[Image:re-10.png|thumb|right|250px|Example of using '''or''' condition with '''groups'''.]]</td></tr></table>
 +
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 
= INVESTIGATION 1: SIMPLE & COMPLEX REGULAR EXPRESSIONS =
 
= INVESTIGATION 1: SIMPLE & COMPLEX REGULAR EXPRESSIONS =
  

Revision as of 21:25, 25 October 2021

Content under development

USING REGULAR EXPRESSIONS


Main Objectives of this Practice Tutorial

  • Define the term Regular Expressions
  • Explain the difference between Regular Expressions and Filename Expansion
  • Explain the purpose of Literal (Simple) Regular Expressions
  • Understand and use common symbols for Complex Regular Expressions and their purpose
  • Understand and use command symbols for Extended Regular Expressions and their purpose
  • List several Linux commands that can use regular expressions


Tutorial Reference Material

Course Notes
Linux Command/Shortcut Reference
YouTube Videos
Course Notes:


Regular Expressions


Linux Commands Brauer Instructional Videos:

KEY CONCEPTS

Regular Expressions

regular expression is a combination of two types of characters: literals and special characters.
Strings of text can be compared to this pattern to see if there is a match.

This usually refers to text that is contained inside a file or text as a result
of issuing Linux commands using a Linux pipeline command.

Literal (Simple) Regular Expressions

A simple (literal) regular expression is a series of letters and numbers (tabs or spaces).

The simplest regular expression is a series of letters and numbers, (tabs or spaces).
A simple (literal) regular expression consists of normal characters, which used to match patterns.

Although there are many Linux commands that use regular expressions, the grep command is a useful command to learn how to display matches of patterns of strings within text files.

For example: grep Linux document.txt

Complex / Extended Regular Expressions

Complex Regular Expressions

The problem with just using simple (literal) regular expressions is that only simple or general patterns are matched.

Complex Regular Expressions use symbols to help match text for more precise (complex) patterns.
The most common complex regular expression symbols are displayed below:

Anchors: ^ , $
Match lines the begin (^) or end ($) with a pattern.
Single Character:   .
Represents a single character that can be any type of character.
Character Class: [ ] , [^ ]
Represents a single character but with restrictions.
Zero or More Occurrence: *
Zero or more occurrences of previous character.

Examples of complex regular expressions are displayed below:
Example of using anchors.
Example of matching by character(s).
Example of using character class.
Example of matching zero or more occurrence of preceding character.












Extended Regular Expressions

Extended Regular Expressions consist of additional special characters to “extend”
the capability of regular expressions. You must use the egrep or grep -E commands
in order to properly use extended regular expressions.


Repetition: {min,max}
Allows for more precise repetitions. Using braces, you can specify
the minimum and/or maximum number of repetitions.
Groups: ( )
Allows you to search for repetition for a group of characters, a word, or a phase.
You enclose them within brackets ( ) to specify a group.
or Condition: |
Can be used with groups to match a variety of character(s), words or phases.
The | symbol is used to separate the variety of character(s) within a group.

Examples of how to use extended regular expressions with the egrep command are displayed below:

Example of using repetition.
Example of using groups.
Example of using or condition with groups.





















INVESTIGATION 1: SIMPLE & COMPLEX REGULAR EXPRESSIONS

INVESTIGATION 2: EXTENDED REGULAR EXPRESSIONS

INVESTIGATION 3: OTHER COMMANDS THAT USE REGULAR EXPRESSIONS

LINUX PRACTICE QUESTIONS