Strings and characters are useful data types in Python. They help Python programmers to write names of people, cities, etc. When using strings, you will want to perform a number of operations.
Examples of such operations include matching and comparing strings. Some of these operations can be done using operators. However, in some cases, these operators may not be enough, which means you may need to have advanced pattern-matching capabilities. That's when you'll need to use *regular expressions*. In this tutorial, we will be discussing Python regular expressions in detail.
What are Regular Expressions?
In Python, a regular expression (regex) refers to a sequence of characters that determine whether a string contains a particular pattern or not.
Regular expressions help us to extract information from text, strings, logs or files.
Regular expressions also help programmers to find, replace, or delete characters from strings.
When working with regular expressions, everything is treated as a character.
So, the patterns to be searched for are simply a sequence of characters.
The `re` Module
To work with regular expressions in Python, we use the `re` module. This module comes with functions and methods for working with regular expressions.
The module is shipped with Python, so you simply have to import it using Python's `import` statement as shown below:
import re
The following example demonstrates how you can search for a pattern within a string:
import re
str = "it rained today."
a = re.search("^it.*today.$", str)
if a:
print("Yes, there was a match")
else:
print("No match")
We've checked whether the string `str` starts with `It` and ends with `today.`.
Since this is true, the first `print` statement will be returned as shown below:
Yes, there was a match
Next, we'll discuss the various functions provided by Python's `re` module.
The `findall()` function
This function helps us search for all of the occurrences of a pattern in a string. For example:
import re
str = "The committee sat down commissioned the process."
a = re.findall("mm", str)
print(a)
In the above example, we are using the `findall()` function to find all the occurrences of the pattern `mm` in the string `str`.
It returns the following output:
['mm', 'mm']
The output shows that there are two occurrences of the pattern in the string.
The `match()` function
The `match()` function is used to search for a matching pattern within a string. If a match is found, it returns the match object.
On the other hand, if no match is found, it returns `None`.
For example:
import re
list = ["cow can", "cat cop", "cate gate"]
for x in list:
a = re.match("(c\w+)\W(c\w+)", x)
if a:
print((a.groups()))
In the above example, we have defined a list with a number of elements.
We have then used the `match()` function to match anything that starts with the letter `c`.
The `groups()` function helps us return all the matching subgroups within the list.
The code returns the following output:
('cow', 'can')
('cat', 'cop')
Only the matching subgroups were returned.
The `search()` function
The `search()` function searches for the occurrence of the pattern in a string and returns its first occurrence.
The function expects you to pass the *pattern* to search for and the *text* from which you need to search the pattern.
For example:
import re
str = "I am learning regular expressions"
pattern = ['I am', 'regular expressions']
for x in pattern:
print('Searching for "%s" in "%s" ->' % (x, str), end = '')
if re.search(x.str):
print('Match found')
else:
print('No Match!')
In the above example, we've declared a string named `str`.
We've also declared a list named `pattern` with two items.
The`search()` function has then been used to search for the occurrence of the two patterns in the string.
The code returns the following output:
Searching for "I am" in "I am learning regular expressions" -> Match found
Searching for "regular expressions" in "I am learning regular expressions" -> Match found
The output shows that a match was found in both cases.
The `sub()` function
This function is used when there is a need to search and replace.
All the matches are replaced with the text that you specify.
For example:
import re
str1 = "+254 986 234"
str2 = "My name is John"
x = re.sub("\s", '-", str1)
x2 = re.sub("John", "Mercy", str2)
print(x)
print(x2)
In the above example, we have declared two strings namely `str1` and `str2`.
We have then invoked the `sub()` function on `str1` to replace all the occurrences of whitespace with an hyphen (-).
The function has also been invoked on the string `str2` to replace all the occurrences of `John` with `Mercy`.
The code returns the following output:
+254-986-234
My name is Mercy
End Notes
In Python, a regular expression is simply a sequence of characters that check whether a string has a particular pattern or not.
They are used by Python programmers to extract information from text, files, logs,and strings.
Python comes with a module named `re` that has functions for working with regular expressions.
To use this module, add it to your Python script using the `import` statement.