Regular Expressions in Python

python regular expressions

I designed a tool a while back in Python that used sar and Solaris explorer data for capacity analysis. One of the issues I faced was needing to find data in between two regular expressions. Fortunately, Python has a powerful regular expression module called re module. Working with regexes can be daunting if you haven’t worked with it before. If you’re unfamiliar with regular pattern matching, please read this: RegEx Primer

Using the Regular Expressions (RE) Module

Three main methods of the re module are compile()match() and search(). The compile() method creates a regex object which makes searching through data much faster. match() will return a re.match object only if the beginning of the string matches the pattern. search() will find any occurrence of the pattern within the string. This is a fairly simple example in that it’s only a string being matched. Typically, the string will actually be patterns instead of simple strings. As an example, something like ^fd.ss$ is more common in pattern matching. This statement says:

  1. ^fd – find “fd” at the beginning of the line. ^ means to match at the beginning of the line.
  2. .ss after finding “fd”, match any character followed by “ss”. The . matches any one character.
  3. ss$ “ss” is the last two characters at the end of the line. $ says end of line, but not including new line characters.
import re
data_str = 'this is my search string'
srch_recomp = re.compile('string')
# Match won't find anything since 'string' is not at the beginning of data_str
regex_found = re.match(srch_recomp, data) 
type(regex_found)

regex_found = re.search(srch_recomp, data_str) # Search will find the pattern in data_str
regex_found
_sre.SRE_Match object; span=(18, 24), match='string'>
In this example, we change the variable src_recomp so re.match() will find the pattern.

data_str = 'this is my search string' 
srch_recomp = re.compile('this')
regex_found = re.match(srch_recomp, data)
regex_found
_sre.SRE_Match object; span=(0, 4), match='this'>

PYTHON FORWARD SEARCH

The algorithm is fairly simple to search for data between two patterns. Using the regular expressions module, re, search for a begin string, append all of the lines in a list until end string is found. This example class is using a file, but the file object can be easily replaced with another object type. Comments in code if you don’t need the begin_re and end_re strings in the final output.

Regular expressions is a complex subject at first mostly because the pattern matching syntax is so different. Start by reading and trying simple expressions at first. For the most part, re follows standard matching syntax, so knowing grep in Linux/UNIX will transfer that knowledge into Python easily. Refer to the re documentation here: Python RE module or check Stackoverflow for examples.

import re

class LookForward():
    """
        begin_re: beginning search pattern
        end_re: end serach pattern
        file_name: File name to search for begin_re and end_re strings.
        Return: a double list of search elements
    """
    def __init__(self, begin_re, end_re, file_name):
        self.begin_re = begin_re
        self.end_re = end_re
        self.file_name = file_name

    def look_forward(self):
        """
           Method that returns a list containing lines between
           begin and end regular expressions.
        """
        return_val = []
        try:
            with open(self.file_name) as file_ctx:
                f_data = file_ctx.readlines()
        except (OSError, PermissionError) as err:
            print(f"Encountered an error while opening {self.file_name}:"
                  f" {err}")
            raise OSError
        for line in f_data:
            begin_pattern = re.compile(self.begin_re)
            begin_match = re.search(begin_pattern, line)
            final_pattern = re.compile(self.end_re)
            # if there is a match for the beginning search pattern, then 
            # start parsing until end_re is found.
            if begin_match:
                try:
                    for next_line in f_data:
                        # take next line in file append each line to 
                        # first list. strip() removes the new line char.
                        return_val.append(next_line.strip())
                        final_match = re.search(final_pattern, next_line)
                        # check if new_line is a match for end_re
                        if final_match:
                            # Uncomment the line below if the end_re should
                            # not be included in the results:
                            # return_val = return_val[:-1]
                            # break the inner loop since end_re was found
                            break
                except StopIteration:
                    continue
        return return_val

To implement this class, initialize the LookForward class by passing begin and end regular expressions and a filename to search. In the example below, “this” is the begin search string, “that” is the end search string and “text.txt” is the file that is searched for these strings.

lf_data = LookForward("this", "that", "test.txt")
lf_data.look_forward()
output = lf_data.look_forward()
for lines in output:
    print(lines)

Checkout the other Python related articles here.