I had a bug that was difficult to trace down. I had a double list that I removed some of the elements using the remove()
method, however, not all of the elements were removed. In fact, the code was removing only every other element. The bug turned out to be the call to remove()
would shorten the list by one, and thus cause a skipping effect. For example:
some_lst = [['a','b'],['c','d'],['e','f']] for i in some_lst: if re.match(regexToMatch, i): some_lst.remove(i)
I needed to rewind the list if remove()
was called. At first, I thought this would be perfect for recursion. As it turned out, with the amount of data I needed the algorithm to search through, recursion was not a the best solution.
A recursive algorithm is one that calls itself. These functions will additionally have a decrementing counter or meet some condition to exit the recursive call. Having an exit condition is required since the function calls itself, otherwise, the function will loop until the system runs out of memory. This is a fast algorithm, however, it is not suited for all conditions. For example, small datasets recursion is best with regards to performance, but as the data grows, performance decreases quickly. This is because memory is consumed and the process incurs a context switch for every function call. The code below worked great on a small subset of data, but did not scale to what was needed for the project.
import re def remove_processes(the_list: list): for (hostname, process) in the_list: for line in exclude_processes: try: if re.search(line.strip(), process): the_list.remove([hostname,process]) remove_processes(theList) except ValueError: continue return the_list
Eventually, I replaced the for loop with while()
and a counter. When a match is found, count
is subtracted by 1 so every element in theList is evaluated. This also removes any duplicates.
def remove_processes(the_list): count = 0 while count <= len(the_list) - 1: hostname = the_list[count][0] process = the_list[count][1] count += 1 try: if re.search(exclude_process, process): the_list.remove([hostname, process]) # reduce the count by 1 which resolves skipping # every other element count -= 1 except ValueError: continue return the_list
Using Recursion with JSON
Here is another example of recursion. recurs()
searches for k in a JSON document. If key
isn’t found, it continues to the next element until all elements are searched. Once key
is found, the function updates the value to “changed”. I used this function to remove passwords in a JSON document so it could be sent externally.
def recurs(k: str, json_doc: dict): """ Takes a string, k and searches through json_doc :param k: search string :param json_doc: JSON / dictionary to search through for k """ for i in json_doc.keys(): if isinstance(json_doc[i], list): for p in range(len(json_doc[i])): json_doc[i][p][k] = 'changed' elif isinstance(json_doc[i], str): if k in json_doc: json_doc[k] = 'changed' elif isinstance(json_doc[i], dict): json_doc[i][k] = 'changed' try: for v in json_doc[i].values(): if isinstance(v, dict): return recurs(k, v) except AttributeError: pass return json_doc