Python Filter List of Strings Using a Wildcard : Chris

Python Filter List of Strings Using a Wildcard
by: Chris
blow post content copied from  Be on the Right Side of Change
click here to view original post


5/5 - (1 vote)

💡Problem Formulation: In Python, a common task is to filter a list of strings based on some pattern which may include wildcards (also called asterisks or star * operators). A wildcard character can represent one or multiple characters, making it a powerful tool for string pattern matching.

Let’s look at several methods to do so in Python.

Method 1: Using fnmatch.filter()

The fnmatch module, which stands for “filename match”, offers the fnmatch.filter() function that can filter a list of strings using Unix shell-style wildcards (*, ?, [seq], [!seq]).

The fnmatch library is already installed on every Python installation, so you don’t need to install it before use, just import it like so:

import fnmatch

strings = ['data1.txt', 'config.ini', 'data23.csv', 'image.png']
pattern = 'data*.txt'

filtered = fnmatch.filter(strings, pattern)
print(filtered)  
# Outputs: ['data1.txt']

In this example, fnmatch.filter() returns a list of strings matching the data*.txt pattern from the provided strings list. The wildcard * matches any sequence of characters (including no character).

Method 2: Using fnmatch.fnmatch() within List Comprehension

Instead of fnmatch.filter(), one can use fnmatch.fnmatch() function within a list comprehension to achieve similar results.

Here’s an example:

import fnmatch

strings = ['report1.doc', 'slide.ppt', 'report12.pdf', 'summary.doc']
pattern = 'report?.doc'

filtered = [s for s in strings if fnmatch.fnmatch(s, pattern)]
print(filtered)  # Outputs: ['report1.doc']

Here, the pattern 'report?.doc' uses ? to match any single character. The list comprehension builds a new list only including items for which fnmatch.fnmatch() returns True.

Method 3: Using Regular Expressions with re.match()

Regular expressions provide a more powerful way to filter strings. The re.match() function can be used for pattern matching with regular expressions instead of simple wildcards.

Here’s an example:

import re

strings = ['hello_world.py', 'helloWorld.py', 'hi_world.py']
pattern = re.compile(r'hello.*\.py')

filtered = [s for s in strings if re.match(pattern, s)]
print(filtered)  # Outputs: ['hello_world.py']

The compiled pattern r'hello.*\.py' matches strings that begin with ‘hello‘, followed by any characters (.*), and end with the '.py' extension. The escaped dot (\.) is to match the literal '.' character in filenames, ensuring it is not interpreted as a wildcard.

Method 4: Using Regular Expressions with re.findall()

re.findall() can also be used when filtering a list based on a pattern with wildcards. It finds all occurrences of the pattern in a string.

Here’s an example:

import re

strings = ['event-123-log.txt', 'error-404.txt', 'event-456-log.txt']
pattern = 'event-*-log.txt'

regex = re.compile(r'event-\d+-log\.txt')
filtered = [s for s in strings if re.findall(regex, s)]

print(filtered)  # Outputs: ['event-123-log.txt', 'event-456-log.txt']

The regex pattern r'event-\d+-log\.txt' is used to match strings with 'event-', followed by digits (\d+), and ends with '-log.txt'. re.findall() checks if the string contains the pattern.

Method 5: Using filter() with a Custom Function

A custom function can be created to encapsulate the pattern matching logic. This function can then be used with the built-in filter() function.

Here’s an example:

def matches_pattern(s):
    return s.startswith('report') and s.endswith('.txt')

strings = ['report9.txt', 'report10.txt', 'summary.txt']
filtered = list(filter(matches_pattern, strings))

print(filtered)  # Outputs: ['report9.txt', 'report10.txt']

The custom function matches_pattern checks for strings that start with 'report' and end with ‘.txt’. The filter() function applies this to each element in the list.

Bonus One-Liner Method 6: Comprehensions with in Operator

Sometimes you might want to check if a substring is present in each string. You can use a concise one-liner using list comprehensions with the in operator.

Here’s an example:

strings = ["apple.txt", "banana.csv", "apple_data.csv"]
filtered = [s for s in strings if "apple" in s]

print(filtered)  
# Outputs: ['apple.txt', 'apple_data.csv']

In this code, we simply check if the substring ‘apple’ is contained within each string and construct a filtered list with those that contain it.

Summary/Discussion

Filtering a list of strings using wildcard patterns in Python can be accomplished with several different approaches, each with its advantages.

The fnmatch module’s functions are straightforward for simple cases, especially when dealing with file patterns.

Regular expressions offer powerful capabilities for more sophisticated patterns that require fine-grained matching criteria.

The classic filter() function, along with custom logic, can be used for more control over the matching process.

When performance is key, using compiled regular expressions or opting for methods that avoid re-evaluating the same pattern multiple times can offer optimization.

👉 Python Regex Superpower — Ultimate Guide to Regular Expressions in Python


February 05, 2024 at 02:31AM
Click here for more details...

=============================
The original post is available in Be on the Right Side of Change by Chris
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce