Python Filter List of Strings Using a Wildcard : Chris
by: Chris
blow post content copied from Be on the Right Side of Change
click here to view original post
Problem Formulation: In Python, a common task is to filter a list of strings based on some pattern which may include wildcards (also called asterisks or star
*
operators). A wildcard character can represent one or multiple characters, making it a powerful tool for string pattern matching.
Let’s look at several methods to do so in Python.
Method 1: Using fnmatch.filter()
The fnmatch
module, which stands for “filename match”, offers the fnmatch.filter()
function that can filter a list of strings using Unix shell-style wildcards (*
, ?
, [seq]
, [!seq]
).
The fnmatch
library is already installed on every Python installation, so you don’t need to install it before use, just import it like so:
import fnmatch strings = ['data1.txt', 'config.ini', 'data23.csv', 'image.png'] pattern = 'data*.txt' filtered = fnmatch.filter(strings, pattern) print(filtered) # Outputs: ['data1.txt']

In this example, fnmatch.filter()
returns a list of strings matching the data*.txt
pattern from the provided strings
list. The wildcard *
matches any sequence of characters (including no character).
Method 2: Using fnmatch.fnmatch() within List Comprehension
Instead of fnmatch.filter()
, one can use fnmatch.fnmatch()
function within a list comprehension to achieve similar results.
Here’s an example:
import fnmatch strings = ['report1.doc', 'slide.ppt', 'report12.pdf', 'summary.doc'] pattern = 'report?.doc' filtered = [s for s in strings if fnmatch.fnmatch(s, pattern)] print(filtered) # Outputs: ['report1.doc']
Here, the pattern 'report?.doc'
uses ?
to match any single character. The list comprehension builds a new list only including items for which fnmatch.fnmatch()
returns True
.
Method 3: Using Regular Expressions with re.match()
Regular expressions provide a more powerful way to filter strings. The re.match()
function can be used for pattern matching with regular expressions instead of simple wildcards.
Here’s an example:
import re strings = ['hello_world.py', 'helloWorld.py', 'hi_world.py'] pattern = re.compile(r'hello.*\.py') filtered = [s for s in strings if re.match(pattern, s)] print(filtered) # Outputs: ['hello_world.py']
The compiled pattern r'hello.*\.py'
matches strings that begin with ‘hello
‘, followed by any characters (.*
), and end with the '.py'
extension. The escaped dot (\.
) is to match the literal '.'
character in filenames, ensuring it is not interpreted as a wildcard.
Method 4: Using Regular Expressions with re.findall()
re.findall()
can also be used when filtering a list based on a pattern with wildcards. It finds all occurrences of the pattern in a string.
Here’s an example:
import re strings = ['event-123-log.txt', 'error-404.txt', 'event-456-log.txt'] pattern = 'event-*-log.txt' regex = re.compile(r'event-\d+-log\.txt') filtered = [s for s in strings if re.findall(regex, s)] print(filtered) # Outputs: ['event-123-log.txt', 'event-456-log.txt']
The regex pattern r'event-\d+-log\.txt'
is used to match strings with 'event-'
, followed by digits (\d+
), and ends with '-log.txt'
. re.findall()
checks if the string contains the pattern.
Method 5: Using filter() with a Custom Function
A custom function can be created to encapsulate the pattern matching logic. This function can then be used with the built-in filter()
function.
Here’s an example:
def matches_pattern(s): return s.startswith('report') and s.endswith('.txt') strings = ['report9.txt', 'report10.txt', 'summary.txt'] filtered = list(filter(matches_pattern, strings)) print(filtered) # Outputs: ['report9.txt', 'report10.txt']
The custom function matches_pattern
checks for strings that start with 'report'
and end with ‘.txt’. The filter()
function applies this to each element in the list.
Bonus One-Liner Method 6: Comprehensions with in
Operator
Sometimes you might want to check if a substring is present in each string. You can use a concise one-liner using list comprehensions with the in
operator.
Here’s an example:
strings = ["apple.txt", "banana.csv", "apple_data.csv"] filtered = [s for s in strings if "apple" in s] print(filtered) # Outputs: ['apple.txt', 'apple_data.csv']
In this code, we simply check if the substring ‘apple’ is contained within each string and construct a filtered list with those that contain it.
Summary/Discussion
Filtering a list of strings using wildcard patterns in Python can be accomplished with several different approaches, each with its advantages.
The fnmatch
module’s functions are straightforward for simple cases, especially when dealing with file patterns.
Regular expressions offer powerful capabilities for more sophisticated patterns that require fine-grained matching criteria.
The classic filter()
function, along with custom logic, can be used for more control over the matching process.
When performance is key, using compiled regular expressions or opting for methods that avoid re-evaluating the same pattern multiple times can offer optimization.
Python Regex Superpower — Ultimate Guide to Regular Expressions in Python
February 05, 2024 at 02:31AM
Click here for more details...
=============================
The original post is available in Be on the Right Side of Change by Chris
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Post a Comment