How to Count a Specific Word in a Text File in Python? : Chris

How to Count a Specific Word in a Text File in Python?
by: Chris
blow post content copied from  Be on the Right Side of Change
click here to view original post


5/5 - (1 vote)

Problem Formulation

💡 Problem Formulation: The goal is to determine how many times a word appears throughout the text.

Given:

  • A text file (example.txt) containing a body of text.
  • A specific word to search for within this text (e.g., "Python").

Goal:

  • Write a Python program that reads the content of example.txt.
  • Counts and returns the number of times the specified word ("Python") appears in the text.
  • The word comparison should be case-insensitive, meaning "Python", "python", and "PYTHON" would all be counted as occurrences of the same word.
  • Words should be considered as sequences of characters separated by whitespace or punctuation marks. For instance, "Python," (with a comma) and "Python" (without a comma) should be treated as the same word.

Example: Consider the text file example.txt with the following content:

💾 example.txt

Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.

If the word to search for is "Python", the program should output a count of 5, as the word "Python" (in various cases) appears five times in the text.

Method 1: Using the split() Function

The simplest way to count a specific word in a text file is by reading the file’s content into a string, converting it to lowercase (to make the search case-insensitive), and then using the split() function to break the string into words. After that, you can use the count() method to find the occurrences of the specified word.

def count_word_in_file(file_path, word):
    with open(file_path, 'r') as file:
        text = file.read().lower()
    words = text.split()
    return words.count(word.lower())

print(count_word_in_file('example.txt', 'Python'))

This code opens the file example.txt in read mode, reads its content, and converts it into lowercase. Then, it splits the content into a list of words and counts how many times the specified word appears in the list.

Method 2: Using Regular Expressions

For more control over what constitutes a word (e.g., ignoring punctuation), you can use the re module. This approach allows you to define a word more accurately by using regular expressions.

import re

def count_word_in_file_regex(file_path, word):
    with open(file_path, 'r') as file:
        text = file.read().lower()
    word_pattern = fr'\b{re.escape(word.lower())}\b'
    return len(re.findall(word_pattern, text))

print(count_word_in_file_regex('example.txt', 'Python'))

Here, the re.findall() function searches for all non-overlapping occurrences of the specified word, considering word boundaries (\b), making it more accurate for word matching. re.escape() is used to escape the word, making sure it’s treated as a literal string in the regular expression.

Method 3: Using the collections.Counter Class

The collections module provides a Counter class that can be extremely useful for counting word frequencies in a text. This method involves reading the text, splitting it into words, and then passing the list of words to Counter to get a dictionary-like object where words are keys and their counts are values.

from collections import Counter
import re

def count_word_in_file_counter(file_path, word):
    with open(file_path, 'r') as file:
        text = file.read().lower()
    words = re.findall(r'\b\w+\b', text)
    word_counts = Counter(words)
    return word_counts[word.lower()]

print(count_word_in_file_counter('example.txt', 'Python'))

This method uses regular expressions to split the text into words in a way that excludes punctuation. Then, it uses Counter to count occurrences of each word. Finally, it returns the count of the specified word.

Method 4: Using a Loop and Dictionary

If you want to avoid importing any additional modules, you can manually count occurrences of each word using a loop and a dictionary. This method provides a good understanding of how word counting works under the hood.

def count_word_in_file_dict(file_path, word):
    word_counts = {}
    with open(file_path, 'r') as file:
        for line in file:
            for word in line.lower().split():
                word_counts[word] = word_counts.get(word, 0) + 1
    return word_counts.get(word.lower(), 0)

print(count_word_in_file_dict('example.txt', 'Python'))

This code reads the file line by line, splits each line into words, and uses a dictionary to keep track of word counts. The get() method is used to update counts, providing a default of 0 if the word isn’t already in the dictionary.

Method 5: Using the pandas Library

For those who are working with data analysis, the pandas library can be a powerful tool for text processing. This method involves reading the entire file into a pandas DataFrame and then using pandas methods to count the word occurrences.

import pandas as pd

def count_word_in_file_pandas(file_path, word):
    df = pd.read_csv(file_path, sep='\t', header=None)
    all_words = pd.Series(df[0].str.cat(sep=' ').lower().split())
    return all_words[all_words == word.lower()].count()

print(count_word_in_file_pandas('example.txt', 'Python'))

This code reads the text file as if it were a CSV file with a single column, concatenates all lines into a single string, splits this string into words, and then counts the occurrences of the specified word using pandas Series methods.

Bonus One-Liner Method 6: Using Path and List Comprehension

For a succinct approach, you can combine the Path object from the pathlib module with list comprehension. This one-liner is efficient and Pythonic.

from pathlib import Path

def count_word_in_file_oneliner(file_path, word):
    return Path(file_path).read_text().lower().split().count(word.lower())

print(count_word_in_file_oneliner('example.txt', 'Python'))

This method reads the file content as a string, lowers its case, splits it into words, and counts the occurrences of the specified word, all in one line.



February 03, 2024 at 05:06PM
Click here for more details...

=============================
The original post is available in Be on the Right Side of Change by Chris
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce