How I Built a Readability and Grammar Checker App Using Streamlit : Jonathan Okah

How I Built a Readability and Grammar Checker App Using Streamlit
by: Jonathan Okah
blow post content copied from  Finxter
click here to view original post


Rate this post

I will show you the steps I took to create a readability and grammar checker app using Streamlit. You can use it to improve your programming skills and add to your portfolio.

💡 Info: Streamlit is a popular open-source app framework among data scientists as it’s used for developing and deploying Machine Learning and Data Science web apps in minutes.

As we will see, Streamlit goes beyond turning data scripts into shareable web apps. Programmers use it to create anything within its capabilities. A quiz app, an anagram app, and a currency converter app are some of them.

Project Overview

A readability checker tool provides a quick way to assess the readability of a text and how readers can understand your work. This is especially helpful if you are writing a book or a blog and want to know where you need to work to improve readability for various audiences.

The Python ecosystem consists of third-party libraries and frameworks that support a particular application.

There’s no need to reinvent the wheel, as the heavy lifting is already done for us. Hence with a few libraries coupled with a bit of finishing touch from us, we will get our readability and grammar checker app up and running in no distant time.

Prerequisites

This tutorial assumes nothing more than a basic knowledge of Python programming, including functions, ifelse, and for loops.

👉 Recommended: Python Crash Course on the Finxter Blog

Although I try my best to explain the procedures, I encourage you to wrap your head around the basics because it’s not every step I have to explain. I expect you to have background knowledge already.

Importing Libraries

Before we get started, let’s import the libraries we will be using in this project.

import streamlit as st
import textstat as ts
from pdfminer.high_level import extract_text
from pdfminer.layout import LTTextContainer
from io import StringIO
import docx2txt
import requests
from bs4 import BeautifulSoup as bs
import language_tool_python

Everything above is self-explanatory. We will use textstat to check the readability of a text. We will also use io to extract text from a TXT document. The library anguage_tool_python will help us check spelling and grammar. I will explain other libraries as we proceed.

Our project is a combination of several functions and callback functions we define, which are all linked together to get the job done. So, without further ado, let’s get started.

The Main Function

Our project started with what we call the main() function which contains several options that, when selected, caused the execution of another function.

def main():
    mode = st.sidebar.selectbox('Select your option', ['Text', '.pdf', '.txt', '.docx', 'Online'])
    # a function is called depending on the mode selected
    if mode == 'Text':
        text_result()
    elif mode == '.pdf':
        upload_pdf()
    elif mode == '.txt':
        upload_txt()
    elif mode == '.docx':
        upload_docx()
    else:
        get_url()
…

if __name__ == '__main__':
    main()

We want to give our app users the option to select what form their document is, whether they want to copy and paste into the textbox or upload an e-book, or even select from a webpage. We call Streamlit to display these options as a sidebar.

At the very last of our script, we set the __name__ variable as __main__ , which is the main() function. This is to ensure it is running as soon as we open Streamlit, and not run when imported into another program.

👉 Recommended: Python __name__ == '__main__' Explained

The Textbox

If our user selects ‘Text’, the text_result() function will execute. The function calls on Streamlit to display a textbox using st.text_area labeled ‘Text Field’, and the placement stored in the text variable will appear in the textbox.

def text_result():
    text = 'Your text goes here...'

    #displaying the textbox where texts will be written
    box = st.text_area('Text Field', text, height=200)
    scan = st.button('Scan File') 

    # if button is pressed
    if scan:
        # display statistical results
        st.write('Text Statistics')
        st.write(readability_checker(box))

The function also calls on Streamlit to insert a button which when pressed causes Streamlit to display readability results using st.write.

The text_result() function sends your texts in the box variable to a callback function, readability_checker() function, and st.write() displays the result.

def readability_checker(w):
    stats = dict(
            flesch_reading_ease=ts.flesch_reading_ease(w),
            flesch_kincaid_grade=ts.flesch_kincaid_grade(w),
            automated_readability_index=ts.automated_readability_index(w),
            smog_index=ts.smog_index(w),
            coleman_liau_index=ts.coleman_liau_index(w),
            dale_chall_readability_score=ts.dale_chall_readability_score(w),
            linsear_write_formula=ts.linsear_write_formula(w),
            gunning_fog=ts.gunning_fog(w),
            word_count=ts.lexicon_count(w),
            difficult_words=ts.difficult_words(w),
            text_standard=ts.text_standard(w),
            sentence_count=ts.sentence_count(w),
            syllable_count=ts.syllable_count(w),
            reading_time=ts.reading_time(w)
    )
    return stats

So what this text_result() does is to accept input and, when prompted, send the input to the readability_checker() function to scan and return results in the form of a dictionary.

👉 Recommended Tutorial: Python Dictionary – Ultimate Guide

That’s all it takes to set up our readability checker app.

Had it been we had only this option in our main function, we would have called it a day. But we want to give our users more options to make a choice. But, the more features we add, the more Python scripts we need to write to execute such features.

PDF Mode

Back to our main() function. if our users select the pdf option, the upload_pdf() function will execute.

def upload_pdf():
    file = st.sidebar.file_uploader('Choose a file', type='pdf')
    if file is not None:
        pdf = extract_text(file)
        #sending the text to textbox
        document_result(pdf)

This function calls Streamlit to produce a file uploader to enable us to upload a PDF file. And when we upload the file, the extract_text() function from pdfminer does the heavy lifting for us. By default, Streamlit accepts all file extensions. By specifying the type, it allows only such.

The Setback

I wanted to make this process as seamless as possible.

What I wanted to do was to call on pdfminer library to extract the text, and send it to the readability_checker() which scans and produces the result that will appear using st.write() without ever seeing the content of the file.

I wasn’t able to do so. Hence, I will appreciate anyone who can reach out to me (1) with a solution to this problem.

A Workaround

I wasn’t deterred, though.

Since there are so many ways to kill a rat, I found a workaround with a little help from Streamlit. I benefited from Streamlit’s ability to display text as a placement in a textbox, as seen in our text_result() function.

So, I created a function like text_result() but with a parameter that will collect the very text extracted from the PDF file and have it displayed in the textbox.

Give me a round of applause. That’s my feat of engineering! Alright, let’s implement it.

def document_result(file):

    #displaying the textbox where texts will be written
    box = st.text_area('Text Field', file, height=200)
    scan = st.button('Scan Text')

    # if button is pressed
    if scan:
        # display statistical results
        st.write('Text Statistics)
        st.write(readability_checker(box))

Make sure you are using the latest version of pdfminer installed using PIP as ‘pip install pdfminer.six’.

Alright, we have passed that setback but have our PDF displayed inside the textbox, which is not bad after all.

The only downside comes from the pdfminer library. It takes time to process bulky files. You may want to try other libraries in your project.

When users choose other options in our main() function, the respective functions get executed in the same way using the libraries imported and send to the document_result() function, which, in turn, passes the file to the readability_checker() to scan. Finally, it displays the result.

You may want to check the documentation to know more about the imported libraries that help to extract the files.

The ‘Online’ Option

This option allows our users to check the readability of content found on web pages.

def get_url():
    url = st.sidebar.text_input("Paste your url")
    if url:
        get_data(url)

As usual, when we select the option, it triggers the execution of the get_url() function.

The get_url() function uses st.sidebar.text_input to provide a small-size box where you can paste your URL. Once you hit the Enter key, it sends the URL to the get_data() function.

def get_data(url):
    page = requests.get(url)
    if page.status_code != 200:
        print('Error fetching page')
        exit()
    else:
        content = page.content
    soup = bs(content, 'html.parser')
    document_result(soup.get_text())

What the get_data() function is doing is web scraping.

It requests to get the content of the URL.

👉 Recommended Tutorial: How to Get the URL Content in Python

If it is successful, it returns the content of the web page. The function then calls the BeautifulSoup library to parse the content in pure HTML form.

Using the get_text() method from BeautifulSoup, the get_data() extracts the content without any HTML tags and sends it to the document_result() function which I have explained before.

The downside of using this option is that it scrapes whatever it sees on the webpage, navigation bar, header, footer, and comments that may not be relevant for readability checking

Grammar Checker

If you have been following along, you will notice, from the above image, another button besides the readability checker button.

That is our grammar checker button. Alright, let me show you how I did it.

I erased it from the Python scripts above, so we can focus on one thing at a time. The below script is now our updated test_result() function.

def text_result():
    text = 'Your text goes here...'
    box = st.text_area('Text Field', text, height=200)
    left, right = st.columns([5, 1])
    scan = left.button('Check Readability')
    grammar = right.button('Check Gramamar')

    # if button is pressed
    if scan:
        # display statistical results
        st.write('Text Statistics')
        st.write(readability_checker(box))
    elif grammar:
        st.write(grammar_checker(box))

Streamlit’s columns() method enables us to display our buttons side by side.

By passing it a list of [5, 1], we specify the position we want the buttons to appear. Also, notice how we used left.button() instead of st.button(). This is because we want to apply the buttons to the position we have specified using the st.columns.

The if statement makes the app look flexible and neat. If we press the grammar checker button, it erases the readability result if it is already there, so it can display the grammar result.

Let us also update the document_result() function.

def document_result(file):
    box = st.text_area('Text Field', file, height=200)
    left, right = st.columns([3, .75])
    with left:
       scan = st.button('Check Readability')
    with right:
       grammar = st.button('Check Gramamar')
    # if button is pressed
    if scan:
        # display statistical results
        st.write('Text Statistics')
        st.write(readability_checker(box))
    elif grammar:
        st.write(grammar_checker(box))

Again, notice another way we use the st.columns to achieve the same result. The ‘with’ notation inserts any element in a specified position. Then comes the grammar_checker() function.

def grammar_checker(text):
    tool = language_tool_python.LanguageTool('en-US', config={'maxSpellingSuggestions': 1})
    check = tool.check(text)
    result = []
    for i in check:
        result.append(i)
        result.append(f'Error in text => {text[i.offset : i.offset + i.errorLength]}')
        result.append(f'Can be replaced with =>  {i.replacements}')
        result.append('--------------------------------------')
    return result

The LanguageTool() function checks grammatical expressions. It comes bundled in language_tool_python module but it’s also used in other programming languages.

To use it, make sure you have Java installed on your system. Once we call and save it in the tool variable, it will download everything necessary to enable your text checked for American English only. The size is 225MB excluding Java.

This is to enable you to use it offline. To use it online, please check the documentation. We added maxSpellingSuggestions to speed up the checking process, especially when dealing with millions of characters.

We appended to the ‘result’ variable to display it when called by the st.write() function. To know more about how to use the language_tool_python module, please consult the documentation.

Deployment

It would be nice to have our new app visible for others with little or no programming knowledge to see and use. Deploying the app makes that possible

If you want to deploy on Streamlit Cloud, it’s very easy. Set up a GitHub account if you have not already done so. Create and upload files to your GitHub repository.

Then, you set up a Streamlit Cloud account. Create a New App and link your GitHub account. Streamlit will do the rest.

Any changes made will reflect in the app. To avoid encountering errors while deploying your app, go to my GitHub page and observe other files I included to enable easy deployment on Streamlit Cloud.

Conclusion

This is how we come to the end of this tutorial on how I built a readability and grammar checker app using Streamlit.

I explained it in a way you can understand. You can visit my GitHub(2) page to view the full project. Also, click this link (3) to view my app live on Streamlit Cloud. Alright, that’s it. Go on, give it a try and create awesome apps.

References


January 17, 2023 at 12:31AM
Click here for more details...

=============================
The original post is available in Finxter by Jonathan Okah
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce