Techno Blender
Digitally Yours.

Develop Your Own Spelling Check Toolkit with Python | by Bharath K | Jan, 2023

0 40


Photo by Olesia 🇺🇦 Buyar on Unsplash

Whenever I start writing articles or any other work-related items, my primary focus is to put out my ideas and compile them into a document or piece of paper. During this process, I often find myself running into spelling errors or grammatical mistakes.

Hence, it is a fantastic idea to build your own spell-check software, especially if you run into similar problems and want to optimize your work while your priority of concentration revolves around generating and developing your ideas.

While several tools already serve this purpose, the advantage of building your own software is that you can customize the project for additional improvements. Several additions, such as an interactive environment (built with Tkinter or other similar libraries), natural language processing techniques (like autocorrect), and many other extra functionalities, can be considered to enhance the project further.

It is also critical to note that while there are several enhancements that you can make to the project, it is difficult for the artificial intelligence built to understand the true semantic meaning behind sentences. Hence, statements of humor, sarcasm, or generic phrases might be misunderstood by the software built. We will look at tackling these challenges in future articles.

A project that I would encourage the readers to check out before proceeding with this article is a guide on how to build your language filter with Python from the link provided below. We can compile the work in this article and the previous one to further upgrade the project, where certain inappropriate slang words might be censored while pointing out spelling errors.

Photo by Dariusz Sankowski on Unsplash

In this section of the article, we will work on constructing the application to highlight the spellings accordingly with their appropriate colors to signify if they are right or wrong (Green highlighting a correct word while red signifying a possible error). Each sub-section of this article will cover all the major components of the project.

Our primary objective is to develop appropriate spell-check software for the user. Hence, we will not convert the wrong words directly into the closest suggested word, as in the case of an autocorrect project. We will look at such a task in a future article! For now, let us get started with the spelling check application with Python.

Importing the essential libraries:

The essential libraries that we will utilize for this task are the natural language processing toolkit (NLTK) and its corresponding word corpus containing a list of the widespread English words. Note that if the developers wish to, they can choose to create their own dictionary containing all the words they want to add to their vocabulary. However, this process may be quite tedious but worth the pain for the specific task.

The other libraries that we will utilize for this project are the term color and regular expressions modules. All the libraries mentioned in this article can be installed with a straightforward pip install command. The regular expressions library helps us to pre-process the unnecessary contents from the particular sentence to only focus on the entered words. On the other hand, the term color library helps to segregate the right and wrong words by assigning the appropriate color.

Below is the list of all the required libraries that must be imported to get started with the project.

# Importing the essential libraries for the required natural language processing task
import nltk
from nltk.corpus import words
from termcolor import colored
import re

Pre-processing the input sentence:

In this section of the project, we will focus on accepting an input sentence from the user to spell test the sentence appropriately. Using the regular expressions substitute command, we will replace punctuations and other special characters with a blank space. We do the following step to prevent the inclusion of these characters along with the words. We can also convert the characters into lower case and prepare to evaluate the sentence. The code block for performing the following actions is provided below.

# Accepting the input sentence by the user
sentence = input("Type in your sentence: \n")

# Modifying the sentence for further processing
new_sentence = re.sub('[^A-Za-z0-9 ]+', '', sentence)
final_sentence = new_sentence.lower()
# print(final_sentence)

word_list = []
print("\n")
print("Evaluated Sentence: ")

Creating the sequential pattern for spell checking the data:

In this section, we will split the sentence and verify the spelling of each word individually. We can use the split command to split each word based on spaces. Note that since we have already pre-processed the sentence in the previous steps, all punctuations and special characters are removed accordingly.

The next step is to check if each word in the sentence exists in the list of the nltk words package. A word with no inclusion of the respective word in the package will be printed in red using the term color library functionality that we previously imported. All the correct words will be interpreted in green, and the sentence will be provided to the user. Below is the code block for computing the following process.

# Creating the loop for checking the spelling
for word in final_sentence.split():
# print(word)
if word not in words.words():
print(colored(word, "red"), end = " ")

else:
word_list.append(word)
print(colored(word, "green"), end = " ")

print("\n")
print(f"Words in red may be typed incorrectly. Please check the spelling!")

Once we finish coding the program, we can proceed to test the output via the command prompt or a local terminal in the interactive development environment.

Testing the output:

Screenshot by Author

Once we have finished coding our project, we can test the output by entering a random sentence and experimenting with how the program works. In about 30 lines of code, we can notice that we have been successfully able to detect a pattern of words that may be spelled incorrectly by marking them in red while all the correctly spelled words are marked in green.

The program helps to deduce an error in the spelling of a particular sentence or paragraph, but there are several improvements that we can make to further make this project much better. We will cover a few more of the additional improvements that curious developers can explore in the upcoming section.

Additional improvements:

In this section, we will look at a few of the many improvements that we can add to improve this project further. A few noteworthy additions would be as follows for developers to start working on next —

  1. Adding a language filter, as discussed in the previous section, to censor foul language or any other inappropriate slang words makes it a project that can be deployed on an effective framework.
  2. Using deep learning and natural language processing for the inclusion of autocorrect techniques and next-word predictions.
  3. Developing a user interface for the following project instead of working on a command terminal or the compiler of the IDE. I have provided a list of the seven best UI graphics tools available in Python for the efficient development of your Projects with some starter codes that you can check out from the link below.
Photo by Aaron Burden on Unsplash

“It is a damn poor mind that can think of only one way to spell a word.”
Andrew Jackson

Typing or writing are essential elements in most people’s lifestyles. While typing numerous words, sentences, and paragraphs, it is not uncommon to run into different types of spelling mistakes ranging from slightly longer ones to mediocre ones to the simplest spelling mistakes. While there are several tools that pinpoint these errors, it is extremely satisfying to build your own custom spell-check application that can be further upgraded to be the most suitable device for your liking.

In this article, we learned how to build a simple spell-checking software application with Python in about 30 lines of code. We made use of the natural language processing toolkit library to simplify the process of accumulating most of the reasonable English words available in a typical dictionary. We utilized regular expressions for streamlining the data and used the term color library to highlight the right and wrong words accordingly.

If you want to get notified about my articles as soon as they go up, check out the following link to subscribe for email recommendations. If you wish to support other authors and me, then subscribe to the below link.

If you have any queries related to the various points stated in this article, then feel free to let me know in the comments below. I will try to get back to you with a response as soon as possible.

Check out some of my other articles in relation to the topic covered in this piece that you might also enjoy reading!

Thank you all for sticking on till the end. I hope all of you enjoyed reading the article. Wish you all a wonderful day!


Photo by Olesia 🇺🇦 Buyar on Unsplash

Whenever I start writing articles or any other work-related items, my primary focus is to put out my ideas and compile them into a document or piece of paper. During this process, I often find myself running into spelling errors or grammatical mistakes.

Hence, it is a fantastic idea to build your own spell-check software, especially if you run into similar problems and want to optimize your work while your priority of concentration revolves around generating and developing your ideas.

While several tools already serve this purpose, the advantage of building your own software is that you can customize the project for additional improvements. Several additions, such as an interactive environment (built with Tkinter or other similar libraries), natural language processing techniques (like autocorrect), and many other extra functionalities, can be considered to enhance the project further.

It is also critical to note that while there are several enhancements that you can make to the project, it is difficult for the artificial intelligence built to understand the true semantic meaning behind sentences. Hence, statements of humor, sarcasm, or generic phrases might be misunderstood by the software built. We will look at tackling these challenges in future articles.

A project that I would encourage the readers to check out before proceeding with this article is a guide on how to build your language filter with Python from the link provided below. We can compile the work in this article and the previous one to further upgrade the project, where certain inappropriate slang words might be censored while pointing out spelling errors.

Photo by Dariusz Sankowski on Unsplash

In this section of the article, we will work on constructing the application to highlight the spellings accordingly with their appropriate colors to signify if they are right or wrong (Green highlighting a correct word while red signifying a possible error). Each sub-section of this article will cover all the major components of the project.

Our primary objective is to develop appropriate spell-check software for the user. Hence, we will not convert the wrong words directly into the closest suggested word, as in the case of an autocorrect project. We will look at such a task in a future article! For now, let us get started with the spelling check application with Python.

Importing the essential libraries:

The essential libraries that we will utilize for this task are the natural language processing toolkit (NLTK) and its corresponding word corpus containing a list of the widespread English words. Note that if the developers wish to, they can choose to create their own dictionary containing all the words they want to add to their vocabulary. However, this process may be quite tedious but worth the pain for the specific task.

The other libraries that we will utilize for this project are the term color and regular expressions modules. All the libraries mentioned in this article can be installed with a straightforward pip install command. The regular expressions library helps us to pre-process the unnecessary contents from the particular sentence to only focus on the entered words. On the other hand, the term color library helps to segregate the right and wrong words by assigning the appropriate color.

Below is the list of all the required libraries that must be imported to get started with the project.

# Importing the essential libraries for the required natural language processing task
import nltk
from nltk.corpus import words
from termcolor import colored
import re

Pre-processing the input sentence:

In this section of the project, we will focus on accepting an input sentence from the user to spell test the sentence appropriately. Using the regular expressions substitute command, we will replace punctuations and other special characters with a blank space. We do the following step to prevent the inclusion of these characters along with the words. We can also convert the characters into lower case and prepare to evaluate the sentence. The code block for performing the following actions is provided below.

# Accepting the input sentence by the user
sentence = input("Type in your sentence: \n")

# Modifying the sentence for further processing
new_sentence = re.sub('[^A-Za-z0-9 ]+', '', sentence)
final_sentence = new_sentence.lower()
# print(final_sentence)

word_list = []
print("\n")
print("Evaluated Sentence: ")

Creating the sequential pattern for spell checking the data:

In this section, we will split the sentence and verify the spelling of each word individually. We can use the split command to split each word based on spaces. Note that since we have already pre-processed the sentence in the previous steps, all punctuations and special characters are removed accordingly.

The next step is to check if each word in the sentence exists in the list of the nltk words package. A word with no inclusion of the respective word in the package will be printed in red using the term color library functionality that we previously imported. All the correct words will be interpreted in green, and the sentence will be provided to the user. Below is the code block for computing the following process.

# Creating the loop for checking the spelling
for word in final_sentence.split():
# print(word)
if word not in words.words():
print(colored(word, "red"), end = " ")

else:
word_list.append(word)
print(colored(word, "green"), end = " ")

print("\n")
print(f"Words in red may be typed incorrectly. Please check the spelling!")

Once we finish coding the program, we can proceed to test the output via the command prompt or a local terminal in the interactive development environment.

Testing the output:

Screenshot by Author

Once we have finished coding our project, we can test the output by entering a random sentence and experimenting with how the program works. In about 30 lines of code, we can notice that we have been successfully able to detect a pattern of words that may be spelled incorrectly by marking them in red while all the correctly spelled words are marked in green.

The program helps to deduce an error in the spelling of a particular sentence or paragraph, but there are several improvements that we can make to further make this project much better. We will cover a few more of the additional improvements that curious developers can explore in the upcoming section.

Additional improvements:

In this section, we will look at a few of the many improvements that we can add to improve this project further. A few noteworthy additions would be as follows for developers to start working on next —

  1. Adding a language filter, as discussed in the previous section, to censor foul language or any other inappropriate slang words makes it a project that can be deployed on an effective framework.
  2. Using deep learning and natural language processing for the inclusion of autocorrect techniques and next-word predictions.
  3. Developing a user interface for the following project instead of working on a command terminal or the compiler of the IDE. I have provided a list of the seven best UI graphics tools available in Python for the efficient development of your Projects with some starter codes that you can check out from the link below.
Photo by Aaron Burden on Unsplash

“It is a damn poor mind that can think of only one way to spell a word.”
Andrew Jackson

Typing or writing are essential elements in most people’s lifestyles. While typing numerous words, sentences, and paragraphs, it is not uncommon to run into different types of spelling mistakes ranging from slightly longer ones to mediocre ones to the simplest spelling mistakes. While there are several tools that pinpoint these errors, it is extremely satisfying to build your own custom spell-check application that can be further upgraded to be the most suitable device for your liking.

In this article, we learned how to build a simple spell-checking software application with Python in about 30 lines of code. We made use of the natural language processing toolkit library to simplify the process of accumulating most of the reasonable English words available in a typical dictionary. We utilized regular expressions for streamlining the data and used the term color library to highlight the right and wrong words accordingly.

If you want to get notified about my articles as soon as they go up, check out the following link to subscribe for email recommendations. If you wish to support other authors and me, then subscribe to the below link.

If you have any queries related to the various points stated in this article, then feel free to let me know in the comments below. I will try to get back to you with a response as soon as possible.

Check out some of my other articles in relation to the topic covered in this piece that you might also enjoy reading!

Thank you all for sticking on till the end. I hope all of you enjoyed reading the article. Wish you all a wonderful day!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment