Tag Archives: spell checker

C++ || How To Create A Simple Predictive Spell Checker Using C++

The following is a module with functions which demonstrates how to create a simple predictive spell checker using C++.

The module demonstrated on this page is an implementation based on the Norvig Spelling Corrector.


1. Overview

In its simplest form, this process works by first telling the spelling corrector the valid words in our language. This is called “training” the corrector. Once the corrector is trained to know the valid words in our language, it tries to choose the word that is the most likely spelling correction.

This process is done by choosing the candidate with the highest probability to show up in our language. For example, should “lates” be corrected to “late” or “latest” or “lattes” or …? This choice is determined by the word which has the highest probability to appear in our language, according to the training text we provide.

Traditionally, spell checkers look for four possible errors: a wrong letter (“wird”), also knows as alteration. An inserted letter (“woprd”), a deleted letter (“wrd”), or a pair of adjacent transposed letters (“wrod”). This process is used when generating probable spelling correction candidates.

The training text used in the example on this page can be downloaded here. This file consists of 80891 distinct words, which is used to train the spelling corrector.


2. Spell Check – Basic Usage

The example below demonstrates the use of the ‘SpellChecker‘ class to load the training file and correct the spelling of a word.


3. Spell Check – Unit Tests

The example below demonstrates the use of the ‘SpellChecker‘ class to load the training file and correct the spelling of a word.

In this example, words are added to a set to determine if the most likely spelling correction is returned by the spelling corrector. The first word in the set is the word to test against (some are spelled correctly, some incorrectly). The second word in the set is the expected result that should be returned by the spelling corrector.

If no spelling suggestion is found, an empty string is returned by the spelling corrector. This is true for one of the words in the test cases below. Even though the word is spelled correctly, it is not defined in our training text, and no correction can be found.


4. SpellChecker Class

The following is the SpellChecker Class. Include this in your project to start using!

The following is the header file ‘SpellChecker.h‘.

The following is the implementation file ‘SpellChecker.cpp‘.


5. More Examples

Below are more examples demonstrating the use of the ‘SpellChecker‘ class. Don’t forget to include the module when running the examples!

QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Remember to include the training file.

C++ || Simple Spell Checker Using A Hash Table


 

Click Here For Updated Version Of Program


The following is another programming assignment which was presented in a C++ Data Structures course. This assignment was used to gain more experience using hash tables.

REQUIRED KNOWLEDGE FOR THIS PROGRAM

Hash Table - What Is It?
How To Create A Spell Checker
How To Read Data From A File
Strtok - Split Strings Into Tokens
#include 'HashTable.h'
The Dictionary File - Download Here

== OVERVIEW ==

This program first reads words from a dictionary file, and inserts them into a hash table.

The dictionary file consists of a list of 62,454 correctly spelled lowercase words, separated by whitespace. The words are inserted into the hash table, with each bucket growing dynamically as necessary to hold all of the incoming data.

After the reading of the dictionary file is complete, the program prompts the user for input. After input is obtained, each word that the user enteres into the program is looked up within the hash table to see if it exists. If the user entered word exists within the hash table, then that word is spelled correctly. If not, a list of possible suggested spelling corrections is displayed to the screen.

== HASH TABLE STATISTICS ==

To better understand how hash tables work, this program reports the following statistics to the screen:

• The total size of the hash table.
• The size of the largest hash table bucket.
• The size of the smallest hash table bucket.
• The total number of buckets used.
• The average hash table bucket size.

A timer is used in this program to time (in seconds) how long it takes to read in the dictionary file. The program also saves each hash table bucket into a separate output .txt file. This is used to further visualize how the hash table data is internally being stored within memory.

== SPELL CHECKING ==

The easiest way to generate corrections for a spell checker is via a trial and error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary.

Example:


wird: bird gird ward word wild wind wire wiry

Traditionally, spell checkers look for four possible errors: a wrong letter (“wird”), also knows as alteration. An inserted letter (“woprd”), a deleted letter (“wrd”), or a pair of adjacent transposed letters (“wrod”).

The easiest of which is checking for a wrong letter. For example, if a word isnt found in the dictionary, all variants of that word can be looked up by changing one letter. Given the user input “wird,” a one letter variant can be “aird”, “bird”, “cird”, etc. through “zird.” Then “ward”, “wbrd”, “wcrd” through “wzrd”, can be checked, and so forth. Whenever a match is found within the dictionary, the spelling correction should be displayed to the screen.

For a detailed analysis how the other methods can be constructed, click here.

===== SIMPLE SPELL CHECKER =====

This program uses a custom template.h class. To obtain the code for that class, click here.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Remember to include the data file.

Once compiled, you should get this as your output

Loading dictionary....Complete!

--------------------------------------------------
Total dictionary words = 61286
Hash table size = 19000
Largest bucket size = 13 items at index #1551
Smallest bucket size = 1 items at index #11
Total buckets used = 18217
Total percent of hash table used = 95.8789%
Average bucket size = 3.36422 items

Dictionary loaded in 1.861 secs!
--------------------------------------------------

>> Please enter a sentence: wird

** wird: bird, gird, ward, weird, word, wild, wind, wire, wired, wiry,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: woprd

** woprd: word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: wrd

** wrd: ard, ord, ward, wed, word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: wrod

** wrod: brod, trod, wood, rod, word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: New! Safe and efective

** efective: defective, effective, elective,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: This is a sentance with no corections gygyuigigigiug

** sentance: sentence,

** corections: corrections,

** gygyuigigigiug: No spelling suggestion found...

Number of words spelled incorrectly: 3

Do you want to enter another sentence? (y/n): n

BYE!!