Monthly Archives: January 2013

C++ || Simple Spell Checker Using A Hash Table


 

Click Here For Updated Version Of Program


The following is another programming assignment which was presented in a C++ Data Structures course. This assignment was used to gain more experience using hash tables.

REQUIRED KNOWLEDGE FOR THIS PROGRAM

Hash Table - What Is It?
How To Create A Spell Checker
How To Read Data From A File
Strtok - Split Strings Into Tokens
#include 'HashTable.h'
The Dictionary File - Download Here

== OVERVIEW ==

This program first reads words from a dictionary file, and inserts them into a hash table.

The dictionary file consists of a list of 62,454 correctly spelled lowercase words, separated by whitespace. The words are inserted into the hash table, with each bucket growing dynamically as necessary to hold all of the incoming data.

After the reading of the dictionary file is complete, the program prompts the user for input. After input is obtained, each word that the user enteres into the program is looked up within the hash table to see if it exists. If the user entered word exists within the hash table, then that word is spelled correctly. If not, a list of possible suggested spelling corrections is displayed to the screen.

== HASH TABLE STATISTICS ==

To better understand how hash tables work, this program reports the following statistics to the screen:

• The total size of the hash table.
• The size of the largest hash table bucket.
• The size of the smallest hash table bucket.
• The total number of buckets used.
• The average hash table bucket size.

A timer is used in this program to time (in seconds) how long it takes to read in the dictionary file. The program also saves each hash table bucket into a separate output .txt file. This is used to further visualize how the hash table data is internally being stored within memory.

== SPELL CHECKING ==

The easiest way to generate corrections for a spell checker is via a trial and error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary.

Example:


wird: bird gird ward word wild wind wire wiry

Traditionally, spell checkers look for four possible errors: a wrong letter (“wird”), also knows as alteration. An inserted letter (“woprd”), a deleted letter (“wrd”), or a pair of adjacent transposed letters (“wrod”).

The easiest of which is checking for a wrong letter. For example, if a word isnt found in the dictionary, all variants of that word can be looked up by changing one letter. Given the user input “wird,” a one letter variant can be “aird”, “bird”, “cird”, etc. through “zird.” Then “ward”, “wbrd”, “wcrd” through “wzrd”, can be checked, and so forth. Whenever a match is found within the dictionary, the spelling correction should be displayed to the screen.

For a detailed analysis how the other methods can be constructed, click here.

===== SIMPLE SPELL CHECKER =====

This program uses a custom template.h class. To obtain the code for that class, click here.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Remember to include the data file.

Once compiled, you should get this as your output

Loading dictionary....Complete!

--------------------------------------------------
Total dictionary words = 61286
Hash table size = 19000
Largest bucket size = 13 items at index #1551
Smallest bucket size = 1 items at index #11
Total buckets used = 18217
Total percent of hash table used = 95.8789%
Average bucket size = 3.36422 items

Dictionary loaded in 1.861 secs!
--------------------------------------------------

>> Please enter a sentence: wird

** wird: bird, gird, ward, weird, word, wild, wind, wire, wired, wiry,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: woprd

** woprd: word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: wrd

** wrd: ard, ord, ward, wed, word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: wrod

** wrod: brod, trod, wood, rod, word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: New! Safe and efective

** efective: defective, effective, elective,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: This is a sentance with no corections gygyuigigigiug

** sentance: sentence,

** corections: corrections,

** gygyuigigigiug: No spelling suggestion found...

Number of words spelled incorrectly: 3

Do you want to enter another sentence? (y/n): n

BYE!!

C++ || Custom Template Hash Table With Iterator Using Separate Chaining

Looking for sample code for a Hash Map? Click here!

Before we get into the code, what is a Hash Table? Simply put, a Hash Table is a data structure used to implement an associative array; one that can map unique “keys” to specific values. While a standard array requires that indice subscripts be integers, a hash table can use a floating point value, a string, another array, or even a structure as the index. That index is called the “key,” and the contents within the array at that specific index location is called the value. A hash table uses a hash function to generate an index into the table, creating buckets or slots, from which the correct value can be found.

To illustrate, compare a standard array full of data (100 elements). If the position was known for the specific item that we wanted to access within the array, we could quickly access it. For example, if we wanted to access the data located at index #5 in the array, we could access it by doing:


array[5]; // do something with the data

Here, we dont have to search through each element in the array to find what we need, we just access it at index #5. The question is, how do we know that index #5 stores the data that we are looking for? If we have a large set of data, doing a sequential search over each item within the array can be very inefficient. That is where hashing comes in handy. Given a “key,” we can apply a hash function to a unique index or bucket to find the data that we wish to access.

So in essence, a hash table is a data structure that stores key/value pairs, and is typically used because they are ideal for doing a quick search of items.

Though hashing is ideal, it isnt perfect. It is possible for multiple items to be hashed into the same location. Hash “collisions” are practically unavoidable when hashing large data sets. The code demonstrated on this page handles collisions via separate chaining, utilizing an array of linked list head nodes to store multiple values within one bucket – should any collisions occur.

An iterator was also implemented, making data access that much more simple within the hash table class. Click here for an overview demonstrating how custom iterators can be built.

Looking for sample code for a Hash Map? Click here!

=== CUSTOM TEMPLATE HASH TABLE WITH ITERATOR ===

QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The iterator class starts on line #368, and is built to support most of the standard relational operators, as well as arithmetic operators such as ‘+,+=,++’ (pre/post increment). The * (star), bracket [] and -> arrow operators are also supported. Click here for an overview demonstrating how custom iterators can be built.

The rest of the code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Looking for sample code for a Hash Map? Click here!

===== DEMONSTRATION HOW TO USE =====

Use of the above template class is the same as many of its STL template class counterparts. Here are sample programs demonstrating its use.

SAMPLE OUTPUT:

Bucket #0 has 10 items
The first element in bucket #0 is Homer

Now bucket #0 has 9 items
The first element in bucket #0 is Tamra

The unsorted items in strHash bucket #0:
it[] = Tamra
it[] = Lyndon
it[] = Johanna
it[] = Perkins
it[] = Alva
it[] = Jordon
it[] = Neville
it[] = Lawrence
it[] = Jetta

The sorted items in strHash bucket #0:
it[] = Alva
it[] = Jetta
it[] = Johanna
it[] = Jordon
it[] = Lawrence
it[] = Lyndon
it[] = Neville
it[] = Perkins
it[] = Tamra

SAMPLE OUTPUT:


strHash Bucket #0:
*it = Cinderella
*it = Perkins
*it = Krystal
*it = Roger
*it = Roger

strHash Bucket #1:
*it = Lilliam
*it = Lilliam
*it = Theda

strHash Bucket #2:
*it = Arie

strHash Bucket #3:
*it = Magda

strHash Bucket #6:
*it = Edda
*it = Irvin
*it = Kati
*it = Lyndon

strHash Bucket #7:
*it = Deb
*it = Jaime

strHash Bucket #8:
*it = Neville
*it = Victoria

strHash Bucket #9:
*it = Chery
*it = Evelia

--------------------------------------------

intHash Bucket #0:
it[] = 2449
it[] = 6135

intHash Bucket #1:
it[] = 1120
it[] = 852

intHash Bucket #2:
it[] = 5727

intHash Bucket #3:
it[] = 1174

intHash Bucket #4:
it[] = 2775
it[] = 3525
it[] = 8375

intHash Bucket #5:
it[] = 4322
it[] = 8722
it[] = 5016

intHash Bucket #6:
it[] = 5053
it[] = 7231
it[] = 1571

intHash Bucket #7:
it[] = 1666
it[] = 4510
it[] = 1548
it[] = 3646

intHash Bucket #9:
it[] = 2756

--------------------------------------------

strctHash Bucket #0:
it-> = Cherilyn
it-> = Roger

strctHash Bucket #1:
it-> = Tamra
it-> = Alex
it-> = Theda

strctHash Bucket #2:
it-> = Nigel
it-> = Alva
it-> = Arie

strctHash Bucket #4:
it-> = Basil

strctHash Bucket #5:
it-> = Tod

strctHash Bucket #6:
it-> = Irvin
it-> = Lyndon

strctHash Bucket #7:
it-> = Amina
it-> = Hillary
it-> = Kenneth
it-> = Amina

strctHash Bucket #8:
it-> = Gene
it-> = Lemuel
it-> = Gene

strctHash Bucket #9:
it-> = Albertina

Java || Snippet – How To Do Simple Math Using Integer Arrays

This page will consist of simple programs which demonstrate the process of doing simple math with numbers that are stored in an integer array.

REQUIRED KNOWLEDGE FOR THIS SNIPPET

Integer Arrays
The "Random" Class
For Loops
Assignment Operators - Simple Math Operations
Custom Setw/Setfill In Java

Note: In all of the examples on this page, a random number generator was used to place numbers into the array. If you do not know how to obtain data from the user, or if you do not know how to insert data into an array, click here for a demonstration.

===== ADDITION =====

The first code snippet will demonstrate how to add numbers together which are stored in an integer array. This example uses the “+=” assignment operator.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

SAMPLE OUTPUT

Welcome to My Programming Notes' Java Program.

Original array values:
22 26 41 89 35 90 15 99 85 5 95 86
--------------------------------------------------
The sum of the items in the array is: 688

===== SUBTRACTION =====

The second code snippet will demonstrate how to subtract numbers which are stored in an integer array. This example uses the “-=” assignment operator.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

SAMPLE OUTPUT

Welcome to My Programming Notes' Java Program.

Original array values:
99 92 91 26 1 52 98 62 51 22 64 65
--------------------------------------------------
The difference of the items in the array is: -723

===== MULTIPLICATION =====

The third code snippet will demonstrate how to multiply numbers which are stored in an integer array. This example uses the “*=” assignment operator.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

SAMPLE OUTPUT

Welcome to My Programming Notes' Java Program.

Original array values:
95 63 32 19 93 83 71 35 32 37 66 95
--------------------------------------------------
The product of the items in the array is: 494770176

===== DIVISION =====

The fourth code snippet will demonstrate how to divide numbers which are stored in an integer array. This example uses the “/=” assignment operator.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

SAMPLE OUTPUT

Welcome to My Programming Notes' Java Program.

Original array values:
28 85 90 52 1 64 93 85 4 22 4 28
--------------------------------------------------
The quotient of the items in the array is: 1.8005063061510687E-17