Tag Archives: hash table

C++ || Multi Process Server & Client Hash Table Using Thread Pools, Message Queues, & Signal Handlers

The following is another homework assignment which was presented in an Operating Systems Concepts class. Using commandline arguments, the following is a program which implements a multi threaded hash table, utilizing message queues to pass text from a client program to a server program and vice versa. This program makes use of multiple interprocess communication function calls provided on Unix based systems.

REQUIRED KNOWLEDGE FOR THIS PROGRAM

How To Override The Default Signal Handler (CTRL-C)
How To Create And Use Pthreads For Interprocess Communication
How To Use Message Queues For Interprocess Communication
Multi Process Synchronization Producer Consumer Problem Using Pthreads
Sample Input Server Records File - Download Here

==== 1. OVERVIEW ====

Hash tables are widely used for efficient storage and retrieval of data. When using hash tables in multi threaded applications, you must ensure that concurrent accesses into the hash table is free from race conditions, dead locks, and other problems that arise in program synchronization.

One solution to overcome this problem is to prevent concurrent accesses into the hash table altogether; i.e. prior to accessing the hash table, a thread acquires a lock, and then releases the lock after the access is complete. Such approach, although simple, is inefficient. The program demonstrated on this page implements an alternative solution, one which permits safe concurrent accesses into the hash table. In the approach implemented on this page, each hash location within a hash table is protected with a separate lock. Hence, multiple threads access the hash table concurrently as long as they are accessing different hash locations. For greater efficiency, this program also makes use of a thread pool.

==== 2. TECHNICAL DETAILS ====

This program was implemented in two parts; a server program and a client program. The server side of the program maintains a hash table of records and a pool of threads. The client program requests from the server program search records by sending record ids over a message queue. The server program then retrieves a request from the message queue, and wakes up a thread in the thread pool. That awakened thread then uses the id (sent from the client program) to retrieve the corresponding record from the hash table, and sends the found record from the server program to the client program over the message queue.

The server also reads a specified file from the commandline, which stores initial user data that is to be inserted and stored into the hash table. The incoming text file has the following format:

a unique numerical id 1 (an integer)
the first name 1 (a string)
the last name 1 (a string)
.
.
.
a unique numerical id N (an integer)
the first name N (a string)
the last name N (a string)

These three fields make up one single record for one individual. More than one record may be present in the incoming text file.

==== 3. SERVER ====

The server has the following structure and function:

Multi Threaded Process Server Flow Control SelectShow



The server program is invoked with the following commandline structure:

./Server [FILE NAME] [NUMBER OF THREADS] (e.g. ./Server database.dat 10)

The server is implemented below.

The client program has a much easier flow of control. It is implemented below.

==== 4. CLIENT ====

The client has the following structure and function:

Multi Threaded Process Client Flow Control SelectShow


The client program was designed to sleep for 1 second every time a new record is obtained from the server. This makes it so its easier for the user to see what is being displayed on the screen.

The client program is invoked with the following commandline structure:

./client

The client is implemented below.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

See sample files named server.cpp and client.cpp which illustrate interprocess communications using message queues. See file condvar.cpp which illustrates the use of condition variables. Finally, see file signal.cpp which illustrates the overriding of default signal handlers.

The following is sample output:
(Note: remember to include the initial records input file!)

SERVER OUTPUT:

./Server INPUT_Records_programmingnotes_freeweq_com.txt 26

** SERVER ID #1015808 SUCCESSFULLY ESTABLISHED

^C

Caught the CTRL-C
Shutting down the server connection..

CLIENT OUTPUT:

./Client

** CONNECTION TO SERVER ID #1015808 SUCCESS

ID = 243 Record = Graham Basil
ID = 7943 Record = Tobias Arie
ID = 3607 Record = Claire Amina
ID = 849 Record = Jetta Victoria
ID = 126 Record = Jeramy Tod
ID = 7483 Record = Vivan Krystal
ID = 8036 Record = Lilliam Harley
ID = 1901 Record = Kati Basil
ID = 3524 Record = Kenneth Perkins
ID = 5256 Record = Jodee Albertina
ID = 7065 Record = Marylou Donn
ID = 3951 Record = Ula Domitila
ID = 395 Record = Jaime Lilliam
ID = 9234 Record = Nigel Gene
ID = 4148 Record = Carmella Evelia
ID = 9340 Record = Sang Cherilyn
ID = 3834 Record = Jessica Freddy

** SERVER CONNECTION CLOSED..

C++ || Custom Template Hash Map With Iterator Using Separate Chaining

Before we get into the code, what is a Hash Map? Simply put, a Hash Map is an extension of a Hash Table; which is a data structure used to map unique “keys” to specific “values.” The Hash Map demonstrated on this page is different from the previous Hash Table implementation in that key/value pairs do not need to be the same datatype, they can be completely different. So for example, if you wish to map a string “key” to an integer “value“, utilizing a Hash Map is ideal.

In its most simplest form, a Hash Map can be thought of as an associative array, or a “dictionary.” Hash Map’s are composed of a collection of key/value pairs, such that each possible key appears atleast once in the collection for a given value. While a standard array requires that indice subscripts be integers, a hash map can use a string, an integer, or even a floating point value as the index. That index is called the “key,” and the contents within the array at that specific index location is called the “value.” A hash map uses a hash function to generate an index into the table, creating buckets or slots, from which the correct value can be found.

To illustrate, suppose that you’re working with some data that has values associated with strings — for instance, you might have student names and you wish to assign them grades. How would you store this data? Depending on your skill level, you might use multiple arrays during the implementation. For example, in terms of a one dimensional array, if we wanted to access the data for a student located at index #25, we could access it by doing:


studentNames[25]; // do something with the data
studentGrades[25];

Here, we dont have to search through each element in the array to find what we need, we just access it at index #25. The question is, how do we know that index #25 holds the data that we are looking for? If we have a large set of data, not only will keeping track of multiple arrays become tiresome, but doing a sequential search over each item within the separate arrays can become very inefficient. That is where hashing comes in handy. Using a Hash Map, we can use the students name as the “key,” and the students grade as the data “value.” Given this “key” (the students name), we can apply a hash function to map a unique index or bucket within the hash table to find the data “value” (the students grade) that we wish to access.

So in essence, a Hash Map is an extension of a hash table, which is a data structure that stores key/value pairs. Hash tables are typically used because they are ideal for doing a quick search of items.

Though hashing is ideal, it isnt perfect. It is possible for multiple “keys” to be hashed into the same location. Hash “collisions” are practically unavoidable when hashing large data sets. The code demonstrated on this page handles collisions via separate chaining, utilizing an array of linked list head nodes to store multiple keys within one bucket – should any collisions occur.

A special feature of this current hash map class is that its implemented as a multimap, meaning that more than one “value” can be associated with a given “key.” For example, in a student enrollment system where students may be enrolled in multiple classes simultaneously, there might be an association for each enrollment where the “key” is the student ID, and the “value” is the course ID. In this example, if a given student is enrolled in three courses, there will be three associated “values” (course ID’s) for one “key” (student ID) in the Hash Map.

An iterator was also implemented, making data access that much more simple within the hash map class. Click here for an overview demonstrating how custom iterators can be built.

=== CUSTOM TEMPLATE HASH MAP WITH ITERATOR ===

QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The iterator class starts on line #381, and is built to support most of the standard relational operators, as well as arithmetic operators such as ‘+,+=,++’ (pre/post increment). The * (star), bracket [] and -> arrow operators are also supported. Click here for an overview demonstrating how custom iterators can be built.

The rest of the code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

===== DEMONSTRATION HOW TO USE =====

Use of the above template class is the same as many of its STL template class counterparts. Here are sample programs demonstrating its use.

SAMPLE OUTPUT:

The key 'CPSC' appears in the hash map 6 time(s)

The first item with the key 'CPSC' is: 386

These are all the items in the hash map whose key is 'CPSC':
Key-> CPSC Value-> 386
Key-> CPSC Value-> 462
Key-> CPSC Value-> 301
Key-> CPSC Value-> 240
Key-> CPSC Value-> 131
Key-> CPSC Value-> 120

[REMOVE THE VALUE '386' FROM THE KEY 'CPSC']

Now the key 'CPSC' only appears in the hash map 5 time(s)

These are the sorted items in the hash map whose key is 'CPSC':
Key-> CPSC Value-> 120
Key-> CPSC Value-> 131
Key-> CPSC Value-> 240
Key-> CPSC Value-> 301
Key-> CPSC Value-> 462

These are all of the items in the entire hash map:
Key-> CIS Value-> 465

Key-> DANCE Value-> 134

Key-> PE Value-> 145
Key-> PE Value-> 125

Key-> MATH Value-> 270
Key-> MATH Value-> 150

Key-> GEOL Value-> 201
Key-> GEOL Value-> 101

Key-> CPSC Value-> 120
Key-> CPSC Value-> 131
Key-> CPSC Value-> 240
Key-> CPSC Value-> 301
Key-> CPSC Value-> 462

Key-> BIOL Value-> 585
Key-> BIOL Value-> 134

Key-> ART Value-> 101
Key-> ART Value-> 345

Key-> CHEM Value-> 185

Key-> HIST Value-> 251

The total number of items in the hash map is: 19

SAMPLE OUTPUT:

'Kenneth' owns 3 cars

These are all of the cars in the hash map:
Jessica's car(s)
Car: Nissan Altima
Year: 2011
MPG: 30.7

Kenneth's car(s)
Car: Ford Fusion
Year: 2006
MPG: 28.5

Car: BMW 535i
Year: 2014
MPG: 25.4

Car: Acura Integra
Year: 2001
MPG: 20.2
-----------------------------------------------------

The total number of cars in the hash map is: 4

Sorting the cars that 'Kenneth' owns by name..

Again, these are all of the cars in the hash map:
Jessica's car(s)
Car: Nissan Altima
Year: 2011
MPG: 30.7

Kenneth's car(s)
Car: Acura Integra
Year: 2001
MPG: 20.2

Car: BMW 535i
Year: 2014
MPG: 25.4

Car: Ford Fusion
Year: 2006
MPG: 28.5
-----------------------------------------------------

'Acura Integra' has been removed from 'Kenneth's' inventory..

'Kenneth' now owns only 2 cars

These are all of the cars in the hash map with the 'Acura Integra' removed:
Jessica's car(s)
Car: Nissan Altima
Year: 2011
MPG: 30.7

Kenneth's car(s)
Car: BMW 535i
Year: 2014
MPG: 25.4

Car: Ford Fusion
Year: 2006
MPG: 28.5
-----------------------------------------------------

The total number of cars in the hash map is: 3

C++ || Simple Spell Checker Using A Hash Table


 

Click Here For Updated Version Of Program


The following is another programming assignment which was presented in a C++ Data Structures course. This assignment was used to gain more experience using hash tables.

REQUIRED KNOWLEDGE FOR THIS PROGRAM

Hash Table - What Is It?
How To Create A Spell Checker
How To Read Data From A File
Strtok - Split Strings Into Tokens
#include 'HashTable.h'
The Dictionary File - Download Here

== OVERVIEW ==

This program first reads words from a dictionary file, and inserts them into a hash table.

The dictionary file consists of a list of 62,454 correctly spelled lowercase words, separated by whitespace. The words are inserted into the hash table, with each bucket growing dynamically as necessary to hold all of the incoming data.

After the reading of the dictionary file is complete, the program prompts the user for input. After input is obtained, each word that the user enteres into the program is looked up within the hash table to see if it exists. If the user entered word exists within the hash table, then that word is spelled correctly. If not, a list of possible suggested spelling corrections is displayed to the screen.

== HASH TABLE STATISTICS ==

To better understand how hash tables work, this program reports the following statistics to the screen:

• The total size of the hash table.
• The size of the largest hash table bucket.
• The size of the smallest hash table bucket.
• The total number of buckets used.
• The average hash table bucket size.

A timer is used in this program to time (in seconds) how long it takes to read in the dictionary file. The program also saves each hash table bucket into a separate output .txt file. This is used to further visualize how the hash table data is internally being stored within memory.

== SPELL CHECKING ==

The easiest way to generate corrections for a spell checker is via a trial and error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary.

Example:


wird: bird gird ward word wild wind wire wiry

Traditionally, spell checkers look for four possible errors: a wrong letter (“wird”), also knows as alteration. An inserted letter (“woprd”), a deleted letter (“wrd”), or a pair of adjacent transposed letters (“wrod”).

The easiest of which is checking for a wrong letter. For example, if a word isnt found in the dictionary, all variants of that word can be looked up by changing one letter. Given the user input “wird,” a one letter variant can be “aird”, “bird”, “cird”, etc. through “zird.” Then “ward”, “wbrd”, “wcrd” through “wzrd”, can be checked, and so forth. Whenever a match is found within the dictionary, the spelling correction should be displayed to the screen.

For a detailed analysis how the other methods can be constructed, click here.

===== SIMPLE SPELL CHECKER =====

This program uses a custom template.h class. To obtain the code for that class, click here.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Remember to include the data file.

Once compiled, you should get this as your output

Loading dictionary....Complete!

--------------------------------------------------
Total dictionary words = 61286
Hash table size = 19000
Largest bucket size = 13 items at index #1551
Smallest bucket size = 1 items at index #11
Total buckets used = 18217
Total percent of hash table used = 95.8789%
Average bucket size = 3.36422 items

Dictionary loaded in 1.861 secs!
--------------------------------------------------

>> Please enter a sentence: wird

** wird: bird, gird, ward, weird, word, wild, wind, wire, wired, wiry,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: woprd

** woprd: word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: wrd

** wrd: ard, ord, ward, wed, word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: wrod

** wrod: brod, trod, wood, rod, word,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: New! Safe and efective

** efective: defective, effective, elective,

Number of words spelled incorrectly: 1

Do you want to enter another sentence? (y/n): y

--------------------------------------------------

>> Please enter a sentence: This is a sentance with no corections gygyuigigigiug

** sentance: sentence,

** corections: corrections,

** gygyuigigigiug: No spelling suggestion found...

Number of words spelled incorrectly: 3

Do you want to enter another sentence? (y/n): n

BYE!!

C++ || Custom Template Hash Table With Iterator Using Separate Chaining

Looking for sample code for a Hash Map? Click here!

Before we get into the code, what is a Hash Table? Simply put, a Hash Table is a data structure used to implement an associative array; one that can map unique “keys” to specific values. While a standard array requires that indice subscripts be integers, a hash table can use a floating point value, a string, another array, or even a structure as the index. That index is called the “key,” and the contents within the array at that specific index location is called the value. A hash table uses a hash function to generate an index into the table, creating buckets or slots, from which the correct value can be found.

To illustrate, compare a standard array full of data (100 elements). If the position was known for the specific item that we wanted to access within the array, we could quickly access it. For example, if we wanted to access the data located at index #5 in the array, we could access it by doing:


array[5]; // do something with the data

Here, we dont have to search through each element in the array to find what we need, we just access it at index #5. The question is, how do we know that index #5 stores the data that we are looking for? If we have a large set of data, doing a sequential search over each item within the array can be very inefficient. That is where hashing comes in handy. Given a “key,” we can apply a hash function to a unique index or bucket to find the data that we wish to access.

So in essence, a hash table is a data structure that stores key/value pairs, and is typically used because they are ideal for doing a quick search of items.

Though hashing is ideal, it isnt perfect. It is possible for multiple items to be hashed into the same location. Hash “collisions” are practically unavoidable when hashing large data sets. The code demonstrated on this page handles collisions via separate chaining, utilizing an array of linked list head nodes to store multiple values within one bucket – should any collisions occur.

An iterator was also implemented, making data access that much more simple within the hash table class. Click here for an overview demonstrating how custom iterators can be built.

Looking for sample code for a Hash Map? Click here!

=== CUSTOM TEMPLATE HASH TABLE WITH ITERATOR ===

QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The iterator class starts on line #368, and is built to support most of the standard relational operators, as well as arithmetic operators such as ‘+,+=,++’ (pre/post increment). The * (star), bracket [] and -> arrow operators are also supported. Click here for an overview demonstrating how custom iterators can be built.

The rest of the code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Looking for sample code for a Hash Map? Click here!

===== DEMONSTRATION HOW TO USE =====

Use of the above template class is the same as many of its STL template class counterparts. Here are sample programs demonstrating its use.

SAMPLE OUTPUT:

Bucket #0 has 10 items
The first element in bucket #0 is Homer

Now bucket #0 has 9 items
The first element in bucket #0 is Tamra

The unsorted items in strHash bucket #0:
it[] = Tamra
it[] = Lyndon
it[] = Johanna
it[] = Perkins
it[] = Alva
it[] = Jordon
it[] = Neville
it[] = Lawrence
it[] = Jetta

The sorted items in strHash bucket #0:
it[] = Alva
it[] = Jetta
it[] = Johanna
it[] = Jordon
it[] = Lawrence
it[] = Lyndon
it[] = Neville
it[] = Perkins
it[] = Tamra

SAMPLE OUTPUT:


strHash Bucket #0:
*it = Cinderella
*it = Perkins
*it = Krystal
*it = Roger
*it = Roger

strHash Bucket #1:
*it = Lilliam
*it = Lilliam
*it = Theda

strHash Bucket #2:
*it = Arie

strHash Bucket #3:
*it = Magda

strHash Bucket #6:
*it = Edda
*it = Irvin
*it = Kati
*it = Lyndon

strHash Bucket #7:
*it = Deb
*it = Jaime

strHash Bucket #8:
*it = Neville
*it = Victoria

strHash Bucket #9:
*it = Chery
*it = Evelia

--------------------------------------------

intHash Bucket #0:
it[] = 2449
it[] = 6135

intHash Bucket #1:
it[] = 1120
it[] = 852

intHash Bucket #2:
it[] = 5727

intHash Bucket #3:
it[] = 1174

intHash Bucket #4:
it[] = 2775
it[] = 3525
it[] = 8375

intHash Bucket #5:
it[] = 4322
it[] = 8722
it[] = 5016

intHash Bucket #6:
it[] = 5053
it[] = 7231
it[] = 1571

intHash Bucket #7:
it[] = 1666
it[] = 4510
it[] = 1548
it[] = 3646

intHash Bucket #9:
it[] = 2756

--------------------------------------------

strctHash Bucket #0:
it-> = Cherilyn
it-> = Roger

strctHash Bucket #1:
it-> = Tamra
it-> = Alex
it-> = Theda

strctHash Bucket #2:
it-> = Nigel
it-> = Alva
it-> = Arie

strctHash Bucket #4:
it-> = Basil

strctHash Bucket #5:
it-> = Tod

strctHash Bucket #6:
it-> = Irvin
it-> = Lyndon

strctHash Bucket #7:
it-> = Amina
it-> = Hillary
it-> = Kenneth
it-> = Amina

strctHash Bucket #8:
it-> = Gene
it-> = Lemuel
it-> = Gene

strctHash Bucket #9:
it-> = Albertina