C++ || Simple Tokenizer Lexer Using A Finite State Machine

Print Friendly

The following is sample code which demonstrates the implementation of a simple Lexer using a table driven Finite State Machine.

In its simplest form, a Finite State Machine is a procedure that can: (1) store the status of an event, (2) can operate on new (or existing) input to change the status of an event, and (3) can cause an action to take place (based on the input change) for the particular event being examined.

State machines usually adhere to the following basic principles:

••• Has an initial state or record of something stored someplace
• Has a set of possible input events
• Has a set of new states that may result from the input
• Has a set of possible actions or output events that result from a new state

Finite state machines usually have a limited or finite number of possible states. The Finite State Machine implemented on this page has 6 transition states. Those transition states has the ability to isolate tokens from a given input string, based on characteristics defined in the transition state table.

The machine was implemented with compiler construction in mind, resulting in the grammar for the 6 transition states to resemble that of many popular programming languages.

The transition table defined in this program can be further fine tuned by allowing the states to accept or reject different parameters, and/or by adding more transition states. Tuning the parameters in the state transition table allows you to change the behavior of the state machine without having the need to alter the Finite State Machine Algorithm (the “Lexer” function) much at all.

NOTE: The example on this page uses a sample input file. Click here to download the file.


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

NOTE: The example on this page uses a sample input file. Click here to download the file.

The following is sample output.

One Response to C++ || Simple Tokenizer Lexer Using A Finite State Machine

  1. Ihor says:

    Thanks for article !

Leave a Reply