CHARMING PYTHON #4
State Machines: Algorithms and programming approaches in Python

David Mertz, Ph.D.
President, Gnosis Software, Inc.
May 2000

    State machines, in a theoretical sense, underlay almost
    everything computer- and programming-related.  But a Python
    programmer does not necessarily need to consider highly
    theoretical matters in writing programs.  Nonetheless, there
    are a large class of ordinary programming problems where the
    best and most natural approach is to explicitly code a state
    machine as the solution.  This article discusses some
    practical cases of using state machines, how to recognize
    them, and how to code them in Python.


WHAT IS PYTHON?
------------------------------------------------------------------------

  Python is a freely available, very-high-level, interpreted
  language developed by Guido van Rossum.  It combines a clear
  syntax with powerful (but optional) object-oriented semantics.
  Python is available for almost every computer platform you
  might find yourself working on, and has strong portability
  between platforms.


WHAT IS A STATE MACHINE?
------------------------------------------------------------------------

  A much too accurate description of a state machine is that it
  is a directed graph, consisting of a set of nodes and a set of
  transition functions.  Such a machine "runs" by responding to a
  series of events, each event is in the domain of the transition
  function of the "current" node, where the range is a subset of
  the nodes.  The function return is a "next" (maybe
  self-identical) node. A subset of the nodes are end-states; if
  an end-state is reached, the machine stops.

  An abstract mathematical description--like the above--is of
  little use for most practical programming problems.  Equally
  picayune is the observation that every program in an imperative
  programming language is a state machine whose nodes are its
  source lines (but not really in a declarative--functional or
  constraint-based--language such as Haskell, Scheme, Prolog).
  Furthermore, every regular expression is logically equivalent
  to a state machine; and every parser implements an abstract
  state machine.  Most programmers write lots of state machines
  without really thinking about it, but that fact provides little
  guidance to specific programming techniques.

  An informal, heuristic definition is more useful than the
  abstract ones.  Often we encounter a program requirement that
  includes a handful of distinct ways of treating clusters of
  events.  Furthermore, it is sometimes the case that individual
  events need to be put in a context to determine which type of
  treatment is appropriate (as opposed to each event being
  "self-identifying").  The state machines discussed in this
  article are high-level machines that are intended to express
  clearly the programming requirements of a class of problems.
  If it makes sense to talk about your programming problem in
  terms of categories of behavior in response to events, it is
  likely to be a good idea to program the solution in terms of
  explicit state machines.


A TEXT PROCESSING STATE MACHINE
------------------------------------------------------------------------

  One of the programming problems most likely to call for an
  explicit state machine is processing text files.  Processing a
  text file very often consists of sequential reading of each
  chunk of a text file (typically either a character or a line),
  and doing something in response to each chunk read.  In some
  cases, this processing is "stateless"--that is, each chunk has
  enough information internally to determine exactly what to do
  in response to that chunk of text.  And in other cases, even
  though the text file is not 100% stateless, there is a very
  limited context to each chunk (for example, the line number
  might matter for the action taken, but not much else besides
  the line itself).  But in other common text processing
  problems, the text files we deal with are highly
  "stateful"--the meaning of a chunk all depends on what types of
  chunks preceeded it (and maybe on what chunks come next).
  Files like report files, mainframe datafeeds, human-readable
  texts, programming source files, and other sorts of text files
  are stateful.  A very simple example of a stateful chunk is a
  line that might occur in a Python source file:

      myObject = SomeClass(this, that, other)

  That line means something very different if it happens to be
  surrounded by these lines:

      """How to use SomeClass:
      myObject = SomeClass(this, that, other)
      """

  That is, we needed to know that we were in a "blockquote"
  *state* to determine that the line was a comment rather than an
  action.


WHEN NOT TO USE A STATE MACHINE
------------------------------------------------------------------------

  When we begin the task of writing a processor for any stateful
  text file, the first question we should ask ourselves is "what
  types of things do we expect to find in the file?" Each type of
  thing is a candidate for a state.  These types should be
  several in number, but if the number is huge or indefinite, a
  state machine is probably not the right approach--maybe some
  sort of database solution is appropriate (or maybe the problem
  has not been formulated right if there appear to by that many
  types of things)

  Moreover, we are not quite ready for a state machine yet; there
  may yet be a simpler approach. It might turn out that even
  though our text file is stateful there is an easy way to read
  in chunks where each chunk is a single type of thing.  A state
  machine is really only worth implementing if the transitions
  between types of text require some calculation based on the
  content within a single state-block.

  An example of a somewhat stateful text file that is nonetheless
  probably not best handled with a state machine is Windows-style
  '.ini' files.  Those files consist of some section headers,
  some comments, and a number of value assignments.  For example:

      #-------------- File: hypothetical.ini ------------------#
      ; set the colorscheme and userlevel
      [colorscheme]
      background=red
      foreground=blue
      title=green

      [userlevel]
      login=2
      title=1

  This example has no real-life meaning, but it was constructed
  to indicate some features of the '.ini' format.  (1) In one
  sense, the type of each line is determined by its first
  character (either semi-colon, left-brace, or alphabetic); (2)
  In another sense, the format is "stateful" insofar as the
  keyword "title" presumably means something independent when it
  occurs in each section.  One could program a text processor
  that had a COLORSCHEME state and a USERLEVEL state, and
  processed the values assignments of each state.  But that does
  not seem like the *right* way to handle this problem.

  On the one hand, we could simply create the natural chunks in
  this text file with some Python code like:

      #---- Chunking Python code to process .INI file --------#
      import string
      txt = open('hypothetical.ini').read()
      sects = string.split(txt, '[')
      for sect in sects:
        # do something with sect, like get its name
        # (the stuff up to ']') and read its assignments

  Or, if we wished, we could use a single 'current_section'
  variable to keep place:

      #---- Counting Python code to process .INI file --------#
      for line in open('hypothetical.ini').readlines():
          if line[0] == '[':
              current_section = line(1:-2)
          elif line[0] == ';':
              pass    # ignore comments
          else:
              apply_value(current_section, line)


WHEN TO USE A STATE MACHINE
------------------------------------------------------------------------

  Now that we have established not to use a state machine if the
  text file is "too simple" we should look at a case where a
  state machine is worthwhile.  _Charming Python #3_ discussed
  the utility 'Txt2Html' that converts "smart ASCII" files to
  HTML (including this article itself).  In very brief recap,
  "smart ASCII" format is a text format that uses a few spacing
  conventions to distinguish different types of text blocks, such
  as headers, regular text, quotations, code samples.  While it
  is easy for a human reader or writer to visually parse the
  transitions between these text block types, there is no simple
  way to chunk a whole text file into its text blocks.  Unlike in
  the '.ini' file example, text block types can occur in any
  pattern of alternation.  There is no single delimiter that
  seperates blocks in all cases (a blank line *usually* seperates
  blocks, but a blank line within a code sample does not end the
  code sample necessarily; and blocks need not be seperated by
  blank lines).  But we do need to perform somewhat different
  formatting behavior on each text block type for the correct
  final HTML output.  A state machine suggests itself as a
  natural solution here.

  The general behavior of the 'Txt2Html' reader is as follows:
  (1) Start in a particular state; (2) Read a line of the text
  file and go to current state context; (3) Decide if conditions
  have been met to leave the current state and enter another; (4)
  Failing (3), process the line in a manner appropriate for the
  current state.  This example is about the simplest case one
  would encounter, but it expresses the pattern described:

      #----- A simle state machine input loop in Python ------#
      global state, blocks, bl_num, newblock
      for line in fhin.readlines():
          if state == "HEADER":         # blank line means new block of ??
              if blankln.match(line):   newblock = 1
              elif textln.match(line):  startText(line)
              elif codeln.match(line):  startCode(line)
              else:
                  if newblock: startHead(line)
                  else: blocks[bl_num] = blocks[bl_num] + line
          elif state == "TEXT":         # blank line means new block of ??
              if blankln.match(line):   newblock = 1
              elif headln.match(line):  startHead(line)
              elif codeln.match(line):  startCode(line)
              else:
                  if newblock: startText(line)
                  else: blocks[bl_num] = blocks[bl_num] + line
          elif state == "CODE":         # blank line does not change state
              if blankln.match(line):   blocks[bl_num] = blocks[bl_num] + line
              elif headln.match(line):  startHead(line)
              elif textln.match(line):  startText(line)
              else: blocks[bl_num] = blocks[bl_num] + line
          else:
              raise ValueError, "unexpected input block state: "+state

  The full source file this code is taken from can be
  downloaded with 'Txt2Html' (see Resources).  The only real
  thing to notice is that the variable 'state' is declared
  'global', and its value is changed in functions like
  'startText()'.  The transition conditions--such as
  'textln.match()' are regular expression patterns, but they
  could just as well be custom functions.  The formatting itself
  is actually done later in the program, the state machine just
  parses the text file into labelled blocks in the 'blocks' list.


AN ABSTRACT STATE MACHINE CLASS
------------------------------------------------------------------------

  It is easy in Python to abstract the form of a state machine.
  Coding in this manner makes the state machine model of the
  program stand out more clearly than does the simple conditional
  block in the previous example (which doesn't right-away look
  all that much different from any other conditional).
  Futhermore, the class presented--and the associated
  handlers--do a very good job of isolating in-state behavior.
  This improves both encapsulation and readability in many cases.

      #-------------- File: statemachine.py ------------------#
      from string import upper
      class StateMachine:
          def __init__(self):
              self.handlers = {}
              self.startState = None
              self.endStates = []

          def add_state(self, name, handler, end_state=0):
              name = upper(name)
              self.handlers[name] = handler
              if end_state:
                  self.endStates.append(name)

          def set_start(self, name):
              self.startState = upper(name)

          def run(self, cargo):
              try:
                  handler = self.handlers[self.startState]
              except:
                  raise "InitializationError", "must call .set_start() before .run()"
              if not self.endStates:
                  raise "InitializationError", "at least one state must be an end_state"

              while 1:
                  (newState, cargo) = handler(cargo)
                  if upper(newState) in self.endStates:
                      break
                  else:
                      handler = self.handlers[upper(newState)]

  The 'StateMachine' class is really all you need for the form of
  a state machine.  It is a whole lot fewer lines than something
  similar would require in most languages--mostly because of the
  ease of passing function objects in Python.

  To actually *use* the 'StateMachine' class, you need to create
  some handlers for each state you want to use.  A handler must
  follow a particular pattern.  Generally, it should loop
  indefinitely; but in any case it must have some breakout
  condition(s).  Each pass through the state handler's loop
  should process another event of the state's type. But probably
  even before handling events, the handler should check for
  breakout conditions, and determine what state is appropriate to
  transition to.  At the end, a handler should pass back a tuple
  consisting of the target state's name, and any cargo the new
  state-handler will need.

  An encapsulation device is the use of 'cargo' as a variable in
  the 'StateMachine' class (not necessarily called 'cargo' by the
  handlers). This is used to pass around "whatever is needed" by
  one state-handler to take over where the last state-handler
  left off.  Most typically, 'cargo' will consist of a
  filehandle, which would allow the next handler to read some
  more data after the point where the last state-handler stopped.
  But a database connection might get passed, or a complex
  class instance, or a list with several things in it.  In the
  case of the test below, the cargo consists simply of a number
  that keeps getting fed back into an iterative function.  That
  is the next value of 'val' is always simply 'math_func(val)'.
  But depending on what the function does, the value may be in a
  range so as to either push it to a different handler, or reach
  an exit condition (which is really just a do-nothing end-state
  handler).  One thing the example illustrates is that an *event*
  is not necessarily an input event, it can sometimes be a
  computational event also (but atypically so). The
  state-handlers differ from one another only in using a
  different marker when outputing the events they handle; this is
  trivial, and does not require a state machine, but it
  illustrates the concept. The code is probably easier to
  understand than its explanation:

      #------------ File: statemachine_test.py ---------------#
      from statemachine import StateMachine
      def ones_counter(val):
          print "ONES State:    ",
          while 1:
              if val <= 0 or val >= 30:
                 newState = "Out_of_Range" ; break
              elif 20 <= val < 30:
                  newState = "TWENTIES";     break
              elif 10 <= val < 20:
                  newState = "TENS";         break
              else:
                  print "  @ %2.1f+" % val,
              val = math_func(val)
          print "  >>"
          return (newState, val)

      def tens_counter(val):
          print "TENS State:    ",
          while 1:
              if val <= 0 or val >= 30:
                 newState = "Out_of_Range";  break
              elif 1 <= val < 10:
                  newState = "ONES";         break
              elif 20 <= val < 30:
                  newState = "TWENTIES";     break
              else:
                  print "  #%2.1f+" % val,
              val = math_func(val)
          print "  >>"
          return (newState, val)

      def twenties_counter(val):
          print "TWENTIES State:",
          while 1:
              if val <= 0 or val >= 30:
                 newState = "Out_of_Range";  break
              elif 1 <= val < 10:
                  newState = "ONES";         break
              elif 10 <= val < 20:
                  newState = "TENS";         break
              else:
                  print "  *%2.1f+" % val,
              val = math_func(val)
          print "  >>"
          return (newState, val)

      def math_func(n):
          from math import sin
          return abs(sin(n))*31

      if __name__== "__main__":
          m = StateMachine()
          m.add_state("ONES", ones_counter)
          m.add_state("TENS", tens_counter)
          m.add_state("TWENTIES", twenties_counter)
          m.add_state("OUT_OF_RANGE", None, end_state=1)
          m.set_start("ONES")
          m.run(1)


RESOURCES
------------------------------------------------------------------------

  Charming Python #3 (a discussion of the 'Txt2Html' tool):

    http://gnosis.cx/cgi-bin/txt2html.cgi?source=../publish/programming/charming_python_3.txt

  This article as "smart ASCII" text:

    http://gnosis.cx/publish/programming/charming_python_4.txt

  To obtain or use Txt2Html, just point to:

    http://gnosis.cx/cgi-bin/txt2html.cgi

  Files used and mentioned in this article:

    http://gnosis.cx/download/charming_python_4.zip

  The concept of a state machine is, at a deeper level, closely
  related to the concepts of coroutines.  A reader who really
  wants to make her brain hurt can read about Christian Tismer's
  Stackless Python, which efficiently implements coroutines,
  generators, continuations, and micro-threads.  This is not for
  the faint of heart:

    http://www.stackless.com/


ABOUT THE AUTHOR
------------------------------------------------------------------------

  {Picture of Author: http://gnosis.cx/cgi-bin/img_dqm.cgi}
  In a ramiculated career, David Mertz has produced his share of
  synecdoches. Most of them have been in areas of academic
  "postmodern" philosophy, but this article also occupies several
  levels of descriptive "states." David may be reached at
  mertz@gnosis.cx; his life pored over at http://gnosis.cx/publish/.
  Suggestions and recommendations on this, past, or future, columns
  are welcomed.

