|  |  | @@ -0,0 +1,241 @@ | 
    
    |  |  | #! /usr/bin/env lua | 
    
    |  |  | 
 | 
    
    |  |  | -- | 
    
    |  |  | -- A simple Lua app to demonstrate the principles of lexical closures. | 
    
    |  |  | -- | 
    
    |  |  | 
 | 
    
    |  |  | -- | 
    
    |  |  | -- An iterator that sorts a table by its keys, in alphabetical order. | 
    
    |  |  | -- | 
    
    |  |  | function pairsByKeys ( alphaTable, sortFunction ) | 
    
    |  |  | local alphaArray = { } | 
    
    |  |  | 
 | 
    
    |  |  | --[[ | 
    
    |  |  | The provided table, 'alphaTable', is a Lua table, which means it's an | 
    
    |  |  | associative array.  AA's store their key-value pairs using hash codes | 
    
    |  |  | to improve look-up efficiency.  One trade-off in using this method is | 
    
    |  |  | that key-value retrieval (using the 'pairs' iterator) will appear in | 
    
    |  |  | more or less a "random" order, not alphanumeric order. | 
    
    |  |  |  | 
    
    |  |  | For this application, we want a loop iterator that *will* successively | 
    
    |  |  | return the key-value pairs in alphabetic order by key.  To do this, we | 
    
    |  |  | will recast 'alphaTable' to make a new Lua 'sequence' (i.e., an ordered | 
    
    |  |  | array), where the values of the array, 'alphaArray', are the keys of | 
    
    |  |  | the 'alphaTable' table we were given. | 
    
    |  |  |  | 
    
    |  |  | Why do this?  Because the lua function 'table.sort' sorts arrays, but | 
    
    |  |  | only by their values; it can't sort keys.  Ergo, we need an array whose | 
    
    |  |  | values are the (unsorted) keys of 'alphaTable': | 
    
    |  |  | ]]-- | 
    
    |  |  | for key, _ in pairs( alphaTable ) do | 
    
    |  |  | 
 | 
    
    |  |  | alphaArray[ #alphaArray+1 ] = key | 
    
    |  |  | end | 
    
    |  |  | 
 | 
    
    |  |  | --[[ | 
    
    |  |  | Now we can sort the table by its keys, using this auxiliary array. | 
    
    |  |  | lua's 'table.sort' function sorts arrays using a provided comparison | 
    
    |  |  | function.  But note that the default sorting function is "<", which | 
    
    |  |  | works for alphabetical sorting.  So if our interator was called with | 
    
    |  |  | no second argument, we'll get an alpha sort, as expected.  And if | 
    
    |  |  | it's called with a function as a second argument, the iterator will | 
    
    |  |  | return key-value pairs using that alternative sorting scheme instead. | 
    
    |  |  | ]]-- | 
    
    |  |  | table.sort( alphaArray, sortFunction ) | 
    
    |  |  | 
 | 
    
    |  |  | --[[ | 
    
    |  |  | Now that the keys are sorted, we can provide a function that will | 
    
    |  |  | iterate over the *sorted* order of the keys, returning them & their | 
    
    |  |  | associated values.  The iterator is a closure, a function that | 
    
    |  |  | "closes over" three variables that are in the lexical scope of the | 
    
    |  |  | returned function defined here (but are not actually defined in the | 
    
    |  |  | function itself): | 
    
    |  |  |  | 
    
    |  |  | 1) The table argument that this function was called with, | 
    
    |  |  | 2) The sorted array used to control the order of retrieval, | 
    
    |  |  | 3) An index to keep track of what to return on successive calls. | 
    
    |  |  |  | 
    
    |  |  | On each call of the iterator we will pre-increment the index to | 
    
    |  |  | make the end compare easy; therefore we must start it at zero. | 
    
    |  |  | (Recall that arrays in Lua are 1-based, not 0-based.) | 
    
    |  |  | ]]-- | 
    
    |  |  | local index = 0 | 
    
    |  |  | 
 | 
    
    |  |  | return function ( ) | 
    
    |  |  | -- When we've exhausted the table/array, a pre-incremented index | 
    
    |  |  | -- will cause the table/array access to return 'nil', which also | 
    
    |  |  | -- signals the client using this iterator that its loop is complete. | 
    
    |  |  | index = index + 1 | 
    
    |  |  | 
 | 
    
    |  |  | -- The keys are in alphabetic order in 'alphaArray', while the | 
    
    |  |  | -- values are in a hashed order in 'alphaTable'.  Note that Lua | 
    
    |  |  | -- is register-based, so the following construct is byte-compiled | 
    
    |  |  | -- into an efficient executable. | 
    
    |  |  | return alphaArray[index], alphaTable[ alphaArray[index] ] | 
    
    |  |  | end | 
    
    |  |  | end | 
    
    |  |  | 
 | 
    
    |  |  | 
 | 
    
    |  |  | -- | 
    
    |  |  | -- An iterator that traverses an input file, returning its words, one by one. | 
    
    |  |  | -- | 
    
    |  |  | function allWords ( filename ) | 
    
    |  |  | -- Open a file of the given name for reading, then count its words. | 
    
    |  |  | -- Note that 'io.input' will raise an error for us, should one occur. | 
    
    |  |  | io.input( filename ) | 
    
    |  |  | 
 | 
    
    |  |  | -- Prime the while loop by reading the first line from the file, and | 
    
    |  |  | -- start searching for words at position 1 (strings/arrays are 1-based). | 
    
    |  |  | -- Note that the default argument for 'io.read' is "*l", or "read line". | 
    
    |  |  | -- The default file read by 'io.read' is set with 'io.input'. | 
    
    |  |  | local line = io.read() | 
    
    |  |  | local pos  = 1 | 
    
    |  |  | 
 | 
    
    |  |  | --[[ | 
    
    |  |  | What Lua (and other languages, such as Java) call an "interator" are | 
    
    |  |  | really "generators".  That is, a "generator" function, such as this, | 
    
    |  |  | creates and returns an "interator" function.  Clients will call the | 
    
    |  |  | iterator function, typically in a for loop. | 
    
    |  |  |  | 
    
    |  |  | Iterators are typically closures; in this case, our iterator closes | 
    
    |  |  | over the variables 'line' & 'pos', defined above in *this* function | 
    
    |  |  | (which is called only once, to obtain the iterator); these variables | 
    
    |  |  | are not defined in the closure being returned, although they can be, | 
    
    |  |  | are are, used in the iterator function.  This is possible because | 
    
    |  |  | these variables exist in the closure's lexical scope at define-time | 
    
    |  |  | (even though they will have already gone out of scope at run-time). | 
    
    |  |  |  | 
    
    |  |  | These two variables are not global, but are not local to the closure; | 
    
    |  |  | they are "non-local variables".  They are captured by (and hidden | 
    
    |  |  | within) the closure returned.  (All functions in Lua are closures.) | 
    
    |  |  | ]]-- | 
    
    |  |  | return function ( ) | 
    
    |  |  | -- Keep looping as long as a line is read from the file.  Once | 
    
    |  |  | -- the end of file is reached, 'io.read' will return 'nil'. | 
    
    |  |  | while line do | 
    
    |  |  | -- Locals 'first' & 'last' delimit the position of each word | 
    
    |  |  | -- found in the current line of the file.  By 'word', we mean | 
    
    |  |  | -- a contiguous string of alphanumeric characters,  a pattern | 
    
    |  |  | -- matched by "%w+".  ("%W+" would match strings of non-word | 
    
    |  |  | -- characters.)  The closure variable 'pos' tells Lua where | 
    
    |  |  | -- in the string to start matching word characters. | 
    
    |  |  | -- Note that Lua supports multiple return values (very nice)! | 
    
    |  |  | -- Also note that we treat 'line' as an object, applying the | 
    
    |  |  | -- 'find' method to it by using the ':' delimiter. | 
    
    |  |  | local first, last = line:find( "%w+", pos ) | 
    
    |  |  | 
 | 
    
    |  |  | -- Was a word found in this 'line' at or after position 'pos'? | 
    
    |  |  | -- If so, 'first' and 'last' will be returned as integers. | 
    
    |  |  | -- If not, they will be 'nil' (which equates to boolean false). | 
    
    |  |  | if first then | 
    
    |  |  | -- Advance the position pointer to the next character after | 
    
    |  |  | -- the word, then extract the word string and return it. | 
    
    |  |  | pos = last + 1 | 
    
    |  |  | 
 | 
    
    |  |  | -- The string function 'sub' returns a sub-string. | 
    
    |  |  | return line:sub( first, last ) | 
    
    |  |  | else | 
    
    |  |  | -- A word was not found in (the remainder of) the line; so | 
    
    |  |  | -- read another line & reset the position to the beginning | 
    
    |  |  | -- of this new line.  Then loop to try again.  If there are | 
    
    |  |  | -- no more lines in the file, 'line' will be 'nil', and | 
    
    |  |  | -- this 'while' loop will terminate.  By using a 'while' | 
    
    |  |  | -- loop, we will skip over multiple blank lines correctly. | 
    
    |  |  | line = io.read() | 
    
    |  |  | pos  = 1 | 
    
    |  |  | end | 
    
    |  |  | end | 
    
    |  |  | 
 | 
    
    |  |  | -- On reaching this point, there is nothing left in the file; so | 
    
    |  |  | -- return 'nil' to signal that the loop has completed.  Note that | 
    
    |  |  | -- this line isn't strictly needed, as a 'return' in Lua without | 
    
    |  |  | -- an argument will return 'nil' anyway. | 
    
    |  |  | return nil | 
    
    |  |  | end | 
    
    |  |  | end | 
    
    |  |  | 
 | 
    
    |  |  | 
 | 
    
    |  |  | ------------------------------------------------------------------------------- | 
    
    |  |  | -- | 
    
    |  |  | -- Test the 'allWords' iterator (generator) with a simple routine to | 
    
    |  |  | -- count all occurrences of each word in a text file. | 
    
    |  |  | -- | 
    
    |  |  | 
 | 
    
    |  |  | --[[ | 
    
    |  |  | We need to start by getting a file to scan -- let's use ourselves!  To do | 
    
    |  |  | that, we need the Lua equivalent of '${0}' in bash or '__file__' in Python. | 
    
    |  |  | Lua doesn't have a global variable for this, however, so we need to call a | 
    
    |  |  | function, 'debug.getinfo' and extract a field that holds the filename. | 
    
    |  |  |  | 
    
    |  |  | Our call to 'debug.getinfo' below returns the name of the source file ('S') | 
    
    |  |  | of the function at the first level ('1') of the call stack.  Note that the | 
    
    |  |  | filename will begin with an '@' symbol, indicating it is a filename.  By | 
    
    |  |  | passing '1', we are asking for information about the current function.  If | 
    
    |  |  | we were to pass in '0', it would return "=[C]", since it would be returning | 
    
    |  |  | information about the 'getinfo' function itself (which is defined in an | 
    
    |  |  | external C library).  More information on this function can be found in the | 
    
    |  |  | Debug Library chapter in the "Programming in Lua" book... | 
    
    |  |  | ]]-- | 
    
    |  |  | local thisModule = debug.getinfo( 1, 'S' ) | 
    
    |  |  | 
 | 
    
    |  |  | -- Don't forget to snip off the '@' sigil from the obtained filename! | 
    
    |  |  | local targetFile = thisModule.source:sub( 2 ) | 
    
    |  |  | 
 | 
    
    |  |  | -- Collect all the words in the target file as a 'bag of words'. | 
    
    |  |  | -- Creating a 'bag' requires us to count the number of occurrences of each | 
    
    |  |  | -- key, where the keys are simply the words we find in the file. | 
    
    |  |  | WordBag = { } | 
    
    |  |  | 
 | 
    
    |  |  | -- Use the Lua iterator we defined above to retrieve the words from the | 
    
    |  |  | -- target file, one by one, dropping each into our bag of words as we go. | 
    
    |  |  | for word in allWords( targetFile ) do | 
    
    |  |  | -- We don't care about capitalization, so make the word all lowercase. | 
    
    |  |  | word = word:lower() | 
    
    |  |  | 
 | 
    
    |  |  | -- If the word hasn't been seen before, create a key with a value of 1; | 
    
    |  |  | -- otherwise, increment the value (count) for keys we've seen before: | 
    
    |  |  | if not WordBag[ word ] then | 
    
    |  |  | WordBag[ word ] = 1 | 
    
    |  |  | else | 
    
    |  |  | WordBag[ word ] = WordBag[ word ] + 1 | 
    
    |  |  | end | 
    
    |  |  | end | 
    
    |  |  | 
 | 
    
    |  |  | -- Now that we've read through the entire file, our bag of words is full. | 
    
    |  |  | -- Create a report, sent to 'stdout', formatted as "<word>  <count>".  The | 
    
    |  |  | -- report format is controlled by 'string.format', where the format codes | 
    
    |  |  | -- are exactly the same as the 'printf' format codes in the C language. | 
    
    |  |  | outputFormat = "%20s   %s" | 
    
    |  |  | 
 | 
    
    |  |  | -- And here is the report generation code: | 
    
    |  |  | print() | 
    
    |  |  | print( string.format( outputFormat, "Word", "Frequency" ) ) | 
    
    |  |  | 
 | 
    
    |  |  | print( string.format( outputFormat, | 
    
    |  |  | "    ----------------", "----------------" ) ) | 
    
    |  |  | 
 | 
    
    |  |  | --[[ | 
    
    |  |  | Now we want to extract each word (key) and its count (value) from our bag of | 
    
    |  |  | words.  However, the order of the words in the bag (i.e., the order we would | 
    
    |  |  | pull them back out in, if we use the 'pairs' iterator) is essentially random. | 
    
    |  |  | We need to sort the bag by its keys, in alphabetical order, and print the | 
    
    |  |  | results.  To do this, we use the custom iterator we defined at the top of this | 
    
    |  |  | module to return the bag contents in the desired order. | 
    
    |  |  | ]]-- | 
    
    |  |  | for word, count in pairsByKeys( WordBag ) do | 
    
    |  |  | 
 | 
    
    |  |  | print( string.format( outputFormat, word, count ) ) | 
    
    |  |  | end | 
    
    |  |  | 
 | 
    
    |  |  | print() | 
    
    |  |  | 
 | 
    
    |  |  | -- Note that this report can easily be modified to display the words in the | 
    
    |  |  | -- order of the frequency with which they appear in the target file. | 
    
    |  |  | -- (This is left as an exercise for the student...  ;^) | 
    
    |  |  | 
 | 
    
    |  |  | -- Finally, cLose the file.  (Not strictly needed, as the OS will do this.) | 
    
    |  |  | -- Note that 'io.input' when called with no arguments, will return the handle | 
    
    |  |  | -- of the current input file. | 
    
    |  |  | io.input():close() | 
    
    |  |  | 
 | 
    
    |  |  | ------------------------------------------------------------------------------- |