Monday, May 2, 2016

JavaScript programming with ES6 in Node.JS: Sets and Generators


I enjoy working on problems involving words. For example, I like the Jumble puzzle in the newspaper. To solve these, one can use a word list file such as the ones provided by Peter Norvig that go with his chapter Natural Language Corpus Data in the Book Beautiful Data that simply contain one word per line.

Sets of Strings

In ES6 JavaScript, I read this file with the following code:



The file is read in and split into an array of lines. The object returned from this function has a property with the value true for each word in the dictionary. For every other string, querying corresponding property returns undefined, which is treated just like false when used as the condition in of an if statement. This gives us a simple way to check for membership in a set of strings.

Iterators

Iterators over a set of values in JavaScript are objects with a method next(), that returns an object with two properties: value and done. As long as done is false, value has a value. When the iteration is complete, done becomes true and the object returned by next does not have a value property. Normally, iterators are used in the JavaScript for ( ... of ...) loop construct. So for example, for (x of ['a', 'b', 'c']) will create an iterator that iterates over the array setting x to next().value for successive values in the array.

To convert an iterable object into an iterator, one can use the function* and yeild* keywords as seen in the following Node.js interactive session (using node v5.4.1):



Generators

So, what kind of a strange function is that with an asterisk in the previous code snippet? That is what is called a "generator." Here is a slightly more enlightening example using a plain yeild rather than a *yield:



The mental model is that f() is invoked over and over. When a yield statement is encountered it returns a value and the function is suspended. The next time the function is invoked, it resumes where it had previously been suspended. This model is not what really happens under the hood. A more accurate model might be that the yielded values are placed in a concurrent queue where they wait until next(), dequeues them (or next() waits for them to be produced). Again, since JavaScript doesn't have concurrency, this model isn't quite right either. The exact mechanism used under the hood is unimportant to us. The difference between yield and yeild* is that yeild generates a single value, and yeild* generates a sequence. However, yeild* is only usable inside a function*.

Putting this all together to solve the Jumble is the following program (to be placed in a file along with the readLexicon() function:



The program should print out all permutations of "ourf" that are in the word list file. This should include the common word "four," but may also include the uncommon word "furo" depending on how complete the word list file is. The problem with large lexicons is that they tend to include more uncommon words along with uncommon variations on common words.

No comments:

Post a Comment