Archive

Archive for July, 2010

The Interpreter, Chapter 7

July 20, 2010 1 comment

The continuing story of one man’s quest to write an interpreter.

Jackson’s interpreter was coming along smoothly. But for it to have a complete foundation, he still needed to add one final feature: user-defined functions.

Jackson wanted to keep the language definition simple. This would avoid special syntax, making the interpreter easier to write, and also would make the language easier to understand and even more extensible. If statements and while loops were defined not with special syntax, but as functions. Similarly, Jackson wanted function definitions to themselves be functions.

So he set up a “def” function, which would be a function to define other functions. It would take three arguments: the function name, the variable list, and the function definition. So far, so good. He implemented this pretty quickly.

But there was a problem: it was impossible to create a recursive function. If the programmer created a function (f) and tried to call it within its own body, the expression would not compile correctly. For (def) to act like a function it had to work at run-time, but the compiler needed to know that a reference to (f) was a function.

After brooding over this problem for some time, Jackson came up with a solution. He created a new data type called a function shell, designed to hold just enough information for the compiler to know what to do. Whenever the compiler saw “def”, it would look for the function name. Then it would find all other references to the function name and convert them into function shells. The program would compile knowing that (f) was a function, and would actually define the function after the (def) function was called at runtime.

All of the foundational features had been implemented. But there was still work to do. Jackson had to implement a few smaller features, fix the myriads of bugs that had sprung up, and optimize the code.

Stay tuned for the exciting continuation of The Interpreter!

Categories: The Interpreter

Improved Keyboard Layout Program

I made some modifications to my keyboard layout program, and it now runs about twice as fast. On my laptop, it can score 12,000 layouts per second — this is six times as fast as Michael Capewell’s program.

You can download it here.

The Interpreter, Chapter 6

The continuing story of one man’s quest to write an interpreter.

Last time, our hero greatly improved the concept and implementation of the interpreter. This time, he took that even further. Soon, if statements and while loops were working. Previously, control structures had been managed by a single 500-line if statement. Not a pretty sight. The new and improved interpreter treated control structures as functions — even the statement separator (which in this case was a semicolon).

What about the interpreter’s standard output? The interpreter was printing a lot of stuff on the screen, so this was important. But the standard output was rather lacking in quality. If you tried to print out more than 1 MB of text, the program would crash.

Jackson spent over eight hours revising the output to be cleaner and catch more errors. The most useful development was a formatted output function, which caught virtually every error possible. All the other output functions could lean on this one core function. The interpreter expanded from having about three output functions to about ten — but each function was cleaner, simpler, and more versatile.

Almost everything from the old interpreter had been re-implemented, and was working better than ever. It was time to move forward. Jackson now had to implement a critical piece, something no language can do without: functions.

To be continued. . . .

Categories: The Interpreter

Human Data Structures

July 1, 2010 1 comment

In computer science, we frequently have to deal with how to represent our data. Different data structures provide different benefits. It is interesting to consider what data structures we humans are good at.

Probably the most common data structure is an array. It provides random access to any element, but inserting a new element is slow.

Humans are very bad at arrays. Unless the array is very short, we cannot achieve constant-time random access the way a computer can. To access the Nth element, if N is much greater than four or five, we have to simply count up to it.

Humans are much better at linked lists. Like a linked list, elements further down the list are harder to find. We also have a good intuitive understanding of linked lists. While an array is like a grocery list, a linked list is like a deck of cards.

What about associative arrays? The fastest computer implementation of an associative array is a hash table. But, because hash tables are usually implemented using arrays, humans are of course not very good at them. Also, being much slower at math than computers, we would have trouble computing hash functions.

If we want to use an associative array, the best bet for us humans is to implement it with a binary tree. This is how we use dictionaries. There is an alphabetized list of keys (words), each with a corresponding value (definition). We don’t immediately know the location of each word. Instead, we have to do something like a binary search. Let’s say you’re looking up the word “pear”. First you turn to about two-thirds of the way through the dictionary, where you think the P’s will be. Then you turn a few dozen pages forward or backward until you get to the P’s. Then look for ‘Pe’, then ‘Pea’, then ‘Pear’. This is something like a binary search algorithm.

The main advantage that computers have over humans is their ability to quickly access any point in memory. We as humans have to search through the whole list. Perhaps this is why it’s a good idea to use computers to hold our data.

Categories: Computer Science