Archive

Archive for the ‘Programming’ Category

Stanford Free Classes

January 10, 2012 Leave a comment

I recently read an article by a Stanford student, Ben Rudolph, in a Machine Learning class that uses a “flipped classroom” model: the students watch lectures at home and then go into class to talk to the professor about the homework. This is an interesting model that’s been gaining some attention lately, and I have some experience with it. I mostly agree with what the article says, and I have a few additional points.

Rudolph complains that the online questions are too easy. As reported in another article,

Mr. Rudolph took particular exception to the programming exercises, in which the computer automatically informed students whether or not they got 100 percent on the task. “It’s so black and white,” he tells Wired Campus. “They have to make it easy enough so everyone can get 100 percent, basically. In the past I’ve turned in programming assignments, and only the really smart kids got stellar scores, because they went above and beyond. This model kind of discourages that.”

Those are some poorly constructed programming assignments. A computer can still grade a difficult programming assignment, because a computer program can—by definition—run on a computer, and the computer can check if it gives the correct output. For example, TopCoder offers a series of programming challenges that could take anywhere from five minutes to a few hours, and all of them are graded by computer. The computer grades not only on accuracy, but also on speed, memory efficiency, and code concision.

The main problem with this is that if you’re stuck, there’s not much you can do by yourself to figure out the next step. I suggest that students work on sophisticated problems like those at TopCoder, and the students who are struggling can talk to a professor about how to get the program a little closer to where it needs to be.

Advertisements

The Interpreter, Part 1 Conclusion

August 21, 2010 4 comments

The final chapter in one man’s first journey to write an interpreter.

Jackson spent the next month or so adding smaller features and fixing bugs. At last, The Interpreter Version 1.0 was ready for release. He named his language Simfpl: Simple Interpreted Mathematically-Oriented Functional Programming Language. He put the source code on the internet for everyone to see.

To be able to use it, you first need to install GMP and MPFR.

EDIT 7/1/13: I’ve moved the code to GitHub.

Categories: The Interpreter

The Interpreter, Chapter 7

July 20, 2010 1 comment

The continuing story of one man’s quest to write an interpreter.

Jackson’s interpreter was coming along smoothly. But for it to have a complete foundation, he still needed to add one final feature: user-defined functions.

Jackson wanted to keep the language definition simple. This would avoid special syntax, making the interpreter easier to write, and also would make the language easier to understand and even more extensible. If statements and while loops were defined not with special syntax, but as functions. Similarly, Jackson wanted function definitions to themselves be functions.

So he set up a “def” function, which would be a function to define other functions. It would take three arguments: the function name, the variable list, and the function definition. So far, so good. He implemented this pretty quickly.

But there was a problem: it was impossible to create a recursive function. If the programmer created a function (f) and tried to call it within its own body, the expression would not compile correctly. For (def) to act like a function it had to work at run-time, but the compiler needed to know that a reference to (f) was a function.

After brooding over this problem for some time, Jackson came up with a solution. He created a new data type called a function shell, designed to hold just enough information for the compiler to know what to do. Whenever the compiler saw “def”, it would look for the function name. Then it would find all other references to the function name and convert them into function shells. The program would compile knowing that (f) was a function, and would actually define the function after the (def) function was called at runtime.

All of the foundational features had been implemented. But there was still work to do. Jackson had to implement a few smaller features, fix the myriads of bugs that had sprung up, and optimize the code.

Stay tuned for the exciting continuation of The Interpreter!

Categories: The Interpreter

The Interpreter, Chapter 6

The continuing story of one man’s quest to write an interpreter.

Last time, our hero greatly improved the concept and implementation of the interpreter. This time, he took that even further. Soon, if statements and while loops were working. Previously, control structures had been managed by a single 500-line if statement. Not a pretty sight. The new and improved interpreter treated control structures as functions — even the statement separator (which in this case was a semicolon).

What about the interpreter’s standard output? The interpreter was printing a lot of stuff on the screen, so this was important. But the standard output was rather lacking in quality. If you tried to print out more than 1 MB of text, the program would crash.

Jackson spent over eight hours revising the output to be cleaner and catch more errors. The most useful development was a formatted output function, which caught virtually every error possible. All the other output functions could lean on this one core function. The interpreter expanded from having about three output functions to about ten — but each function was cleaner, simpler, and more versatile.

Almost everything from the old interpreter had been re-implemented, and was working better than ever. It was time to move forward. Jackson now had to implement a critical piece, something no language can do without: functions.

To be continued. . . .

Categories: The Interpreter

The Interpreter, Chapter 5

June 17, 2010 2 comments

The continuing story of one man’s quest to write an interpreter.

Jackson’s interpreter was working great. Most of the features he wanted were working. There were still a few annoying bugs that needed to be worked out, but all in all, things were going very well.

The way that the interpreter was changing scope was painfully slow. Jackson did some research to try to find a solution, but decided that the only way to speed up the program would be to make some fundamental changes. One thing led to another, and before he knew it Jackson was up until midnight creating plans for a whole new evaluation system.

Up to this point, the interpreter’s interpretation system was pretty jumbled. Compiler and interpreter were hopelessly mixed, with no possibility of separation. Some parts were being evaluated before optimization, and some parts after. It just didn’t make much sense, and was hard to work with. To make any serious headway, things had to change.

The most important alteration would be to separate compile time and run time. The whole internal representation of functions would have to change as well. Blocks, instead of being unoptimized chunks of code, would become fully optimized expressions.

Of course, big changes also means big mistakes. Jackson spent hours tracking down all the new bugs that had been created.

Conceptually, the new interpreter was far superior to the old one. It was simpler, easier to understand, theoretically faster and just a whole lot neater overall. The real trick was making the code look like the concept.

Chapter 6

Categories: The Interpreter

The Interpreter, Chapter 4

June 3, 2010 1 comment

The continuing story of one man’s quest to write an interpreter.

Most of the basic data types — numbers, strings, arrays, lists, hashes — were supported. Next, the interpreter needed control structures. But before those could be added, they needed some sort of support. So Jackson created blocks.

Blocks began as chunks of code in between curly brackets. They were pretty simple. Getting them to work was a bit tricky, but nothing too difficult. Even before control structures were implemented, blocks still proved useful. They could be packaged up, and the same code could be called multiple times.

Jackson used this to implement if statements and while loops. To keep control structures as pure as possible, he treated if statements and while loops as functions. While loops became functions that took two blocks, one for the condition and one for the body. If statements became functions that took either two or three blocks, depending on whether there was an “else” block. After blocks were implemented, if statements and while loops were easy.

Now that blocks were possible, they could be used not just for control structures but also could be passed as arguments into functions. The first such function was the times() function, which took two arguments: a number and a block. It would execute the block a number of times equal to the given number, and it looked like 10 times { do stuff }.

For blocks to really achieve their full potential, they needed to be able to take variables. Implementing this required making the internal data structure a good deal more complicated. But when it was finished, the interpreter was all the more powerful.

At last, Jackson’s programming language was Turing complete. He successfully implemented a program to calculate every prime number from 1 to 100.

Chapter 5

Categories: The Interpreter

Hangman

Inspired by Peter Norvig’s essay about writing a spelling corrector, I was recently working on a puzzle and thought that it would make a good essay: I am attempting to write a program that can win Hangman.

The idea behind Hangman is simple: guess letters that are in the hidden word, and if you get all the letters then you win. Every time you get another letter wrong, draw another body part on the hanged man. Once he is fully drawn, you lose. (This man is represented in my program by a counter that goes up to some number. I call this the number of guesses you’re allowed to make, although strictly speaking it’s the number of incorrect guesses.)

Without trying to predict what word your opponent might choose, which would require some serious artificial intelligence programming, the optimal strategy is to select whichever letter is the most likely to be in a word like the one you’re looking at. The simplest way for a computer to do this is to have a list of letters by letter frequency, and then simply guess each letter in turn. The code looks like this:


frequency = "esiarntolcdugpmhbyfkvwxqjz" # This is letter frequency in dictionary words, and doesn't take into account that some words are more common than others.

while True:
letter = frequency[1:]
frequency = frequency[:1]
if hangman.guess(letter) == win:
break

The problem with this method, as you may have guessed, is that it is extremely unreliable. When you’re allowed six guesses, it only gets the word 11% of the time. When you’re allowed four guesses, that figure drops to 3%.

A better method is to consider that letter frequency is relative. Depending on what words you’ve eliminated already, the most probable letter is going to be different from just the next on the list of letter frequency. For example, if you have a word such as “___in_”, it is very probable that the last letter is a ‘g’ — even though the letter g is not very common in general. This simple change bumps up the 11% figure to a surprising 92%.

Next I tried something that is rather easy for humans. It is clear enough to us that all words have at least one vowel, so the program should recognize this too. I modified the program so that it would keep guessing vowels until it found at least one. Perhaps surprisingly, this only improved the program by 1% — barely a statistically significant margin, given how many trials I did. Perhaps the improvement would be more noticeable if the win ratio wasn’t already so high.

The ‘executioner’, as we may call the program that decides upon the word, was as yet not doing anything clever. It was time to change that. I wrote a simple method designed to evaluate how difficult a given word is: take the average of the rarity of each letter in the word. Then I had the executioner look through some number of words and pick whichever one is decided to be the hardest. When it looked through 50 words, the win ratio dropped to 72%; when it looked through 500 words, it went all the way down to 49%.

Given the current program, it seems like the optimal strategy for the executioner is simply to find the hardest-to-guess word and pick it every time. This might fool a computer, but it wouldn’t fool a person. The next step is to add some pattern recognition to the program.

This is the basic idea of how I was able to write a pretty powerful Hangman program for both sides. I was able to collect some data regarding the success of the program. It’s not a whole lot, but it gives you a rough idea of how successful the program is. Players were allowed six guesses: head, body, arm, arm, leg, leg.


human guesser: 2 wins against computer, 2 wins against human; 6 losses against computer, 10 losses against human
computer guesser: 7 wins against human; 2 losses against human

It is worth noting that humans seem to be much better at coming up with words than at guessing them, whereas the computer is about equally good either way when using the more sophisticated algorithms. But on both counts, the computer performed much better than the people did.

Unfortunately, one thing that humans are bad at and computers are not is pattern recognition. If many more trials had been done, the human players would have started guessing rarer letters. The computer assumes that rarer letters such as q and z are harder to guess, and so finds words that contain them (words like “buzz” and “minx”). The humans were already beginning to notice the abundance of rare letters.

Currently, the biggest flaw in the computer is that if you are the executioner and you give it the same word over and over, as long as the word is hard enough, the computer’s guesser will lose every single time. The necessary improvement there is clear enough, but actually implementing it could be tricky. I’ll leave that as an exercise to the reader . . . for now.