Stanford Free Classes

January 10, 2012 Leave a comment

I recently read an article by a Stanford student, Ben Rudolph, in a Machine Learning class that uses a “flipped classroom” model: the students watch lectures at home and then go into class to talk to the professor about the homework. This is an interesting model that’s been gaining some attention lately, and I have some experience with it. I mostly agree with what the article says, and I have a few additional points.

Rudolph complains that the online questions are too easy. As reported in another article,

Mr. Rudolph took particular exception to the programming exercises, in which the computer automatically informed students whether or not they got 100 percent on the task. “It’s so black and white,” he tells Wired Campus. “They have to make it easy enough so everyone can get 100 percent, basically. In the past I’ve turned in programming assignments, and only the really smart kids got stellar scores, because they went above and beyond. This model kind of discourages that.”

Those are some poorly constructed programming assignments. A computer can still grade a difficult programming assignment, because a computer program can—by definition—run on a computer, and the computer can check if it gives the correct output. For example, TopCoder offers a series of programming challenges that could take anywhere from five minutes to a few hours, and all of them are graded by computer. The computer grades not only on accuracy, but also on speed, memory efficiency, and code concision.

The main problem with this is that if you’re stuck, there’s not much you can do by yourself to figure out the next step. I suggest that students work on sophisticated problems like those at TopCoder, and the students who are struggling can talk to a professor about how to get the program a little closer to where it needs to be.

Kinesis Contoured Keyboard

December 25, 2011 24 comments

I finally purchased the Kinesis Contoured Keyboard, and I thought it prudent to write a brief review.

When I got this keyboard, I immediately noticed how little my fingers have to travel. The bowl shape makes it easy to reach keys that are relatively far away—so easy, in fact, that I kept accidentally hitting the number row instead of the top row. The fact that the rows are symmetric is a serious advantage, because it makes the B and Y keys (QWERTY positions) far easier to reach.

The thumb pads prove themselves to be very useful. I remapped the thumb pads to put space and enter on the right thumb, and shift and backspace on the left thumb. This greatly reduces the workload on my right pinky, because it doesn’t have to keep stretching to hit backspace. But there are two issues with the thumb pads. First, they are placed too high up, so my thumbs have to sit above the rest of my hand. Second, I have trouble reaching the smaller keys on the outer edges of the thumb pads—ctrl, alt, and command (on a Mac). I would prefer if the pads were shifted outward more so that the thumb could rest in the center of the thumb pad, instead of on the edge, and then reach all the keys more easily.

It has taken me some time to adjust to the keyboard. When I first got it, my speed was only at about two-thirds of my speed on a standard keyboard, and I am improving by about 5 wpm per day. I have kept practicing on a standard keyboard, and my speed there has not decreased at all.

Now that I have some experience with a Kinesis, I plan on modifying my keyboard layout optimizer for contoured keyboards.

Categories: Keyboards

Typing Data: Preliminary Analysis

November 27, 2011 8 comments

I have collected a large quantity typing data using Amphetype, on both QWERTY and MTGAP 2.0 (the two layouts that I currently know). I do not have any conclusive results, but I have some interesting data that I thought worth sharing.

My most interesting discovery is that there is a statistically significant correlation between frequency of a trigram and the average speed at which it is typed. On MTGAP 2.0 the correlation is 0.34 and on QWERTY it is 0.33. This means that a trigram’s frequency accounts for about 10% of the variation in typing speed—not a lot, but still enough to merit consideration.

Then I analyzed the speeds of various key combinations. For example, on MTGAP 2.0, the average speed for a trigram containing an inward roll is 121 words per minute (wpm); for a trigram containing an outward roll, the average is 110 wpm, and for a trigram containing neither, it is 111 wpm.

When all three keys are typed with one hand, the average is 104 wpm; when two are typed with one hand and one with the other, the average is 118 wpm; and where the hand alternates, 107 wpm.

Where the total finger travel distance is short, the average is 120 wpm; for medium distance, 111; and for a long distance, 105.

It would be premature to draw conclusions from these data. For example, the reason why short finger travel distance is faster may be because MTGAP 2.0 intentionally places common keys on the home row, and common keys tend to be typed faster. On QWERTY, the average speeds for short, medium, and long distance are 96, 102, and 104 wpm, respectively. In this case, the short-distance keys are the slowest.

I am currently looking for anyone who uses Amphetype or is willing to contribute some time to using it. I want to get as much typing data as possible, especially on a variety of keyboard layouts. Leave a comment if you are interested.

For those who are interested, here are all the data I have acquired.


MTGAP 2.0

Average WPM: 112

near distance average: 120
medium distance average: 111
far distance average: 105

inward close keys average: 120
outward close keys average: 107
not close keys average: 110

in roll average: 121
out roll average: 110
not roll average: 111

same hand average: 104
two and one average: 118
alternation average: 107

triple finger average: 73
same finger average: 91
different finger average: 115

twice jump average: 73
home jump average: 92
home jump index average: 113
not jump average: 112

twice to center average: 105
to center average: 116
not to center average: 112

QWERTY

Average WPM: 104

near distance average: 96
medium distance average: 102
far distance average: 104

inward close keys average: 106
outward close keys average: 113
not close keys average: 102

in roll average: 104
out roll average: 116
not roll average: 102

same hand average: 98
two and one average: 106
alternation average: 105

triple finger average: 72
same finger average: 87
different finger average: 107

twice jump average: 71
home jump average: 82
home jump index average: 116
not jump average: 104

twice to center average: 100
to center average: 108
not to center average: 102

RGB Cipher

Like my first haiku,
It came to me in a dream.
I saw its colors.

(Note: This will only make sense if you know a thing or two about ciphers. For the rest of you, I’m afraid this won’t be very interesting.)

Last night I had a dream that I was trying to break a cipher. But this was no ordinary cipher: instead of using numbers, it used colors. Ciphers were suddenly even more beautiful than they had been before.

When I woke up, I was dismayed to realize that it is mathematically impossible for an encryption algorithm to use colors. Nonetheless, I was infatuated with the idea.

What would it mean to have a colorful cipher? I realized that the rounds could be colored. There are three core rounds (red, green and blue) that each represent a different operation. For instance, the red round could be a data rotation, and the green round could be a substitution-box permutation.

The encryption would move through a six-round cycle as the algorithm flows through the color wheel: red, yellow, green, cyan, blue, magenta. Each secondary color represents the combination of two primary colors: yellow is a red round plus a green round, cyan is a green round plus a blue round, and similarly for magenta. Then there are the black and white rounds. The black round is no operation (maybe just adding the key to the text) and the white round is all operations. These eight colors could be arranged to paint a picture—the world’s first cryptographically-secure picture.

How exactly these colors would be arranged to create a secure algorithm, I do not know. All I know is that this is what I saw in my dream, and it was beautiful.

Categories: Cryptography, Math

Which Numbers are the Most Common?

February 4, 2011 4 comments

I’ve often wondered which numbers are used most frequently. The low numbers (0, 1, 2) are probably the most common and the single-digit numbers are all among the most common, but how common are larger numbers and which larger numbers are the most common? Today I decided to find out.

I analyzed my text corpus and came up with the following as the 100 most common numbers:


1 2 0 3 5 4 10 000 6 8 7 20 12 2005 15 11 9 30 100 2008 16 2007 14 50 18 25 13 17 19 1812 24 2006 23 2004 21 40 22 26 60 27 70 2001 2003 28 200 2002 29 31 80 500 2000 300 1805 150 35 90 1000 101 45 32 36 07 33 00 08 1809 99 75 1990 1984 34 1999 400 48 800 95 06 44 1807 47 1998 41 85 55 250 83 53 1813 43 02 52 37 39 600 1980 51 03 120 04 64

As you might expect, the single-digit numbers are all among the most common. 10, 000, 20, 12, 2005, 15 and 11 are all more common than the least common single-digit number. I’m not surprised by 10 or 20, but I don’t know how 000 got there. 2005 is also an interesting one which I’ll get to later.

The most common numbers from 10 to 19:

10 12 15 11 16 14 18 13 17 19

This isn’t too surprising either.

Here’s every two digit number:


10 20 12 15 11 30 16 14 50 18 25 13 17 19 24 23 21 40 22 26 60 27 70 28 29 31 80 35 90 45 32 36 07 33 00 08 99 75 34 48 95 06 44 47 41 85 55 83 53 43 02 52 37 39 51 03 04 64 38 42 46 65 49 01 09 54 66 58 84 89 67 05 59 98 72 56 73 77 62 78 68 76 57 92 63 61 81 82 69 97 88 86 96 71 79 94 93 74 91 87

Every number from 10 to 99 appears at least once, which I suppose is what you’d expect. Lower numbers tend to be smaller, as do round numbers like 50, 25, and 40.

I also looked at the most common dates:


2005 2008 2007 1812 2006 2004 2001 2003 2002 2000 1805 1000 1809 1990 1984 1999 1807 1998 1813 1980

Probably this means that most of the text corpus is from the late 2000′s. But besides recent dates, a few other dates stand out: 1812, 1805, 1809, 1984. (1000 is probably not used as a date most of the time.) 1812 is probably because of references to the War of 1812 and 1984 is because of references to the book. 1805 and 1809 both appear repeatedly in War and Peace, one of the books used in the text corpus.

For our last exercise, let’s find the first number that doesn’t appear. Every number from 0 to 567 appears at least once. The first number not to appear at all is 568.

I’ve always wondered which numbers are the most common, and now we have an answer.

Categories: Language

Easy-to-Use Keyboard Optimization Program

January 22, 2011 22 comments

I’ve made some modifications to the keyboard optimization program. It is now much easier to use, especially for someone who doesn’t have much experience with computer programming. You can get it here. I added a makefile and a better readme, but more importantly, a command-line user interface. You can now customize the costs, use various features, and even change how the text corpus is weighted.

Fully Optimized Standard Keyboard

January 16, 2011 22 comments

I recently proposed a fully optimized layout built for the Kinesis physical keyboard. By popular request, I have now created a fully optimized layout for the standard keyboard.


= 1 2 3 4 5 6 7 8 9 0 q z
y p o u - k d l c w x / j
i n e a , m h t s r "
( ) ; . _ v f g b '

Fitness: 184428299
Distance: 364416
Inward rolls: 10.16%
Outward rolls: 2.36%
Same hand: 34.93%
Same finger: 1.60%
Row change: 13.17%
Home jump: 0.27%
To center: 2.38%
To outside: 0.52%

It looks strikingly similar to the latest version of the Kinesis layout:


1 2 3 4 5 6 7 8 9 0 q
y p o u - v d l c w x
i n e a , m h t s r "
( ) ; . _ k f g b '
/ =             z j

Fitness: 186751864
Distance: 959128
Inward rolls: 10.16%
Outward rolls: 2.36%
Same hand: 34.94%
Same finger: 1.60%
Row change: 13.40%
Home jump: 0.30%
To center: 2.38%
To outside: 0.39%

Most of what I have to say about fully optimized keyboard layouts has been said. I do find it interesting that the standard and Kinesis layouts look so similar; it looks like the rare keys around the edges have barely any effect at all.

Starting to Fully Optimize the Keyboard

January 4, 2011 19 comments

(Edit: I found a bug in the way rolls were being calculated. MTGAP 0.1 (shown below) is no longer the best layout.)

It’s been a while since I’ve done anything with the New Keyboard Layout Project, but I read a comment on one of my posts and I got to thinking about punctuation. Every keyboard I’ve designed has just been based on the main 30 keys and used .,’; as the four punctuation marks, because those are the ones that Dvorak used. But why use those four punctuation marks? Why not use a different set?

In fact, why not simply try to optimize the entire keyboard instead of just the main 30 keys?

Previously, the answer to that question was that it would be too slow. But now, thanks to a much-improved algorithm, I no longer have that excuse. That means I can evaluate the entire keyboard.

The Physical Keyboard

Changing the size of the keyboard requires rewriting large portions of the program. For this reason I didn’t want to rewrite it for a standard physical keyboard — why design such a highly optimized layout for a suboptimal physical keyboard? Instead, I rewrote the program to optimize on the Kinesis Advantage Pro keyboard. (You can see a good picture of the full keyboard here.) I ignored the thumb pads, tab, shift and caps lock for aesthetic reasons, and the arrow keys and function keys because it is nearly impossible to determine the frequency of those keys. This leaves four rows and 47 keys. The QWERTY keyboard looks like this:

1 2 3 4 5 6 7 8 9 0 -
Q W E R T Y U I O P \
A S D F G H J K L ; '
Z X C V B N M , . /
` =             [ ]

Shifted Keys

My program doesn’t deal with shifted keys, and modifying it to do so would be a much greater task than what I am currently doing. Rather than try to get the program to deal with shifted keys, I decided to simply choose the most common punctuation and put those on the unshifted slots.

There are 26 letters and 10 numbers. Out of 47 spots this leaves 11 spots for punctuation. The 11 most common punctuation marks are:

, . ) ( _ \ " ; - ' = /

So I pulled off the standard punctuation and stuck those on.

Results

After some (not insignificant) modifications, the program was able to optimize a full-sized keyboard. You can download my earliest functional version of the program here. It hasn’t been extensively tested and it’s messy, but it’s functional.

The first layout it came up with was this one:

Hands: 53% 46%
Fingers: 10% 10% 10% 21% 13% 14% 10% 8%


x 3 6 5 q / " 9 2 8 0
u l c o ; v m d p ) j
a r s e , f h t n i -
( ' w . = k y g b _
4 1             z 7

Fitness: 20014435648
Distance: 97925496
Inward rolls: 16.44%
Outward rolls: 5.40%
Same hand: 48.05%
Same finger: 2.02%
Row change: 21.05%
Home jump: 0.80%
To center: 3.03%
To outside: 0.40%

(I’ve added a new cost: “to outside.” It’s similar to “to center”: it penalizes a layout every time the user has to reach to the outside of the keyboard with the pinky before or after typing a letter on that same hand.)

Highly optimized and aesthetically horrible. The number keys, instead of being in a nice straight line, are all over the place. The parentheses aren’t even next to each other. This isn’t much of an issue when you’re dealing with the main 30 characters because there are no real aesthetics to speak of, but once you expand it becomes a serious problem.

The solution is to require that the program put certain keys in certain places: the number keys on the top row and the parentheses next to each other. There are two fundamental ways to do this: force it to, or give a penalty for not doing so. I found that the best way to keep the number keys in place was simply to tell the computer that it wasn’t allowed to move them. That doesn’t quite work with parentheses though, because they should still be able to move around; they just should stay next to each other. If one moves, the other moves. Forcing them to be next to each other but still be able to move around as a chunk would require adding an extra layer of complexity to the program. The simpler solution is to heavily penalize a keyboard layout every time the parentheses aren’t next to each other.

After adding these restrictions and tweaking the costs a bit, I came up with this layout:

MTGAP Full 0.1

Hands: 52% 47%
Fingers: 9% 10% 18% 13% 13% 14% 10% 9%


1 2 3 4 5 6 7 8 9 0 q
y c o u ( ) l d p w x
i s e a , m h t n r k
_ v " . ; ' f g b -
/ =             z j

Fitness:       193491944
Distance:      956628
Inward rolls:  8.42%
Outward rolls: 2.20%
Same hand: 36.00%
Same finger: 1.48%
Row change: 17.14%
Home jump: 0.26%
To center: 2.29%
To outside: 0.50%

Some of its numbers are quite impressive. For comparison, here’s Colemak (with punctuation modified a bit to fit on the keyboard):

Hands: 46% 53%
Fingers: 8% 8% 11% 18% 18% 15% 10% 9%


1 2 3 4 5 6 7 8 9 0 -
q w f p g j l u y ; =
a r s t d h n e i o '
z x c v b k m , . /
_ "             ( )

Fitness:       230028740
Distance:      1006256
Inward rolls:  4.53%
Outward rolls: 2.62%
Same hand: 42.86%
Same finger: 2.01%
Row change: 18.93%
Home jump: 0.74%
To center: 7.54%
To outside: 0.48%

My layout beats Colemak on every single metric except “to outside” (and possibly outward rolls, depending on whether you like those or not). Notice that, even though my layout puts ‘o’ (the fourth most common letter) off the home row, it still has lower travel distance than Colemak.

(In case you’re new here, the reason I compare my layout to Colemak is because Colemak is my favorite keyboard layout that I didn’t design.)

Also, if you’re interested, here’s Dvorak:

Hands: 44% 55%
Fingers: 8% 8% 12% 14% 16% 13% 13% 11%


7 5 3 1 9 0 2 4 6 8 =
' , . p y f g c r l /
a o e u i d h t n s -
; q j k x b m w v z
_ "             ( )

Fitness:       247807385
Distance:      1020108
Inward rolls:  4.14%
Outward rolls: 1.25%
Same hand: 31.14%
Same finger: 3.16%
Row change: 14.36%
Home jump: 0.50%
To center: 7.39%
To outside: 0.39%

The Interpreter, Part 1 Conclusion

August 21, 2010 4 comments

The final chapter in one man’s first journey to write an interpreter.

Jackson spent the next month or so adding smaller features and fixing bugs. At last, The Interpreter Version 1.0 was ready for release. He named his language Simfpl: Simple Interpreted Mathematically-Oriented Functional Programming Language. He put the source code on the internet for everyone to see.

To be able to use it, you first need to install GMP and MPFR.

Categories: The Interpreter

The Interpreter, Chapter 7

The continuing story of one man’s quest to write an interpreter.

Jackson’s interpreter was coming along smoothly. But for it to have a complete foundation, he still needed to add one final feature: user-defined functions.

Jackson wanted to keep the language definition simple. This would avoid special syntax, making the interpreter easier to write, and also would make the language easier to understand and even more extensible. If statements and while loops were defined not with special syntax, but as functions. Similarly, Jackson wanted function definitions to themselves be functions.

So he set up a “def” function, which would be a function to define other functions. It would take three arguments: the function name, the variable list, and the function definition. So far, so good. He implemented this pretty quickly.

But there was a problem: it was impossible to create a recursive function. If the programmer created a function (f) and tried to call it within its own body, the expression would not compile correctly. For (def) to act like a function it had to work at run-time, but the compiler needed to know that a reference to (f) was a function.

After brooding over this problem for some time, Jackson came up with a solution. He created a new data type called a function shell, designed to hold just enough information for the compiler to know what to do. Whenever the compiler saw “def”, it would look for the function name. Then it would find all other references to the function name and convert them into function shells. The program would compile knowing that (f) was a function, and would actually define the function after the (def) function was called at runtime.

All of the foundational features had been implemented. But there was still work to do. Jackson had to implement a few smaller features, fix the myriads of bugs that had sprung up, and optimize the code.

Stay tuned for the exciting continuation of The Interpreter!

Categories: The Interpreter
Follow

Get every new post delivered to your Inbox.