Archive

Archive for the ‘Language’ Category

Which Numbers are the Most Common?

February 4, 2011 28 comments

I’ve often wondered which numbers are used most frequently. The low numbers (0, 1, 2) are probably the most common and the single-digit numbers are all among the most common, but how common are larger numbers and which larger numbers are the most common? Today I decided to find out.

I analyzed my text corpus and came up with the following as the 100 most common numbers:


1 2 0 3 5 4 10 000 6 8 7 20 12 2005 15 11 9 30 100 2008 16 2007 14 50 18 25 13 17 19 1812 24 2006 23 2004 21 40 22 26 60 27 70 2001 2003 28 200 2002 29 31 80 500 2000 300 1805 150 35 90 1000 101 45 32 36 07 33 00 08 1809 99 75 1990 1984 34 1999 400 48 800 95 06 44 1807 47 1998 41 85 55 250 83 53 1813 43 02 52 37 39 600 1980 51 03 120 04 64

As you might expect, the single-digit numbers are all among the most common. 10, 000, 20, 12, 2005, 15 and 11 are all more common than the least common single-digit number. I’m not surprised by 10 or 20, but I don’t know how 000 got there. 2005 is also an interesting one which I’ll get to later.

The most common numbers from 10 to 19:

10 12 15 11 16 14 18 13 17 19

This isn’t too surprising either.

Here’s every two digit number:


10 20 12 15 11 30 16 14 50 18 25 13 17 19 24 23 21 40 22 26 60 27 70 28 29 31 80 35 90 45 32 36 07 33 00 08 99 75 34 48 95 06 44 47 41 85 55 83 53 43 02 52 37 39 51 03 04 64 38 42 46 65 49 01 09 54 66 58 84 89 67 05 59 98 72 56 73 77 62 78 68 76 57 92 63 61 81 82 69 97 88 86 96 71 79 94 93 74 91 87

Every number from 10 to 99 appears at least once, which I suppose is what you’d expect. Lower numbers tend to be smaller, as do round numbers like 50, 25, and 40.

I also looked at the most common dates:


2005 2008 2007 1812 2006 2004 2001 2003 2002 2000 1805 1000 1809 1990 1984 1999 1807 1998 1813 1980

Probably this means that most of the text corpus is from the late 2000’s. But besides recent dates, a few other dates stand out: 1812, 1805, 1809, 1984. (1000 is probably not used as a date most of the time.) 1812 is probably because of references to the War of 1812 and 1984 is because of references to the book. 1805 and 1809 both appear repeatedly in War and Peace, one of the books used in the text corpus.

For our last exercise, let’s find the first number that doesn’t appear. Every number from 0 to 567 appears at least once. The first number not to appear at all is 568.

I’ve always wondered which numbers are the most common, and now we have an answer.

Categories: Language

Our World of Lists

The world we live in is quite an interesting place. Its structure, and especially the way that we perceive its structure, is worth observing.

When we read or write, we do so linearly. We only read one word at a time, and when we’re done, we go on to the next word. Sentences are treated as lists, where only one element at a time can be read. Our inability to comprehend anything other than lists is remarkable.

You may object to this. Sure, we read linearly, but we can also read non-linear graphs, flow charts, and even just clusters of words. While it is true that we can read such structures, we always convert them into lists, taking in one word at a time. Because time is linear, we can only perceive one thing at once before moving on to the next one. Sometimes we are able to package multiple things together and perceive them as one thing, but we cannot truly perceive more than one item at once.

Although we perceive things as lists, it is difficult to imagine things being any other way. How else could we perceive things? Perhaps in a data structure other than lists, for example binary trees, language would be more versatile. Language, of course, is not the only instance of our list perception — it is simply a very common one. What would language be like if we perceived it as a binary tree?

Well, it would be a lot less linear, that’s for sure. The whole idea of language would be a lot different, and perhaps more expressive. It’s difficult to imagine, though, just because our minds are so fundamentally grounded in lists.

The world outside of lists is an interesting one to speculate about. Where might this speculation lead?

Categories: Language, Math

The PSAT: an Objective Assessment

October 24, 2009 1 comment

This is a biased student’s unbiased assessment of the fun and frenzy that the PSAT brings to the world.

The PSAT had the usual categories that you find on a standardized test: math, reading comprehension, critical reading. The math was pretty simple stuff: basic geometry, basic statistics (median, mode, etc), basic math rules (absolute value, integers vs. rationals, etc). It’s actually been a while since I did any of that stuff, but I managed to remember it all. After I finished, I tried to find a generalized form for approximating the nth root of a number.

The most fun part, though, was the part where we had to read a story or an essay and then answer questions about it. The questions themselves weren’t so interesting; but some of the little writings were actually very fascinating. There was one section with two short essays about grammar sticklers, which I found to be pretty hilarious. And there was one where somebody was bad-mouthing Wikipedia. I wrote notes all over the test booklet, deconstructing the essay. The essay cited a study that Wikipedia has four errors for every three that Encyclopedia Britannica has. And the essayist’s response was something along the lines of “no reference work is infallible.” While true, he or she is completely disregarding the fact that this study demonstrates just how accurate Wikipedia really is. Wikipedia is moderated; 99% of websites are not moderated. While there are many sources that are more reliable than Wikipedia, there are very few that achieve the same balance of reliability and accessibility. I could write about the benefits of Wikipedia for hours.

I certainly hope that the SAT is as amusing as the PSAT was for me.

Categories: Language, Math

The PSAT: an Objective Assessment (preview)

October 17, 2009 1 comment

This morning, I spent four hours taking the PSAT. Without a doubt, it is the best standardized exam that I have ever taken.

As much as I would love to, I am not allowed to talk about the PSAT for one week after taking it. So return in one week to hear my fascinating insights!

Update: It is now available!

Categories: Language, Math