Archive

Archive for February, 2011

Which Numbers are the Most Common?

February 4, 2011 28 comments

I’ve often wondered which numbers are used most frequently. The low numbers (0, 1, 2) are probably the most common and the single-digit numbers are all among the most common, but how common are larger numbers and which larger numbers are the most common? Today I decided to find out.

I analyzed my text corpus and came up with the following as the 100 most common numbers:


1 2 0 3 5 4 10 000 6 8 7 20 12 2005 15 11 9 30 100 2008 16 2007 14 50 18 25 13 17 19 1812 24 2006 23 2004 21 40 22 26 60 27 70 2001 2003 28 200 2002 29 31 80 500 2000 300 1805 150 35 90 1000 101 45 32 36 07 33 00 08 1809 99 75 1990 1984 34 1999 400 48 800 95 06 44 1807 47 1998 41 85 55 250 83 53 1813 43 02 52 37 39 600 1980 51 03 120 04 64

As you might expect, the single-digit numbers are all among the most common. 10, 000, 20, 12, 2005, 15 and 11 are all more common than the least common single-digit number. I’m not surprised by 10 or 20, but I don’t know how 000 got there. 2005 is also an interesting one which I’ll get to later.

The most common numbers from 10 to 19:

10 12 15 11 16 14 18 13 17 19

This isn’t too surprising either.

Here’s every two digit number:


10 20 12 15 11 30 16 14 50 18 25 13 17 19 24 23 21 40 22 26 60 27 70 28 29 31 80 35 90 45 32 36 07 33 00 08 99 75 34 48 95 06 44 47 41 85 55 83 53 43 02 52 37 39 51 03 04 64 38 42 46 65 49 01 09 54 66 58 84 89 67 05 59 98 72 56 73 77 62 78 68 76 57 92 63 61 81 82 69 97 88 86 96 71 79 94 93 74 91 87

Every number from 10 to 99 appears at least once, which I suppose is what you’d expect. Lower numbers tend to be smaller, as do round numbers like 50, 25, and 40.

I also looked at the most common dates:


2005 2008 2007 1812 2006 2004 2001 2003 2002 2000 1805 1000 1809 1990 1984 1999 1807 1998 1813 1980

Probably this means that most of the text corpus is from the late 2000’s. But besides recent dates, a few other dates stand out: 1812, 1805, 1809, 1984. (1000 is probably not used as a date most of the time.) 1812 is probably because of references to the War of 1812 and 1984 is because of references to the book. 1805 and 1809 both appear repeatedly in War and Peace, one of the books used in the text corpus.

For our last exercise, let’s find the first number that doesn’t appear. Every number from 0 to 567 appears at least once. The first number not to appear at all is 568.

I’ve always wondered which numbers are the most common, and now we have an answer.

Categories: Language