The New Keyboard Layout Project (NKLP)
Right now I am expanding my corpus. If anyone else has big blocks of text, like a bunch of stuff they typed on their computer, send it to me at MTGandP@gmail.com. Tell me what’s in it (like, emails, computer programs, business letters) so I don’t have to read it. (For confidentiality reasons, and for my convenience.)
I’m trying to get my corpus up to 10,000 pages, because I think that’s enough to have a really good variety of text. Right now I have about 3000.
I have about 11,000 pages in my corpus. However, I only have about 1000 pages of casual text, an I’d like about 3000. I’d also like about 1000 pages of news, and only have about 400. But I’m close to being done. (Collecting news is just so tedious, though.)
I just realized that Carpalx has a good corpus that’s free. It doesn’t have everything that I want, but it has a lot of books and programming code.
VvV from colemak.com wanted to make an evolutionary algorithm for non-latin letters. I don’t have any data for any languages other than English. But if I did, what languages could be done? I’ll look at the world’s most popular languages (from geography.about.com).
1. Mandarin Chinese – 882 million
There are far too many characters in Chinese to make a keyboard layout.
2. Spanish – 325 million
3. English – 312-380 million
4. Arabic – 206-422 million
This could work. http://en.wikipedia.org/wiki/Arabic_alphabet
5. Hindi – 181 million
This alphabet is also probably small enough. http://www.omniglot.com/writing/hindi.htm
6. Portuguese – 178 million
7. Bengali – 173 million
It looks kind of big, but it should work. http://www.omniglot.com/writing/bengali.htm
8. Russian – 146 million
9. Japanese – 128 million
Same deal as Chinese.
10. German – 96 million
Any comments you have relating to the NKLP should be posted here.