I stopped being a computer programmer in early 2002. I had been studying it at college, but I was running out of money, and a series of life events were throwing up obstacles, so I ended up leaving school and working as a short order cook in a restaurant (good times) and when I went back to school my interests had evolved and I ended up graduating in Political Science.
By 2002, though, I had been programming for ten years. My mom and stepfather bought me a Commodore 64 one year, and even loading a game from one of its 5 1/2 inch floppies required typing in a basic BASIC program (LOAD “*”,8,1) and you could play around with nonsense like printing your name infinite times using GOTO. At that time, there were magazines that had a “type-in game” every issue – literally a program, printed out in a magazine, that you would manually type into your computer and then play. Ah, nostalgia.
Fast forward to last year. I was visiting the TLG office one day and some of the staff were talking about a problem they were having: how to distribute their limited supply of books to volunteers, to fill in the gaps in what the volunteers already had, in a way that would created the highest number of full sets of books. They wanted volunteers to have full sets for ease of bookkeeping, but up until then books had been distributed ad hoc based on volunteer requests, and the system was leaving many volunteers without the materials they needed.
TLG had this huge spreadsheet that they had filled in based on email and phone interviews – the books every volunteer had – and an inventory of books TLG had. It seemed transparently like a task suited for a computer rather than a human, so I said as much, and I ended up offering to write a program to basically fill in slots maximize the number of full sets. I went home, and managed to finish before I went to sleep. (I make no promises about how/whether TLG used the distribution list I gave them, so I claim no responsibility if you didn’t get your books last year…)
A while later, Coursera debuted its Algorithms course in conjunction with Stanford University, and I signed up. The professor, Tim Roughgarden, is just incredible. His pace is fast but basic – concepts anyone can understand, presented quickly and wittily. Occasionally I pause the video lectures to reflect on an interesting idea he mentioned, or to try to anticipate the solution to a problem he’s presented, but overall the pacing is perfect – fast enough to keep me interested, but not too fast to follow. Most importantly, he never glosses over any important facts or hand-waves anything away. Algorithms 1 really renewed my interest in programming, and I’m now taking his Algorithms 2, which is also really great.
Sometime between those two classes, though, I had a bunch of downtime. I spent up to an hour a day studying Georgian – mostly memorizing vocabulary and irregular verb forms – and I noticed that the slowest part of study was looking up words to put into my flashcards. It seemed like the sort of task that would be well-suited for a computer, assuming a reasonable database of words and translations were available.
More importantly, I felt that flashcards had some limitations that could be addressed by a program with at least some ability to randomize questions – for instance, one day it might say “ვაშლი” and you have to type in “apple”; the next day it might say “ვჭამ _____” a. წიგნი b. ვაშლი c. სახლი d. ბანკი” and you have to select b. Writing a program that can manage that is an interesting and surprisingly complex endeavor – for example, in order to ask a question like the above, the word database would have to have some kind of list of nouns that are edible and nouns that are not edible, to draw random correct and incorrect answers from. And that’s just for the verb “I eat” – you’d also need to know nouns that you can read, write, watch, play, etc.
So that’s basically a gross oversimplification of the problem – and I have several different sorts of approaches in mind in terms of creating educational software for Georgian – but I figured that before I solved that problem, I’d solve another – the lack of any reliable database of Georgian words and their English translations.
In the last few weeks I’ve been working on two problems: one, the problem of extracting Georgian/English word matches from the various available electronic sources – from .pdf dictionaries to the enggeo android app to possibly working on querying or scraping the various online translation resources and dictionaries; and two, the problem of making that data useful for humans and machines who want to use it.
The first problem is relatively simple, but tedious – it requires that I manually write a different parser for every data set I get, that I look up website APIs and do a bunch of other very time-consuming processing and hoop-jumping – and so far, I’ve basically got one and a half sources in usable format.
The second problem is manifold – I want to be able to automatically merge two sources, so that any disagreements about meaning are reported for human checking, and any agreements about meaning are scored in some way to measure the reliability of each translation. I want to figure out how to add morphological information to the words I have – for nouns, I want all their case forms, for verbs, I want all their conjugations and arguments – and I want to see if I can do that by cross-referencing with Georgian texts (which would involve writing some kind of parser and collecting some kind of corpus) or if I have to gather a team of humans to input various morphological data. If it’s the latter, I want to write an interface to make it very easy for a human to see what information is missing, and to input that information.
So I’ve been spending a lot of time working on all of these issues. I’ve spent hours over the last two weeks learning to program a Windows UI so I can make the interface I mentioned above. As it turns out, windows programming is nowhere near as scary as I thought, and I think in about two weeks I’ll roll out a simple English-Georgian/Georgian-English dictionary (for Windows… sorry Mac people), that can automatically decline any Georgian noun, just as a sort of demonstration/proof of concept so that I can convince people who might be interested in this project that I’m reasonably serious and reasonably competent about this whole matter.
I intend to spend most of my winter break (I have a month off!) working on these (and other) applications, and hopefully by the end of January I will have really substantial progress to report. Luckily, I really enjoy programming, and learning new things, and so this whole venture is really fun for me in addition to being productive in terms of my continuing struggle to master the Georgian language.