Tuesday, April 15, 2008

How to Start an Open-Source Project? And Why?

Why?

The last 10 days or so I've been working on TinyTM (http://tinytm.sourceforge.net/). We've decided to launch this new open-source project in order to cover an important shortcoming of ]project-open[ in the Translation sector. Most "competing" applications in this sector provide an integration with a "translation memory" -- the most important translation tool. We don't.

What?

Translation memories are a kind of "light" natural language processing tools. They record the sentences that a translator translates and are able to remember them later if a similar sentence appears. That is not as bad as, say, real machine translation, but it's sufficiently tricky to take a long time to work out.

The "Competition"

There is already one OSS translation memory, but it's written in Java; this makes it relatively difficult for linguists, many of whom are not technologically proficient, to install. It also doesn't support Microsoft Word, which is clearly a vital tool for working linguists. In short, it doesn't meet its current users' needs.

The "Plan"

So the idea is to offload most of the work to the OSS community. Will that work? There seems to be a big "market" for it (if that's the appropriate word in the OSS world); many users have requested a better translation tool, so it's certainly possible.

Allies

I've been in contact with FOLT, an industry association trying to set up an open-source TM system themselves. I was really keen to join their team until I heard that they've been discussing a TM system for about the last three years. Also, they didn't seem to grasp the open-source "publish early and frequently" approach. So that one didn't work out ...

I've already heard about other companies asking Common Sense Advisory about a TM system. They apparently even asked CSA to take the leadership. But I wasn't able to find out who that was. Maybe it's just easier to publish some code in oder to get their attention? Maybe that's the open-source answer to business development: Just publish your code.

Another group of potential allies are small TM vendors. These small companies currently have no chance to fight the market leader (SDL Trados), which has an 80%/90% market share. But if TinyTM gets successful, these small vendors might be able to "ride the open-source tiger." Since we OSS guys are usually not very good at creating user interfaces for non-techies, these guys might be able to sell their polished front ends as a kind of "enterprise version" for TinyTM if they can connect to the TinyTM back end. That's not 100% the open-source idea, but a nice commercial front end might convince the "early majority" to adopt TinyTM once it starts taking off. For this reason I've used the LGPL for the TinyTM protocol and the interface code.

The "Market"

So what are the "success criteria" for TinyTM to fly? There are some 900,000 people in the world spending their days translating texts, according to a post from Common Sense Advisory. But most of them are linguists, and linguists tend to be IT-averse, meaning they don't make very productive open-source community members.

Success Criteria

So we'd need to get the 1%-2% of linguists who are "open" to open-source and try to provide them with a fun environment where they try and extend the TM:
  • "Recruit" enough skilled developers to extend/finish TinyTM
    • Spread the word to reach as many of them as possible
    • Try to lower the barrier to involvement.
    • Think of ways to make involvement a fun experience

  • Get some donations to push the project

  • Reach to innovators and early adopters in translation companies who could start using TinyTM while it's still in its early phases.

Some Online Community Theory

Philip Greenspun, an impressive entrepreneur, once said: "Communities need killer content to attract users." So is a TM killer content? I believe that is actually true for a large number of people, because TMs still cost several hundred dollars per license.

Code!

So I've pulled my basic knowledge about natural language processing and written a first version of the software, trying to simplify the architecture as much as possible while keeping everything modular. Web services sounded nice, but I actually took the decision to start with a plain old relational database and Pl/SQL as the language to implement remote procedure calls.

This approach is very simple to start, it doesn't require all the web-application XML stuff, and it's easy to wrap the Pl/SQL calls in XML later. So: Quick wins without sacrificing anything about the future -- that seems to fit.

On the client side I decided to go with something very unconventional for an open-source project: VBA. Yes, Visual Basic for Applications. The advantage: MS Word, the #1 translation environment (reach the 900k users!), contains an ODBC driver, and already comes with the "editor."

Spread the Word

So I set up http://tinytm.sourceforge.net/ with the purpose:
  • Attract the maximum number of developers
  • Attract some companies as sponsors
  • Convince the "suits" that the project is about something business
For Thursday (17th of April, 2008) I've prepared a press release and a list of magazines and other potential multipliers. I've send the press release to a few contacts for feedback, and it seems more or less to work out.

So let's see if it works out.
Please let me know!

Frank

No comments: