How to translate Tkabber to a new language

Submitted by kostix on Sat, 01/26/2008 - 16:54

Creating a new translation is easy, here's how to do one:

Getting the source

First, either check out the tip of the repository's trunk (much better) or get a "release candidate" version.

The former choice is better because textual resources of Tkabber change from time to time (usually new resources are being added) and so the best way of supporting a translation is to track the repository commits (via "tkabber-dev" mailing list for instance) and preparing patches with translation updates when some string resources change.

What to translate

Tkabber has several parts to be translated: the "core" (distributed as tkabber-VER.tar.gz) and the standard external plugins (distributed as tkabber-plugins-VER.tar.gz).

Each "entity" that can be translated is identified by the presence of a subdirectory named "msgs" in its directory. Thus the core has such subdirectory and most plugins have it too.

This "msgs" directory contains a set of files with translation statements — one file per language: "de.msg" for German, "es.msg" for Spanish and so on. They are called message catalogs.

It should be now clear that each entity is translated separately from the others.

Also there are several ".rc" files in the "msgs" direcrory of the core. Those are translated BWidget resources — generic strings used in the GUI components offered by the BWidget package used in Tkabber to implement many parts of the user interface (standard dialogs, for example).

How to translate message catalogs

To help with translation, Tkabber offers a small utility called "extract.tcl" which is located under the "contrib/extract_translations" folder in the Tkabber's directory. This utility always dumps the text it generates to stdout.

To illustrate its usage, here are the steps to translate Tkabber's core to Czech:

Get to a shell and change the current directory to the Tkabber's root folder (that's where "tkabber.tcl" and "msgs" are located).
Run "extract.tcl" to get translatable strings:
```
tclsh contrib/extract_translations/extract.tcl -lang cz . >msgs/cz.msg
```
Which means: «process all files matching "*.tcl", recursively, starting from the current directory (".") and redirect the output to the file "msgs/cz.msg"».

Note that this will overwrite the file "msgs/cz.msg" if one exists, so think twice before you do this — you can easily void any changes you made to such file before mistakenly issuing a command like this.
After that you will have your brand new shiny message catalog in place. It contents look like this:
```
...
::msgcat::mcset cz "continue processing rules"
::msgcat::mcset cz "forward message to"
::msgcat::mcset cz "my status is"
::msgcat::mcset cz "reply with"
::msgcat::mcset cz "store this message offline"
...
```
These are actually just calls to a Tcl command ::msgcat::mcset that will be executed when this file is sourced by Tkabber. ::msgcat::mcset accepts three arguments:
1. The language code.
2. The source (reference) string.
3. The string containing a translation for its source string for the given language.
As you can see, the last argument to ::msgcat::mcset is absent in each call, so your task is to append translated strings to them.

Note: Actually, the last argument to ::msgcat::mcset (the translated string) is optional and without it this command will be effectively a no-op. This is particularly useful for incremental translation: you can translate strings randomly, one-by-one and evaluate the intermediate results by just running Tkabber with the incomplete message catalog.
Before editing the message catalog you must develop the strategy to do that observing that this file must be in UTF-8 without the byte-order-mark.

Thus, to edit this file you have two options:
- Use an editor understanding UTF-8. Vim and Emacs will be OK. On Windows 2000 and above you can use Notepad, but see below for a description of its byte order mark issue.
- Make a private copy of the message catalog and edit it using some non-Unicode charset appropriate for your language, then overwrite the target message catalog with a re-encoded version using some charset/encoding translation tool like iconv or recode.
Adding translations is just a matter of turning the strings like
```
::msgcat::mcset cz "continue processing rules"
```
into something like
```
::msgcat::mcset cz "continue processing rules" "my translation in Czech"
```
You can break long lines using the standard Tcl rules for grouping characters into "Tcl words" (described here), for instance, the translation string in
```
::msgcat::mcset cz "continue processing rules" "my translation\
    of a very long\
 string to Czech"
```
will end up being interpreted by Tcl as
```
my translation of a very long string to Czech
```
In other words, each occurence of a "backslash + newline + any number of whitespace characters, including none" is replaced by a single ASCII space character.

The backslash+newline can also be used between the arguments to ::msgcat::mcset like this:
```
::msgcat::mcset cz \
   "continue processing rules" \
   "my translation of a very long string to Czech"
```

How to translate BWidget resources

Note that BWidget already has several built-in translations of its resources, these are: Danish (da), German (de), Spanish (es) and French (fr). There's also a default (reference) file for English (en).

If your target language isn't among the set presented above, and it's not present among the ".rc" files presented in the core "msgs" direcrory, follow these steps:

Either take any of the already present ".rc" files, or download BWidget sources from the Tcllib project site and look into the "lang" folder — it contains several resource files including the "reference" file "en.rc".

Copy the chosen file to make your new resource file under the core "msgs" directory. Like with message catalogs, the file must be named after the target language code, say, "msgs/cz.rc" for Czech.
Edit the file, using the rules described below.

BWidget resource files map entries from the "Tk widget options database" to their respective values, for example, "en.rc" contains these lines among the others:

*okName:      &OK
*cancelName:  &Cancel
*yesName:     &Yes
*noName:      &No

Your task as a translator is to change values (they are to the right hand of ":") to their translated equivalents, like this (for Russian):

*okName:      &OK
*cancelName:  От&мена
*yesName:     &Да
*noName:      &Нет

Resource files are read by the Tcl interpreter so the values may be enclosed in double quotes, if needed.

The file is in UTF-8, so all the relevant information on how to deal with it when traslating message catalogs still applies.

The "&" characters make the symbols they precede to be underlined when rendered on a widget.

Following the trunk updates

To make your translation really helpful you should follow the main line of Tkabber development by tracking the changes made to the repository trunk and sending back patches with tratslation updates when any string resources are changed/deleted/added. This way you will keep your translation up to date.

As was just mentioned, string resources may be deleted, added and changed. "Changed" can be considered as deleted + added, so in fact we deal just with deletions and additions. extract.tcl provides methods to track them.

To actually know what was added and deleted extract.tcl builds up a list of available string resources and then compares it with string resources present in the specified message catalog. The results, as before, go to stdout.

To see what was added, run

tclsh contrib/extract_translations/extract.tcl . msgs/cz.msg >>msgs/cz.msg

This will append new untranslated string resources to the end of the message catalog.

Note the ">>" redirection of the standard output and never use ">" in this case which would overwrite the file. If you're not familiar with these concepts, better rediret the output to some temporary file using ">" and then use your text editor's abilities to add the contents of that file to your message catalog.

To see what was deleted, run

tclsh contrib/extract_translations/extract.tcl -unused . msgs/cz.msg

This will show you the strings that are present in the message catalog but are now absent from Tkabber, you should find them in the message catalog and delete.

Note that the "-lang" option isn't needed for these cases.

Fossil allows for easy making of patches from your changes, see the explanation of how to do it.

Tips and tricks

Testing translated resources

Tkabber selects the message catalog to load based on the current locale, so when you're preparing the message catalog for a language different from your locale, this little trick will help — make Tkabber to see a modified "LC_MESSAGES" environment variable. For bash-like shells, it's just a matter of running tkabber like this:

$ LC_MESSAGES=cz tkabber

(provided you have a shell script named "tkabber" somewhere in your path) or

$ LC_MESSAGES=cz wish /path/to/tkabber.tcl -name Tkabber

if you don't.

For more limited shells, separate steps may be needed to adjust the environment so that spawned processes see the change.

Another way to make Tkabber pick your message catalog is to put a call to ::msgcat::mclocale into the Tkabber's config file, like this:

::msgcat::mclocale cz

Preparing a patch out of your work: an easy way

If you translate a copy of Tkabber checked out of its Fossil repository, you can make use of the fact Fossil keeps "pristine" copies of each checked out file.

This means, that when you're done with your work and want to send your changes to the maintainer, all you need is to invoke

cd /path/to/tkabber/root
fossil diff >/tmp/cz.msg.diff

or something like this to get the patch ready for submission.

You can take the same approach for other translatable entities.

Of course, this method only has sense for updating a translation; if you have just created a new translation it may be better to just bundle the files created and send them to the maintainer.

Note that you can even change a "live" working directory in which you keep the up-to-date chekout of Tkabber if you're using the bleeding edge version. It's not needed to copy this directory to a temporary location just to make your translation updates and generate a patch — you can make your changes in place, just perform

cd /path/to/tkabber/root
fossil revert

after you're done with your changes and generated a patch out of them. This will revert the repository back to the state it was in when you started your edit.

Miscellaneous notes

Notepad and UTF-8 byte order marks

Be aware that on each save of an UTF-8 file Windows Notepad prepends the contents of the file with the so-called "UTF-8 byte-order-mark" (BOM) which is exactly three bytes: 0xEF, 0xBB, 0xBF. Unfortunately, Tcl can't deal with BOMs so you'll end up with an unloadable message catalog (or BWidget resource file).

To fix the issue, open the just saved file with some dumb text editor (the venerable built-in "edit" will be just OK), delete the first three bytes, save and exit.