mirror of
https://salsa.debian.org/freeipa-team/freeipa.git
synced 2025-01-03 20:56:58 -06:00
1b4eab0411
This patch reverts the use of pygettext for i18n string extraction. It was originally introduced because the help documentation for commands are in the class docstring and module docstring. Docstrings are a Python construct whereby any string which immediately follows a class declaration, function/method declaration or appears first in a module is taken to be the documentation for that object. Python automatically assigns that string to the __doc__ variable associated with the object. Explicitly assigning to the __doc__ variable is equivalent and permitted. We mark strings in the source for i18n translation by embedding them in _() or ngettext(). Specialized extraction tools (e.g. xgettext) scan the source code looking for strings with those markers and extracts the string for inclusion in a translation catalog. It was mistakingly assumed one could not mark for translation Python docstrings. Since some docstrings are vital for our command help system some method had to be devised to extract docstrings for the translation catalog. pygettext has the ability to locate and extract docstrings and it was introduced to acquire the documentation for our commands located in module and class docstrings. However pygettext was too large a hammer for this task, it lacked any fined grained ability to extract only the docstrings we were interested in. In practice it extracted EVERY docstring in each file it was presented with. This caused a large number strings to be extracted for translation which had no reason to be translated, the string might have been internal code documentation never meant to be seen by users. Often the superfluous docstrings were long, complex and likely difficult to translate. This placed an unnecessary burden on our volunteer translators. Instead what is needed is some method to extract only those strings intended for translation. We already have such a mechanism and it is already widely used, namely wrapping strings intended for translation in calls to _() or _negettext(), i.e. marking a string for i18n translation. Thus the solution to the docstring translation problem is to mark the docstrings exactly as we have been doing, it only requires that instead of a bare Python docstring we instead assign the marked string to the __doc__ variable. Using the hypothetical class foo as an example. class foo(Command): ''' The foo command takes out the garbage. ''' Would become: class foo(Command): __doc__ = _('The foo command takes out the garbage.') But which docstrings need to be marked for translation? The makeapi tool knows how to iterate over every command in our public API. It was extended to validate every command's documentation and report if any documentation is missing or not marked for translation. That information was then used to identify each docstring in the code which needed to be transformed. In summary what this patch does is: * Remove the use of pygettext (modification to install/po/Makefile.in) * Replace every docstring with an explicit assignment to __doc__ where the rhs of the assignment is an i18n marking function. * Single line docstrings appearing in multi-line string literals (e.g. ''' or """) were replaced with single line string literals because the multi-line literals were introducing unnecessary whitespace and newlines in the string extracted for translation. For example: ''' The foo command takes out the garbage. ''' Would appear in the translation catalog as: "\n The foo command takes out the garbage.\n " The superfluous whitespace and newlines are confusing to translators and requires us to strip leading and trailing whitespace from the translation at run time. * Import statements were moved from below the docstring to above it. This was necessary because the i18n markers are imported functions and must be available before the the doc is parsed. Technically only the import of the i18n markers had to appear before the doc but stylistically it's better to keep all the imports together. * It was observed during the docstring editing process that the command documentation was inconsistent with respect to the use of periods to terminate a sentence. Some doc had a trailing period, others didn't. Consistency was enforced by adding a period to end of every docstring if one was missing. |
||
---|---|---|
.. | ||
as.po | ||
bn_IN.po | ||
contributing_translators.txt | ||
de.po | ||
el.po | ||
es.po | ||
fa.po | ||
fr.po | ||
gu.po | ||
he.po | ||
id.po | ||
ipa.pot | ||
it.po | ||
ja_JP.po | ||
ja.po | ||
kn.po | ||
ko.po | ||
LINGUAS | ||
Makefile.in | ||
nl.po | ||
pl.po | ||
pt_BR.po | ||
pt.po | ||
pygettext.py | ||
README | ||
ru.po | ||
sv.po | ||
test_i18n.py | ||
uk.po | ||
zh_CN.po | ||
zh_TW.po |
Q: I've added a new source file, how do I make sure it's strings get translated? A: Edit Makefile.in and add the source file to the appropriate *_POTFILES list. Then run "make update-po". NOTE: Now this i only necessary for python files that lack the .py extension. All .py, .c and .h files are automatically sourced. Q: How do I pick up new strings to translate from the source files after the source have been modified? A: make update-po This regenerates the pot template file by scanning all the source files. Then the new strings are merged into each .po file from the new pot file. Q: How do I just regenerate the pot template file without regenerating all the .po files? A: make update-pot Q: How do I add a new language for translation? A: Edit the LINGUAS file and add the new language. Then run "make create-po". This will generate a new .po file for each language which doesn't have one yet. Be sure to add the new .po file(s) to the source code repository. For certain languages, you may have to edit the Plurals line. See: http://www.gnu.org/software/hello/manual/gettext/Plural-forms.html However, if this line is wrong, it is often an indicator that the locale value is incorrect. For example, using 'jp' for Japanese in stead of 'ja' will result in an invailid Plural's line. Q: What files must be under source code control? A: The files Makefile.in, LINGUAS control the build, they must be in the SCM. The *.pot and *.po files are used by translators, they must be in SCM so the translator can checkout out a .po files, add the translations, and then check the .po file back in. Be careful, .po files may be automatically updated when the source files change (or the .pot changes, usually the .pot file changes only as a result of rescanning the source files). This mean a .po file might be automatically updated while a translator has the file out for editing, all the caveats about SCM merging apply. Q: Which are automatically generated and thus do not need to be in SCM? A: The *.mo files are automatically generated on demand from their corresponding .po file. Q: What role does the .pot file play? A: The .pot file is called a template file. It is generated by scanning all the source files (e.g. *.py *.c *.h) in the project using xgettext. xgettext locates every translatable string (e.g. strings marked with _()) and adds that string along with metadata about it's location to the .pot file. Thus the .pot file is a collection of every translatable string in the project. If you edit a source file and add a translatable string you will have to regenerate the .pot file in order to pick up the new string. Q: What is the relationship between a .po file and the .pot file? A: A .po file contains the translations for particular language. It derives from the .pot file. When the .pot file is updated with new strings to translate each .po will merge the new strings in. The .po file is where translators work providing translations for their language. Thus it's important the .po not be recreated from scratch and is kept in SCM, otherwise the translators work will be lost. Let's use an example for French, it's .po file will be fr.po. 1) Developer creates main.c with one translatable sting _("Begin"). 2) Produce the .pot file by running xgettext on main.c 3) .pot file contains one msgid, "Begin" 4) fr.po is created from the .pot file, it also contains one msgid, "Begin" 5) Translator edits fr.po and provide the French translation of "Begin". 6) Developer adds new translatable sting _("End") to main.c 7) Generate a new .pot file by running xgettext on main.c 8) .pot file contains two msgid's, "Begin", and "End" 9) fr.po is missing the new msgid in the .pot file, so the .pot is merged into fr.po by running msgmerge. This copies into fr.po the new "End" msgid but preserves the existing translations in fr.po (e.g. "Begin"). The fr.po will now have 2 msgid's one which is translated already (e.g. "Begin") and one that untranslated (e.g. "End"). 10) Sometime later the French translator comes back to see if he/she needs to add more translations to fr.po. They see there is a missing translation, they check fr.po out from SCM, add the missing translation, and then check fr.po back into SCM. This means at any given moment the set of .po files will have varying degrees of translation completeness. Because the .po files are merged when the source code files are updated existing translations are not lost. It also means a .po file which was fully translated may need new translations after a .pot update. It is permissible to have incomplete translations in a message catalog, at run time if a translation for a particular string is available in the message catalog the user will be presented with the string in their language. However if the string is not yet translated in the .po file then they just get the original string (typically in English). Q: What are .mo files? A: .mo files are the content of a .po file but in "machine" format for fast run time access (mo = Machine Object, po = Portable Object). .mo files are what gets installed along with the package. Think of a .po as a source file which is compiled into a object file for run time use. Q: Why don't we use gettexize and autopoint? A: Because the framework they produce is too limited. Specifically there is no way to pass the source language to xgettext when it scans a file. xgettext only knows how to automatically determine the language from the source files extension. However we have many files without extensions, thus we have to group all Python (et. al.) files together and run xgettext on every file *we* know to Python (because xgettext can't figure this out itself if there is no file extension). There is another added benefit of avoiding gettextize and autopoint, simplicity. Managing translations is complex and hard enough as it is, gettextize and autopoint adds another whole layer of complexity which just further obscures things. Q: Who created the awful mess and who do I ask when things don't work as I expect or I have further questions? A: John Dennis <jdennis@redhat.com>