[[meta title="DEP-4: Translation packages in Debian (TDebs)"]] Date: March 2008 Drivers: Neil Williams Joerg Jaspert Thomas Viehmann Mark Hymers Frank Lichtenheld URL: http://people.debian.org/~codehelp/tdeb/ Abstract: This document provides an overview of the TDeb format, TDeb design and usage. This specification should be considered as a work in progress. Online version: http://people.debian.org /~codehelp/tdeb Source SGML: /~codehelp/tdeb/Draft.sgml [[!toc levels=2]] --- # TDeb Specification --- This is where the Draft TDeb Specification, created at the ftp-master/i18n meeting in Extremadura, will be developed and improved. --- ## Motivation --- 1. Updates to translations should not require source NMU's. 2. Translation data should not be distributed in architecture-dependent packages. 3. Translators should have a common interface for getting updates into Debian (possibly with automated TDeb generation after i18n team review). ## Version 0.0.2 ## Copyright © 2008 * Neil Williams codehelp@debian.org, * Joerg Jaspert joerg@debian.org, * Thomas Viehmann tv@beamnet.de, * Mark Hymers mhy@debian.org, * Frank Lichtenheld djpig@debian.org, * partially based on dpkg man pages, © by the original authors. This document is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. For more details, on Debian GNU/Linux systems, see the file /usr/share/common-licenses/GPL-2 for the full license. --- # Format of binary translation packages (tdeb) ## Summary The tdeb binary package format is a variation of the deb binary package format. It has the same structure as deb, but the (single) data member is replaced by bzip2-compressed members for each LOCALE_ROOT supported. ## Format specification The file is an ar archive with a magic number of !<arch>. The first member is named debian-binary and contains a series of lines, separated by newlines. Currently only one line is present, the format version number, 2.0 at the time this manual page was written. Programs which read new-format archives should be prepared for the minor number to be increased and new lines to be present, and should ignore these if this is the case. If the major number has changed, an incompatible change has been made and the program should stop. If it has not, then the program should be able to safely continue, unless it encounters an unexpected member in the archive (except at the end), as described below. The second required member is named control.tar.bz2. It is a tar archive compressed with bzip2 which contains the package control information, as a series of plain files, of which the file control is mandatory and contains the core control information. The control tarball may optionally contain an entry for '.', the current directory. The members following the control.tar.bz2 are named t.${LOCALE_ROOT}.tar.bz2. Each contains the filesystem archive for the locale root, as a tar archive compressed with bzip2. LOCALE_ROOT must match the regular expression [a-z]{2,3} These members must occur in this exact order. Current implementations should ignore any additional members after the t.${LOCALE_ROOT}.tar.bz2 members. Further members may be defined in the future, and (if possible) will be placed after these. Any additional members that may need to be inserted before t.${LOCALE_ROOT}.tar.bz2 and which should be safely ignored by older programs, will have names starting with an underscore, '_'. Those new members which will not be able to be safely ignored will be inserted before the t.${LOCALE_ROOT}.tar.bz2 members with names starting with something other than underscores, or will (more likely) cause the major version number to be increased. --- # Source format ## +t1.diff.gz TDebs will use a source format for translation updates that will not cause any changes in the package binaries. The foo_1.2.3-4+t1.diff.gz will be created for changes made by translators and tools will need to apply the translation diff after applying the .diff.gz prepared (and signed) by the Debian maintainer. The +t[0-9] update will need to be built from the source package but only details changes in the translated content. No changes will be allowed in the package binaries or untranslated content. Translation updates are source-package based and translation updates are denoted by the +t[0-9] suffix where 0 is assumed to be the original upload by the Debian maintainer. e.g. for a non-native package foo: source version 1.2.3-4, the first TDeb update would be foo_1.2.3-4+t1 the changes from -4 to -4+t1 will be in foo_1.2.3-4+t1.diff.gz BinNMU versions are not affected as it is source based. The +t1.diff.gz needs dpkg support which is being implemented: New translations and translation fixes are currently tracked in the BTS. Tdeb uploads shall be able to close those bugs. Using a changelog might be the easiest way. During the transition, those bugs will remain. After the transition, those bugs will go away so there should be no need for a closure method. We'll need to rely on i18n.debian.org for translation tracking after Squeeze. --- # TDeb contents ## What goes into a TDeb? (With the exception of debconf templates, untranslated content remains in the original package). * Translations from upstream -/usr/share/locale/*/LC_MESSAGES/*.mo * Other localisation files from upstream - /usr/share/locale/*/LC_*/* Translated content, including: * Translated manpages * Translated info documents (if supported by info) * Translated documentation. With provisos that packages with large amounts of translated documentation and debconf templates would create two tdebs, one minimal tdeb for debconf and one for the rest. * Debconf templates fileNot the config or other related scripts. The regular deb will need to contain a untranslated copy of the templates file, too. See "TDebs and Debconf" below. # TDeb resources. ## Packages and patches The main changes to support TDebs will be concentrated in the archive tools and central packaging tools (dpkg, apt, debhelper). Test packages are available via Emdebian: * http://www.emdebian.org/toolchains/search.php?package=emdebian-tdeb&arch=&distro=unstable * http://packages.debian.org/emdebian-tdeb * http://buildd.emdebian.org/svn/browser/current/host/trunk/emdebian-tools/trunk/tdeb * (SVN is regularly updated) Patches for current tools are handled in repositories for the relevant tools: * http://git.debian.org/?p=users/codehelp/debhelper.git;a=summary * http://git.debian.org/?p=users/codehelp/dpkg.git;a=summary # TDeb Architectures ## TDebs are architecture-independent TDebs must only be used for Architecture-independent data. There will be NO support for Architecture-dependent TDebs outside Emdebian. Any translation system that does not use gettext can choose to use TDebs as long as the translation files are architecture-independent. # TDebs and LINGUAS ## Avoiding changes to the source package Many packages using autotools use the LINGUAS support of gettext but this requires changes within the source of the package - sometimes po/LINGUAS but more commonly configure.ac|in. Changing configure.ac and regenerating the autotools build system completely undermines the objective of TDebs being able to be used independently of maintainer uploads and NMUs. Existing TDeb support ignores the LINGUAS method, therefore: If a $lang.po file exists in a recognisable po directory (${top_srcdir}/po/ or ${top_srcdir}/po-*/, TDeb handlers will process that .po file even if it is not listed in LINGUAS. If the PO file is valid, the generated .mo file will be included into the TDeb. Packages will no longer be able to have unactivated or unused translations. (This is a debhelper / other packaging tool implementation problem, not a dpkg one) As a result of this requirement, the debhelper tdeb tool (dh_gentdeb) handles finding the translations, preparing the binary translation files and moving the translations to suitable directories within the package build. ## TDebs and binary packages The filesystem contents of TDebs and their associated binary packages must be mutually exclusive, so that dpkg doesn't need any special replace handling. We will still need some Replaces for the transition, but that can be handled like any other Replaces. ## Migrating packages to TDeb support Maintainers will need to make a variety of changes to support TDebs: * Replaces Add the recommended $src-tdeb package name with Replaces: $binaries (<< $srcversion) where $srcversion is a fixed string for the version prior to TDebs e.g. Replaces: apt (<< 0.7.19), apt-utils (<< 0.7.19) * Remove translated content from all *.install files in debian/ * Remove any lines in debian/rules that handle translated content * Ensure that dh_gentdeb is called in debian/rules (CDBS will be patched to implement this support automatically). --- # Resolution of corner cases ## TDeb documentation duplication Basing the TDeb on the source package means that the TDeb could include large amounts of translated documentation. This results in a corner case where a package with debconf templates and a large amount of translated documentation would result in the docs being installed merely to obtain the translated templates. In order to resolve this, each source package may have one or more tdebs. If a source package has translations, it must have a tdeb named after the source package (suffixed with -tdeb) and all debconf templates must be placed in it. Such a package should place all architecture independent documentation (even in the native language) into a tdeb. If a package contains documentation which is not always required (for example API documentation or user documentation), the source package may provide additional ${source}-${foo}-tdeb_$version_all.tdeb files. If tdebs are revised by the translation teams, the suffix +t[0-9]+ must be used and all tdebs for the source package must be revised at the same time. # TDebs and package managers Package managers can find out whether a package has a base tdeb by examining the Packages file for Translation-Version: [0-9]+. In the case of Translation-Version: 0, the tdeb name and version is the same as the source file with -tdeb appended. In the case of Translation-Version: 1 or higher, the tdeb name is ${source}-tdeb_$version+t[0-9]+_all.tdeb. Additional tdebs are referenced in the Packages file in the following way: Additional-Translations: ${source}-api-tdeb, ${source}-user-tdeb In cases where a base tdeb is present, package managers *must* call dpkg with the tdeb and the deb in the same invocation in order to ensure that all debconf templates can be extracted before the config script is run. There is no need to unpack in order to obtain the debconf templates - the tdeb merely has to be locatable by debconf which will call apt-extracttemplates and load the translated debconf strings into memory. See TDebs and debconf: # TDebs and debconf apt-extracttemplates is used by debconf's dpkg-preconfigure to extract templates from the not-yet-extracted .debs right after download. This needs to take tdebs into account. Note that the templates are per-binary while tdebs are per-source. Also, the .deb should have non-translated templates. # TDebs and multiple templates files If a source package builds multiple binaries that use debconf, the debian/ directory will contain foo.templates and bar.templates. The TDeb will retain all templates files under the original names. apt-extracttemplates and po-debconf will need to work together to ensure that all templates files are available to debconf so that debconf can selectively load only the templates files required. # Tdebs and usr/share/doc A tdeb needs usr/share/doc/copyright and changelog.Debian and dpkg will create the necessary files, just as with a normal .deb. # Lintian support ## PO translations * No source changes - The Tdeb packages should not add messages not related to a message of the original source package. How to check this? If there is a POT file, then it is possible to do the comparison with the gettext msg* tools. POT file will not be in the tdeb, only in the main source package. When a PO file is modified, lintian can get the POT file of the same directory from the source package. * msgfmt warnings - Modification of upstream PO files should be avoided. A warning could be produced. * File naming rules * Location of PO files in the source package (+t1.diff.gz) * Location of mo files in the binary packages (tdeb) * Location of manpages in the binary packages (tdeb). (current check can be reused) * Name of the manpages in the binary packages (tdeb). An english manpage shall remain, with the same name, in the original binary package. # TDeb maintainers Rather than allow repeat uploads of the same change in multiple languages, coordinate builds of tdebs to make a single upload with as many changes as possible at one time. Translation-Maintainers: in debian/control and Localisation Assistants. # TDeb implementation ## What needs to be done and when? * Archive and tools support (Squeeze) * Debconf translation will form the first TDebs (Squeeze + 1) * Native packages with program translations next * Non-native packageswith Debian maintainers who are also the upstream # Incorporation of the tdiff in the next source package A process will be needed to help maintainers including the tdiff when they prepare a new source package (kind of NMU acknowledgement?) Automated so that the +t1.diff.gz is automatically applied if it exists. Problem still exists with maintainers who don't check apt-get source first. Possible method is to modify uscan and uupdate. When the maintainer prepare a new package, he applies the tdiff and "acknownledge the new translations". (This tdiff has great chances not to be applicable if the upstream source changed) The i18n infrastructure can check that this acknowledgement is really performed (e.g. merge the old translations in the new one and check if the translation statistics changed) Automation in uscan should be possible This issue can be postponed until tdebs appear for non-native packages (squeeze+1) # L10N Infrastructure i18n.debian.net gathers the translation material from the packages. It needs to support tdebs too (tdiff). i18n.debian.net can check that translation material from the tdiff were merged in new versions of the source package i18n.debian.net needs to help "Localisation Assistants" in gathering the new translations before the preparation of a new tdeb # Timeline ## What needs to be done still? * tdeb binary file definition - (ratification and review) * tdeb source file definition - (development and testing) * dpkg class support - (make it easier to selectively install translations for specific locale roots). * dpkg-source building support - (partially implemented in git) * debhelper support for both tdebs explicitly, and also marking files into classes in general (partially implemented via dh_gentdeb in git) * provide a patch to cdbs for running dh_gentdeb in the right place. (Done - only remains for the patch to be filed and applied, after Lenny). * apt/aptitude support for pulling in and removing tdebs * lintian support * debdiff support * devscripts support (debc) * dak support (run away, run away) run faster * support for packages using non-gettext translations. Packages using non-gettext mechanisms include OOo, mozilla, Qt or Java properties, menus, desktop.) We do need the toolchain changes in squeeze so we can enable use of it in squeeze+1. # Changes ----- 2009-03-08 - [Neil Williams] * Convert to DEP. 2009-03-19 - [Neil Williams] * Add a table of contents via ikiwiki