Editors
- Asgeir Frimannsson
Copyright © 2004, 2005 XLIFF Tools Project contributors
Contributors
- Josep Condal
- Asgeir Frimannsson (Queensland University of Technology, Red Hat APAC)
- Tim Foster
- David Fraser
- Bruno Haible
- Rodolfo M. Raya (Heartsome Holdings Pte Ltd)
Revision History
- Draft 2 (In progress)
- Draft 1 31 Jan 2005 Asgeir Frimannsson
Initial Draft. Available at http://xliff-tools.freedesktop.org/snapshots/xliff-po-guide-en/
Abstract
This document defines a guide for mapping the GNU Gettext PO file format to XLIFF (XML Localisation Interchange File Format.
This document is a proposal only, and will be submitted to the XLIFF Technical Committee for review and possible inclusion as the upcoming XLIFF 1.1 PO Representation guide.
Table of Contents
1. Introduction
This document aims to define a common mapping between the GNU Gettext PO and XLIFF file formats. This document is intended as a guide, and not a specification. By following the recommendations in this guide, localisation tool vendors can ensure that their tools work with this file format in a consistent and translator-friendly way across filter implementations.
This guide is not intended to provide filter-independent XLIFF representations of PO, and hence does define how to structure e.g. id attributes nor does it define the structure of an optional XLIFF skeleton. The purpose of this guide is to ensure that common PO attributes are mapped consistently across filter implementations.
This guide was developed as part of the XLIFF Tools project on freedesktop.org. Please visit the project web site at http://xliff-tools.freedeskop.org/ for the latest version of this document.
2. Mapping Guide
2.1. General Structure
Example: General XLIFF Structure for PO files
<?xml version="1.0" ?>
<xliff version="1.1" xmlns="urn:oasis:names:tc:xliff:document:1.1">
<file original="example.po" source-language="en-US" datatype="po">
<header>
...optional header information ...
</header>
<body>
... translation units ...
</body>
</file>
</xliff>
Each PO file maps to one XLIFF <file> element. XLIFF representations of PO files should have the datatype attribute set to "po", and the original attribute set to the name of the PO file. The XLIFF file may also contain some meta-data from the PO header in the <header> or in <trans-unit> elements.
The source-language attribute should by default be set to "en-US", as GNU standards defines American English as the standard language for GNU development.
The XLIFF <body> element contains translation units, grouped by PO domains using hierarchal <group> elements.
2.2. PO header
Example: Elements of a PO header
# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2004-11-11 04:29+0900\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=n>1;\n" "X-User-Defined-Var: value\n"
The PO header contains both technical and informative meta-data, and can in addition contain user-defined meta-data. XLIFF does not support most of the meta-data stored in PO headers, and filter implementations may choose to implement support for the PO header in various ways, depending on the work flow in use.
There are two recommended approaches for handling the PO Header:
2.2.1. Approach 1: Leave header out
All PO header information may be left out of the XLIFF file and stored in the skeleton. This eliminates the possibility for translators to modify header information, and the PO header will stay unchanged in the XLIFF based localisation process.
In some circumstances PO is used only as a temporary file format in the localisation process. Messages are extracted to PO Template files, and these are used 'on the fly' to generate XLIFF files. In these circumstances there is no need to store PO header information, and the header can be ignored. When these XLIFF files are converted back to PO after translation, the filters should be able to generate the necessary PO header elements such as Plural-Forms and Content-Type for inclusion in the MO file.
2.2.2. Approach 2: Using a <trans-unit> element
This approach involves storing the whole PO header as a XLIFF <trans-unit> element; with the 'restype' attribute set to 'x-gettext-domain-header'. In PO the header is identified by a empty 'msgid', and the header is stored in the 'msgstr' field. In converting to XLIFF, we copy the value of 'msgstr' to both <source> and <target>, ensuring that translators can modify the header without loosing track of the original content.
By treating the PO header as a translation unit, translators will be able to edit the PO header in the translation window of their editor of choice, eliminating the need for custom tools to modify the header.
Example: Treating PO header as normal translation unit.
<trans-unit id="message_header" restype="x-gettext-domain-header" approved="no">
<source xml:space="preserve">
Project-Id-Version: MyPackage 1.0
Report-Msgid-Bugs-To: foo@example.com
POT-Creation-Date: 2004-11-11 04:29+0900
PO-Revision-Date: 2005-02-01 12:00+0900
Last-Translator: Foo Bar <foo@example.com>
Language-Team: My Language <LL@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Plural-Forms: nplurals=2; plural=n>1;
</source>
<target xml:space="preserve">
Project-Id-Version: MyPackage 1.0
Report-Msgid-Bugs-To: foo@example.com
POT-Creation-Date: 2004-11-11 04:29+0900
PO-Revision-Date: 2005-02-01 12:00+0900
Last-Translator: Foo Bar <foo@example.com>
Language-Team: My Language <LL@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Plural-Forms: nplurals=2; plural=n>1;
</target>
<note from="po-file">
SOME DESCRIPTIVE TITLE.
Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
This file is distributed under the same license as the PACKAGE package.
FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
</note>
</trans-unit>
2.2.3. Discussion
While this guide does not recommend one approach over the other, it is important to understand the limitations and implications of both approaches.
Treating the PO header as a translation unit is not totally faithful to the XLIFF Specification, in that translation units are meant for translatable data, and not for storing changing meta-data such as the PO header. But, this can be seen as a lesser-of-evils approach, in that translators then are able to modify PO header information, something that's necessary in Gettext based localisation processes.
In the long term however, it is recommended to use localisation processes based around XLIFF, and not use PO as a persistent file format in the localisation process. This would eliminate the need for storing the PO header in the localisation process.
2.3. Domain
Each domain is enclosed in a group element, with the domain name as the resname attribute:
Example: Representing PO files with multiple domains
domain domain-name-A white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string msgstr translated-string domain domain-name-B white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string msgstr translated-string
becomes
<group id="##1" restype="x-gettext-domain" resname="domain-name-A">
<trans-unit id="#1">
...
</trans-unit>
</group>
<group id="##2" restype="x-gettext-domain" resname="domain-name-B">
<trans-unit id="#n">
...
</trans-unit>
</group>
When the domain is not specified in a PO file, entries are not grouped into domains.
In some cases the first entries of a PO files does not belong to any domain (belonging to the default domain), but later in the file a domain is specified. In these situations, the first translation units are not grouped into any domain, but later entries where a domain is specified are grouped:
Example: Representing PO files where first entries belong to default domain
white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string msgstr translated-string domain domain-name-A white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string msgstr translated-string
becomes
<trans-unit id="##1">
...
</trans-unit>
...
<group id="##1" restype="x-gettext-domain" resname="domain-name-B">
<trans-unit id="##2">
...
</trans-unit>
</group>
2.4. PO entry
2.4.1. Header
Example: Elements of a PO entry header
white-space # translator-comments #. automatic-comments #: reference... #, flag...
2.4.1.1. Translator Comments
Translator comments are represented in XLIFF as a note element with the from attribute set to "po-file". Multi-line PO comments are merged into one note element with added newline characters.
Example: Handling of translator comments
# translator-comment-text msgid untranslated-string msgstr ""
becomes
<trans-unit id="##1"> <source xml:space="preserve">untranslated-string</source> <note from="po-file"">translator-comment-text</note> </trans-unit>
PO comments can also be stored as context information in XLIFF <context-group> elements, as in the following example.
Example: Handling of translator comments as context information
# translator-comment-text msgid untranslated-string msgstr ""
becomes
<trans-unit id="##1">
<source xml:space="preserve">untranslated-string</source>
<context-group name="x-po-entry-header">
<context type="x-po-transcomment">translator-comment-text</context>
</context-group>
</trans-unit>
2.4.1.2. Automatic Comments
Automatic comments are extracted from source files, and usually describe the context of the translation. These comments are in XLIFF represented as a context element within a context-group. As with translator comments, multi-line comments are merged into one context element with added newline characters.
Example: Handling of automatic comments
#. auto-comment-text msgid untranslated-string msgstr "" #. auto-comment-text-line1 #. auto-comment-text-line2 msgid untranslated-string msgstr ""
becomes
<trans-unit id="##1">
<source xml:space="preserve">untranslated-string</source>
<context-group name="x-po-entry-header#1" purpose="information">
<context context-type="x-po-autocomment">auto-comment-text</context>
</context-group>
</trans-unit>
<trans-unit id="##2">
<source xml:space="preserve">untranslated-string</source>
<context-group name="x-po-entry-header#2" purpose="information">
<context context-type="x-po-autocomment">auto-comment-text-line1
auto-comment-text-line2</context>
</context-group>
</trans-unit>
Each context group must have a different name. The filter must ensure uniqueness.
2.4.1.3. Reference
References also describe the context of a translation unit. In XLIFF each reference is represented in a context-group element with the context child-elements holding the source file name and position within the file. When the translation unit occurs in multiple positions within a file, one context-group element is created for each position.
Example: Handling of references
#: example.cpp:134 example.cpp:343 example2.cpp:23 msgid untranslated-string msgstr translated-string
becomes
<trans-unit id="##1">
<source xml:space="preserve">untranslated-string</source>
<target xml:space="preserve">translated-string</target>
<context-group name="x-po-reference#1" purpose="location">
<context context-type="sourcefile">example.cpp</context>
<context context-type="linenumber">134</context>
</context-group>
<context-group name="x-po-reference#2" purpose="location">
<context context-type="sourcefile">example.cpp</context>
<context context-type="linenumber">343</context>
</context-group>
<context-group name="x-po-reference#3" purpose="location">
<context context-type="sourcefile">example2.cpp</context>
<context context-type="linenumber">23</context>
</context-group>
</trans-unit>
As noted in previous section, each context group must have a different name and the conversion tool must ensure uniqueness.
2.4.1.4. Fuzzy Flag
Non-empty entries marked as fuzzy are added as target elements with the state attribute set to needs-review-translation. If the fuzzy flag is set and the translation is empty, the flag is simply ignored.
Also note that the "approved" attribute in <trans-unit> is by default "no", and a translation is not approved until this attribute is set to "yes".
Example: Handling of fuzzy flag
#, fuzzy msgid "Hello world" msgstr "Hello Europe" #, fuzzy msgid "Hello America" msgstr ""
becomes
<trans-unit id="##1"> <source xml:space="preserve">Hello World</source> <target xml:space="preserve" state="needs-review-translation">Hello Europe</target> </trans-unit> <trans-unit id="##2"> <source xml:space="preserve">Hello America</source> </trans-unit>
When backconverting to PO, the "fuzzy" flag is set unless the "approved" attribute in <trans-unit> is set to "yes".
2.4.1.5. C-format Flag
Where entries are flagged with "c-format" (or e.g. "php-format" for php source-files), c-parameters are enclosed in <ph> elements. This makes it possible for tools to check that the parameter is present, and in addition Translation Memory matches can be improved (E.g. the "%d" and "%s" parameters give the same result). If the translation unit contains a "no-c-format" flag, parameters are ignored.
Example: Handling of c-format flag
#, c-format msgid "You have %d files" msgstr "" #, no-c-format msgid "You have %d files" msgstr ""
becomes
<trans-unit id="##1"> <source xml:space="preserve">You have <ph id="1" ctype="x-c-param">%d</ph> files</source> </trans-unit> <trans-unit id="##2"> <source xml:space="preserve">You have %d files</source> </trans-unit>
2.4.2. Non-plurals
Each non-plural translation unit is encapsulated in a single XLIFF trans-unit element. The "xml:space" attribute is set to "preserve" to protect formatting and white-space.
Example: Non-plural mapping of entries
msgid untranslated-string msgstr translated-string
becomes
<trans-unit id="##1"> <source xml:space="preserve">untranslated-string</source> <target xml:space="preserve">translated-string</target> </trans-unit>
2.4.3. Plurals
XLIFF does not support plural forms directly, but this can be accomplished by encapsulating each PO plural translation unit in a XLIFF group element.
Example: Basic plural mapping
msgid untranslated-string-singular msgid_plural untranslated-string-plural
becomes
<group restype="x-gettext-plurals">
<trans-unit id="##1a">
<source xml:space="preserve">untranslated-string-singular</source>
</trans-unit>
<trans-unit id="##1b">
<source xml:space="preserve">untranslated-string-plural</source>
</trans-unit>
</group>
The number of trans-unit elements inside a plural group differs depending on how many forms the target language has. XLIFF files are therefore post-processed for each language, and the correct number of trans-unit elements are added to the plural group.. This means that the filters must have knowledge of how many plurals there are for each target language.
The following example shows a plural group for a language with 3 different forms (E.g. Russian, Slovak, Ukrainian):
Example: Plurals where target language has two or more plural forms
msgid untranslated-string-singular msgid_plural untranslated-string-plural msgstr[0] translated-string-case-0 msgstr[1] translated-string-case-1 ... msgstr[n] translated-string-case-n
becomes
<group restype="x-gettext-plurals">
<trans-unit id="##1a">
<source xml:space="preserve">untranslated-string-singular</source>
<target xml:space="preserve">translated-string-case-0</target>
</trans-unit>
<trans-unit id="##1b">
<source xml:space="preserve">untranslated-string-plural</source>
<target xml:space="preserve">translated-string-case-1</target>
</trans-unit>
<trans-unit id="##1c">
<source xml:space="preserve">untranslated-string-plural</source>
<target xml:space="preserve">translated-string-case-n</target>
</trans-unit>
</group>
When the target language only has one form (e.g. Japanese, Korean), the plural source element is still included to support back-conversion, with the "translate" attribute set to "no":
Example: Plurals where target language has only one form
msgid untranslated-string-singular msgid_plural untranslated-string-plural msgstr[0] translated-string-case-0
becomes
<group restype="x-gettext-plurals">
<trans-unit id="##1a">
<source xml:space="preserve">untranslated-string-singular</source>
<target xml:space="preserve">translated-string-case-0</target>
</trans-unit>
<trans-unit id="##1b" translate="no">
<source xml:space="preserve">untranslated-string-plural</source>
</trans-unit>
</group>
2.5. Obsolete Entries
Obsolete entries are translation units that are no longer present in the source-files, and are therefore commented out when a PO file is updated. These entries are used by Gettext only if the translation-unit re-appears in the project, and are also used for fuzzy matching by the 'msgmerge' tool.
Example: An obsolete entry
# translator-comments #~ msgid untranslated-string #~ msgstr translated-string
These entries have no place in an XLIFF file, and should be stored in the skeleton.
3. Other issues
3.1. Character Set
The PO character set is defined in the Content-Type PO header field. When converting between XLIFF and PO, filters can:
- Ignore character set handling and assume PO file character set is same as XLIFF character set (defaults to UTF-8)
- Add character set logic to filters
It is up to the individual implementation to decide how character sets are handled.
3.2. Language Codes
XLIFF represented PO files use the same language code scheme as the XLIFF Specification.
4. References
[XLIFF]
XLIFF 1.1 Specifications. OASIS XLIFF Technical Committee, October 2003.
[PO]
Gettext PO File Format. Free Software Foundation, May 2002.
[RFC 3066]
RFC 3066 Tags for the Identification of Languages. IETF (Internet Engineering Task Force), Jan 2001.


