Supported data formats

Introduction

The Platform's formats are derivations of popular formats used by application Developers. Each description states which format it is based upon (with links to the formal specification) and gives the Platform's additional restrictions. The Platform will parse your data files using regular expressions. There are some simplification rules; otherwise we won't be able to read what you give to us or, worse, we may read it incorrectly. While RegExp may not be the perfect tool for parsing complex formats, we find it effective to use in capturing the structure of your input files, so that we can generate output files with the same look, preserving any whitespace which you use to give them more readability.

Terms and concepts

  • 'Code-text pair' – pair of strings, each code is unique and has associated text which is to be translated
  • 'Ignored line' – line that is not parsed for code-text pairs. It is recreated in output files in its original form.
  • 'Reference entry' – code-text pair which is ignored because text has special form that gives it non-translateable meaning

Common description

Currently we support three formats. They have some properties in common:

  • Data is textual (you can edit it in Notepad-like editor).
  • Lines containing only whitespace are ignored.
  • Same basic rules for parsing, only changing RegExp's for recognizing character strings. Kinds of RegExp's (Java-style):
    • whitespace, always \s*
    • comment
    • code-text
    • other
    • reference

Parsing algorithm

In Java-style pseudo-code:

for each line in file:
    if line is 'whitespace':
        continue
    if line is 'comment':
        continue
    if line is 'code-text':
        extract code and text
        if text is 'reference':
            continue
        save code and text
        continue
    if line is 'other':
        continue
    report error

Guide to examples

Each format's description contains three types of examples:

  • Correct data – data which is parsed successfully
  • Correct but bad data – data which is parsed, but doesn't give the result you want
  • Incorrect data – data which cannot be parsed with AFL's additional restrictions (but is correct according to formal specification of base format).

We do not present examples of data that is incorrect according to formal specifications, assuming you can figure out the errors by yourself.

Properties (simplified)

Name: properties

Formal description (XML is not supported)

Additional the Platform's restrictions

  • Each code-text pair is contained within single line
  • Trailing spaces in text are ignored
  • Code-text pair where text (after un-escaping) starts with $ character are ignored

RegExp's

kind

Java-style RegExp

comment

\s*[#!].*

code-text

(\s*)([^=:\s\\]*(?:\\.[^=:\s\\]*)*)(\s*[=:]?\s*)(.*?)(\s*)

other

none (never match)

reference

\$.*

Examples

Correct data

# Sample translation data
  ! Leading spaces and both comment characters are supported

firstKey=Text
secondKey=Second text with    spaces
  spacesAroundKeyAreIgnored   =   Also in text
keySeparatedByColon : fromText
keySeparatedBySpace fromText
complex.and,stuffed*long(key   =  Text with : special = characters
some\:escaped\=key\ characters = Here begins text
ignoredDollarEntry=$textBeginsWithDollarSign
anotherIgnoredDollarEntry=    $remeberThatLeadingSpacesAreIgnored
unsafeKeyWithEmptyText
safeKeyWithEmptyText=

Beware of input entries where text is empty and there is no separator character, for example key. This will result in badly constructed output entry of the form keyText instead of key=Text (separator character won't be added). Last two lines in above example show the difference.

Correct but bad data

brokenEntry=Here we break \
  entry across \
    multiple lines

Note: above example won't report any error, all three lines are correct:

  • 1st line: code is brokenEntry, text is Here we break \
  • 2nd line: code is entry, text is across \
  • 3rd line: code is multiple, text is lines

Incorrect data

None - this format always parses successfully.

Android (simplified)

Name: android

Formal description

Additional AFL's restrictions

  • Only single-whole-line comments are allowed
  • Elements other than <string> are ignored
  • Each <string> element is contained within single line
  • Code-text pairs where text starts with @ character are ignored

Warning

Currently, the Platform does not support files other than strings.xml. You should put all <string> elements inside strings.xml. Use one file for each language.

Warning

Currently, the Platform does not support HTML tags in Android's text. Even if you allow them, they will be escaped using entities. This effectively means that any implicit formatting will be lost and your users will see raw tags unless you explicitly convert text to HTML.

RegExp's

kind

Java-style RegExp

comment

\s*\<\!\-\-.*\-\-\>\s*

code-text

(\s*\<string\s+name\s*=\s*")([^"]*)("\s*\>\s*(?:"+|'+)?)(.*?)((?:"+|'+)?\s*\</string\s*\>\s*)

other

.*

reference

@.*

Examples

Correct data

<?xml version="1.0" encoding="utf-8"?>
<resources>
  <!-- comment -->
  <string name="firstKey">firstText</string>
  <string name  = "some  "  > whitespace   </string  >
  <string name="complex.and ,Space-Filled Key">Text</string>
  <string name="key">Text with weird ,^9$*].,/; characters</string>
</resources>

Correct but bad data

<?xml version="1.0" encoding="utf-8"?>
<resources> <string name="firstKey">firstText</string>

  <!-- comment --> <string name="secondKey">secondText</string>

  <string
    name=
      "multilineEntryKey">multiline
        entry
          text</string>

</resources>

Note: each line in above example will be captured by 'other' pattern so it won't cause errors, but won't be parsed as code-text pair either.

<?xml version="1.0" encoding="utf-8"?>
<resources>

<!-- multiline
  <string name="key">Beware of multiline comments with string elements</string>
comment -->

</resources>

Note: you should avoid multi-line comments, because their contents could be parsed as code-text pairs (as in above example).

Incorrect data

None - this format always parses successfully.

Note: In order to be able to parse elements other than comments and code-text pairs, the parser simply ignores everything that does not match previous patterns. So, <?xml ... ?> and <resources> lines are accepted, but do not create any code-text pairs. In fact, this allows you to include elements other than <string>, notably <string-array> or <plurals>. The Platform does not support them (they won't be recognized) and does not report errors (these elements are ignored).

iOS (simplified)

Name: ios

Formal description (property list format is not supported)

Additional the Platform's restrictions

  • Only single-whole-line comments are allowed.
  • Each code-text pair is contained within a single line.

RegExp's

kind

Java-style RegExp

comment

\s*/\*.*\*/\s*

code-text

(\s*")([^"\\]*(?:\\.[^"\\]*)*)("\s*=\s*")([^"\\]*(?:\\.[^"\\]*)*)("\s*;\s*)

other

none (never match)

reference

none (never match)

Examples

Correct data

/* comment */
"This is key" = "This is text";
  /*    lots of    whitespace   */
  "  Key with *)%^#& fancy characters"=    "@Text *can ^be &complex !too" ;

Correct but bad data

None - this format always parses successfully or fails verbosely.

Incorrect data

/* multiline
  comment */

"Multiline entry key"
  = "Multiline entry text"
    ;