Tibetan Layout Requirements

W3C Group Draft Note

More details about this document
This version:
https://www.w3.org/TR/2024/DNOTE-tlreq-20240730/
Latest published version:
https://www.w3.org/TR/tlreq/
Latest editor's draft:
https://w3c.github.io/tlreq/
History:
https://www.w3.org/standards/history/tlreq/
Commit history
Editor:
(W3C)
Feedback:
GitHub w3c/tlreq (pull requests, new issue, open issues)

Abstract

This document points to resources for the layout and presentation of text in languages that use the Tibetan script. The target audience includes developers of Web standards and technologies, such as HTML, CSS, Mobile Web, Digital Publications, and Unicode, as well as implementers of web browsers, ebook readers, and other applications that need to render Tibetan text.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document points to resources for Tibetan script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and digital publications about how to support languages written using the Tibetan script. The information here is developed in conjunction with a document that summarises gaps where the Web fails to adequately support the Tibetan script.

🚩
This document is a stub awaiting future edits. .
See Tibetan Script Resources instead.

The editor's draft of this document is being developed in the GitHub repository Tibetan (tlreq), with contributors from the W3C Internationalization Interest Group. It is published by the Internationalization Working Group. The end target for this document is a Working Group Note.

To make it easier to track comments, please raise separate issues or emails for each comment, and point to the section you are commenting on using a URL.

This document was published by the Internationalization Working Group as a Group Draft Note using the Note track.

Group Draft Notes are not endorsed by W3C nor its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The W3C Patent Policy does not carry any licensing requirements or commitments on this document.

This document is governed by the 03 November 2023 W3C Process Document.

1. Introduction

1.1 About this document

This document points to resources for Tibetan script layout and text support on the Web and in eBooks. These resources provide information for developers of Web technologies such as CSS, HTML and digital publications, and for application developers, about how to support languages written using the Tibetan script. They include requirements, tests, GitHub discussions, type samples, and more,

The document focuses on typographic layout issues. For a deeper understanding of the Tibetan script and how it works see Tibetan Orthography Notes, which includes topics such as: Phonology, Vowels, Consonants, Encoding choices, and Numbers.

2. Tibetan script overview

Tibetan can be written using two different styles: དབུ་ཅན dbu can with a head, the block style of the Tibetan script used in print, pronounced u.cen; and དབུ་མེད dbu med headless, the cursive style of the Tibetan script used in shorthand and calligraphy, pronounced u.me. This page concentrates on the former. Pronunciations are based on the central, Lhasa dialect.

Historically, Tibetan text was written on loose-leaf sheets called pechas, ( དཔེ་ཆ pé.t͡ɕʰá book, scripture ). Some of the characters used and formatting approaches are different in books and pechas.

Tibetan text runs left to right in horizontal lines.

Words boundaries are not indicated. However, Tibetan words are made up of one or more units called tsheg-bar which are basically equivalent to phonological syllables. The tsheg-bar units are separated using U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG.

These tsheg-bar units are composed of structural elements that include vowel signs and consonants used as prefixes, root characters, subscripts, superscripts, suffixes, and secondary suffixes. A common realisation includes a stack and additional consonants to either side of the root consonant. These may indicate syllable-final consonant sounds, but more often than not they qualify or modify the root value, and are not associated with their nominal sound value. The actual pronunciation of Tibetan is usually much more simple than a typical romanisation would suggest. For example, the word བཀོད kǿː to create is transcribed as bkod.

རྒྱུད་
Figure 1 The single-syllable word cy᷈ː string with an initial stack of three consonants plus a vowel sign. followed by a suffix consonant (to the right).

To write the sounds of the standard Lhasa dialect, Tibetan uses 28 consonant letters (plus their subjoined forms). 6 more letters are used to write Sanskrit.

A distinguishing feature of Tibetan is the set of separate code points for subjoined consonants, used to create consonant stacks. Of the 77 combining characters in the Tibetan block, 48 represent subjoined consonant forms. Unlike many other Indic scripts, the modern Tibetan orthography doesn't use a virama to create stacks.

Tibetan is an abugida with one inherent vowel. When writing the Lhasa dialect, other post-consonant vowels are represented using 4 vowel signs, all combining marks.

There are no pre-base, circumgraph, or multipart vowels in the Tibetan used to write the Llasa dialect (though there are when writing in Sanskrit).

Standalone vowels are written by adding vowel signs to either U+0F60 TIBETAN LETTER -A or U+0F68 TIBETAN LETTER A, depending on the tone.

Sanskrit vowels written in Tibetan use additional vowel signs and combining marks, some of which represent diphthongs, and some of which form circumgraphs or multipart characters, depending on the encoding.

Tone is indicated by the choice of root character and/or its associated prefixes and superscripts.

Modern Tibetan writing uses few punctuation marks or symbols, but the Tibetan script block in Unicode contains many of these.

Tibetan has its own set of numbers.

2.1 Tibetan Syllables

The following diagram shows characters in all of the syllabic positions, and lists the characters that can appear in each of the non-root locations. The two-syllable word in the example is འགྲེམས་སྟོན 'grems-ston ɖɹemton exhibition.

Picture of syllable composition.

Figure 2 Syllable composition in Tibetan

See more information about how the various parts of the tsheg-bar work together.

3. Lists, counters, etc.

Tibetan numerals can be used for list counters. The Tibetan numbers are used in a simple decimal notation, ie. in the same way as European numerals; they differ only in shape.

༡ འ་ཞ་མི་རིགས་ཀྱིས་བསྐྲུན་པའི་ཤིང་གི་ཟམ་པ།

༢ ལོ་ངོ་800ཡི་ལོ་རྒྱུས་ལྡན་པའི་དགོན་རྙིང་ཆོས་པོ་དགོ།

༣ ཆི་ཅ་ཞེས་པའི་ཁྱིམ་རྒྱུད་ཀྱི་བང་སོའི་ཚོགས།

Figure 3 Examples of Tibetan counters in a list.

European numerals can also be used for list counters. The European numeral is followed by a period.

1. འ་ཞ་མི་རིགས་ཀྱིས་བསྐྲུན་པའི་ཤིང་གི་ཟམ་པ།

2. ལོ་ངོ་800ཡི་ལོ་རྒྱུས་ལྡན་པའི་དགོན་རྙིང་ཆོས་པོ་དགོ།

3. ཆི་ཅ་ཞེས་པའི་ཁྱིམ་རྒྱུད་ཀྱི་བང་སོའི་ཚོགས།

Figure 4 Examples of European numeral counters in a list.

A. Change log