close

Unicode CLDR Project

News

CLDR Mission

To build and maintain the most trusted and comprehensive repository of locale data, reflecting common usage across the world, through active participation from organizations and community members.

What is CLDR?

CLDR (Common Locale Data Repository) supplies key information and structures critical for programs and operating systems around the world to ensure that they feel natural, no matter which language users speak or where they live.

For example, imagine looking at a list of files on your mobile phone. You’ll see the format of the dates (like the creation date), numbers, units (like the size of the file), and the alphabetical order of the files. All of these will vary depending on your language — and all of these are usually supplied by CLDR.

Just as Unicode has standards for handling characters, writing systems, and their properties, CLDR is focused on languages and their regional variations (collectively referred to as locales). Over 100 languages are supported, with more added each release.

CLDR consists of three main components:

  1. A curated collection of structured data used by implementations
  2. A specification, UTS #35: Unicode Locale Data Markup Language (LDML), documenting the structure and usage of that data (via defined algorithms), including conformance requirements and guidelines
  3. Code used to collect that data from language specialists, guide those specialists in supplying the data, verify the validity and consistency, and process it into different formats for use by software developers

Formatting dates, numbers, currencies, and units of measurement is far more complicated across different languages and regions than most people recognize. Part of the goal of CLDR is to provide the foundation for APIs that handle that complexity without developers needing to know about 100+ languages. It is the source for enabling software that needs to support languages ranging from Arabic to Zulu.

CLDR continues to add additional features each year, such as support for more complex grammatical and cultural variations needed in many countries. Among many other things, it also describes how plurals work in various languages and variations in how lists are alphabetized. CLDR data and standards are vetted by native speakers and linguistic experts, and validated by Unicode’s diverse membership.

The standards, data, and algorithms that make up CLDR provide the basis for international language support and cultural adaptation of software for all manner of devices and software globally, with support for over 100 distinct languages.

Who uses CLDR?

CLDR is incorporated into all modern operating systems and browsers; into many programming languages such as Java, C#, .NET, Swift, Javascript; and into most application programs. Often the usage is indirect; an application uses an operating system service (eg, to format a date), which calls an ICU library (Unicode’s production code for C, C++, Java, and Rust), which then uses CLDR. There are other libraries for other programming languages, such as Babel (Python), TwitterCLDR (Ruby), and Unicode::CLDR (Perl). Some CLDR data is used more directly. For example, the emoji short names and search keywords often form the basis for character pickers in applications and virtual keyboards.

Some of the companies and organizations that use CLDR are:

There are other projects which consume cldr-json directly, see here for a list.

How to Use?

Most developers will use CLDR indirectly, via a set of software libraries, such as ICU, Closure, or TwitterCLDR. These libraries typically compile the CLDR data into a format that is compact and easy for the library to load and use.

For those interested in the source CLDR data, it is available for each release in the XML format specified by UTS #35: Unicode Locale Data Markup Language (LDML). There are also tools that will convert to JSON and POSIX format. For more information, see CLDR Releases/Downloads.

How to Contribute?

CLDR is a collaborative project, which benefits by having people join and contribute. There are multiple ways to contribute to CLDR.

Translations and other language data

CLDR has an online tool to gather data, the Survey Tool. The Survey Tool is usually open once a year to gather data for new structure, and make corrections in previously-released data.

Code and Structure

The CLDR tooling supports the interactive Survey Tool, plus all of the tooling necessary to test and process the release. Programmers interested in contributing to the tooling are welcome; they may also be interested in contributing to ICU, which uses CLDR data. For more information, see Development.

CLDR covers many different types of data, but not everything. For projects which may cover other types of data, see Other Projects.

Tickets

People may file tickets with bug fixes or feature requests. Once a ticket is approved, they can also create pull requests on GitHub.

Who has contributed?

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

What is the Schedule?

CLDR has a regular schedule, with two cycles per year. There is a consistent release schedule each year so that implementations can plan ahead. The actual dates for each phase are somewhat adjusted for each release: in particular, the dates will usually fall on Wednesdays, and may change for holidays.

The two important periods for translators are:

The details for the current release are found in Current CLDR Cycle.