BAMAKO FORUM







A Multilingual Namespace for the Cyberspace

Un Nommage Multilingue pour le Cyberespace



Panel on Stakes related to the promotion of
Multilingualism in the Cyberspace

Panel sur les enjeux de la promotion du
Multilinguisme dans le Cyberespace



FORUM INTERNATIONAL DE BAMAKO SUR LE MULTILINGUISME

BAMAKO INTERNATIONAL FORUM ON MULTILINGUALISM

19-20-21 Janvier /January 2009, Bamako, Mali


Francis MUGUET

Legal Notice: Observations and opinions expressed in this presentation, as usual in any scholarly presentation, do not represent the official view of any institution, coalition or entity.





WSIS - SMSI

WSIS recommendations :
53. (Tunis Agenda) We commit to working earnestly towards multilingualization of the Internet, as part of a multilateral, transparent and democratic process, involving governments and all stakeholders, in their respective roles.
In this context, we also support local content development, translation and adaptation, digital archives, and diverse forms of digital and traditional media, and recognize that these activities can also strengthen local and indigenous communities.
We would therefore underline the need to:
Advance the process for the introduction of multilingualism in a number of areas including domain names, e-mail addresses and keyword look-up.
Implement programmes that allow for the presence of multilingual domain names and content on the Internet and the use of various software models in order to fight against the linguistic digital divide and to ensure the participation of all in the emerging new society.
Strengthen cooperation between relevant bodies for the further development of technical standards and to foster their global deployment.
Recommandations du SMSI :
53 (Tunis Agenda) Nous prenons l’engagement d’œuvrer résolument en faveur du multilinguisme de l’Internet dans le cadre d’un processus multilatéral, transparent et démocratique faisant intervenir les pouvoirs publics et toutes les parties prenantes, en fonction de leur rôle respectif.
Dans ce contexte, nous prônons en outre l’utilisation des langues locales pour l’élaboration de contenus, la traduction et l’adaptation, les archives numériques et les diverses formes de médias numériques et traditionnels et nous sommes conscients que ces activités peuvent également renforcer les communautés locales et autochtones.
De ce fait, nous souhaitons insister sur la nécessité :
de faire progresser l’adoption du multilinguisme dans un certain nombre de secteurs : noms de domaine, adresses de courrier électronique, recherche par mot-clé ;
de mettre en oeuvre des programmes autorisant la présence de noms de domaine et de contenus multilingues sur l’Internet, et d’utiliser divers modèles logiciels pour faire face au problème de la fracture numérique linguistique et assurer la participation de tous dans la nouvelle société qui se fait jour.







INTRODUCTION



Multilinguism is one one of the key aspect of the new Information Society

Multilinguistic Tools are required to access Multilingual Content,
otherwise this is incoherent
People have a right to be able to express oneself in their mother tongue.

Linguistic diversity is the key to creativity, because one thinks in a language, and different languages are leading to a richness of concepts.

Linguistic diversity is the key to peace, because it teaches understanding of other cultures.

So lets start to review, with some humor, the current situation.

Let not be afraid of technical aspects

Power is in the Code...






PUNYshment for Domain Names



  • For the Internationalized domain names (IDN), the solution proposed by l'ICANN is based on a Puny Code :
  • Punycode transforms a Unicode chain ( in general UTF-8) into an ASCII chain in a unique and reversible way. ASCII characters stay unchanged, and non-ASCII characters are represented by ASCII characters. For example:
  • académie-française.org might give xn--acadmie-franaise-npb1a.org.
  • http://русский.idn.icann.org gives http://xn—h1acbxfam.idn.icann.org
  • This approach appears as a patch. However often, patches as quick and easy fixes to a specific problem ( IDN for a web browser ), are often ending up into overly complicated and untractable developments, unable to provide general solutions ( Mail, file transfer, etc... ).
  • Unexpected problems such as the Funy Code are now appearing.








and PUNYshed from Mail ?


  • In order to send a mail to secrétaire@académie-française.org, or to иван@русский.рϕ, the problem becomes more complicated, because secrétaire and иван are encoded in UTF-8 et académie-française and русский.рϕ are encoded in Puny Code.

  • The protocol to solve this issue has been only recently finalized in September 2008 ( RFC 5335 , RFC 5336 & RFC 5337 )
  • One may ask the simple question : Should it be possible to conceive a homogeneous , coherent, multilingual UTF-8 DNS system ?






BIND : the Key Software

  • The spotlight has been only the control of DNS root databases
  • Left in the dark : The software tools to access the DNS databases
  • The actual subnetwork of DNS servers is neither owned nor under contract with ICANN, the DNS servers are voluntarily maintained by users ( mostly ISPs, web hosting companies, some registrars,... ).
  • almost all machines in this subnetwork are running the free software ( FreeBSD licence ) called rather aptly BIND which is maintained by the Internet Systems Consortium (ISC).
  • BIND 9 is striving for a strict compliance with IETF standards, ie, with the Request for Comments (RFCs) established by the Internet Engineering Task Force (IETF) , but this is not yet fully achieved.
  • There are few other available DNS server software ( see a Comparison on Wikipedia ) but most often they follow BIND features.







BIND the Key Software (II)

The ISC T-shirt is rather amusing :







Implementing Net4D :
BIND, as a PUBLIC RESSOURCE



Probably because of its academic author, it is very fortunate that BIND allows to carry different resolving services related to different classes of network.

2.1.3 Resource Records : The data associated with domain names are contained in resource records, or RRs. Records are divided into classes, each of which pertains to a type of network or software. Currently, there are classes for internets (any TCP/IP-based internet), networks based on the Chaosnet protocols, and networks that use Hesiod software. (Chaosnet is an old network of largely historic significance.) The internet class is by far the most popular. (We're not really sure if anyone still uses the Chaosnet class, and use of the Hesiod class is mostly confined to MIT.)

This possibility has been mostly ignored except for the proposal made by John C Klensin for a new class that is not limited to ASCII from its initial definitions. This would have allowed to a cleaner Internationalized Domain Name system, instead of relying on the patch that constitutes Punycode. However, the seamless implementation of such a two class system, where records of a new class are used as remedies to the shortcomings of the class "IN" would have created technical difficulties. These problems should not occur when starting with only one class, conceived from the onset for internationaliization.







The RFC 2929

  • CLASS is a two octet unsigned integer containing one of the RR CLASS codes. See section 3.2.
  • DNS CLASSes have been little used but constitute another dimension of the DNS distributed database. In particular, there is no necessary relationship between the name space or root servers for one CLASS and those for another CLASS. The same name can have completely different meanings in different CLASSes although the label types are the same and the null label is usable only as root in every CLASS. However, as global networking and DNS have evolved, the IN, or Internet, CLASS has dominated DNS use.
  • There are two subcategories of DNS CLASSes: normal data containing classes and QCLASSes that are only meaningful in queries or updates.
  • The current CLASS assignments and considerations for future assignments are as follows:
  • Decimal Hexadecimal
    0 0x0000 - assignment requires an IETF Standards Action.
    1 0x0001 - Internet (IN).
    2 0x0002 - available for assignment by IETF Consensus as a data CLASS.
    3 0x0003 - Chaos (CH) [Moon 1981].
    4 0x0004 - Hesiod (HS) [Dyer 1987].
    5 - 127 0x0005 - 0x007F - available for assignment by IETF Consensus as data CLASSes only.
    128 - 253 0x0080 - 0x00FD - available for assignment by IETF Consensus as QCLASSes only.
    254 0x00FE - QCLASS None [RFC 2136].
    255 0x00FF - QCLASS Any [RFC 1035].
    256 - 32767 0x0100 - 0x7FFF - assigned by IETF Consensus.
    32768 - 65280 0x8000 - 0xFEFF - assigned based on Specification Required as defined in [RFC 2434].
    65280 - 65534 0xFF00 - 0xFFFE - Private Use.
    65535 0xFFFF - can only be assigned by an IETF Standards Action.
    This leaves the possibility of at least 32512, and at most 216= 65536 - 5 ( taken by the IN, CH, HS, None, Any classes ) = 65531 classes ( among which 255 for private use ) that could be used to carry other DNS services, if QCLASS / CLASS distinction is not retained.







More about Classes
  • In fact, in May 2001, ICANN recommended the use of classes instead of the so-called "alternate root servers":
  • Moreover, it should be noted that the original design of the DNS provides a facility for future extensions that accommodates the possibility of safely deploying multiple roots on the public Internet for experimental and other purposes. As noted in RFC 1034, the DNS includes a "class" tag on each resource record, which allows resource records of different classes to be distinguished even though they are commingled on the public Internet. For resource records within the standard root-server system, this class tag is set to "IN"; other values have been standardized for particular uses, including 255 possible values designated for "private use" that are particularly suited to experimentation.
  • As described in a recent proposal within the IETF, this "class" facility allows an alternative DNS namespace to be operated from different root servers in a manner that does not interfere with the stable operation of the existing authoritative root-server system. Those that have deployed alternative roots have not used a different class designation, however, choosing instead to have their resource records masquerade as emanating from the standard root, and creating the potential for disruption of other's operations.
  • Another view it is that the actual subnetwork of DNS servers ( in fact a P2P network, before the term was coined ) should be able carry several DNS systems, in other words to "degroup" the "lines" of this "common carrier" to introduce "competition".







TECHNICAL IMPLEMENTATION ASPECTS

  • At the present time, only DNS look-up utilities ( such as dig, host ) associated with BIND allow to query a DNS server with the field class. Current browsers do not allow that. With dig, one may query a DNS server while specifying the class field. The default query class (IN for internet) is overridden by the -c option. class is any valid class, such as HS for Hesiod records or CH for Chaosnet records.
  • From the point of view of users : what is of utmost importance, and what differentiates this proposal from the "alternate root servers" of ill repute, is that users do not have to specify a DNS server different from the one provided by his/her ISP.
    1. her/his browser may query the DNS server with the field class.
    2. the ISP DNS server is updated to a future version of BIND that may carry different classes.
  • In order to simplify the query by a browser and other programs ( mail, file transfer, etc.. ) the definition of an URL ( URI ) must be generalized and updated with a field provided for the class parameter.
  • For example : the domain wikipedia in the new gTLD .open with the class net4d could become http://4d%fr.wikipedia.open
  • Another example, the cyrillic domain Москва in the cyrillic gTLD ро in the cyrillic class Сеть ( network in Russian), could be written as http://ст%Москва.ро .







TECHNICAL IMPLEMENTATION ASPECTS (II)

  • The thorny problem of homographs and ensuing phishing and litigations are avoided. For example, it could be possible to ensure that in the the Cyrillic class Сеть , domain names are only written with Cyrillic characters. Mixing of Latin and Cyrillic is simply not allowed when registering a domain name in the Cyrillic class Сеть .
  • A browser may be configured by the user, so that his/her browser use by default, a certain class. corresponding to a certain language to avoid any confusion.
  • In order that the browser ( such as Mozilla ) may query DNS servers with the field class, the modification to the browser is relatively minor, and less complicated that the implementation of the Puny code that is TLD dependent ( Mozilla IDN-enabled TLDs). In its query the browser must include the field class. After this modification, there is no need of plug in or a specific client. For old browsers, users may go through a portal, specific to each class.
  • From the side of a web server such as Apache, in order to enable virtual hosting, modifications are also relatively minor.






TECHNICAL IMPLEMENTATION ASPECTS (III)

  • Features available for the class IN, should be available in principle, in agreement with the RFCs. but RFCs compliance is not yet fully achieved in current version 9 of BIND. For all practical purposes, the difficulty would depend on the way how the BIND 9 code is written. While "parametrization" of existing subroutines should be preferred, a brute force method is to replicate code for a new class.
  • ISC is in the process of rewriting BIND from scratch for version 10, in a modular way, that would greatly simplify the task.
  • Modifications required in the free software Mozilla et Apache can be written quickly, and most importantly, included easily in the next official release. For proprietary software, this is going to depend on the good will of manufacturers, however considering the domination of Apache among web servers, and the importance of Mozilla among web browsers, one should expect that they are not going to afford to lag behind.
  • Last but not least, each class may use of its own distinct network of root servers.




Net4D, networks to empower
the second generation of the Web: the Semantic Web

  • Net4D are another classes of Network. Like Hesiod and Chaosnet, ICANN has no jurisdiction on this network, only on the class "IN".
  • There could be other classes in competition with the ICANN IN class and the NET4D classes. Fair and ethical competition is welcome.
  • Net4D classes are not designed to provide similar minimal services as ICANN, it has in mind to provide value added services, in view to allow linguistic diversity and empower the Semantic Web.
  • Net4D domain holders should abide by a specific ontology, as a contractual requirement to the effect of :
  • Establishing pollution free zone concerning metadata, and providing pathway for the interoperability of metadata concerning specific activities following the Semantic Web approach.
  • Providing a Open Digital Resource Identifier system that is clearly needed for future evolution of the Web and to authenticate metadata
  • Providing Open Digital Resource Identifiers (ODRI) that are P2P friendly, that is facilitating a maximal flow via P2P, therefore allowing sites with little bandwidth to exchange vast amount of data.





Empowering the Semantic Web for Multilingualism


It is proposed to create Linguistic SWgTLDs or LSWgTLDs. An extension shall be assigned to each language according to the three-letter code ISO 639-2 ) so that sites or sites' versions written in specific languages can be easily found and identified. It would facilitate greatly the task of search engines and would foster linguistic diversity.
The main points of the breakthrough are the following :
  1. Facilitate exchange between sites in different languages that share the same structure of meta-data, the same meta-language.
  2. Help automatic translation.






A more efficient Automatic Translation


Automatic translation would be much improved if automatic tools could work with several human certified translations of the same text.
For example, if the same document has been available in English and in French by the authors on the same site, and translated by human users in Russian and Korean on other sites, it would be tremendous advantage for automatic translation tools to have access and make use of all existing versions in different languages of the same document.
For example "Société Civile" would be translated in yet other languages such as Spanish as " Sociedad Civil" ( meaning Civil Society not Civil Company or Business ! ), with the help the correct English version.
Of course, it is required that the translation tools could retrieve and identify the various versions at different locations, therefore the need follows for an identifier, as well as standardized metadata. SWgTLDs could be the keys to practical not just elite multilinguism on the Web.







Saving Language Diversity through Automatic Translation





Language Diversity without interfacing/translating = Intellectual Isolation

Language Diversity with interfacing/translating = Intellectual Enrichment

People would be more eager to keep a language alive if they know they

are going to be able to communicate with other linguistic groups.





GOVERNANCE

  • The recommendations of the Dynamic Coalition for Linguistic Diversity ( www.linguis.tk ) at the Second IGF meeting 2007 ( 12 - 15 November 2007 Rio de Janeiro, Brazil )
    The coalition for linguistic diversity
    Draws attention to the need of assisting in an institutional, technical and financial way, structures that are working in the field of linguistic diversity
    Raises awareness of Internet technical and political governance bodies about the necessity of working in close cooperation with the structures promoting linguistic diversity.
    Emphasizes that this cooperation should be multi-stakeholder, open and transparent.


    La coalition sur la diversité linguistique
    Attire l'attention sur la nécessité d'accompagner d'une manière institutionnelle, technique et financière les structures qui oeuvrent dans le domaine de la diversité linguistique
    Sensibilise les organes techniques et politiques de la gouvernance de l'Internet sur la nécessité de travailler en étroite coopération avec les structures qui oeuvrent pour la diversité linguistique.
    Souligne que cette coopération doit être multi-partenariale, ouverte et transparente.






A NEW GOVERNANCE ?

  • If the current DNS monopolistic situation no longer prevails. If a fair competition is introduced over all classes, then it is possible to let ICANN evolves to its own destiny, This would avoid international political tensions.
  • Concerning Net4D and other classes governance, it is suggested to consider a transparent, inclusive, multi-stakeholder partnership, including intergovernmental and governmental organizations, technical operators, businesses, academia, civil society, fully recognized within an international public law context, according to the UNMSP proposal.
  • The role of the W3C that researches and develops, for the public good, open (non-proprietary) standards, protocols and languages for the Semantic Web should be recognized, and a substantial part of financial revenues, originating from the sales of semantic domains, should be allocated to support W3C activities.
  • The role of MAAYA whose goal is to promote; with a multi-disciplinary approach, multilingualism should be recognized, and a substantial part of financial revenues, originating from the sales of linguistic domains, should be allocated to support MAAYA activities.





CONCLUSIONS


  • DNS 1.0 --> Monopoly : ICANN, Web 1.0 HTML, US parentage, English only.

  • DNS 2.0 --> Open competition including inter alia Net4D , Semantic Web,

  • An open, coherent approach to linguistic diversity

  • What should be the recommendations of the Linguistic Diversity Coalition following the IGF Hyderabad meeting ?