Workshop on Natural Language Processing of Minority Languages with few computational linguistic resources Workshop to be held at TALN 2003 TALN 2003 Traitement Automatique des Langues Naturelles VVF - Batz-sur-Mer (44) France June 11- 14 2003 organized by IRIN (Computer Sciences Institute of University of Nantes) in collaboration with the laboratories ACIDORE and VALORIA of the University of Bretagne-Sud and IRISA, INRIA, Rennes. http://www.sciences.univ-nantes.fr/irin/taln2003/ BACKGROUND Over the last few years, minority and small languages have attracted considerable attention. Projects aiming at the revitalization, standardization and linguistic normalization have been initiated to promote usage of these languages and contribute to their survival. Speakers of smaller languages have gained awareness that their languages belong to the world's cultural heritage, and are becoming more and more inclined to use their native tongues at a broader scale. The rising number of web-pages in lesser-used languages demonstrates this fact. PROBLEM DESCRIPTION This workshop will approach the problem of minority languages from the computational point of view. The workshop will focus on minority languages with few computational linguistic resources, e.g. Occitan, Hakka, Corse, Nahuatl, including specific minority languages as sign languages. Minority languages with rich computational linguistic resources, as for example Catalan, are not excluded from the workshop as they may function as an example of a successful minority language. Papers related to majority languages are equally accepted in case the languages treated face problems similar to minority languages. The goal of the workshop is to get an overview of activities, methodologies and achievements in the area of Natural Language Processing of Minority Languages, in order to promote the research in this area and to enhance the prestige associated with this research. Automatic processing of minority languages has to overcome a number of difficulties which arise from their special status. * As these languages have few speakers, there are few native linguists and even fewer computational linguists. Rule-based approaches to tagging, parsing, etc. may thus be difficult to apply. * The scarce financial support that these languages enjoy equally seems to virtually exclude rule-based approaches due to the amount of human labor these approaches generally require. This problem might be overcome if computational frameworks derived from other languages can be adopted. * Corpus-based approaches are only applicable if adequate corpora are available. However, creation of a corpus is time- and money-consuming and requires linguistically sound conceptions, especially if general-purpose corpora are to be created. * Example-based approaches seem to be more promising in this light if no general-purpose corpora, but specific examples are required. Compilation of special examples also seems to be easier to implement than to write formal rules. However, little is known of the feasibility of this paradigm with respect to minority languages. * Shallow knowledge techniques may be developed or are already in use, which benefit from a specific property of a language or a language family. This however may hamper the transfer of the approach from one language to other languages. Some techniques might work with analytic languages and not with agglutinative languages, etc . Different writing systems might also prevent one simple approach from being applicable to another language. The workshop is expected to stimulate research in this area. We invite papers which are concerned with, but not restricted to, the following topics:. TOPICS OF INTEREST * the relation between NLP and minority language support in general, * development of specific NLP applications for minority languages, e.g. tagging, morphological analysis, parsing, information retrieval, machine translation * development of corpora and machine-readable dictionaries for minority languages, * presentation of shallow knowledge NLP techniques which could be applied to minority languages, * overview studies that describe the state of the art of NLP for the minority languages of a country, a region or a language type, * comparative analysis of different NLP approaches to different minority languages and languages types, * free resources for NLP, their application areas and limitations, * the requirements for NLP applications for special minority language groups. PROGRAM COMMITTEE Shin-Hsi Chen National Taiwan University, hh_chen@csie.ntu.edu.tw Vitelio Herrera Union Latine, Direction Terminologia et Industries de la Langue, Paris v.herrera@unilat.org Leonid Iomdin Academia Nauk Moscow, Laboratory of Computer Linguistics iomdin@iitp.ru Harold Somers Centre for Computational Linguistics, UMIST Harold.Somers@umist.ac.uk Oliver Streiter EURAC, European Academy, Language & Law, ostreiter@eurac.edu Mathias Stuflesser SPELL, Servisc de Planificazion y Elaborazion dl Lingaz Ladin, spell-mathias@ladinia.net Leonhard Voltmer EURAC, European Academy, Minorities, lvoltmer@eurac.edu Wolfgang Wölck University at Buffalo, SUNNY, wwolck@acsu.buffalo.edu IMPORTANT DATES 19.3.2003 Submission deadline 31.3.2003 Notification of acceptance 28.4.2003 Camera ready version SUBMISSION FORMAT Submissions should not be longer than 10 pages in Times 12, all included. For more detailed information in French see: http://www.sciences.univ-nantes.fr/irin/taln2003/page/taln_appel.html Style files can be downloaded here. Latex French: http://www.sciences.univ-nantes.fr/irin/taln2003/doc/StyleLatexTaln03_FR.tgz Latex English: http://www.sciences.univ-nantes.fr/irin/taln2003/doc/StyleLatexTaln03_EN.tgz Word French: http://www.sciences.univ-nantes.fr/irin/taln2003/doc/ModeleTaln2003_FR.dot Word English: http://www.sciences.univ-nantes.fr/irin/taln2003/doc/ModeleTaln2003_EN.dot CONTACT ADDRESS The contact address for submissions to the workshop and further informations with respect to the workshop is Oliver Streiter European Academy Language and law mail: ostreiter@eurac.edu tel: +39 0471 055 115 fax: +39 0471 055 199