ladin | italiano | deutsch
Zum Seitenende

The Information System for Legal Terminology bistro

bistro is an online Information System for the research, analysis and translation of terms and texts in the Italian, German and Ladin legal and administrative language. bistro was developed by the scientific collaborators of the Institute for Specialised Communication and Multilingualism at the European Academy of Bolzano.

The name of the system recalls the atmosphere of small Parisian restaurants where food is served quickly. It is said that the bistros owe their name to Russian soldiers who reached Paris after Napoleon’s defeated army and used to shout Bystro! Bystro! (Quick! Quick!) to the French waiters in order to be served more speedily. Hence the name of the restaurants and of the bistro Information System that allows the user to perform searches and find complete and useful information very rapidly.

bistro consists of:

  • A data base containing about 50,000 terms of the Italian legal system (terms in the Italian original language and in their German and Ladin translations) as well as of the Austrian, German and Swiss federal systems. bistro also ensures access to the German-language terminology approved for South Tyrol by the Terminology Commission (TerKom) on the basis of the comparative and linguistic research work carried out by the collaborators of the Institute for Specialised Communication and Multilingualism.
  • A bilingual corpus (CATEx), which is a collection of Italian-language legal texts with the corresponding German translations. Among other texts the CATEx contains the Italian civil code, the civil procedure code, the Italian insolvency law, the consolidated tax code as well as a collection of local legislation from the Autonomous Province of Bolzano/Bozen.
  • A trilingual corpus (CLE), which is a collection of administrative documents in the Italian, German and Ladin languages. The documents and laws were provided by the municipalities of the Ladin valleys Badia and Gardena, the Ladin Pedagogical Institute, the Office for Language Issues of the Autonomous Province of Bolzano/Bozen.
  • A term recognition tool, which automatically connects the contents of a document or webpage loaded onto the bistro platform by the user to the bistro data base and thus allows to access existing translations;
  • A term extraction tool, which makes it possible to automatically generate lists of term candidates from within webpages or documents indicated by the user. These glossaries can serve as a starting point for further comparative terminology research.

The bistro Information System is freely available online.


CATEx corpus

This corpus was built within the CATEx (Computer Assisted Terminology Extraction) project. It is a collection of Italian legal texts with their translations into German. The entire corpus is available online via the bistro Information System. The texts are part of the so-called Blaue Reihe, a series of translations of the main Italian legal codes into German. Also a collection of local bilingual laws is part of the corpus.


Contents

  • Civil code (Codice civile - Italienisches Zivilgesetzbuch)
  • Civil procedure code (Codice di procedura civile - Italienische Zivilprozeßordnung)
  • Penal procedure code (Codice di procedura penale - Italienische Strafprozeßordnung)
  • Administrative procedures (Processo amministrativo - Der Italienische Verwaltungsprozess)
  • Insolvency code (Fallimento ed altre procedure concorsuali - Italienisches Konkursrecht und andere Insolvenzverfahren)
  • Italian notary code (Ordinamento del notariato italiano - Italienische Notariatsordnung)
  • Consolidated tax code (Testo Unico delle Imposte sui Redditi - Einheitstext der Steuern auf das Einkommen)
  • Subsidiary laws to the Civil code (Leggi complementari al codice civile - Nebengesetze zum Italienischen Zivilgesetzbuch)
  • local legislation of the Autonomous Province of Bolzano/Bozen

The corpus contains approximately 5 million words in Italian and German.

The two language versions of each text - in the two official languages of the province, Italian and German - are aligned at paragraph level. This means that the user can access the equivalent text excerpt in the other language for every single paragraph within the corpus.


Purpose

The CATEx is a tool for linguistic and terminological analyses in general. More specifically, it allows to compare the two languages rapidly and with great flexibility. A dedicated user interface allows to carry out detailed and specific researches within the corpus. The user can look up single or multiword expressions as well as entire collocations, such as delitto, colpa grave or dolo o colpa grave. Combined searches are also possibile. For example, all instances of colpa translated with Fahrlässigkeit and not with Schuld can be found thanks to this option.


Further readings

The CATEx project was coordinated by Johann Gamper. For further information read the following articles:


CLE corpus

The Ladin corpus of the EURAC (CLE) was built within the project on Ladin legal and administrative terminology TermLad II. It consists of a collection of administrative texts in electronic format, which were provided by the municipalities and public bodies of the Province of Bolzano/Bozen. Every single text (mostly with Italian as source language) was linked to the equivalent translations into Ladin and German, i.e. the texts were aligned at paragraph/sentence level. The corpus contains about 5000 documents (8.5 million words) in German, Italian and Ladin.


Purpose

The CLE corpus is mainly used for the elaboration of terminology entries. Terms, definitions and contexts contained in the corpus are extracted for this purpose. Different translations and their frequency can be analysed. The CLE corpus is freely accessible to the public via a dedicated user interface. The user can search the corpus for German, Italian and/or Ladin terms.


Text collection

All the texts contained in the CLE were provided by the Municipalities of the Ladin valleys Val Badia and Val Gardena, by the Office for Language Issues of the Autonomous Province of Bolzano/Bozen and by the Ladin Pedagogical Institute (IPL).


Further readings

Streiter, O./ Stuflesser, M./ Ties, I.: CLE, an Aligned, Tri-lingual Ladin-Italian-German Corpus. Corpus Design and Interface. LREC 2004, workshop on ‘First Steps for Language Documentation of Minority Languages: Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation’ Lisbon, 24th May 2004.


Language tools

The language tools available in bistro are the term extraction and term recognition tools. Both aim at speeding up and supporting terminology work.


Users can copy/paste a text of their choice (accepted formats are *.doc., *.rtf, o *.txt) or an internet address (URL) into the interface of the term recognition tool. bistro will then process the text and highlight all terms that are already contained in the bistro terminology data base. For each highlighted term direct access to the full terminology entry is granted.

This tool allows the user to

  • check the presence of a term within the bistro terminology data base;
  • access definitions, contexts or translations rapidly;
  • find the legal subfield which a term belongs to (administrative law, penal law, etc.).

The term extraction tool can be used exactly in the same way. In this case bistro finds potential ‘term candidates’ in the texts indicated by the user and drafts a list (a glossary) that can be used for further terminology research.