Home
TERMINAE User Manual - V12-1
Contents
1. Term CRS Document ID 0 Sentence ID Occurrence text RB OK Cancel Figure 6 5 Add occurrence for a term e Removing adjectives allows to suppress the terms that are tagged as adjectives e Removing numbers allows to suppress the terms that are numbers e Removing adverbs allows to suppress the terms that are tagged as ad verbs e Removing terms from its frequency allows to suppress the terms for which its frequency is less than a number for example 0 6 4 4 Terminological form actions This menu is used to define terminological forms described in next chapter e New terminological form allows to create a terminological form for the selected term Once the terminological form is created the new form can be visualized on the Terminae Terminological level step 2 perspective which is automatically opened and the lexical unit which form has been created is displayed in blue character in the Lexical units view Terminae Terminological level step 1 per spective e To terminological formallows to visualise the terminological form of the selected terminological unit if it has one This action automatically switches from the Terminae Terminological level step 1 CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE23 perspective to the Terminae Terminological level step 2 perspective Chapter 7 Terminae Terminological level step 2 perspective This perspe
2. step 2 E Terminae Terminological level step 1 E Terminae Project perpective Terminological for 23 BH Fi Lexical information 2 as Terminological form Entry range Variants abrasion conditioning Term extractor Yatea ix airbag acceleration device form Airbag acceleration test device grammatical type NN adjusting device NE extractor Gate agreement T RT Relations 53 D amendment anchorage Syntactical relations Terminological relations cc es CETTE TU e anchorage of the belt Head Modifier Term1 name of relation 1 anchorage of the seat passenger airbag airbag assembly anchorage point s sg gt lt a D C B angle of the strap angle quadrant Sed Term occurrences X ELI PER RO atmosphere y oun phrases belt anchorage Securencell EN buckle ID 0cc8343 doc 0 sent 58 PA Airbag buckle test Airbag assembly means a device installed to supplement safety belts and restraint systems in power driven vehicles i e system which in the event of a severe impact affecting the vehicle automatically deploys a carriage of passenger flexible structure intended to limit by compression of the gas contained cold conditioning within it the gravity of the contacts of one or more parts of the body of an occupant of the vehicle with the interior of the passenger compartment calibration test conditioning o em C lo L Occurrence 2 number of lines
3. 90 ID 0cc9230 doc 0 sent 60 zl l ML Im mme m onm mmm Mmmm oam mele me mma ds da intandad ta mnm dmm de hae Figure 7 1 Terminae Terminological level step 2 perspective e The Lexical information view is a form in which you can freely create modify or suppress some fields By default four lexical fields are defined Term extractor Yatea which range is X if the terminolog ical unit has been extracted by YaTeA term extractor form which gives its canonical form grammatical type which gives its grammatical category NE extractor Gate which the range is its type if the lexical unit is a recognised named entity The first three fields are automatically filled in by information provided by Ya TeA The last one is an ANNIE Gate information e The Variants view lists all the lexical forms that are associated as vari ants to the canonical form They can be found in the corpus or manually added CHAPTER 7 TERMINAE TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE26 e The Relations view presents the relations that the terminological unit has The Syntactical relations list shows the phrases to which it belongs either as a head or as a modifier The syntactical information is provided by YaTeA analysis of the corpus The Terminological relations list shows what are its ter minological relationships In the current version of the TERMINAE platform the terminological relations have to be filled manua
4. left hand tree view No annotation sets nor annotations will be shown until annotations are selected in the annotation sets the Default set is indicated only with an unlabelled right arrowhead which must be selected in order to make visible the available annotations Open the default annotation set and select some of the annotations to see what the ANNIE application has done Having selected an annotation type in the annotation sets view hovering over an annotation in the main resource viewer or right clicking on it will bring up a popup box containing a list of the annotations associated with it from which one can select an annotation to view in the annotation editor or if there is only one the annotation editor for that annotation Now to save your corpus annotated with ANNIE right click on a document in the resources tree and choose Save as XML In addition all documents in a corpus can be saved as individual XML files into a directory by right clicking on the corpus in the resources tree and choosing the option Save as XML For French corpora you have to install treetagger and load the Tagger Framework plugin In the resource directory you find TreeTagger FR Tokenization gapp You load this application in Gate platform You also load the Lang French plugin and the french gapp Gate application The selected processing resources are defined in Figure 11 8 Gate named entity type file The DTD of the XML file which contains name
5. occurrence to be removed e Create a terminological form to create a terminological form from a termino concept This functionality is useful when you want to add terminological information and occurrences to an existing thesaurus You start from an existing termino concept and create a terminological form us ing a defined corpus CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE33 e Create all terminological forms to create all terminological forms from a preexisting thesaurus This functionality is useful when you want to add terminological information and occurrences to an existing the saurus You start from an existing thesaurus and create a terminological form for each termino concept using a defined corpus 8 3 3 Feature management submenu This submenu proposes various actions related to the detailed information pro vided for a given termino concept and recorded in its termino conceptual form e Add a synonym to add a synonym to the selected termino concept A dialog window opens for capturing the new synonym If the corresponding terminological unit has been found by YaTeA or ANNIE its occurrences are automatically clustered with that of the current termino concept e Remove a synonym to remove a synonym You have to confirm if you want also to remove the related occurrences e Add a link to add a type of link and its value e Remove a link to remove a type of link and its value 8 3 4 Neon ontology submenu Th
6. step 1 E Terminae Project perpective O Lexical units E Occurrences A Named entity type _ Term Freque Named entity comments la E E P 3 Y Unknown CATEGORIES 1 Unknown E Noun phrases CATEGORIES Installed ON ISi 1 i i Occurrence 1 CATEGORY i1 i ID occ5661 doc 0 sent 1819 e i i B ISO F2 Reduced Height CATEGORY Child RESTRAINT 1 Forward Facing toddler CRS CC 1 Unknown RR panna nnn nnnn nnn ae CH Occurrence 2 CH compound hardness CONDUCTING ID occ5662 doc 0 sent 1820 Unknown B1 ISO F2X Reduced Unknown Height Forward Facing toddler CRS CONDUCTING APPROVAL TE CONFORMITY Unknown Occurrence 3 CRF j Unknown ID occ7088 doc 0 sent 1821 CRF base i i C ISO R3 Full Size a i1 1 1 1 E Rearward Facing toddler dh 3 2 1 1 1 Centreplane iUnkown 6 lee Occurrence 4 Centreplane of occupant Child Child RESTRAINT SYSTEMS Child RESTRAINT SYSTEMS Ir Classes i Classes j i ID 0cc7280 doc 0 sent 1824 Unknown F ISO L1 Left Lateral i i Facing position CRS carry cot Occurrence 5 ID occ7888 doc 0 sent 1833 j Figure 1 ISO F3 envelope So jM j v dimensions for a full height number of lines 3655 forward facing toddler CRS Figure 6 3 Visualisation of terms and named entities e Visualize all terms to redisplay the list of terminological units af ter a search sequence e Find a ter
7. 2 perspective 7 1 Perspective overview 7 2 Data Terminological forms 7 3 Terminological actions menu 7 3 Termino concept management submenu 7 3 30 Form management submenu 7 3 3 Feature managementsubmenu 8 Terminae TerminoConceptual level perspective 8 1 Perspective overview 8 2 Data Termino conceptudlforms 8 3 TerminoConceptual actions menu 8 3 1 File submenu 8 3 2 Termino concept management submenu 8 33 Feature managementsubmenu 8 3 4 Neon ontology submenu 9 Neon toolkit Conceptual level OWL perspective 9 1 Perspective overview 9 2 Terminae links menu 10 Annotator perspective 10 1 Input files 10 2 How to proceed 10 3 Some caveats 11 1 XML backup DTD forterms o 11 2 XML backup DTD for ENs 11 3 EnsLexUnit DTD 11 4 Thesaurus DID 11 5 TreeTagger English Tagset 11 6 TreeTagger French Tagset 11 7 Use ANNIE to extract named entities 11 8 Gate named entity type file 19 21 22 24 24 24 26 26 27 27 29 29 29 31 31 32 33 33 36 36 37 38 39 39 40 Chapter 1 Introduction This document describes the functionalities of the TERMINAE platform which is an eclipse application Chapter 2 gives a very short insight of the methodol ogy Ch
8. A gt lt ELEMENT LIST EN NAMED_ENTITY gt lt ATTLIST LIST EN numeroDocument CDATA REQUIRED gt lt ELEMENT LIST OCCURRENCES OCCURRENCEx gt lt ELEMENT LIST SENT SENTx gt lt ELEMENT LIST TERM CANDIDATES TERM CANDIDATE gt lt ELEMENT List Variants Variant gt lt ELEMENT MORPHOSYNTACTIC FEATURES SYNTACTIC CATEGORY gt lt ELEMENT NAMED ENTITY Ens Variants ID LEMMA LIST OCCURRENCES LIST SENT NUMBER OCCURRENCES Types x lt ELEMENT NUMBER OCCURRENCES PCDATA gt lt ELEMENT OCCURRENCE ID DOC SENTENCE START POSITION END POSITION Texte gt lt ELEMENT SENT EMPTY gt lt ATTLIST SENT ID CDATA REQUIRED gt lt ELEMENT SENTENCE PCDATA gt lt ELEMENT START POSITION PCDATA gt lt ELEMENT SYNTACTIC CATEGORY PCDATA gt lt ELEMENT TERM CANDIDATE D LEMMA NUMBER OCCURRENCES LIST OCCURRENCES FORM MORPHOSYNTACTIC FEATURES List Variants NAMED ENTITY lt ELEMENT TERM EXTRACTION RESULTS LIST TERM CANDIDATES LIST EN lt ELEMENT Texte PCDATA gt lt ELEMENT Types typet gt lt ELEMENT Variant PCDATA gt lt ELEMENT type PCDATA gt 11 4 Thesaurus DTD The DTD of the XML file which contains a thesaurus which is visualized in Ter minae TerminoConceptual level perspective A thesaurus contains a collection of terminoconcepts Each terminoconcept is described by an ID a natural language definition corpus oc
9. AE assumes that the acquisition corpus has been processed by Tree Tag ger YaTeA takes as input e A tagged corpus required e A list of terms extracted from it as input required see Section 6 2 Data Terminological files When you open the Terminological 1 vel perspective you have to specify the term extractor used see figure you have three choices e Term list see 6 2 4 e TermoStat see 6 2 1 e Yatea see 6 2 2 You may also want to work with named entities see 6 2 3 CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE15 6 2 1 TermoStat Term files First you have to specify the terminological data you want to start with note that additional data can be loaded afterwards e Load a term list Load TermoStat file which is supposed to be located in the repExt ract Term subdirectory of your project e Select the tagged corpus from which the terms have been extracted tt file tt r It is supposed to be located in the corpora subdirectory of your project e Select the corpus file t xt It is supposed to be located in the corpora subdirectory of your project e Speficy the corpus language English en or French fr When the terminological data is loaded TERMINAE creates one additional file in the corpora directory e f TempCorpus2XML xml which is an xml version of the corpus If you have several documents each one must be processed by TreeTagger and the results must
10. ECTIVE32 8 3 2 Termino concept management submenu e Create termino concept to create a new termino concept You have to type in the name of the termino concept if it is not created directly from a terminological unit e Remove termino concept to remove the selected termino concept You have to confirm the removal e Rename termino concept to change the name of the selected termino concept e Add kindOf link to give a father to the selected termino concept A dialog window opens in which you have to give the name of the father termino concept e Remove kindOf link to remove a father of the selected termino concept e Add a RTC to add a termino concept relation for the selected termino concept A first dialog window opens in which you have to give the name of the relation A second dialog window opens in which you have to click on ok if the selected termino concept is the domain and on cancel if not A third dialog window opens in which you have to give the name of the range or domain depending on the previous answer That termino concept must pre exist A choice dialog window then opens in which you have to select the skos type of the relation e Remove a RTC to remove the selected termino conceptual relation e Add occurrence to add an occurrence to the selected termino concept e Remove occurrence to remove an occurrence of the selecteed termino concept You have to select the identifier of the
11. ERMINAE TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE28 e Remove a syntactical relation modifier to remove the se lected relation e Add a terminological relation to add a terminological relation where the selected term is term1 or term2 e Remove a terminological relation toremove a terminological relation e Add an occurrence to add an occurrence to the selected term You have to specify the document identifier and to type in the text of the occur rence e Remove an occurrence to remove an occurrence to the selected term Select the relevant occurrence to indicate which occurrence has to be re moved Chapter 8 Terminae TerminoConceptual level perspective This perspective must be opened from the Perspective submenu in the main menu by selecting the Terminae TerminoConceptual level 8 1 Perspective overview The Terminae TerminoConceptual level perspective presentation is very similar to that of the Terminae Terminological level step 2 perspective It is composed of two main parts with a global view on the left and a set of more detailed and dependant views on the right see Figure 8 1 e The TerminoConcept tree view is by default presented on the left part of the perspective It shows the hierarchy of all the termino concepts that have been created e The other views form the termino conceptual form of the termino concept that has been selected in the TerminoConcept tree see Section 8 2 Note th
12. MENT List Lemme EMPTY gt lt ELEMENT List Variants EMPTY gt lt ELEMENT NAMED ENTITY ID LEMMA FORM List Variants Types NUMBER OCCURRENCES LIST OCCURRENCES LIST SENT lt ELEMENT NUMBER OCCURRENCES PCDATA gt lt ELEMENT OCCURRENCE ID DOC SENTENCE START POSITION END POSITION Texte gt lt ELEMENT SENT ID offset phrase List Lemme gt lt ELEMENT SENTENCE PCDATA gt lt ELEMENT START POSITION PCDATA gt lt ELEMENT Texte PCDATA gt lt ELEMENT Types type gt lt ELEMENT offset PCDATA gt lt ELEMENT phrase PCDATA gt lt ELEMENT type PCDATA gt 11 3 EnsLexUnit DTD The DTD of the XML file which contains terms named entities and their occur rences which is visualized in Terminae Terminological 1 perspective A gt HPC level DATA Ens Variants EMPTY gt PCDATA lt ELEMENT DOC PCDAT lt ELEMENT END POSITION lt ELEMENT lt ELEMENT FORM lt ELEMENT ID PCDATA gt gt gt step gt CHAPTER 11 ANNEX 44 lt ELEMENT LEMMA PCDAT
13. NCE gt lt ELEMENT LIST TERM CANDIDATES TERM CANDIDATE gt lt ELEMENT List Variants Variant gt lt ELEMENT MORPHOSYNTACTIC FEATURES SYNTACTIC CATEGORY gt I E lt ELEMENT SENTENCE PCDATA gt lt ELEMENT START_POSITION PCDATA gt lt ELEMENT SYNTACTIC_CATEGORY PCDATA gt lt ELEMENT TERM_CANDIDATE ID LEMMA FORM List_Variants NUMBER_OCCURRENCES LIST_OCCURRENCES MORPHOSYNTACTIC_FEATURES gt lt ELEMENT TERM_EXTRACTION_RESULTS LIST_TERM_CANDIDATES gt lt ELEMENT Texte PCDATA gt lt ELEMENT Variant PCDATA gt 42 CHAPTER 11 ANNEX 11 2 XML backup DTD for ENs 43 The DTD of the XML file which contains named entities and their occurrences which is visualized in Terminae Terminological level step 1 perspective lt ELEMENT DOC FPCDATA gt lt ELEMENT END POSITION PCDATA gt lt ELEMENT FORM EMPTY gt lt ELEMENT ID PCDATA gt lt ELEMENT LEMMA PCDATA gt lt ELEMENT LIST EN NAMED_ENTITY gt lt ELEMENT LIST OCCURRENCES OCCURRENCEx gt lt ELEMENT LIST SENT SENTx gt lt ELE
14. R 10 ANNOTATOR PERSPECTIVE 40 X Annotator RCLN Show View About Ej E Annotator perspective 2 0 A Annotatedtext view in rdfa 3 Document text file Browse POS Morpho syntax tt file Browse Thesaurus rdf file Browse Ontoloay owl file Ontology files Browse Figure 10 1 The Annotator window 10 3 Some caveats The document must be in text format so pdf and other elaborated files have to be converted It is required to use the same encoding in the three files where non ascii characters may appear text POS and SKOS UTF 8 is proposed by default but other encodings can work too Due to OS and source files diversity encoding may need some care When debugging anomalies the text and POS file being non homogeneous results in scope errors and misses of the annotations The SKOS and POS file being non homogeneous results in misses Sentence splitting and word splitting are provided by the POS tagger Depend ing on it sentence boarders may happen to be internally incorrect e g because titles have no end point But the output exactly preserves the appearance of the input white space line length blank lines Some typography may be ambiguous w r t word splitting e g the upper middle class and we have had a version of CHAPTER 10 ANNOTATOR PERSPECTIVE 41 a POS tagger which blows out y with some poor effects on the annotation The lexicalization of the ontolo
15. TERMINAE User Manual vi2 1 Sylvie Szulman Paris 13 with contributions from Adeline Nazarenko Paris 13 2012 October Abstract TERMINAE is a platform that assists users in designing termino ontological resources from texts It can be used by terminologists to build terminological forms and by knowledge engineers to build either thesaurus expressed in SKOS or ontologies organising concepts and lexical units in a formal way supporting inferences This platform allows to link textual elements to terminological and conceptual resources The acquisition corpus may contain one or several documents The supported languages are English and French Keyword list Ontology acquisition terminology assisting tool Executive Summary This document is the user guide of TERMINAE TERMINAE is a platform that assists users in the design of termino ontological resources from texts It is used to build from texts e thesaurus expressed in SKOS and e ontologies organising in a formal way the concepts associated to the terms and supporting inferences This platform allows to link textual elements to terminological and conceptual resources The corpus may contain one or several documents The supported languages are English and French TERMINAE is organised in three main levels the first step of the terminolog ical level enables to constitute the set of terms of the corpus its second step or ganises these according to lexical and syntactic re
16. age the Terminae application e The file Terminae contains the name of the current project It is created in the directory where you launch the Terminae application You normaly do not need to modify it e The file nameOfProject xcfg defines the configuration of each project the set of files exploited by the project Advertised user may easily under stand its content and may happen to change it in tricky cases e g for renaming directories or files These files are text files or modifiable xml files Chapter 4 Main menu Figure 4 I presents the main menu of the TERMINAE platform which is accessible from any perspective It presents 4 items which are associated to specific actions or sub menu Terminae project actions Perspectives Show View help Figure 4 1 Main menu e The action submenu gives access to the specific functionalities accessible at the Terminae level where you are currently working The name of the action menu depends to the perspective from which it depends Terminae project actions Linguistics actions Terminological actions TerminoConceptual actions andTerminae links e The Perspectives item allows to open new perspectives you simply have to click on the name of the perspective you want to open in the per spective list that appears 6 perspectives are accessible Annotator perspective see Section 10 Terminae Project perspective which is the default per spective which is
17. apter B gives the technical characteristics and the installation instructions Chapter 4 presents the main menu and the following chapters chapters S to introduces the 6 perspectives of the platform and the related functionalities Chapter 2 The Terminae method TERMINAE is a tool that is supported by a method and some very short fore words on the method can help using the tool The task is to build a domain termino ontological resource thesaurus or ontology This is an expert task since it needs to decide which concepts are really important for the domains and how they are related It has been experienced that linguistic tools relying on texts spe cific of the domain can help the expert They do not do the work in his her place but they propose a good starting point to improve the coverage of the domain and some ambiguities they raise reveal real and unseen ambiguities of the domain vocabulary The TERMINAE method starts from the linguistic results produced by a term extractor Yatea TermoStat It has then three steps e At the linguistic level the input is a list of term candidates i e words or group of words which on a linguistic basis could possibly figure in a terminology of the domain a list of its main terms The goal of this level is in a first step chapter 6 to constitute clean and improve the list removing parasistic or irrelevant proposals A second step 7 involves grouping those which are morphologic v
18. ariants of the same term and collecting linguistic relations This work relies on the list of occurences of each term which are gathered with linguistic information in terminological forms e The termino conceptual level chapter 8 is specific to TERMINAE Whereas terms are at the vocabulary level the goal is now to analyse the use of terms in the corpus at the semantic level The work is to recognize and distribute the various senses of this term into several termino concepts distribut ing also the occurences of the term between senses At the same time the CHAPTER 2 THE TERMINAE METHOD 3 termino concepts of the form can be tagged as having a synonym in an other form or being otherwise more loosely related e The ontological level see chapter 9 now relies on termino concepts and their relations to build the ontology First synonym termino concepts should only yield one concept All the related termino concepts help building the hierarchical relations and defining the roles as can do some other linguistic information gathered during the process Chapter 3 Technical Characteristics e The current version of TERMINAE platform is compiled using SUN 1 6 Java virtual machine e It relies on UTF 8 text encoding e t can be used for English and French 3 1 Installation To install TERMINAE you need java version 1 6 Download the version of the platform for your system from the http lipn univ paris13 fr terminae in
19. at you can find a termino concept simply by typing its first letter in the TerminoConcept tree view 8 2 Data Termino conceptual forms The termino conceptual level is a bridge between the terminological level and the conceptual level the ontology It is made of a set of termino concepts which are themselves described by termino conceptual forms gathering the relevant infor mation that has been collected or defined for those termino concepts 29 CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE30 amp w Terminae project TestDemo Y y X TerminoConceptual actions Perspectives Show View help rs E Terminae TerminoConceptual level El Terminae Terminological level step 1 E Terminae Terminological level step 2 B Te Giel E TerminoConcept 23 BF TerminoConcept features 23 TerminoConcepts Term iD airbag Synonyms Links OWLClass OWLClass B AdjustingDevice Bl Agreement ll PassengerAirbag Bl Amendment E v D I Anchorage O NL Definition a Mi AnchorageOfTheBe lg Bl AnchorageOfTheSeat A ttp Test DemoOntoAirbag http lipn univ paris13 fr RCLN Bl AnchoragePoint Bl AngleOftheStrap B AngleQuadrant lll Atmosphere B BeltAnchorage Ml Breakingstrengthofs F1 Occurrences 3i Br TCrela 2 Noun phrases Bl Buckle Occurrence 1 Domain E BuckleTest ID 0cc8343 doc 0 sent 58 a Airbag assembly means a device installed
20. ate the corpus to work on it you can call for ANNIE From the File menu select Load ANNIE System To run it in its default state choose with Defaults This will automatically load all the ANNIE re sources and create a corpus pipeline called ANNIE with the correct resources selected in the right order and the default input and output annotation sets JAPE is a Java Annotation Patterns Engine It provides finite state transduction over annota tions based on regular expressions JAPE allows you to recognise regular expressions in annota tions on documents CHAPTER 11 ANNEX 48 If without Defaults is selected the same processing resources will be loaded but a popup window will appear for each resource which enables the user to spec ify a name location and other parameters for the resource This is exactly the same procedure as for loading a processing resource individually the difference being that the system automatically selects those resources contained within ANNIE When the resources have been loaded a corpus pipeline called ANNIE will be created as before The next step is to add a corpus and select this corpus from the drop down cor pus menu in the Serial Application editor Finally click on Run from the Serial Application editor or by right clicking on the application name in the resources pane and selecting Run To view the results double click on one of the document contained in the corpus processed in the
21. be concatenated in a single file where the various intial documents are separated by a document tag as shown below Text n TAB Document TAB n where TAB is the tabulation character and n varies between 0 and x 1 x being the total number of documents 6 2 2 Yatea Term files First you have to specify the terminological data you want to start with note that additional data can be loaded afterwards e Load a term list Load Yatea file which is supposed to be located in the repExtractTerm subdirectory of your project e Indicate how many documents your corpus encompasses Note that docu ments are numbered starting from 1 if there are several of them but that a single document has number 0 e Select the tagged corpus from which the terms have been extracted tt file t t r It is supposed to be located in the corpora subdirectory of your project CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE16 e Speficy the corpus language English en or French fr When the terminological data is loaded TERMINAE creates two additional files in the corpora directory e f TempCorpus2XML xml which is an xml version of the corpus e fTempTT2XML xml which is an xml version of the tagged corpus Be aware that there is a bug in Yatea result file if you use treetagger french utf8 command The result is in utf 8 encodage but the encoding attribute of the first line has the value ISO 8859 1 You have to modify the encoding at
22. ctive can be opened either by creating a terminological form or from the main Perspective menu Terminological level step 2 7 1 Perspective overview The Terminae Terminological level step 2 perspective is com posed of two main parts with a global view on the left and a set of more detailed and dependant views on the right see Figure 7 1 e The Terminological form list view is by default presented on the left part of the perspective It gives the lists of all the canonical terminolog ical units for which a terminological form has been created the form can be In progress or Completed e The other views form the terminological form of the unit that has been se lected in the Terminological form list see Section 7 2 Note that when the list of terminological forms is selected you can find any terminological form by typing the first letter of its canonical terminological unit 7 2 Data Terminological forms An example of terminological form is displayed on the right part of Figure 7 1 A terminological form gathers all the lexical and terminological information that has been collected or manually added for a given term or named entity It is usually composed of the following views 24 CHAPTER 7 TERMINAE TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE25 amp w Terminae project TestDemo e V Terminological actions Perspectives Show View help E E Terminae Terminological level
23. cument itself in a single text txt file e The output of a morphological analyzer and POS tagger in three tab separated columns word POS lemma e A lexicalization file following the SKOS standard such as provided by TERMINAE when it builds an ontology This file can also be created or modified with a plain text editor Its DTD is defined in the annex part see 11 6 e One or several ontologies in OWL format 10 2 How to proceed The ontologies and their lexicalization can generally be reused for several doc uments The POS file is of course document dependent and must be generated before annotating When the Annotator perspective is open it supposes that the directory of your project is the defined workspace and that the file s encoding is UTF 8 Then a window opens see fig 10 1 with four fields in the left pane Used resources and with a blank right pane entitled Annotated text view Browse in the four left pane fields for the files which have been prepared Then run by clicking on the button with a triangle down this pane The annotated text appears in the right pane You can check it and if satisfied save it two buttons up the right pane allow to save either a project which SemEx can use or only two files describing the annotations according two different for mats Then if you continue annotating some more files for SemEx you can store the new results in the same project or create a fresh one CHAPTE
24. currence 1 ID occ5661 doc 0 sent 1819 e LEN EL LL LLL NN B ISO F2 Reduced Height CRF base jii i j Forward Facing toddler CRS CRS 9 UA ju f i Occurrence 2 ncs pu pmo c ID occ5662 doc 0 sent 1820 Centreplane of occupant E i B B1 ISO F2X Reduced Child ia enn Heiaht Forward Facina Figure 6 4 Select an occurrence identifier e Add occurrence for a term to enter a new occurrence for a term You have to select a term and fill the form see Figure 6 5 e Remove occurrence for a term to remove an occurrence for a term You have to select the identifier of the occurrence you want to re move 6 4 3 Cleaning submenu This menu allows to clean up the list of terminological units by removing a certain category of terms or named entities Various options are proposed e Remove terms listed in a file allows to suppress all the ter minological units that are listed in a given file You have to give the name of that file in which the stop words are listed one at each line e Remove terms involving given characters allows to clean the list of terminological units on a character basis You have to type in the list of forbidden characters e Remove single character terms allows to suppress the single character terms from the list of terminological units CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE22 fe e Fill the form The document number and text occurrence fields are required
25. currences a prefLabel a set of see also a set of synonyms altLabel a set of children and its father lt ELEMENT DOC PCDATA lt ELEMENT END_POSITION lt ELEMENT lt ELEMENT ID PCDATA lt ELEMENT NL_Definition PCI EnsTerminoConcepts gt gt DATA gt name PCDATA gt TerminoConcept gt CHAPTER 11 ANNEX 45 lt ELEMENT OCCURRENCE ID DOC SENTENCE START POSITION END POSITION Texte lt ELEMENT PrefLabel PCDATA gt lt ELEMENT RelationRTC name domain range Skos_type gt lt ELEMENT SENTENCE PCDATA gt lt ELEMENT START POSITION PCDATA gt lt ELEMENT See also PCDATA gt lt ELEMENT SetRIC RelationRTC gt lt ELEMENT Skos type PCDATA gt lt ELEMENT Synonym PCDATA gt lt ELEMENT TerminoConcept ID NL Definition OCCURRENCE PrefLabel See also SetRTC Synonym children fathers x gt lt ELEMENT Texte PCDATA gt lt ELEMENT child PCDATA gt lt ELEMENT children childx gt lt ELEMENT domain PCDATA gt lt ELEMENT father PCDATA gt lt ELEMENT fathers father gt lt ELEMENT name PCDATA gt lt ELEMENT range PCDATA gt 11 5 TreeTagger Eng
26. d entity type file which is used when loading named entities see 6 2 3 lt xml version 1 0 encoding UTF 8 gt lt ensTypeEn gt lt typeEn gt Organization lt typeEn gt CHAPTER 11 ANNEX 49 2 TreeTagger FR T A ANNIE Selected Processing resources Name Type gt EI reset Document Reset PR es RegEx Sentence Splitter RegEx Sentence Splitter e S merecen oenencragger amp French Gazetteer ANNIE Gazetteer IN ANNIE POS Tagger ANNIE POS Tagger et ANNIE NE Transducer ANNIE NE Transducer Figure 11 1 Selected processing resources lt typeEn gt Date lt typeEn gt lt typeEn gt Person lt typeEn gt lt typeEn gt Percent lt typeEn gt lt typeEn gt Location lt typeEn gt lt typeEn gt Money lt typeEn gt lt typeEn gt Title lt typeEn gt lt typeEn gt Address lt typeEn gt lt typeEn gt Unknown lt typeEn gt lt typeEn gt Jobtitle lt typeEn gt lt typeEn gt FirstPerson lt typeEn gt lt typeEn gt Location lt typeEn gt lt typeEn gt UrlPre lt typeEn gt lt ensTypeEn gt
27. dex php Download web page and unzip the downloaded file The default language is English but it can be changed If you want to work with a French platform edit the terminae ini file and change the line n1 enby nl fr FR This file is located in the Terminae directory on Linux and Windows systems and in the Terminae app Contents MacOS directory on MacOS systems 3 2 How to start To launch the TERMINAE platform click on the Terminae application either Terminae on Linux system Terminae exe on Windows system or Terminae app on MacOS Initially the project management perspective Terminae Project perspective is open and you have to import or create a project 4 CHAPTER 3 TECHNICAL CHARACTERISTICS 5 3 2 1 Project location and structure In any case you have to define your project directory On Linux and Windows systems it is advised to locate it in the workspace directory created by the eclipse application A project has a fixed structure represented as the 6 following subdirectories e corpora Contains the corpus data raw and tagged and the results of named entity recognition tools The current version of the platform is de signed to work with TreeTaggel and ANNIE named entity recognition tool e terminoFormDir Contains the terminological forms that are created using TERMINAE and output by it e linguae Contains the search patterns that have been designed and their results no pattern design tool is avai
28. e meaning of a termino concept is not formally defined It is mainly described by its related occurrences 8 3 TerminoConceptual actions menu The action menu associated with the Terminae TerminoConceptual level perspective is the TerminoConceptual action menu It proposes 4 sub menus which are presented in the following subsections File submenu Termino concept management submenu Feature management submenu Neon ontology submenu The corresponding actions are also contextually accessible from the right click of the mouse 8 3 1 File submenu This menu allows to load and save termino conceptual data It proposes the fol lowing actions Load XML format to load a thesaurus in XML format see DTD in An nex 11 4 Save XML format to save a thesaurus in XML format Export SKOS to Import SKOS to load an existing thesaurus in Skos format export a thesaurus in Skos format A dialog window opens in which you have to define an URI added to the name of skos con cepts to guarantee they are uniquely identified for instance http www lipn univ paris13 fr terminae Note that in the current version of the TERMINAE platform the termino conceptual relations are not described in the exported file Export SKOS R DF XML format to export a thesaurus in RDF XML format A dialog window opens in which you have to define an URI as for the skos format CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSP
29. eptual level perspective you need to create or to import a Neon toolkit project which is different from the Terminae project and to create or import an ontology in this project This can be done either from theNeon ontology submenu of the Terminae TerminoConceptual perspective Create a Neon project and Create Neon Toolkit ontology items or create the project and the ontology from the menu of the navigator view in Neon toolkit conceptual level perspective click right In the Neon toolkit conceptual level perspective you can also import an ex isting project In this case you have to refresh the view to display the imported project and to link it to the terminoConceptual perspective see the following sec tion You can also import an ontology use import item from the menu of the navigator view of Neon toolkit conceptual level perspective 9 1 Perspective overview The Neon toolkit Conceptual level OWL perspective presentation 1s very similar to that of the Terminae TerminoConceputal level per spective It is composed of two main parts with a global view on the left and a set of more detailed and dependant views on the right see Figure 8 2 See the documentation http www neon toolkit org wiki Documentation and Support 36 CHAPTER 9 NEON TOOLKIT CONCEPTUAL LEVEL OWL PERSPECTIVE37 9 2 Terminae links menu Terminae links menu has been added to the Neon Toolkit perspective to link the conceptual and the term
30. gy described in the SKOS file associates sev eral lexical forms to a single labeling entity Each lexical form stores the lem matized form of words don t forget 1t 1f you create your own SKOS As this form is also computed by the morphosyntactic parser lexicalizations are recog nized independently of morphological variants Note that the technique is a bit over productive due to ambiguity of lemmas We plan to improve it by using the POS category On the other hand before annotating according to the SKOS file the labeling entity is checked against the ontology if it is not present there the annotation is skipped Discrepancies between SKOS and OWL files are logged in annotator log in the result directory and it can be wise to check the content of this file Chapter 11 Annex This annex lists the DTD used by Teminae 11 1 XML backup DTD for terms The DTD of the XML file which contains terms and their occurrences which is vi sualized in Terminae Terminological level step 1 perspective lt ELEMEN lt ELEMEN NUMBER_OCCURRENCES PCDATA gt OCCURRENCE ID DOC SENTENCE START POSITION END POSITION Texte lt ELEMENT DOC PCDATA gt lt ELEMENT END POSITION PCDATA gt lt ELEMENT FORM PCDATA gt lt ELEMENT ID PCDATA gt lt ELEMENT LEMMA PCDATA gt lt ELEMENT LIST OCCURRENCES OCCURRE
31. has been already defined project corpus thesaurus and author s names e The right view is a text editor where the user may write comments To save the comments you have to click on ctrl s 5 1 Terminae project actions menu A project consists of all data used or created by TERMINAE when building a spe cific termino ontological resource from a given corpus see Section for a description of the project structure 10 CHAPTER 5 PROJECT MANAGEMENT PERSPECTIVE 11 The corpus is in a txt file it is advised to use utf 8 encoding See section 6 2 to have the description of the used files You can either 5 2 Create a new project Create Terminae project if you start to build a specific termino ontological resource from a given corpus You have to specify The name of your project The name of the directory where you want to locate your project A default directory is proposed but click on the cancel button and nav igate through the file system if you want to choose another directory Switch from one project to another Load Terminae project note that only one project can be opened at the same time You are first of fered to navigate through the file system to select the directory containing the concerned project directory Be aware if you change project when a perspective X other than project perspective is open you have to reload manually the data of the X perspective Export the current project E
32. hich named entity types you are interested in by selecting a named entity type XML file that should be located in the corpora subdirectory of your project A second file dialog window opens in which you have to select an other xml file containing the list of named entities extracted by AN NIE This file should also be located in the corpora subdirectory of your project You have to indicate the number of the document 0 if only one document for which you have used Annie tool e Save named entities to make an XML backup see Annex for details on the file format e Load named entities to load the named entities from an XML backup e Load all lexical units to load the terms and named entities from a single XML backup e Save all lexical units to make an XML backup of all entities terms and named entities see Annex for details on the file format If everything works properly when all types of terminological data are loaded the window of Figure 6 3 appears 6 4 2 Term Management submenu This menu allows to manage terminological data i e to visualise the list of termi nological units and edit it by clustering removing or adding some of them The Term Management menu proposes 9 different actions CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE20 amp w Terminae project TestDemo Y Y V Linguistic actions Perspectives Show View help E E Terminae Terminological level
33. idual name and select the class from which it belongs thanks to dialog windows CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE35 e Create an individual is used to create an individual You have to enter the individual name and select the class from which it belongs thanks to dialog windows amp amp Terminae project TestDemo Y Y Y Terminae links Perspectives Show View Search help ei Neon toolkit conceptual level OWL E Terminae TerminoConceptual level H Terminae Terminological level step 2 Ce E AA Ontology Naviga 23 D P Entity Properties 23 a gees E P O Attribute URI khttp lipn univ paris13 fr RCLN terminae Audi Airbag gt b BusinessObj E P 3 Category Annotations b O Conditioning R Annotation Propert Value Type b Device PER JP b Dimension Concept Airbag JrerminoConcept v O Function Create new D Adjustingt RER b O Anchorage Anchorage Buckle b ChildResti SafetyBe Seat T 3 C kleng D C IE D Class Restrictions Taxonomy Annotations Source View Figure 8 2 Neon toolkit conceptual level OWL perspective Chapter 9 Neon toolkit Conceptual level OWL perspective The conceptual perpective is a Neon toolkit plugin version 2 4 to which a spe cific menu has been added for the TERMINAE platform to link the conceptual and termino conceptual levels When using Neon toolkit conc
34. ino conceptual levels of Neon and TERMINAE projects and of the resulting termino conceptual resources e To terminoConceptual level is used to switch from the Neon toolkit Conceptual level OWL perspective to the Terminae TerminoConceputal level perspective Clicking on this action item re opens the termino conceptual perspective and selects the termino concept associated with the class initially selected in the conceptual perspective e Create a termino concept is used to create a termino concept and link it to the selected This functionality is useful when you want to add the saurus information to an existing ontology You start from an existing class and create a termino concept in the thesaurus of the TERMINAE project e To link a class to a TCisusedto link aclass to an existing termino concept in the thesaurus of the TERMINAE project Chapter 10 Annotator perspective This chapter and the tool have been written by F L vy A Guiss S Szulman The LIPN Annotator marks the occurrences of given terms in a text with con cepts and individuals of an ontology It outputs a project which can be directly opened by SemEx the LIPN semantic explorer to explore the annotations mark and transform rules etc The user can alternatively choose to produce plain result files and to work them with her his own programs The output format is textual html and txt and self explaining The output format is language independent a
35. is menu is used to link TERMINAE and Neon ToolKit It supports the creation of the conceptual level and many actions to connect it to the termino conceptual one e Create a Neon project is used to create a Neon toolkit project If you want to work at the conceptual level you have to create a Neon project and to specify its name It is recommended to use different names for the TERMINAE and Neon projects e Create Neon Toolkit ontology is used to create an ontology This ontology is part of the newly created Neon project e Create a class is used to create a class in the previous ontology and from the selected termino concept A dialog window opens in which you have to give a name to the class and select a class father in the existing ontol ogy The class can be visualized in the Neon toolkit Conceptual level OWL perspective see Figure 8 2 Note that the class is cre ated with an annotation property in which the link to the source termino concept and its identifier is saved Once it has been linked to a class at CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE34 the conceptual level the termino concept is displayed in blue color in the TerminoConcept tree e To ontology level is used to switch from the termino conceptual perspective to the OWL one This action opens the OWL perspective and shows the class corresponding to the selected termino concept e Link to Neon project is used when one wants to exploit an existing Neo
36. lable in the current version e thesauri Contains the termino conceptual resources that are created us ing TERMINAE and output by it e system Contains some files automatically created by TERMINAE e repExtractTerm Contains the results of term extraction tools The cur rent version of the platform is designed to work with YaTeA term extractor or with TermoStat term extractor which can be used through a web service or with a sample file involving terms one term by line 3 2 2 How to import a project A project to be imported is represented as a zipped file containing the project directory with all the required subdirectories and files of a given project You do not have to unzip the file e Goto the main menu e Click on Terminae project actions e Click on Import project http www ims uni stuttgart fr projekte corplex TreeTagger http gate ac uk ie annie html http search cpan org 7Ethhamon Lingua YaTeA 0 5 http olst ling umontreal ca drouinp termostat_web CHAPTER 3 TECHNICAL CHARACTERISTICS 6 e A first dialog window appears in which you must indicate the zipped file to load e A second dialogue window appears to propose the directory into which the project will be imported If you do not accept you ll be offered to choose another one When the project is imported its main characteristics are presented in the Terminae project information view on the left by default of the project perspec ti
37. lations the termino conceptual level organizes the terminology according to semantic relations the third level the ontological level enables to create a formal ontology out of the list of termino concepts created at the second level This document describes the functionalities of the TERMINAE platform The first chapter describes the technical characteristics and the installation instruc tions The following chapters present the main menus of the platform that are accessible from its main window Contents 1 Introduction The Terminae method r3 3 1 Installation 32 How to start Project location and structure 3 22 Howto import a project 3 23 How to create a project o 0 20 e 3 3 Hidden files 4 Main menu 5 6 3 2 1 Project management perspective 5 1 Terminae project actionsmenu 52 Helpmenu 53 Show View menu Terminae Terminological level step 1 perspective 6 1 Term extractor uses TermoStat web serviced 6 1 2 YaleA tool DONUM eek EN RR RENE TermoStat Term files l l 6 1 1 6 2 1 6 2 2 Yatea Term files A A A A M 6 2 4 Term list files nop e Dun duo P ee a 6 4 1 File submenu 10 10 11 12 6 4 2 Term Management aubmenu s a saoao aa 6 4 3 Cleaning submenu rn 6 4 4 Terminological form actions 7 Terminae Terminological level step
38. lish Tagset CC Cooordinating conjunction CD Cardinal number DT Determiner EX Existential there EW IN Preposition or subordinating conjunction JJ Adjective JJR Adjective JJS Adjective superlative LS list item marker MD Modal NN Noun singular or mass NNS Noun plural NP Proper noun NPS Proper noun PDT Predeterminer POS Possessive ending PP Personal pronoun Foreign word comparative singular plural CHAPTER 11 ANNEX 46 PPS Possessive pronoun RB Adverb RBR Adverb comparative RBS Adverb superlative RP Particle SYM Symbol TO to UH Interjection VB Verb base form VBD Verb past tense VBG Verb gerund or present participle VBN Verb past participle VBP Verb non 3rd person singular present VBZ Verb 3rd person singular present WDT Wh determiner WP Wh pronoun WPS Possesive wh pronoun WRB Wh adverb 11 6 TreeTagger French Tagset ABR abreviation ADJ adjective ADV adverb DET ART article DET POS possessive pronoun ma ta INT interjection KON conjunction NAM proper name NOM noun NUM numeral PRO pronoun PRO DEM demonstrative pronoun PRO IND indefinite pronoun PRO PER personal pronoun PRO POS possessive pronoun mien tien PRO REL relative pronoun PRP preposition PRP det preposition plus article au du aux des PUN punctuation PUN cit punctuation citation SENT sentence tag CHAPTER 11 ANNEX 47 SYM sy
39. lly e The Occurrences view lists all the occurences of the terminological unit that have been identified They can be occurrences of the canonical form or of any of its alternative variant form e TheRelated termino concepts view shows to which termino concepts the terminological unit is related As indicated in the second column of the Terminological form list view a terminological form can be In progress or Completed Each terminological form is saved in an XML file in the terminoFormDir directory The list of terminological forms is saved in the filet ableTermeFiches xml in terminoFormDir directory 7 3 Terminological actions menu The action menu associated with the Terminae Terminological level step 2 perspective is the Terminological action menu It proposes 3 submenus which are presented in the following subsections e Termino concept management submenu e Form management submenu e Feature management submenu The corresponding actions are also contextually accessible from the right click of the mouse 7 3 1 Termino concept management submenu This submenu proposes three different actions CHAPTER 7 TERMINAE TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE27 e Create a termino concept to create a termino concept linked to the selected terminological unit The termino concept is added to the cur rent thesaurus If the terminological unit is a named entity the type of the named entity may als
40. m to search for a specific unit on the basis of its beginning characters Note that this functionality is also directly accessible when the list of terms is selected by typing the first letter of the searched term e Cluster terms to cluster several lexical units You first have to select the various units you want to cluster then click on the Cluster terms action and choose the canonical form you want to keep The alternative forms are removed from the term list and all their occurrences are attached to the canonical form which frequency count is updated e Add a termto add a new term to the term list e Remove a termto remove the selected term from the list CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE21 e Undo remove to undo the last remove action This may also undo a clean ing action see Section 6 4 3 e View occurrence context to visualise the surrounding sentences of an occurrence You have to select the occurrence identifier see Figure 6 4 and to set the size of the expected context expressed as a number of sen tences Lo Terminae project TestDemo Linguistic actions Perspectives Show View help E E Terminae Terminological level step 1 B Terminae Project perpective Lexical units L3 Occurrences 4 Named entity type Term Freque Named entity comments l ripio HR 2 CONDUC TIO I i UTFIKITOVVTT i CONDUCTING APPROVAL TE 1 Noun phrases CONFORMITY i1 Unknown H Oc
41. mbol VER cond verb conditional VER futu verb futur VER impe verb imperative VER impf verb imperfect VER infi verb infinitive VER pper verb past participle VER ppre verb present participle VER pres verb present VER simp verb simple past VER subi verb subjunctive imperfect VER subp verb subjunctive present lt Element rdf Description skos preflabel skos altlabelx rdf type gt lt ATTLIST rdf Description rdf about CDATA gt lt Element prefLabel PCDATA gt lt Element altLabel PCDATA gt lt Element rdf type EMPTY gt lt ATTLIST rdf type rdf resource CDATA gt 11 7 Use ANNIE to extract named entities This annex describes the procedure to be followed to use ANNIE to extact named entities from a given document only one document can be processed at a time Note that the following procedure is extracted from the Gate documentation for processing English corpora http gate ac uk sale tao splitch3 html GATE enables you to extract named entities from plain texts and annotate your corpus with it GATE is distributed with an IE system called ANNIE ANNIE relies on finite state algorithms and the JAPH language Take one large pile of text documents emails etc Call this your corpus If you right click on Language Resources in the resources pane select New then GATE Document the window Parameters for the new GATE Document will appear Once you indic
42. minological level allows to browse and modify the list of domain specific lexical units that have been extracted from the source corpus using term extraction and named entity recognition tools such as YaTeAl or the web service for TermoStatPland ANNIE You may also use a list of terms see 6 2 4 if you have another term extractor 6 1 Term extractor uses TERMINAE assumes that the acquisition corpus has been processed by the term extractor beforehand and possibly ANNIE beforehand 6 1 1 TermoStat web service Termostat Web is usable after login The software is still usable for free for re search purposes you only need to create an account You have to upload an utf 8 txt file You download a part of the results by clicking on a disk icon The result is given in a txt file named termostat res DI Put this file in the repExtractTerm directory of your project The acquisition corpus has to be also processed by TreeTagger Use the script for UTF 8 Put the treetagger file and the corpus file in the corpora directory of your project Ihttp search cpan org 7Ethhamon Lingua YaTeA 0 5 http olst ling umontreal ca drouinp termostat_web 3http gate ac uk ie annie html 13 CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE14 A Choice an used term extractor Select a term extractor eegent Term list TermoStat Yatea OK Cancel Figure 6 1 Term extractor used 6 1 2 YaTeA tool TERMIN
43. n toolkit project e Link to Neon ontology is used when one wants to exploit an exist ing ontology in a specified project e Link to a class is used to link a termino concept to an existing class e Create an ObjectPropert y is used to create an objectProerty from a termino conceptual relation A dialog window opens and you have to enter the name of the property the father object property its domain and range The objectProperty is created with an annotation property in which the name and type of the source termino conceptual relation are saved e Link a RT and an ObjectProperty is used to link a termino conceptual relation to an existing objectProperty e Link a RT and a classisused to link a termino conceptual relation to a an existing class e Create classes and TCs is used to derive a set of classes from a set of selected termino concepts If these termino concepts have termino conceptual relations objectProperties are created and linked to these source relations e Create classes and TCs without dialog offers the same func tionality as above but there without dialog The default values are system atically kept name of class name of terminoconcept name of objectproperty name of the RTC if termino concepts are linked by a isKindOf link the correspond ing classes are in the same hierarchical order e Link to an individual is used to link a termino concept to an in dividual You have to enter the indiv
44. nch fr When the terminological data is loaded TERMINAE creates one additional file in the corpora directory e fTempCorpus2XML xml which is an xml version of the corpus If you have several documents each one must be processed by TreeTagger and the results must be concatenated in a single file where the various intial documents are separated by a document tag as shown below Text n TAB Document TAB n where TAB is the tabulation character and n varies between 0 and x 1 x being the total number of documents 6 3 Perspective overview If everything works properly when loading the terminological data the window of Figure 6 2 appears when the Terminae Terminological level step 1 perspective is first opened The window is composed of two views the Lexical units view on the left and the Occurrences view on the right The terminological units either terms or named entities are listed on the left view By clicking on the heads of the columns you can sort the list alphabetically Term by frequency Frequency or by type terms vs named entities and named entity type Named entity The last column of the Lexical units view allows to write comments if you click on a cell comment a text field appears and you can add a comment to the corresponding terminological unit The comments are saved with the termi nological results and can be reloaded upon request when the YaTeA results are loaded The occurrences of the selected termi
45. nological unit in the working corpus ap pear on the right view 6 4 Linguistic actions menu The action menu associated with the Terminae Terminological level step 1 perspective is the Linguistic action menu It proposes 3 sub menus and 2 actions which are also contextually accessible from the right click of the mouse CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE18 y Terminae project TestDemo Linguistic actions Perspectives Show View help L 6 x Ej E Terminae Terminological level step 1 E Terminae Project perpective S Lexical units E Occurrences Term Freque Named entity comments E E A base i ID 0cc1483 doc 0 sent 555 base of the fixture A f i However if the belt Dasis i3 adjustment device for basis of these prescription 1 pane constituted by th P P belt anchorage as belt i m3 approved in accordance with belt access gap i IE the provisions of Regulation j No 14 the Technical Service responsible for belt adjustment device testing may at its belt anchorage te belt arrangement il i discretion apply the i f provisions of paragraph belt assembly i27 1731 belt assembly component belt corrosion test belt of a type belt of the type Occurrence 2 i ID occ8006 doc 0 sent 742 below except in the case i lof retractors having a pulley lor strap guide at the u
46. o give bearth to a termino concept and a kindOf link 1s created between the two termino concepts e Remove a termino concept to remove a termino concept from the current thesaurus e To TerminoConceptual level to switch from the Terminae Terminolo gical level step 2 perspective tothe Terminae TerminoConceptual level perspective 7 3 2 Form management submenu This submenu proposes two actions related to terminological forms e Remove a terminological form e Validate a terminological form this action is used to note that the work on this terminological form is completed It acts as a com ment aimed at the user 7 3 3 Feature management submenu This submenu proposes various actions related to the detailed information pro vided for a given terminological unit and recorded in its terminological form e Add a variant to add a lexical variant of the selected term e Remove a variant toremove a lexical variant of the selected term e Add a lexical entry to add a lexical entry for the selected term You have to type in the entry name and its value separated by two points e Remove a lexical entry toremove a lexical entry e Add a syntactical relation headto add a phrase where the se lected term is the head e Add a syntactical relation modifier to add a phrase with the selected term as a modifier e Remove a syntactical relation head to remove the selected relation CHAPTER 7 T
47. opened when a project is loaded It is presented in Section 5 Terminae Terminological level Section 6 Terminae Terminological level Section 7 step 1 perspective see step 2 perspective see This main menu slightly differe from on exploitation system to another CHAPTER 4 MAIN MENU 9 Terminae TerminoConceptual level perspective see Sec tion B Neon toolkit Conceptual level OWL perspective see Section D The 2 3 4 5 perspectives make up Terminae The OWL perpective belongs to Neon ToolKit 2 4 Please note that the last Eclipse perspective Team Synchronizing is used by Neon ToolKit The annotator perspective marks the occurrences of given terms in a text with concepts and individuals of an ontology e The item Search is proposed Neon toolkit Conceptual level OWL per spective it is not described in this report e The item Help is proposed in all eclipse application it is not described in this report e An additional Terminae submenu is proposed on MacOS systems It gives access to the standard application main operations information About Terminae Preferences Hide Terminae Quit Terminae Chapter 5 Project management perspective TERMINAE starts with the project management perspective This perspective has 2 views Fig 5 1 Figure 5 1 Project management perspective e The left view presents the project information if a project
48. pper belt strap belt anchorage when belt twisting i the load will be 980 daN and belt remaining wound on the reel elt s shall be the length resulting bench i from locking as close as possible to 450 mm from bench seat the end of the strap 1 1 2 1 1 ilo belt type i 6 i i ithe length of strap E i i 7 3 bench type seat I 3 i i i Il block i i IW Occurrence 3 number of lines 3655 ID occ9120 doc O sent 134 the tvne and dimensions of Figure 6 2 Visualisation of Yatea results File submenu e Term management submenu Cleaning submenu e New terminological form action e To terminologcial form action Those submenus and actions are presented in the following subsections 6 4 1 File submenu This menu allows to load and save terminological data It proposes the following actions CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE19 e Load term extractor results to load the terms initially extracted from your corpus by the term extractor or saved in a XML backup The procedure is the same as that described in Section 6 2 e Save term extractor results to make an XML backup see An nex for details on the file format e Load named entities from ANNIE results to load the named entities identified by the ANNIE named entity recognition tool see Sec tion 6 2 3 A first file dialog window opens in which you have to indicate w
49. s are the algorithms so the ap plication can in principle be used for any language where its input makes sense namely where lemmatizing and POS tagging are possible and not too ambiguous The Annotator is included as a plugin in SemEx and in Terminae and can be used from them if preferred Only the installation differs In Terminae the Annotator may be used through the Annotaor perspective Linux specific Eclipse s browser calls native browsing libraries to do its work Under Linux you may have to install specific ones the present version of the annotator relies on Eclipse 3 7 which browser needs a proper installation of one of Mozilla 1 4 GTK2 1 7 x GTK2 XULRunner 1 8 x 1 9 x and 3 6 x but not 2 x WebKitGTK 1 2 x and newer If your installed browser is either too old or too recent you can install also XULRunner the autonomous heart of Mozilla Firefox and Thunderbird to enable Eclipse browser In this case you have to specify where XULRunner is modify the annotator ini file in the executable s directory to initialize org eclipse swt browser XULRunnerPath e g The Annotator and SemEx can be found from or or from 38 CHAPTER 10 ANNOTATOR PERSPECTIVE 39 Dorg eclipse swt browser XUL RunnerPathz home szulman outils xulrunner sdk bin Of course you must replace home szulman outils xulrunner sdk bin with your own location 10 1 Input files To annotate a document you need 4 inputs e The do
50. to supplement safety Bi CalibrationTest belts and restraint systems in power driven vehicles i e system Bl ChildRestraintSysterr which in the event of a severe impact affecting the vehicle IdConditioni automatically deploys a flexible structure intended to limit by Di ColdConditioning compression of the gas contained within it the gravity of the ll Conditioning contacts of one or more parts of the body of an occupant of the MO ii vehicle with the interior of the passenger compartment i L P number of lines 102 Occurrence 2 Iri 7 D EE alee A emit cn Figure 8 1 Terminae TerminoConceptual level perspective A termino conceptual form is usually composed of the following views e The TerminoConcept features view presents the properties of the selected termino concept its Synonyms its Links that have been derived from the terminological levels This mainly holds for termino concepts related to named entities for which type information can be collected Typical links are brother father links e The NL definition view allows to enter a natural language definition for the selected termino concept e The Occurrences view presents the occurrences in the corpus of the lexical units to which the termino concept is linked CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE31 e The TC relations view presents the termino conceptual relations in which the termino concept is domain or range Note that th
51. tribute of the first line If you have several documents each one must be processed by TreeTagger and the results must be concatenated in a single file where the various intial documents are separated by a document tag as shown below Text n TAB Document TAB n where TAB is the tabulation character and n varies between 0 and x 1 x being the total number of documents 6 2 3 Named entity files You may also want to work with named entities In that case you need two files that are output by the ANNIE named entity recognition tool see Annex 11 7 for details on the file format and which are expected to be located in the corpora subdirectory of your project e The first xml file indicates which named entity types you are interested in e The second xml file contains the list of named entities extracted by ANNIE To create such files follow the procedure described in Annex 6 2 4 Term list files e Load a term list Load term file which is supposed to be located in the repExtractTerm subdirectory of your project The format is a term by line e Select the tagged corpus from which the terms have been extracted tt file tt r It is supposed to be located in the corpora subdirectory of your project CHAPTER 6 TERMINAE TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE17 e Select the corpus file t xt It is supposed to be located in the corpora subdirectory of your project e Speficy the corpus language English en or Fre
52. ve and you can start working on it 3 2 3 How to create a project To start working on a new project e Go to the main menu e Click on Terminae project actions e Click on Create Terminae project e A first dialog window appears in which you must indicate the name of the project e A second dialogue window appears in which you must indicate in which directory you want to locate the project A directory with the same name as the project is automatically created with 6 subdirectories To start working on your project to build termino ontological resource from a given corpus you need to have at least the following files in your project directory more details in 6 2 e In the corpora subdirectory A tagged version of the row corpus txt tt file as output by TreeTag ger e In the repExtractTerm subdirectory the list of terms that have been extracted from the tagged version of the corpus by YaTeA xml file or the list of terms extracted by TermoStat downloaded from the web service named termostat_res txt You must also give the name of the corpus if you exploit one and the name s of the authors s of the future resource s When the project is created its main characteristics are presented in the Terminae project information view on the left by default of the project perspec tive and you can start working on it CHAPTER 3 TECHNICAL CHARACTERISTICS 7 3 3 Hidden files The software creates 2 hidden files to man
53. xport project A zipped file is created in which all the required directories and files are included If you have created a Neon project its directory is also included in the zipped file Import an existing project Import project The project to be im ported is represented as a zipped file containing the project directory with all the required subdirectories and files You do not have to unzip the file but you have to specify The zipped file to load The name of the directory where you want the project to be imported Modify author Modify author allows to modfy the project s author Help menu The Help information is not available yet CHAPTER 5 PROJECT MANAGEMENT PERSPECTIVE 12 5 3 Show View menu Each perspective has many views and a main view which is on the left side of the perspective A click on an item in the main view change values in other views These views may be closed by the user or he she may want to see a view of another perspective which is not in the used perspective only one perspective may be selected This menu is used to reopen a view that has previously been closed Click on the single item Other to visualise the list of available views and choose again Other to find TERMINAE views Select the view you want to reopen or to see and be aware that the view may be dependant of one or the other perspective Chapter 6 Terminae Terminological level step 1 perspective The Terminae Ter
Download Pdf Manuals
Related Search
Related Contents
Black Box RM095A-R2 rack accessory Washlet B100 Lisboa User`s Manual 赤外線信号中継機 VⅣ 取扱説明書 PMV2 manuel d`installation - Keyware Terminals & Transactions 超高輝度LEDフラッシュ表示灯 LFH/LFH-S型 Braun Series 7 760cc-4 Copyright © All rights reserved.
Failed to retrieve file