Home
User Manual - Wilker Aziz
Contents
1. lt task gt Listing 3 5 PET s output file 3 5 External Information XML files are used to bring external information to PET External information can be displayed using the bottom boxes shown in Figure 2 2 referred to as passive info or using the drop down 27 menus shown in Figure referred to as active info Either way the principle is fairly simple a database of external information is a collections of key value pairs For passive info keys are n grams in the active unit and values are HTML content that will be offered as extra information at the bottom pane PET will list in the bottom pane all the information matching n grams following the scpecs in the configuration file of the active unit text at the beginning of the editing For active info keys are n grams selected by the user and queried via the drop down menu for while PET only renders dictionaries in this way PET does not handle fuzzy matches yet so the keys are strings to be matched exactly furthermore the matching of keys is case insensitive Listing 3 6 shows an example typical of a monolingual dictionary Values are actively queried by the users when using the drop down menu 1 lt db alias cambridge gt 2 lt entry gt 3 lt phrase gt moving lt phrase gt 4 lt paraphrase score 0 36873065 gt changing place lt paraphrase gt 5 lt entry gt 6 lt entry gt 7 lt phrase gt moving back lt phrase gt 8 lt paraphrase score 0 36873065 gt goin
2. phrase and a list of values each one identified by the tag paraphrase A score may be also given For example if multiple paraphrases are available a score reflecting their length could be of use in experiments where the length of the translation is important Frequency or confidence scores can also be useful 28 Chapter 4 Examples 4 1 LREC EAMT demo version If you download the LREC EAMT 2012 workspace you will find a few examples of pec and pej files You may place the workspace lrec and the meta file pec meta in the tool s root directory or you can place the workspace lrec wherever you prefer and set the variable dir in your tool s pec meta to the location you chose If you use the provided pec meta you will notice that it sets dir lrec default en pt typical pec The directory lrec contains 5 examples of context files 1 en pt typical pec comes with a typical setup in which future units are hidden 2 en es wmt pec mixes HT and PE tasks and it s focused on gathering post editing time no explicit assessment is requested 3 fapesp pec renders the bitext as HTML and presents the whole text in a single unit 4 fapesp mono pec renders only the target text as HTML in a single unit 5 subs pec sets a task with general and external info as well as length constraints The directory lrec also contains 4 users sub directories which contain examples of input files pej 1 fapesp check this one for an e
3. 23 A job is made of unit that can be either for human translation HTs or for post editing PEs see Listing 8 2 A unit is identified by a unique number and it must be either a translation from scratch ht or a post editing pe In PE units at least one translation MT is expected Multiple sources S references R and machine translations MT may be given They should be assigned producers to identify the origin of the text They can be assigned scores for example confidence quality estimation scores Mesias IN ISI eels Se 2 lt task id 1 type pe gt 3 lt S producer manual transcription gt SUR o Se lt S gt lt S producer speech to text gt LISSE ees AS lt R producer gold gt 10 lt a aaa 11 Bey ke 12 lt MT producer systemi score 1 gt 13 SS SO 14 lt MT gt 15 lt MT producer system2 score 5 gt 16 SS o Ged 17 UMTS 18 lt task gt 19 UA aa e 20 lt task id 2 type ht gt 21 lt S producer manual transcription gt 22 SU a e 23 lt S gt 24 lt task gt o colou Listing 3 2 PET s input file units Jobs units and sentences can have additional attributes that may or may not be used depending on advanced features of the tool Listing 3 3 1 lt xml version 1 0 encoding utf 8 standalone yes gt 2 lt job source home waziz experiments opus baseline data greys 08x06 vi vi en br en 3 first 1 4 target home waziz experiments op
4. Figure 1 5 Environment variables d Add a new variable C Program Files Foxit Software Foxit R USERPROFILE AppData Local Temp USERPROFILE AppData Local Temp Windows_NT C Program Files MiKTeX 2 8 miktex pin COM EXE BAT CMD VBS VBE JS x86 Figure 1 6 New user variable e Set variable s name and value to PATH and lt path to java gt respectively ao Figure 1 7 Java s path Conclude Environment Variables ES User variables for waziz Variable Value ia MOZ PLUGIN P C Program Files Foxit Software Foxit R E PATH C Program Files Java jre6 bin TEMP USERPROFILE AppDatalLocal Temp TMP USERPROFILE AppData LocaliTemp d New Edit Delete System variables Variable Value ComSpec C windows system32 cmd exe FP_NO_HOST_C NO NUMBER OF P 4 os Windows NT T New Edit Delete OK Cancel Figure 1 8 4 How do I find the path to Java binaries a Open a terminal see Figure la b Type in where java System Properties Computer Name Hardware Advanced System Protection Remote You must be logged on as an Administrator to make most of these changes Performance Visual effects processor scheduling memory usage and virtual memory Settings User Profiles Desktop settings related to your logon Settings Startup and Recovery System startup system failure and debugging informat
5. I have installed the JVM yet I don t get it working what can I do a Open your computer s properties Open Manage Map network drive Disconnect network drive Create shortcut Delete Recycle Rename Properties Figure 1 3 Properties b Open the Advanced Settings HZ Control Panel System and Security System y Search Control Panel 2 cotos View basic information about your computer 6 Device Manager Windows edition 6 Remote settings Windows 7 Starter is Copyright 2009 Microsoft Corporation All rights reserved 8 Advanced system settings Get more features with a new edition of Windows 7 System Manufacturer Samsung Electronics Rating EN windows Experience Index Processor Intel R Atom TM CPU N550 1 50GHz 1 50 GHz Installed memory RAM 2 00 GB See also System type 32 bit Operating System Action Center Pen and Touch No Pen or Touch Input is available for this Windows Update Display Performance Information and Samsung Electronics support ie Website Online support E Figure 1 4 Advanced settings c Go to Environment variables You must be logged on as an Administrator to make most of these changes Perfomance Visual effects processor scheduling memory usage and virtual memory User Profiles Desktop settings related to your logon Startup and Recovery System startup system failure and debugging information
6. PEJ expects plain text files containing one segment per line You may input several files for each type of segment i e source reference MTs Each segment must be assigned a producer therefore the interface will ask either for i a file level producer or ii a meta file containing attribute value pairs one list per line from where the producer of each segment will be parsed Figure 3 2 exemplifies input files for PEJ These settings will generate a pej file containing two units each one having one source and two machine translations The first column is an optional file that may be used to set attributes to each unit If present this file must specify the attribute type of every unit The second column is an example of source file opus is the producer of the file PEJ will query you for that information The third column is an example of machine translated file google would be the producer The fourth column is another example of machine translated file and the fifth column is an example of a meta file containing attributes for the segments in MT2 txt If present this file must set the producer of every segment Figure 3 3 shows an examples of use of PEJ Units attr txt S1 txt opus MT1 txt MT2 txt MT2 attr txt google type pe Ela bateu na She slapped me She beat me producer modell show bigbang minha cara max 25 type pe Eu nao tenho I am not to Not my fault producer model2 show dexter culpa bl
7. answers categories to be dis played for instance OOV terms Word order Semantic ambiguity 2 In addition every assessment question can have a box for comments disableCommentOnAssessment hides a comments box in the assessment page 3 You can enter as many assessment options as you want they will be displayed one by page unless you set how many assessments are displayed by page assessmentsByPage number specifies the maximum number of assessments by assessment page 4 You can hide all assessment question if the job does not require them disableAssessment 3 2 6 Additional information 1 You can show the annotator additional information about the active unit Additional information is given via the unit s attributes multiple attributes are allowed given to the tool as part of the input file and will be shown as labels at the top of the annotation page general Info attribute color displays an attribute with a specific color the color is optional for instance duration blue and subtype green and lines generalInfoFont font size sets the font and size of the general info 2 You can choose to show external information that refers to n grams in the active unit using the two boxes at the bottom pane of the annotation page externalSourceInfo file displays external info for the source text externalTargetInfo file displays external info for the target text 21 You can specify as many files as necessary You can i
8. shows the active unit a You can apply the rule selected by hideIfNotEditing to all the units as op posed to applying it only to the active unit applyHideIfNotEditingToAll Setting hideIfNotEditing undone and applyHidelfNotEditingToA11 will make hidden all the units on the screen that have not yet been edited You can set what is shown in the unit box when the unit is hidden editableMessageUndone message sets a message to be shown for an undone unit for instance Click here to start editableMessageDone message sets a message to be shown for a previously done unit for instance Click here to redo You can hide panes hideBottomPane hides the pane containing external information hideTopPane hides the pane containing sources and or references and or alter native translations blindPE hides the source text useful for monolingual post editing You can swap the source and target panes displayTS displays the target pane on the left hand side and the source pane on the right hand side For the active unit you can show reference translations showReference displays reference translations at the top pane You can display the identifiers of the units on the left hand side of the screen showSentenceld displays the id attribute of the units You can block the editing in read only tasks or tasks aimed at gathering assessments only blockEditing makes the units non editable You can display HTML f
9. to the values of those attributes and will change dynamically as the translators change the text adding or removing characteres For example translations beyond the maximum length will be shown in red This behaviour is implemented via PET s API We will soon provide more documentation for that For now you can try it using the default implementation 3 2 9 Output files 1 PET can save the progress of a job automatically autoSave saves the job unit by unit and also generates some additional files that are used to restore data in case of unexpected failure 2 You can stamp the output file with the time it was created outputTimeStamp timestamps the name of the output file 3 There is a check box on PE T s main page that allows the user decide whether or not output files are timestamped You may want to disable this option hideOutputTimeStampCheckBox 3 3 PEJ PET s main input file is an XML file The XML scheme is not yet available but the examples provided with this distribution should make creating new files fairly intuitive A file is called a job and it is identified by an id field see Listing 3 1 which will also be the stem of its output file 1 lt job id identifier gt 2 SSS Sag SS 3 lt job gt Listing 3 1 PET s input file a job See our EAMT 2012 paper for more details Cross lingual Sentence Compression for Subtitles hnltshare fbk eu EAMT2012 html Papers 32 pdf
10. 1 shows Configuration allows the selection of a configuration file also referred to as context where all the job and interface options are set Workspace displays the workspace defined by the context that is where the files with the jobs to be performed are stored and where the edited jobs will be saved 10 Jobs list of available jobs where jobs can be a collection of segments to post edit revise or translate or any combination of these Results list of partial results that is the output of jobs that have been started completed or ongoing Log lists the progress of actions such as loading a new context loading dictionaries etc Output timestamp if set adds timestamps the results files if not set a single output file with the same name as the input file will be created with a different extension Default re loads the default context if set in a meta file see 8 1 User switches to a different user Start starts a job from zero Edit continues a partial result Once one loads the tool using run bat the main page is displayed and a default context is loaded if specified in a meta file called pec meta Loading a context may take some time and one can visualize the loading progress in the Log A context see 3 2 is a set of parameters that sets PET to a certain configuration A valid context is a file ending in pec that necessarily defines a workspace and may define a default user A workspace is a direc
11. USER MANUAL PET Wilker Aziz University of Wolverhampton Lucia SPECIA University of Sheffield July 27 2012 Contents 1 3 Running LI Howtoinst lll oo cosita Aa o ada E ADD al e E He ee A A Etc rd dio ado So od he Ste Bate we ee Pe a EES eS a psu A RE SA A ee eee eo ee eee eee ee ee ee oe Oe A er ee ae ee ee ee a eee 221 Toolbox ame sos bo ee eer oe ee Bie oe Fei Ee Se ts 2 2 2 Activeunit 2 aa A e E es a od 224 Drag and dO esa aparte Parse II a Ht OR eos eo eee e Bh add BA Files O SAN A AREA 3 2 1 Defining a workspace 6 4 a ai ee ee wee a a a 3 2 2 Controlling how units get displayed o NA A AI ee NE a oR ue oe eee e peer ec ee fos gy gue tenets Goines ante oe agate Gave geet ue Cae aces 3 2 6 Additional information 0 0 0 00 00 0004 eee eee 3 2 __Dictionaries 2 a a a 3 2 8 Constraints ccccccc A as GS Mere ee ge ee ee ee ee SS O a A E oye ae ae cheep ss tects Se Bowne Se er ese oP Sy che idan E Gs a a ee ee ee oe Bane foe ase tee Ge ee ee ee Ge es oe ce doe ee ee jee ree te ee ee A ee ee ee ee ed 4 Examples 5 4 1 LREC EAMT demo version API 10 10 11 13 14 15 15 16 18 18 18 18 19 20 20 21 21 22 23 23 23 26 27 29 29 30 I know the code is still a bit messy and poorly documented I am seriously working on that at least once a week I am giving support to this code If you can think of a useful feature please let me know By
12. ame max 20 Figure 3 2 PEJ example of input files 25 PEJ Job s attributes date 02072012 user Ispecia attr Units attributes home waziz units units Files S home waziz s1 waziz S home waziz s2 mrios R home waziz r1 waziz R home waziz r2 mrios MT home waziz mt1 google S MT home waziz mt2 Vhome waziz mt2 meta R MT Output home waziz output pej PEJ Do X Figure 3 3 PEJ settings 3 4 PER PET s output file per is a pej file extended with more information see the example in Listing In a per file the job is tagged with its status and progress units are also annotated with their status and if a unit has been completed it will contain a collection of annotations 1 lt job id bigbang pe 2 progress 1 10 status GOING_ON gt 2 lt task id 1 status FINISHED type pe gt 3 lt S producer en opus gt My brother he s got a big crush on Bernadette lt S gt 4 lt R producer pt opus gt Meu irmao esta apaixonado por Bernadette lt R gt 5 lt MT producer baseline gt meu irmao tem uma quedinha por Bernadette lt MT gt 6 lt MT producer google gt Meu irmao ele tem uma grande paixao por Bernadette lt MT gt 7 lt MT producer bing gt Meu irmao ele tem uma grande paixao por Bernadette lt MT gt 8 lt annotations revisions 2 gt 9 lt annotation r 1 gt 10 lt PE producer de
13. ameters source acronym sets an acronym for the source language target acronym sets an acronym for the target language 1 You can add one or monolingual dictionaries repeat the commands below for each new dictionary s2s file adds a source source dictionary t2t file adds a target target dictionary 2 You can add one or more bilingual dictionaries s2t file adds a source target dictionary t2s file adds a target source dictionary 22 3 2 8 Constraints The tool is aimed to allow for constraints that are specific to a given job to be defined These still requires some work and documentation but one example that is already implemented and can be used will give an example of the type of constraints that could be added This refers to attempting to guide translators to restrict the length of the target unit according to a suggested threshold as given in the input file It was defined for experiments with post editing translating subtitles which need to follow strict length conventions to fit screens and people s reading speed i 1 You can activate length constraints lengthConstraints ideal preferable mas enables the length constraint e ideal the unit s attribute that specify an ideal length e preferable the unit s attribute that specify a preferable length e maz the unit s attribute that specify a maximum allowed length The target text in the active unit will be rendered differently according
14. ation extens 164 KB Ji CyberLink jkernel dil 01 03 2011 14 35 Application extens 212 KB db Divx G lil 01 03 2011 14 35 Application extens 76 KB Ji DVD Maker jp2iexp dll 01 03 2011 14 35 Application extens 110 KB Ji EditPlus 3 E jp2launcher exe 01 03 2011 14 35 Application 23 KB J Elantech 23 jp2native dil 01 03 2011 14 35 Application extens 8 KB 8 jp2ssv dil 01 03 2011 14 35 Application extens 41 KB FileZilla FTP Client B jpeg dil 01 03 2011 14 35 Application extens 148 KB ra Application Copy the path from the address bar java exe Date modified 01 03 2011 14 35 Size 141 KB Figure 1 10 Java installation Date created 01 03 2011 14 35 folder Type 1 Computer Application extens 10 KB amp Local Disk C Application extens 124 KB dev o Application 142 KB Intel Application extens 14 KB nu PerfLogs Right to left Reading order Application 58 kB Db Program Files Show Unicode control characters Application 33 KB db 7 Zip Insert Unicode control character Application 142 KB eae Open IME pia 154 KB a es pplication extens 5 KB Broadcom E jbrokeren poorer Application 78 KB Cisco 2 JdbcOdbe dill 01 03 2011 14 35 Application extens 36 KB Common Files 22 jdwp dil 01 03 2011 14 35 Application extens 164 KB ii CyberLink jkernel dil 01 03 2011 14 35 Application extens 212 KB J Divx 8 lil 01 03 2011 14 35 Appli
15. ays a fixed text normally the source text while the active unit is being post edited the exact text displayed can be chosen by the user by right clicking on the box The top box can also used to preview alternative source text reference text or machine translations 11 ready to edit revisions 0 total Os Figure 2 2 PE T s annotation page Tool box 4 red placed on the right hand side it displays action buttons explained in Section 2 2 1 Id box 5 purple placed on the left hand side it is an empty column that may display information such as an identifier or a sequential count for the units on screen Bottom box 6 blue displays two panes that render additional information relevant to the active unit the left pane displays information about the source text and the right pane displays information about the target text unless PET s standard orientation is changed via context file Units the center of the page contains a grid of n lines configured via context file and two columns i e for source and translation unless it is a monolingual revision job The line in the center of the screen highlighted with dark borders is the active unit as explained in Section 2 2 2 The top half of the units area shows already edited units while the bottom half shows units to come see Figure 2 2 As the user progresses some units are displayed in green some are displayed in red Green stands f
16. cation extens 76 KB e DVD Maker jp2iexp dll 01 03 2011 14 35 Application extens 110 KB e EditPlus 3 E jp2launcher exe 01 03 2011 14 35 Application 23 KB I Elantech jp2native dil 01 03 2011 14 35 Application extens 8 KB eSupport com 23 jp2ssv dll 01 03 2011 14 35 Application extens 41 KB FileZilla FTP Client B jpeg dil 01 03 2011 14 35 Application extens 148 KB e de javaexe Date modified 01 03 2011 14 35 Date created 01 03 2011 14 35 2 Application Size 141 KB Figure 1 11 Path to binaries 1 2 How to run Simply execute the file run bat On Windows double clicking the file will be enough On Linux or Mac if the executable permission is set double clicking the file then selecting run 8 on terminal will be enough If not set it under the tab Permissions right clicking the file or open a terminal and type sh run bat Chapter 2 Interface 2 1 Main The main page see Figure 2 1 is the page that you see when PET is loaded Observe that the title displays the default user of the tool demo in this example as set in the configuration file Lo a ts Configuration Workspace rest C Output timestamp Log done Loading file xml fapesp ibm1 pt en xml done Context en pt hiding and assessing pec loaded Context en pt hiding and assessing pec Workspace Irecistd Default user demo Figure 2 1 PE T s main page Figures 2
17. eu irmao tem uma crush por Bernadette lt indicator gt lt baseline is replaced by google gt lt indicator id target t0 7s 374 type sysselection gt google lt indicator gt lt indicator id substitution type wrap gt lt action elapsed 0 id deletion length 40 offset 0 t0 7s 375 type change gt meu irmao tem uma crush por Bernadette lt action gt lt action elapsed 0 id insertion length 53 offset 0 t0 7s 375 type change gt Meu irmao ele tem por Bernadette uma grande paixao lt action gt lt indicator gt lt indicator elapsed 2s 260 id deletion length 6 offset 10 t0O 15s 617 type change gt ele lt indicator gt lt indicator id shift type wrap gt lt action elapsed 0 id insertion length 15 offset 47 t0 29s 623 type change gt por Bernadette lt action gt lt action elapsed 1s 183 id deletion length 15 offset 14 t0 34s 157 type change gt por Bernadette lt action gt lt indicator gt lt indicator id substitution type wrap gt lt action elapsed 0 id deletion length 14 offset 18 t0 44s 398 type change gt grande paixo lt action gt lt action elapsed 1s 563 id insertion length 8 offset 18 t0 44s 399 type change gt quedinha lt action gt lt indicator gt lt indicator elapsed 0 id insertion length 1 offset 41 t0 48s 848 type change gt lt indicator gt lt annotation gt lt annotations gt
18. focused scrolling down the mouse wheel will move forward and scrolling up will move backward among the units When the tool box is focused some shortcuts are available Placing the pointer over a button for a second will show a tool tip text and information about shortcuts for that button Shortcuts are not case sensitive some of them are I enables the insert mode that is it changes the active unit to the editing state moves to next unit if in scrolling state 13 moves to previous unit if in scrolling state lt END gt moves forward until an undone unit is found lt HOME gt moves backward until an undone unit is found B binds source and target scrolling lt F10 gt saves lt ALT gt lt F4 gt closes the tool 2 2 2 Active unit The active unit is the central line in the units area It is highlighted by a dark border and is the only unit whose target side is editable By default the active unit is in the state ready to edit when its target box receives the focus the unit will go to the state editing the box turns yellow which means that effort indicators will start to be logged The active unit in the editing state will be generally referred to as the editing boz The editing box offers some shortcuts some of them are lt ALT gt finalizes the current unit and moves to next unit lt ALT gt N finalizes the current editing and moves forward starting the next unit lt ALT gt final
19. g back lt paraphrase gt 9 lt paraphrase score 0 36873065 gt returning lt paraphrase gt 10 lt entry gt 11 lt entry gt 12 lt phrase gt looks like lt phrase gt 13 lt paraphrase score 0 5 gt appears lt paraphrase gt 14 lt paraphrase score 0 2 gt seems lt paraphrase gt 15 lt entry gt 16 lt db gt Listing 3 6 PE T s database file active The entries in Listing are definitions and links to external resources e g wikipedia that are retrieved by PET when ones starts the editing of a unit 1 lt db alias cambridge gt 2 lt entry gt 3 lt phrase gt Cholera lt phrase gt 4 lt paraphrase gt infection in the small intestine caused by the bacterium Vibrio cholerae lt paraphrase gt 5 lt paraphrase gt lt a href http en wikipedia org wiki Cholera gt wikipedia lt a gt lt paraphrase gt 6 lt entry gt y lt entry gt 8 lt phrase gt Leprosy lt phrase gt 9 lt paraphrase gt Hansen s disease lt paraphrase gt 10 lt paraphrase gt lt a href http en wikipedia org wiki Leprosy gt wikipedia lt a gt lt paraphrase gt 11 lt entry gt 12 lt entry gt 13 lt phrase gt Hansen s disease lt phrase gt 14 lt paraphrase gt leprosy lt paraphrase gt 15 lt paraphrase gt lt a href http en wikipedia org wiki Leprosy gt wikipedia lt a gt lt paraphrase gt 16 lt entry gt 17 lt db gt Listing 3 7 PE T s database file passive In Listings and an entry has a key
20. ing the unit box In the editing box selecting an option from a dictionary will replace the selected text by the selected option Alternative text can be displayed for the active unit in the editing state if such alternative text is given in the pej file Alternative text includes i multiple sources ii multiple references and iii multiple translations Past revisions of the unit PEs can also be shown and used if they are available for the job The drop down menu displays these alternative texts Moving the mouse over the options will display a preview on the top box while selecting one of the options will cause the alternative text to replace the content of the current box One may replace a source by an alternative source the top box by any of the available alternative text and the target unit by an available translation MT or PE editing partial 15s revisions O total Os 0 1 esquisa FAPESP 10 2001 Edi o 69 pioneiro da aeron utica 120 anos J lio C sar Ribeiro de Souza descobria como 120 yea sar Ribeiro de Souza discovered how er baldes 3 A er insertreplace R do o paraense J lio C sar Ribeiro de Souza 1843 1887 en th delete llio C sar Ribeiro de Souza 1843 ornou se obcecado por p ssaros na segunda metade do 1887 bi shitt td by birds in the second half of the culo 19 as pessoas mais bem informadas de Bel m n o ineteen sr E most well i
21. ion Settings Environment Variables Cancel Conclude Select Administrator C windows system32 cmd exe Microsoft Windows Version 6 1 7600 Users waziz gt _ Copyright lt c gt 2009 Microsoft Corporation All rights reserved Figure 1 9 Finding Java If this does not work you may try the following c Open the Windows Explorer e g double click Computer d Find the directory where you have installed Java usually C Program Files Java e Go to jre6 bin E JO eE computer Local Disk C Program Fles gt Jove gt Jus gt bin esa Organize Open New folder He Name R Date modified Type Size a E Computer jaas_nt dll 01 03 2011 14 35 Application extens 10 KB amp Local Disk C 83 java dil 01 03 2011 14 35 Application extens 124 KB dev S java exe 01 03 2011 14 35 Application 142 KB J Intel java_crw_demo dll 01 03 2011 14 35 Application extens 14 KB PerfLogs S javacpl exe 01 03 2011 14 35 Application 58 KB Program Files E java rmi exe 01 03 2011 14 35 Application 33 KB J 7 Zip javaw exe 01 03 2011 14 35 Application 142 KB Adobe L javaws exe 01 03 2011 14 35 Application 154 KB AP Tuner 3 jawt dll 01 03 2011 14 35 Application extens 5 KB Broadcom E jbroker exe 01 03 2011 14 35 Application 78 KB Cisco 2 JdbcOdbe dill 01 03 2011 14 35 Application extens 36 KB Common Files a jdwp dil 01 03 2011 14 35 Applic
22. izes the current unit and moves to previous unit lt ALT gt P finalizes the current unit and moves backward starting the previous unit lt ALT gt B binds source and target scrolling lt CTRL gt Z undo editing lt CTRL gt Y redo editing lt CTRL gt C copy selection lt CTRL gt X cut selection lt CTRL gt V paste lt CTRL gt R replace insert selection lt CTRL gt 1 insert replace selection lt CTRL gt D delete selection lt CTRL gt S shift selection In case a unit is in the state editing an attempt to close the job will result in a prompt to finalize or discard the modifications done to that active unit 14 2 2 3 Drop down menu By right clicking the several text boxes one gets a drop down menu with a few options All the boxes except for the active unit are read only therefore they will simply display read only options such as copy and dictionary lookup The options available will depend on what is given to the tool as input for the job dictionaries alternative translations etc Available dictionaries are shown using the languages of the current job for instance source source or target target monolingual dictionaries source target or target source bilingual dic tionaries Several dictionaries can be loaded and lookup operations are grouped by dictionary see Figure 2 2 3 In order to use dictionaries it is necessary to right click a portion of the text as opposed to simply click
23. mo baseline gt Meu irmao tem uma quedinha pela Bernadette lt PE gt 11 lt SU gt rro abc oaa 2 12 lt annotation gt 13 lt annotation r 2 gt 14 lt PE producer demo baseline gt Meu irmao tem uma quedinha pela Bernadete lt PE gt 15 let Cua indicators 16 lt annotation gt 17 lt annotations gt 18 lt task gt 19 EMPATE 20 lt job gt Listing 3 4 PET s output file Each revision for each unit contains i the human translation HT or post edited text PE according to the type of task with the information on the user who performed the task e g demo in Listing 3 4 and the translation chosen to be edited e g baseline and depending on the options set in the context file ii implicit effort indicators iii explicit assessments and iv comments Listing 3 5 shows the output with explict and implicit effort indicators Note the times tamped primitive edits i e insertion and deletion sorted by time and wrapped into more meaningful operations e g substitution and shift when possible 1 lt job id bigbang pe 2 progress 1 10 status GOING_ON gt 2 lt task id 1 status FINISHED type pe gt 26 Aw oo nN DH 34 36 37 38 51 lt S producer en opus gt My brother he s got a big crush on Bernadette lt S gt lt R producer pt opus gt Meu irmao esta apaixonado por Bernadette lt R gt lt MT producer baseline gt meu irmao tem uma crush por Bernade
24. mplement readers and printers for these files to be detailed in the API PET comes with a default reader that reads XML format see Section 3 5 and a default printer that prints values associated to n grams from the source for externalSourceInfo and from the target for externalTargetInfo of the active unit Setting a customized reader and a customized printer as part of these parameters is reserved for future use 3 You can specify some parameters that control how external information is displayed externalSourceInfoMin0rder number sets the minimum order of the n grams to be matched in the source side externalSourceInfoMaxOrder number sets the maximum order of the n grams to be matched in the source side externalSourceInfoMinLength number sets the minimum length in charac ters of the source text to be matched externalSourceInfoMaxLength number sets the maximum length in charac ters of the source text to be matched externalSourceInfoNoLonger prevents displaying information that is longer than the matched source text The same options apply to the target language information externalTarget Inf oMinOrder number externalTargetInfoMaxOrder number externalTargetInfoMinLength number externalTargetInfoMaxLength number externalTargetInfoNoLonger 3 2 7 Dictionaries PET can display options from one or more monolingual and bilingual dictionaries In order to make the interface more intuitive you should set the language par
25. nformed citizens of tranharam id not gt balonismo estava na moda na Europa e havia uma corrida gt fapesp ibm1 gt jurope and there was a race gt descobrir como tornar os bal es dirigiveis o disco MTs ouza mergulhou na quest o a partir da observa o das aves ouza p question be ecollection the observation achasse o ponto de equilibrio que permite aos p ssaros f birds if one was to find the poi omerium that allows e birds to fly and plane in the air with litle effort one ould find a mechanical solution that could also be applied to alloons e Brazilian ended up producing an original study that ould be i important for hi history aviation ses is a ea anana oar e planar no ar com pouco esfor o encontraria uma olu o mec nica que poderia ser aplicada tamb m aos bal es brasileiro acabou por produzir um estudo original que seria importante para a hist ria da avia o ouza publicou suas conclus es em 1 de agosto de 1880 no Figure 2 4 PE T s annotation page dictionaries 2 2 4 Drag and drop Text can be dragged from any text box source target etc and dropped into the active unit s target box if editing Text can also be moved around within the editing box by using drag and drop 15 2 3 Assessment Page An optional assessment page can be displayed after every active unit is completed The exact type of assessment can be configured via contex
26. ng an MT Besides the use of these options are logged so that one can easily recognize the MTs that were accepted as they were presented If the auto accept button is hit after some edits were performed those edits will be completely disregarded 4 You can enable the user to discard a segment that is tag it as impossible to edit enableDiscard enables a button in the interface and whenever this button is used there will be an indicator impossible in your per file a You can skip the collection of explicit assessments on discard collecting them is the default skipAssessmentOnDiscard 5 You may want to log all the changes performed by the annotator trackChanges logs timestamped insertions and deletions also tries to infer sub stitutions and shifts 20 3 2 5 Gathering assessments PET also allows explicit more subjective assessments to be collected after every unit is com pleted 1 You can set assessment questions for both translation and post editing jobs in a context file postEditingAssessment id message maa list sets a post editing assessment ques tion or translationAssessment id message maz list sets a translation assessment ques tion e dis the assessment identifier for instance Accuracy e message is the message to the user for instance Highlight the most glaring prob lems e maz is the maximum number of answers or for no constraint e list is a vertical bar separated list of options
27. or done that is units that have been post edited translated revised while red stands for to be done The unit in white is the active unit always in the central area and it turns yellow to indicate that the editing has started Note that the label at the top will change from ready to edit to editing The labels at the top also displays the time spent on the current revision partial the number of complete revisions revisions and the total time spent on revisions that are complete total 12 editing partial 22s revisions 0 total Os Voc esta voltando para a india en en 2 baseline o EST Figure 2 3 PET s annotation page progress 2 2 1 Tool box The context of the tool box right hand side box in Figure 2 2 with the tool s action buttons is also configurable via context file It generally displays Copy copy from the top box to the active unit target side Revert revert the active unit to its last revision Bind binds source and target scrolling useful for long segments Previous undone searches backward for the first undone unit Previous backs one unit Next advances one unit Next undone searches forward for the first undone unit Save saves the current progress to disk Close closes the job asks for confirmation and offers a chance to save the progress By click anywhere on the tool box it will be focused Once
28. ormatted text renderHTML interprets text as HTML 19 3 2 3 Fitting small screens 1 You can change font and size of text standardFont font size sets the underlying font and size of the tool editingFont font size sets the font and size of the editing unit editableFont font size sets the font and size of any editable unit idFont font size sets the font and size of the units identifiers 2 You can change how much context is displayed to the annotator sentencesByPage number displays a number of units per page the default is 11 3 2 4 Collecting effort indicators PET automatically collects some effort indicators such as post editing time but some of them need to be set 1 You can collect keystrokes keystrokes enables 4 indicators number of keys pressed keystrokes how many of those where space like keys white keystyped how many of those where non white keys nonwhite keystyped and how many of those were control keys iso keystyped 2 You can flag PEs that match their respective MTs exactly i e translations that did not require any editing enableUnchanged 3 You can enable the user to automatically accept MTs enableAutoAccept enables a button in the interface and whenever this button is used there will be an indicator autoaccept in your per file a You can skip the collection of explicit assessments on auto accept skipAssessmentOnAutoAccept These two options give the user a quicker way of accepti
29. t file Figures 2 3 and 2 3 show examples the latter displays a query that accepts multiple answers meu irm o tem uma quedinha por Bernadette Post edited MT Meu irm o tem uma quedinha pela Bernadete Does the translation preserve the meaning of the source 3 Perfectly v Comment on adequacy Is the translation grammatical 2 Minor agreement problems Comment on fluency Figure 2 5 PE T s assessment page single selection 16 meu irm o tem uma quedinha por Bernadette Post edited MT Meu irm o tem uma quedinha pela Bernadete Figure 2 6 PE T s assessment page multiple selection 17 Chapter 3 Files 3 1 Meta When started PET will look for the pec meta file in its main directory If present this meta configuration file can be optionally used to set up to two global variables others may be created in future dir path sets the path to the directory from where the tool should be run PET will run considering it is placed in that directory it will look there for pec configuration files and paths such as the workspace and external files will be relative to it default pec sets a default context to be loaded when the tool starts If the file is not present or it is present but all or some of the variables are not defined PET will assume default values behaviour dir its default value is the tool s base directory defaul
30. t if not set the tool will not load any context by default it will expect the user to select a context to be loaded 3 2 PEC A context in PET is defined in a pec file A pec file has a simple declarative syntax to specify different attributes The main attributes are described in what follows 3 2 1 Defining a workspace The workspace is the directory where files are read from and stored and therefore it has to be set It is a path relative to the tool s folder unless dir is set in pec meta see Section 3 1 in which case the workspace will be relative to that dir path workspace path sets the workspace to path To create a user space create a folder username in your workspace and place pej files under this folder You may set a default user that is a user space to be loaded when the tool starts user username sets the default user space to username 18 3 2 2 Controlling how units get displayed 1 You can set how the active unit is rendered In some experiments you may want to prevent annotators from inspecting the units before they start working on them as this could affect the results specially the time based indicators You can set when the active unit before it starts being edited should be hidden hideIfNotEditing always undone never sets when active units are hidden e always always hides the active unit e undone hides the active unit unless it has been edited before e never always
31. the way please do report me bugs Wilker Aziz Chapter 1 Running 1 1 How to install No installation is required Once you unzip the PET file PET only requires a recent version of the Java Virtual Machine 1 6 0_26 or superior which is likely to be already installed if you keep your machine up to date On Microsoft Windows it is important to check whether the PATH to the Java binary files is properly set Again this may be already set If you try to run the tool and it does not work follow the guidelines in Section 1 1 1 to set the path 1 1 1 Windows 1 Do I have the right JVM a Open a terminal Programs 1 a cmd exe Documents 2 s manual pdf Z7 Ap ndice 2_ Lista de Abreviaturas pdf Files 6 _ readper pl groovy binary 1 8 3 zip 2 build impl xml 2 build impl xml fapesp ibm1 pt en xml 2 fapesp ibm1 en pt xml e 4 See more results cmd Figure 1 1 Terminal b Type in java version EX Administrator C windows system32 cmd exe co Es CiNUserswwaziz gt java version java version 1 7 6_61 JavaCIM gt SE Runtime Environment build 1 7 0_0i b 8 gt Java HotSpot lt TM gt Client UM lt build 21 1 b82 mixed mode gt iC Users waz iz gt Figure 1 2 Java version 2 What if my computer does not recognize the command java a Make sure you have downloaded and installed the JVM http java com en download index jsp 3 Iam sure
32. tory in the file system that contains all relevant input and output files Each user will have a sub directory in the workspace tree so that input and output files are grouped by user If a default user is defined PET will load the jobs pej files and results per files available to that user One can change the user to any of the available ones in the workspace As for now PET does not display the known users so its up to the user knowing its own username and inputing it correctly when queried If a valid username is given PET will update the list of jobs and results according to that user 2 2 Annotation Page The annotation page see Figure is the page that you see when a job is loaded The exact appearance of the annotation page may change due to several aspects of the task set in the context file Figure 2 2 illustrates some important features Labels 1 light blue at the top of the window you will find some useful informa tion such as the status of the current unit the number of revisions and the time spent on that unit Additional information may be displayed depending on what is set in the context Progress 2 green on the top right hand side the status of the current job is shown in units done out of the total units in the file in red the status of the current jobs is also shown in number of units saved to disk besides the timestamp of the last time the progress was saved to disk is shown Top box 3 orange displ
33. tte lt MT gt lt MT producer google gt Meu irmao ele tem por Bernadette uma grande paixao lt MT gt lt MT producer bing gt Meu irmao ele tem uma grande paixao por Bernadette lt MT gt lt annotations revisions 1 gt lt annotation r 1 gt lt output gt lt PE producer demo baseline gt Meu irmao tem uma quedinha por Bernadette lt PE gt Gl explicit indicators gt lt assessment id adequacy gt lt score gt 2 Preserves the core lt score gt lt assessment gt lt assessment id fluency gt lt score gt 2 Minor agreement problems lt score gt lt assessment gt lt assessment id problems gt lt score gt Phrase salad lt score gt lt assessment gt lt inplicit indicators gt lt indicator id editing type time gt 50s 2 lt indicator gt lt indicator id assessing type time gt 41s 2 lt indicator gt lt indicator id keystrokes type count gt 38 lt indicator gt lt indicator id white keystyped type count gt 0 lt indicator gt lt indicator id nonwhite keystyped type count gt 8 lt indicator gt lt indicator id iso keystyped type count gt 8 lt indicator gt lt indicator id unchanged type flag gt false lt indicator gt lt timestamped edits gt lt LI baseline se Elda gt lt indicator id target t0 0 70 type sysselection gt baseline lt indicator gt lt indicator elapsed 0 id assignment length 40 offset 0 t0 71 type change gt m
34. us google mt greys 08x06 v1 v1 en br br 5 last 50 6 id greys 08x06 vi vi en br s2 1 7 meta home waziz experiments opus google mt greys 08x06 vi vi en br original meta gt 8 lt task ratio 17 3333 9 max 80 10 lines 1 11 duration 1 5100 12 subtype 1 1 13 sub 1 14 id 1 15 type pe 16 ideal 26 17 preferable 26 gt 18 lt S producer opus gt That s it Fran One more push lt S gt 19 lt MT producer s2 score 0 6 gt E isso ai Fran Mais um empurrao lt MT gt 20 lt task gt 21 cas irao GS 22 nar E 23 mes 24 du uration 42 100001 25 subtype 1 1 26 SuD 2 27 ad A 2There is a mismatch between this document and the XML tag The XML tag used to declare a unit is task due to an earlier version of the software This mismatch is being corrected both in the code and in the input output files We use unit throughout this document as it is clearer 24 28 type ht 29 ideal 30 30 preferable 24 gt 31 lt S producer opus gt As babies we were easy lt S gt 32 lt task gt 33 NS Listing 3 3 PET s input file attributes Job s attributes attr Units attributes units Files S R N MT Output PEJ Figure 3 1 PEJ Auxiliary tool for generating pej files PEJ Figure can format plain text files into pej files for you run pej bat and a simple interface will be loaded
35. xample with HTML 2 std 3 subs check this one for examples with length constraints and additional attributes for units 4 wmt Finally there is a directory called xml that contains XML databases with external informa tion 29 Chapter 5 API This chapter is yet do be written 30
Download Pdf Manuals
Related Search
Related Contents
Westinghouse 7826765 Use and Care Manual EASY.B EASY.W USER MANUAL Video Tracer for Conference System ICN 2005 Computer based CMS user manual 取扱説明書 User`s Manual Multi-format to 1-fiber DVI converter, OMVC-200 NCC Quarterly Fall 2011 - Nutrition Coordinating Center Banque de données de sécurité 2 roues Copyright © All rights reserved.
Failed to retrieve file