Home

Tutorial Introduction

image

Contents

1. TYPE rplace noun rsubstance noun rhuman noun rdisease noun rquality noun group noun countable noun COMMON NOUN TYPE2 29e etm X collective noun PERSONNAME TYPE f PLACE NAME TYPE determiner not required rplace name PLACE NAME TYPE2 X medication name given name rperson name surname title determiner required country name planet name city or state name day bos TIME name NAME TYPE month time rorganisation name religion name 45 e Verbs infinitive thirdpersonsingular N past verb participle gerund i AUX do aux aux verd VERB TYPE AUX VERB reduced aux ve aux TYPE nonreduced aux intransitive verb INFLECTING pe b rinflecting verb lt VERB TYPE hee DOING nne ek TYPE ditransitive verb link verb L LEX mental verb kx verb VERB TYPE ine verp PROJECTING ibak verb projecting verb Tea Typp VPA verb appearance verb relating verb INFLECTING VERB reg verb REG PATTERNS L S rinfinitive be r 1ps be regd verb VERB verb p F2ps be 3ps be BE AUX INFLECTION P5 Plur be r past sing be J r past plur be perau X r participle be gerund be BE pue AUX TYPE nonreduced be daLverb MODAL reduced modal modal ver VERB TYPE ponreduced modal Subclasses of mental verb si b COGNITIVE cognitive action verb cognition
2. mental projecting matches any verb which is classified as mental __ human noun matches any noun classified as a human noun Inflection matching at the end of a token indicates that all inflection forms of the token which should be a root form should be matched Thus break matches break broken broke breaking breaks red matches red reds noun redder adj reddest be matches be is are was were been being jS matches nothing only roots can be used To constrain the inflection matching to a limited set of inflections one can add noun verb adjective or pronoun after the E g red noun matches red reds red adjective matches red redder reddest 29 Note that wildcards cannot be used within 96 forms Nor can the string before the 96 be blank 4 Running a Query After entering your query you can hit the Show button If your cursor is in a text field Containing String you can hit the Return Key 5 Modifying a Query To change a feature selection just click on the feature to change it To delete any of your search extensions click on the keyword amp P containing in and click on remove 6 The Result Space The white space below the Query space displays the results Click on a result and the annotation file c
3. the of and a A more informative listing works out how important each word is for a particular corpus when compared with a more general corpus For instance the keywords from a corpus split over three fields are shown below The words are ordered in terms of their specialness for this corpus relative frequency in this corpus when compared to the relative frequency in the general corpus A value of 100 indicates the word appears 100 times more in this corpus than in other corpora NOTE for this to work one needs to select only a sub corpus If you select the whole corpus then nothing will happen Military Economics Crime troops 100 0 economy 121 38 crime 142 85 weapons 100 0 companies 116 52 detective 50 0 engine 100 0 Stock 100 0 police 49 16 mountains 100 0 tax 1000 disappearance 40 0 smoke 90 0 cuts 85 0 criminal 39 86 gulf 85 0 profits 80 0 court 34 88 enemy 85 0 investment 75 0 justice 304723 aircraft 80 0 billion 75 0 driver 30 23 force 70 0 returns 70 0 boy 29 06 civilians 70 0 sales 70 0 victims 18 6 civilian 70 0 earnings 65 0 family 17 56 guys 65 0 investors 65 0 child 13 95 military 62 47 jobs 65 0 car 12 81 squadron 60 0 package 65 0 lived 11 96 suicide 55 0 assets 65 0 officers 11 96 tanks 55 0 prices 60 0 legal 11 51 soldier 55 0 bill 60 0 children 10 57 jungle 554 0 corporate 60 0 kids 9 3 altitude 55 0 stocks 58
4. AutoGode Statistics Enter Search Query Below person Show Show as by file Save Help FinanceNews Age25 05 03 txt 7 hits Bush President George W Bush his opposition the rich Bush his FinanceNews BBC25 05 03 txt 18 hits the country s finance minister Hans Eichel he Mr Eichel ha Figure 5 1 The Corpus Search Window 2 Specifying Search Queries At the top of the window is a menu driven widget to define your search query For this tutorial we will use a small project called Finance which can be downloaded from the CorpusTool website on the Download page Simple Feature Search To search for all segments containing a given feature click on the widget at the top left in Figure 5 1 person and select a feature from one of your layers Then press Show to see all instances Press Save to save the search results to a file 1 More Complex Searching Click on the small next to the feature selector to extend your query and Allows you to add another feature and the search will return all segments containing both the nominated features or Allows you to add another feature and the search will return all segments containing either of the nominated features NOTE and and or cannot be mixed and not Allows you to add another feature which should be excluded and the search will return all segments containing the first feature but not the second
5. or the MacOSX equivalent and open the text with MS Word which may help you choose the best encoding Otherwise using Open with select a browser and look for the Encoding or Character Encoding menu item and see which encoding this program gave the text e Display Font Choose here the font family and size you want to use to display your text in the annotation windows Some fonts will better cope with non western writing systems e g some fonts are designed to display Chinese etc However many modern fonts should display any writing system oo File MetaData Metadata for file News Obama1a txt english utf 8 Arial 16 Obama is like Apple Google and Facebook a once hip brand tainted by Prism Among the guests at the fabled Bilderberg meeting held this weekend just outside London are the top bras s of Google Amazon and Microsoft How appropriate they should be there alongside luminaries of the US p olitical and military establishment For this was the week that seemed to confirm all the old bug eyed conspir acy theories about governments and corporations colluding to enslave the rest of us The Guardian revealed that the US National Security Agency has cracked open our online lives that it can rif le through your emails listen to your calls on Skype watching your ideas form as you type as a US intellig ence officer put it apparently in cahoots with the corporate titans of the web OK Cancel Figure 2 4 Fi
6. verl MENTAL VERB TYPE L cognitive state verb VERB TYPE reaction verb please type verb Subclasses of verbal verb mental verb ADDRESSEE telling verb addressee oriented verb ORIENTED VERB D sien VERBAL verbal verb VERB TYPE proposing verb iy ADDRESSEE not addressee oriented verb ORIENTED 2 E STATING SAYING stating verb VERB TYPE L saying verb SAYING bede VERB TYP E saying not required stating saying verb 46 e Adjectives 3 ADJECTIVE TYPE adjective ADJECTIVE SEMTYPE N e Adverbs ADV adverb r descriptive adverb F ER EST ADJ INFLECTION rer est adjective absolute adjective comparative adjective superlative adjective ER EST ADJECTIVE reg adj REG PATTERNS L regd adj N noninflecting adjective NONINFLECTING comparable adjective ADJECTIVE TYPE noncomparable adjective r nationality adjective other adjective rmanner adverb r temporal descriptive DESCRIPTIVE connective adverb ADVERB TYPE2 modal adverb rlocation adverb other descriptive adverb more less INTENSIFIER TYPE intensifier r Jussative TYPE ee degree intensifier r interrogative adverb ago adverb 47 e Pronouns r nominative pronoun r accusative pronoun PRONOUN CASE genitive pronoun Fgenitive2 pronoun reflexive pronoun r
7. a cd file Now make a new folder and place within it the scheme file and the codings file Open CorpusTool and create a new project Select Import Layer from the Project Menu You will be asked to specify the folder created in 3 above The cd3 file will be split into the raw text to be put into your Corpus folder and the analyses placed in the Analyses folder The next window asks in which subcorpus folder to place your text file The analysis scheme is imported as a new layer The next window asks for the name of the layer In Coder the only way not to code a bit of text was to ignore it In CorpusTool one selects only the bits of text one wants to code You may thus want the ignored segments in your Coder study to disappear The next window allows you to do this 10 Press Finalise and you have a new Layer added and your cd3 file is imported If you have a set of files all annotated with the same scheme 1 2 3 Place all the Coder files in a folder Make sure ALL the files are in cd3 format not cd2 Follow step 1 for a single file for at least ONE of the files e g make sure there is a scheme file in the folder Proceed from step 4 for the single file case above 43 If you have one or more files where the same text s have been coded with different networks in a sense you have done multi layered annotation using Coder 1 For each set of files annotated with the same
8. document open the HTML file in MS Word and from there cut pasted into your own document 41 1 Merge Projects Up until Version 2 8 4 this function did not work correctly Before Merging To ensure cleaner merger delete any layers in either project which have no annotation associated with them In all cases where the same files exist in both projects e g schemes corpus files and annotation files the file in the current project will be preserved Files will moved from the other project where they do not exist in the current project Select Merge projects from the Project menu You will be asked to select another project to merge with the current one Results will be saved in a new folder with the same name as the currrent project plus merged added 42 Importing Systemic Coder Studies 1 How to Import Coder Studies The analysis files in Systemic Coder can be imported into CorpusTool To do so follow the following instructions If you have a single file to import 1 NOD U A Ensure that the coding scheme is saved as an external file master scheme To do this open the file in Coder and select Scheme Storing from the Options menu Select Save to Master and specify a location to save the scheme Ensure the codings are saved as cd3 not cd2 if the file on disk has a cd2 extension you need to open the file and select Save Codings As from the File menu The program will offer to save it as
9. error 5 Presenting Results as a Network When performing a feature based study you can now view the results in a system network instead of in table format See Figure 7 3 After a study is presented in table form a new menu is presented labelled View As Select Network to switch to Network view This way of displaying statistics has been copied from a similar feature in SysFan thank Canzhong Wu the author of SysFan for allowing me to use this feature Available from http minerva ling mq edu au units tools index htm 36 CorpusTool 2 0 beta 5 Project Misc Help Project Search AutoGode TYPE Of Study Compare two datasets Aspect of Interest Feature Coding el Pas Dit participant i py vienas Network eae fpn editorial participant Options country set 20 00 set 2 14 29 PARTICIPANTS TY set 2 24 683 participant Figure 7 3 Network View of Statistics 6 Saving Statistics Each Statistics window offers a Save button which allows you to save the results to file in HTML format tabbed delimited or plain text Results saved in HTML can be opened in MS Word and then cut pasted into your publications Results saved tab delimited can be opened in MS Excel on Windows right click on the txt file and specify Open with Excel These files may also be useful for programs such as SPSS 37 1 Keywords The top words in any frequency list for English will be words such as
10. should look like the scheme in Figure 4 5 Scheme Entity xml entity 4 100 Options Cli human entity gee company TYPE organisation BASE MSATON government body media Figure 4 5 Scheme with subsystem renamed As a next step we want to code each entity not only by semantic type but also in terms of form We can do this by providing a sub network in parallel to the content network e Click on entity and select Add System this will add a system underneath the original one The curly bracket indicates that during coding you have to select from both the system above and the system below See Figure 4 6 eoo Scheme Entity xml entity 4 100 Options Cli human SEMANTIC Tem ORGANISATIO bens organisation ees ENTITY rfeature1 TYPE je entity Figure 4 6 Network with parallel system We now need to edit this new system e Click on ENTITY TYPE and select Rename System call the system FORM e Click on feature1 and rename it common e Click on feature2 and rename it proper e Click on FORM and choose Add Feature and call the feature pronoun eoo Scheme Entity xml entity 4 100 Options Cli human eel organisation Si SATION government body entity media proper FER coon pronoun Figure 4 7 The finished scheme The resulting network should look like Figure 4 7 Coding Schemes can get quite complex They can grow to contain hundr
11. the steps needed to create your project 1 Providing a name for a new project 2 Specify the folder where your new project s folder is to be stored For instance choose the Desktop folder on your machine When you click the Finalise button CorpusTool will create your project which is a folder containing all the details related to your project including the corpus and the annotation files It also contains an icon which can be used to launch your project directly the ct3 file Once you have finished with the Create Project Wizard the CorpusTool Main Window will open showing the File pane See Figure 1 2 This pane is where you add or remove files to your project or open a file for annotation eoo CorpusTool 3 0 My First Project ZM layers Search _AuoCode Statistics Explore Oplions Help Files in this project Extend Corpus Help zl Files in corpus but not incorporated in project Incorporporate All Figure 1 2 The File Management pane The buttons at the top of the pane allow you to switch between the different panes of CorpusTool Files Tutorial 2 Layers Tutorial 3 Search Tutorial 5 Autocode Tutorial 6 Statistics Tutorial 7 Explore Tutorial 8 Options and Help We will assume for now that the File pane is selected The name of your project is shown in the title bar of the Project window In the space below is a box showing all the files in the project initially empty and for each f
12. 1p pronoun r personal pronoun lt PRONOUN 2p pronoun PERSON wh personal pronoun 3P 3p pronoun PRONOUN TYPE hennie neuter pronoun PRONOUN singular pronoun NOUN NUMBER pronoun we plural pronoun spatial pronoun j i temporal pronoun LOCATION thin PRONOUN TYPE TS Pronoun one pronoun nonpersonal pronoun 4 that pronoun LOCATION r wh pronoun PRONOUN TYPE2 nonwh pronoun e Number cardinal NUM number ordinal TYPE percentage e Conjunction coordinating conjunction CONJ conjunction TYPE TYPE and conjunction ror conjunction but conjunction pre or infix conjunction NORMAL SUBORDINATING copo subordinating conjunction e Prepositions ragent preposition to preposition Dep e of preposition TYPE as preposition r by preposition other preposition CONJUNCTION TYPE if conjunction gerund conjoiner 48 e Determiners positive strict determiner f werden cr SEULS DETERMINER TYPE j negative strict determiner DET wh strict determiner TYPE2 M rm NONSTRICT quantifying determiner i nonstrict determiner 4 DETERMINER TYPE L demonstrative determiner QUANTIFYING r singular determiner DETERMINER NUMBER plural determiner e Punctuation period SENTENCE r sentence final punctuation FINAL TYPE exclamation mark question mark Bie TIVE semicolon tik PUNCTUATION conjunctive punctuation NS pu
13. 26 mercy 9 3 strikes 55 0 markets 5540 investigators 9 3 trees 55 4 0 budget 55 0 woman 9 01 lieutenant 55 0 finance 50 0 murder 8 52 withdrawal 55 0 volatility 50 0 boys 8 52 missile 5540 reforms 45 0 age 7 77 bomber 50 0 commercial 40 0 victim 6 64 invasion 50 0 temporary 40 0 street 6 27 combat 50 0 cent 37 87 body 6 22 rounds 50450 analysts 32 04 incident 5 98 missions 45 0 growth 32 04 2 Phrases Rather than looking at single words n gram analysis looks for sequences of words which are common in the corpus For instance a list of the frequent 3 38 grams sequence of 3 words that occur in a small corpus of introductions to academic papers are shown below in terms of 12 ad hoc networks 6 a set of 11 we believe that 6 in this paper 10 of this paper 6 the performance of 7 terms of a 5 of the two 7 in section 4 5 be able to 7 some of the 5 a number of 7 in order to 5 the design of 7 large number of 5 which can be 7 that can be 5 the problem of 6 ad hoc network 5 According to Biber e g Biber and Barbieri 2007 as the corpus grows to a reasonable size millions of words the kinds of phrases that raise to the top don t contain lexical content as such e g ad hoc networks Rather they are phrases which are used to frame such meanings We see here in terms of a set of etc While keywords tell us which words we should teach in a text n grams can tell us which phrasings are u
14. WIIG Was litally d viu ad tofly Like Ashburton waco oA been named in honor of ine report a six page cable from r p gt 0 05 with echo tracking 0 023 mm than with ning spline slightly CO deviation of nl air strikes blast palaces government buf dings and to supply lines behind a friend at Soldiers were also filmed in burnt out cars and injured people from Before ANDI IJ UI LUI T Associated Atlantic Auckland Aussie Australian Australian B mode Baghdad Baghdad Baghdad Baghdad Baghdad Danhkdanad UAM Corpus Tool Version 3 0 UlSUILL Press quoted Hadithah police Captain Ocean solo central business district last night also Rules football hero of the same name Embassy in Copenhagen titled Den Embassy notes that the Danish Govern 0 036 mm or M mode 0 074 mm me and M mod distension amplitudes bul OD nior British government sou Party headquarters across the river fror front ood st Wednesday by tv und s marked by a ci sayin gyiraais market AA raq had reported 78 ci moarlat innidant will ha imuactianntad Tutorial Introduction June 2013 Mick O Donnell michael odonnell uam es About this Document This document provides a tutorial introduction to UAM CorpusTool 3 0 henceforth UAMCT3 For more detailed information about the options in each screen and menu of UAMCTS please see the UAMCTS User Manual About UAM CorpusTool 3 0 UAM CorpusTool is a
15. an annotation window for this file at the specified layer Button Colours The buttons for each layer of a document are colour coded to indicate their degree of completeness e White totally coded e Light Blue Partially Coded e Dark Blue Coded to a high degree Note that these colours are indicative only 6 Quitting CorpusTool Note that all changes to a project are automatically saved If you quit the Project Management Window using the X in the top right corner you quit CorpusTool all changes saved 25 7 Continuing a Project Once your project is created the easiest way to open CorpusTool to work on your project is 1 Open your project folder on the desktop 2 double click on the cptr file which has a blue globe icon CorpusTool will open directly with your Project Window UNDO No undo is currently supported It will be supported in a later version Corpus Search 1 Introduction The Search Interface is opened by clicking on the Corpus Search button on the Project Window Figure 5 1 shows this window NOTE You can also open the Search Window from e a Scheme window Click on a feature and select Show Examples CorpusTool will open the Search window with all segments marked with that feature displayed e Descriptive or Comparative Feature Statistics Click on the count field of any set and the instances which make up the count will be displayed 26 CorpusT ool 2 0 beta 5 Project Misc Help
16. atures box If there are more choices in the coding scheme the next choice will then be displayed c Gloss Box If you introduced a gloss for a feature in the scheme see Section 3 3 above then if you single click on a feature in the Current Choice box the gloss will be displayed in this space This is useful when you have forgotten what exactly is the coding criteria for this feature 19 2 The Comment Frame In this box you can type comments about the current segment either to remind yourself of some problem or to communicate with other people working with the same project For instance one might write Is this a material or behavioural clause Check with IFG In summary to code a whole document 1 Select from the options shown in the Current Choice box until no options remain 2 If you make a mistake double click on features in the Selected Features box to undo the selection 3 Close the window and your codings will be saved 3 Annotating Code Segment files When annotating a document at a layer specified as Code Segments the process is slightly more complex Firstly for the sake of this tutorial let s add a new layer to our study 1 Bring the Project Window to the front Click on the Add Layer button on the right of the screen Call the layer Participant Select Annotate Segments Select Do not automatically segment Select Create New Scheme Press the Finalise button Note that this a
17. aving American families with more to spend more to save v id il E gt gt Gz eur Delete Other Action gaa Save Close Help Ll l1 i Figure 4 2 Code segments window This display differs from that for coding a whole document in that there are more buttons in the toolbar in the middle These buttons basically allow you to move through the segments 3 1 Making Moving and Selecting Segments Make segments by swiping text clicking down at one point in the text and dragging to the place you want to end the segment then releasing the mouse Select segment you can select a segment by clicking on the segment line which runs under each segment You can tell which segment the mouse is over as the line of the segment is highlighted Select next previous segment use the and buttons in the toolbar to move around between segments Select next previous incomplete segment use the and buttons in the toolbar to move to the next or previous segment which is not totally coded yet Resizing Segments Select the border of a segment by moving the cursor over the small border marker a vertical line until it goes red to indicate you are over it Then click down and drag it where you want to go Delete segments if you create a segment erroneously you can delete it by selecting the segment then clicking on the delete button in the toolbar Alternatively hit the Delete key 21 3 2 Ignoring Segments Clic
18. corpus of the project Paste from the Clipboard you will be given a space in which to copy paste text into This is a useful way to take texts from the internet into UAMCT In the first two cases the files you select will be copied from where they are into the Corpus folder of your project The originals are left untouched For this tutorial lets use the Paste From Clipboard option Copy the following paragraphs of text and follow the instructions below Obama is like Apple Google and Facebook a once hip brand tainted by Prism Among the guests at the fabled Bilderberg meeting held this weekend just outside London are the top brass of Google Amazon and Microsoft How appropriate they should be there alongside luminaries of the US political and military establishment For this was the week that seemed to confirm all the old bug eyed conspiracy theories about governments and corporations colluding to enslave the rest of us The Guardian revealed that the US National Security Agency has cracked open our online lives that it can rifle through your emails listen to your calls on Skype watching your ideas form as you type as a US intelligence officer put it apparently in cahoots with the corporate titans of the web 1 Select I want to paste from the clipboard Figure 2 1 then press Next oo Add Files Assistant Corpus Location What do you want to do Iwant to add a single text file C want to a
19. dd a folder of text files want to paste from the clipboard Figure 2 1 2 Paste the text into the space edit it here if you want eoo Add Files Assis Paste the text in the space below and edit it to suit Filename Subcorpus Add new subcorpus Figure 2 2 3 Type in a filename for the file e g Obama1 txt 4 Leave Subcorpus set to Add new subcorpus 5 Press Next You will be prompted for the name of the subcorpus to add the file to Type News and then press OK 6 Press Finalise The file you added should not be displayed in the Project window see Figure 2 3 eoo CorpusTool 3 0 My First Project Files Layers Search AutoCode Statistics Explore Files in this project Extend Corpus Help Options Help al Files in corpus but not incorporated in project Incorporporate All Action Y News Obamad1 txt Figure 2 3 Files window after adding a text file The newly added files are under the caption Files in corpus but not incorporated in project UAMCT makes a distinction between incorporated files which have buttons to annotate at all available levels and unincorporated files which are in the corpus but not yet opened for annotation This distinction is made to make it easy to keep track of those files which you have started editing distinct from those you may wish to add later If you have 100 files in the corpus but have only anno
20. dds a new Layer in the layer space and also adds a new button for each incorporated file Now let s define the scheme for this layer 1 Click on the Edit button in the space for the Participant Layer 2 When the scheme window opens change participant 1 to human and participant 2 to organisation 3 Click on PARTICIPANT TYPE and select the option add feature Type in other participant b M D a Mu Your network should look like that shown below human id PARTICIPANT isati participan TYPE organisation other participant Now close this window returning to the Project Window Click on the Participant button for one of your text files This will open an annotation window for the document at this layer See Figure 4 2 20 4 Participant analysis for FinanceNews Age25 05 03 txt PRSE Source Melbourne Age Breaking News Article Bush defends tax cuts Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US350 billion A4533 94 billion tax cut package against opposition accusations that it unfairly benefits the rich Bush who is launching into a re election campaign insisted in his weekly radio address that the package narrowly passed by Congress on Friday would boost the ailing economy and create badly needed jobs With the state of the economy now dominating voter opinion polls commentators are divided over the effect the tax package will have The president said By le
21. e 1 Opening the Scheme Editor Before annotating files for a given layer you need to define the annotation scheme for the layer The first step here is to open the scheme editor Change to the Layers pane and click on the Edit Scheme button for the layer This tutorial assumes we are working on the Entity layer defined in the previous tutorial eoo Scheme Entity xml entity 4 100 Options Close ENTITY entity 1 entity TYPE entity 2 Figure 4 1 The Entity Scheme before editing A window like Figure 3 2 will pop up It shows a small system network a hierarchy of features with entity as the most basic concept and a choice between entity 1 and entity 2 2 Editing the Scheme These features have been automatically generated and we will change them to more informative names e Click on entity 1 and a menu will appear with options as in Figure 4 2 Choose Graph from here Graph from parent Show examples Figure 4 2 Options for Features For more information on these options please see the manual For now we want to change entity 1 to something more plausible Let s assume that we want to code our entities as either human or organisation we will ignore NPs that refer to other sorts of entities e Select Rename Feature A window will appear allowing you to edit the feature name e Change the feature text to human in UAMCT all features are in lower case and space
22. e usage of features in the corpus at that layer counts mean and standard deviation 33 These studies can be done for a single dataset descriptive statistics two datasets comparative statistics or showing results for each document individually 1 2 Describe a dataset offers descriptions of your corpus or a specified subcorpus Compare two datasets provides a comparison of two subsets of your corpus e g english vs spanish When Feature is selected the two sets are contrasted in terms of the occurrence of presence of the features in the codings at the layer specified Levels of significance of the differences between the sets are displayed both in terms of Students T test and Chi Squared see below Compare Multiple Files provides details of each file in your corpus one column per file 2 AContrastive Feature Study Figure 7 2 shows a sample Comparative study done using the Finance project Note very little of the text has been annotated so the results are for small numbers only We would need to tag a thousand or more participants from a range of editorials and fpn front page news articles before we could start to trust the results This preliminary study shows two significant results more reference to people rather than organisations at a 9896 level of significance and significant differences in the types of organisations discussed but the numbers are too low to trust CorpusTool 2 0 beta 5 Pr
23. eds of choices with lots of parallel systems and sub classifications However the smaller the scheme the quicker the coding This is all we need for now For more details on editing coding schemes and including the schemes in your publications see the UAMCT3 Manual 17 Tutorial 5 Manual Annotation THE REST OF THIS DOCUMENT STILL NEEDS TO BE UPDATED FROM VERSION 2 8 1 Annotation Types CorpusTool currently supports two types of annotation 1 Code document the document as a whole is assigned features Useful for defining document language text type register etc Also can be used to code features of the writer e g language proficiency 2 Code segments the user defines segments in the document and assigns features to each segment For instance clauses NPs words speaker turns etc Below we will explain how to annotate in both manners 2 Annotating Code Document files Each text file incorporated into your project has a button for each layer of analysis If you click on a layer button where the layer was specified as Code document then a window like that in Figure 4 1 will appear Entity analysis for News Obamal txt Obama is like Apple Google and Facebook a once hip brand tainted by Prism Among the guests at the fabled Bilderberg meeting held this weekend just outside London are the top brass of Google Amazon and Microsoft How appropriate they should be there alongside luminaries of the US po
24. er s error annotation scheme 8 For this tutorial select Create new scheme Then click on the Finalise button and your new layer will be added to the Project Window Figure 2 3 shows the Project window with one layer added The Layer space provides some information about the layer it s name Register its type code document and the name of the scheme associated with the layer Register xml 24 There are two buttons on the Layer control panel e Delete this will delete the layer and all analyses of text files performed on this layer Press this only before you begin coding of the layer or if you really want to delete the layer e Edit this button will open a window to allow you to edit the coding scheme We will come back to this in the next section CorpusTool 2 6 7 ore Project Misc Help Search AutoGode Statistics Keywords Options Project MyFirstProject Layers in this project Name Register Delete Type code documen Segmenta none Scheme Register xml Edit Ki Files in this project Extend Corpus Figure 2 3 The Project Window with one Layer added 9 You can also select Import layer from the Project menu to add a layer using a Systemic Coder study cd3 files See Appendix for more details 5 1 Opening an Annotation Window The remaining buttons on each row each correspond to an annotation layer defined in your project Click on the button to open
25. f you specify immediately then if the contained segment or string falls within such an embedded unit it will not match the units in which the unit is embedded For instance with They left because she was tired a search for clause containing immediately was would only match the inner clause 3 Combining Complex Searches One can combine complex searches e g person containing immediately bush in finite clause in editorial amp english 3 Concordance Searching CorpusTool lets you search for lexical patterns English only currently for most features 3 1 Specifying the Search Query If you specify containing string see above you can specify a lexical pattern instead of a simple string For example to find passive clauses be participle will match all segments containing any form of be followed by a participle verb en verb Note that the corpus is NOT tagged in terms of part of speech POS Rather CorpusTool includes a large dictionary of English and looks up each word in the dictionary Because of this a word will match all POS classes to which it belongs For instance be will match all occurrences of being even in the context where the word is not a verb e g the being Matching occurs as follows Case Insensitive all searching is case insensitive Thus Birch will match Birch and birch and BIRCH The search string consists of a sequence of search tokens
26. feature containing segment this allows search across layers it returns all units tagged with the first feature which contain segments at another layer tagged with the second feature For instance one might search for finite clause containing person amp subject to find all finite clauses where the segment boundaries totally include a segment at the participant layer which is coded both person and subject 2f e containing string this will allow you to find all segments with the nominated feature which contain a given string Matching is not case sensitive NOTE this feature is also used for concordance searching searching based on lexical features wildcard matching etc See below for more details e in segment this allows search across layers specifying that segments should match only if they are contained within segments at the second specified layer For instance one might search for person in editorial to find segments tagged as person in editorials Immediate containment NOTE for search queries including containing segment containing string or in segment you can choose between immediate and anywhere The difference is as follows e anywhere if the containing segment contains the specified segment or string it will match e immediately Sometimes users allow units embedded within others at the same layer for instance clauses can be embedded within other clauses I
27. feature in the Set 1 space and another in the Set 2 space This should be a unit which CONTAINS the unit of interest In this case we specify units of the Register layer fon and editorial Since these features apply to whole texts they do contain the segments with feature participants 5 Press Show 4 Interpreting the Results Feature based Studies Only systems which are relevant are shown For instance if we had specified the unit of interest as person above then the study would involve only those segments with feature person For this reason the results for this system are not shown as person would score 10095 and the other features in that system 0 Counts and Percentages The results for each feature are shown with both raw counts how often that feature occurred in the dataset and also as a percent The percent shows the proportion of segments which have this feature Note that the percentages in a system a given set of choices always adds up to 100 so really what it is measuring the propensity to select this particular feature as opposed to the other features in the same system Statistical Significance when a comparative study is done it is possible to measure whether the differences between the two datasets is statistically significant does it represent a real difference or is it possibly due to randomness in the data CorpusTool uses two measures of statistical significance and presents them both in the re
28. g a new rule To add a new rule click the Add button in the list of buttons at the top of the Autocode window A window like the following will appear Autocode Rule Editor clauses clauses a VE Cancel Select a feature which you want to code automatically Then specify a search pattern to use see section on Corpus Search for how to specify a search query Then press Save to keep this rule in memory 3 Editing a rule Click on the Edit button to edit the currently displayed rule 31 Hints Deleting a rule Click on the Delete button to delete the currently displayed rule Coding with a Rule When you have a rule selected press the Show button to see all segments which match the search pattern component A new toolbar appears with three widgets Display All Agreements Conflicting Nonconflicting selecting from this list allows you to filter out some of the matches e All shows all of the matches e Agreements shows all segments already coded with the specified feature e Conflicting shows those segments which are already coded with a feature which conflicts with the feature you are autocoding For instance if autocoding as passive this would show all segments already coded as active e Nonconficting shows all segments which are neither agreements nor conflicting Select All None selecting one of these options will select deselect the check boxes next to each segment Code Selected Clicking
29. ile one button for each of the possible analyses of that file This ends the first tutorial The next tutorial will show how to add content to your project Tutorial 2 Adding text files to your project The next step is to add some files to the project 1 Save Documents as plain text UAMCT 3 deals only with plain text files If your files are in MS Word format or PDF you need to save them as plain text If you are on Windows and your texts are in languages with non western characters e g Cyrillic Chinese Korean etc then it is better to open your docx document with WordPad and use the Save as option there as it can save as a Unicode file 2 Click on Extend Corpus Click on the Extend Corpus button in UAMCT A window will appear to guide you through the process of adding files You are given a choice between e Add a single file You will be asked to select a file to add to the corpus Additionally you will be asked to specify a subcorpus for the text file Texts in UAMCT are stored within subcorpora folders within the Corpus folder For instance you might have one subcorpus for native texts and another for learner texts e Add a folder of files You will be asked to select a folder to add to the corpus This folder could be o A folder of plain text files the folder will be added as a subcorpus of the project o A folder of folders of plain text files each folder will be added as a sub
30. ill open a window to allow you to edit the coding scheme We will come back to this in the next tutorial e Delete Layer this will delete the layer and all analyses of text files performed on this layer Press this only before you begin coding of the layer or if you really want to delete the layer e Edit Details this button is currently disabled but will in the future allow you to change the characteristics of the layer e g manual automatic auto segment etc Currently you need to delete the layer and add it again to change the characteristics FU Layers Ss RUSO Sse lade pdr re Add Layer Type manual Subtype segment Edit Details Delete Name Entity Layer Layers in this project Autoseg None Special None Edit Scheme 12 Figure 3 2 The Layers Window with one Layer added 2 1 Return to the Files pane If you click on the Files button you will see that the display has changed slightly The entry for the Obamat1 txt file now has a button next to it Entity You can click on this button to edit this file at this layer The colour of the annotation buttons are colour coded to indicate their degree of completeness e Light Not yet coded e Medium Partially Coded e Dark Coded to a high degree Don t open the annotation window just yet though First we need to specify the coding scheme for the Entity layer The next tutorial will deal with this process Tutorial 4 Editing the Coding Schem
31. k the Ignore button when a segment is selected and this segment will not be used in statistical analyses Ignore segments are shown in grey in the text window The same button can be used to unignore a segment 4 The Other Actions Menu This menu displays some extra options depending on the kind of annotation whole document segments that you are annotating Edit Scheme Opens the scheme window for this annotation layer so that you can edit the scheme or add change the glosses associated with features Add New Feature Prompts you to type in the name of a new feature which is added to the currently displayed set of choices and assigned to this segment Copy Features Copy the features so far assigned to this segment into memory Paste Features Assigns the features previously copied to this segment Resegment Document Wipes all segmentation of this layer for this document Note this deletes all annotation of the document at this layer Show XML Displays how the currently open file is stored on disk in XML format Show Structure Switches to an alternative display of the segmentation interface which approximates more to the standard structural display of Functional Linguistics See figure 4 3 EunctionStruct analysis for FinanceNews Age25 05 03 txt RES Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US350 Actor Circumstan Process Goal Dej Numerat billion A533 94 bil
32. le Metadata Window After incorporating the file the Project Window appears as in Figure 2 5 ooo CorpusTool 3 0 My First Project Pes is project COM ee Action News Obamat txt F Files in corpus but not incorporated in project ders ordo Figure 2 5 The Files Window after incorporating files 10 Tutorial 3 Adding a manual annotation layer to your project The next thing we need to do is to specify what analyses you want in the project Let s start by adding just one layer A Layer is a type of analysis of the text files We can add layers for coding clauses for coding groups for the register of the whole text for appraisal analysis etc For this example we will assume we are adding a layer for analysing noun phrases NPs in terms of both their content what they express and their form proper common pronominal 1 Change to the Layers pane Click on the Layers button at the top of the window 2 Click on the Add Layer button When you click on Add Layer a window will pop up asking several questions Use the Next button to move between questions 1 2 3 Layer Name the name given to the layer Put Entity Automatic or Manual Annotation choose Manual Scheme choose Design Your own The other options allow you to use one of the schemes supplied with UAMCTS or to use a scheme from another project you have created Kind of Segment here you specify whether you want to a
33. lion tax cut package against opposition Goal Circumstance Numerator Classi Thing Classifier accusations that it unfairly benefits the rich Circumstance Thing j MEN ee A Circum Proces Rrieh who ie latinchina into a plor camnainn insisted in hie x mm E a gt Jano Delete Other Action Eav sClose Help qy nnn ee sfg participant imaterial participant actor Figure 4 3 Function Structure Display Mode 22 Show Text Stream brings up a new window which allows you to view how choices made change throughout this text Use the System to Graph menu to select a system to view Use the Smoothing menu to change the degree of smoothing With 0 smoothing each choice is shown in the sequence it occurs in Use higher levels of smoothing to better view how choices are distributed over phases of the text For instance in Figure 4 4 the text stream shows that passive clauses occur more strongly at the beginning of the text and to lesser degrees later in the text System to graph VOICE Smoothing 4 passive clause active clause Figure 4 4 Text Stream Window 23 Tutorial 4 Adding a document layer to your project The first thing to do in a new project is to specify what analyses you want in the project Let s start by adding just one layer For 5 Click on the Add Layer button A Layer is a type of analysis of the text files We can add layers for coding clauses for coding gr
34. litical and military establishment For this was the week that seemed to confirm all the old bug eyed conspiracy theories about govemments and corporations colluding to enslave the rest of us The Guardian revealed that the US National Security Agency has cracked open our online lives that it can rifle through your emails listen to your calls on Skype watching your ideas form as you type as a US intelligence officer put it apparently in cahoots with the corporate titans of the web lt lt lt gt gt gt Ignore Delete OtherAction Save Close Help EN EN Comment Figure 5 1 An Annotation window The code document window has 4 parts 1 The Text Frame shows the text file You can scroll to see the whole text 2 The ToolBar giving various actions such as Save Close and Help see below 1 The Coding Frame contains three boxes a Selected Features labelled Assigned the features already assigned to the text Initially this will contain one feature the leftmost root feature of the coding scheme for this layer As other features are assigned they will appear here You can delete features by double clicking on the features in the Selected Features box The root feature cannot be deleted since it applies by default to all documents b Current Choice the middle box is a choice which needs to be made for this document Double click on one of the options That choice will be moved to the Selected Fe
35. m than with the B mode 0 036 mm or M mode 0 074 mm me hing spline slightly redug tne standard deviation of the de and M m distension amplitudes but O d 7 nior British government sou S 1 00 air strikes blast palaces 10 byfdings and the Baath Party headquarters across the river fror to supply lines behind the Baghdad front afriend at the Baghdad ood 7 Wednesday by tv Soldiers were also filmed in the Baghdad s marked by a ci burnt out cars and injured people from the Baghdad bombi J Before the Baghdad market M Iraq had reported 78 ci the Ranhrar market incident will he invectinated Copyright Mick O Donnell 2007 Email michael odonnell uam es Web http www wagsoft com CorpusTool Start New Open Import Project Bansal Project Project from UAMCT 2 Figure 1 1 The Opening Window The Window offers several options e Start New Project create a new project from scratch e Open Project to continue with a project you have already started you will be prompted to select one e Import Project from UAMCT 2 If you have a project from UAMCT 2 you can use the Import Project from UAMCT 2 button to make a copy of your project in the UAMCT 3 format e Open SomeProjectName If you have opened a project previously on this machine there will also be a button to open the last project opened 2 Click on the Start New Project button After clicking this button a Create Project Wizard will appear which will lead you through
36. n your machine you can begin working with it The first thing to do is to create a new project Windows e When installing CorpusTool you had the option to place an icon on the desktop Click on this icon to launch CorpusTool e Alternatively there should be a UAM CorpusTool icon in the Programs menu in the Start menu on Windows Toolbar Select this to launch CorpusTool Macintosh e The installation of CorpusTool placed the application in your Applications folder Double click on the application to launch it e You might find it useful to place the application in the Dock for easy access If you have already created a project you can open it simply by double clicking the cp3 file in the Project folder This file has an icon as below MacOSX Windows e The Opening Window A window should appear as in Figure 1 1 This window provides amongst other information the version number you are using useful if you need to communicate bugs Open Project WINCH Was litally ad uiu ad uc ANDIIVUI LUI UlSUILE The Associated Press quoted Hadithah police Captain tofly the Atlantic Ocean solo Like Ashburton the Auckland central business district last night also vao kor ld been named in honor of the Aussie Rules football hero of the same name Te report a six page cable from the Australian Embassy in Copenhagen titled Den The Australian Embassy notes that the Danish Govern r p gt 0 05 with echo tracking 0 023 m
37. nctuation ypg hvphen colon open bracket BRACKETING close bracket TYEE single quote doubl DOUBLE open quote ouble quote QUOTE TYP El close quote bracketing punctuation genitive s 49
38. o a re election campaign insisted in his weekly radio address that the package narrowly passed by Congress on Friday would boost the ailing economy and create badly needed jobs With the state of the economy now dominating voter opinion polls commentators are divided over the effect the tax package will have The president said By leaving American families with more to spend more to save and more to invest these reforms will help boost the nation s economy and create jobs He added When people have extra take home pay there s greater demand for goods and services And emnlovers will need more workers to meet that demand z De person _ MB Add De organisation H TE Figure 8 1 The Text Styler Window 5 Opening the Text Styler From the Project window the main window click on the filename of one of your files Note this only works for files incorporated in your project Also your project needs to have at least one layer defined 6 Styling the Text You can assign colour and or font effects bold italic underline to all text tagged with a given feature or feature combination This allows the patterns of selection throughout the text to be visible E g use bold italic underline for appraisal categories and colour coding for clause type to see how appraisal is distributed in respect to clause types 40 7 Saving Styled Text You can save styled text to an HTML file To include styled text in an MS Word
39. oject Misc Help Project Search AutoCode Type of Study Compare two datasets Aspect of Interest Feature Coding Help Unit participant Shows View as Table seSave fpn gt editorial gt Seti Results Set2 Results Feature N Percent N Percent TStat Sign ChiSqu Sign PARTICIPANTS TYPE N 100 N 77 person 27 27 00 47 6104 4817 EE rn country 20 20 00 11 1429 0989 organisation 53 5300 19 24 68 3946 HE Mer ORGANISATION TYPE N 53 N 18 company 48 90 57 0 0009 0 000 50 323 government 2 377 X 4 2222 2503 5911 union 1 89 9 50 00 6257 25 704 1 other organisation 0 0 00 2 11 11 0 000 6 060 2 political party 3 77 3 16 67 1866 3412 E Weak Significance 90 Medium Significance 95 High Significance 98 Figure 7 2 A Contrastive Stats Study 34 3 Performing a Study To perform one of the studies outlined above 1 Choose one of the options from the Type of Study menu describe a dataset compare two datasets or compare multiple files 2 Choose from the Aspect of Interest menu choose either Feature Coding or General Text Statistics 3 Specify the unit that you are interested in see section 5 part 2 Specifying Search Queries This should be the unit which you wish to explore differences in It could be the root feature in a network as in the case in Figure 7 2 or a more delicate one 4 f you are selected Compare two datasets then enter a
40. on this button will automatically code all displayed segments which are selected e For some grammatical phenomena you can provide a pair of rules like e Select passive if contains be participle e Select active if clauses and not passive e Use the first rule to code passives and then use the second rule to put everything else as active e Provide one rule such as the passive rule above Code these Then edit the rule inserting a between the search terms e g e Select passive if contains be participle e This will find some instances where not or an adverb falls between the verbs 32 Corpus Statistics 1 Introduction The Corpus Statistics pane allows various statistics to be derived from your tagged corpus Press the Statistics tab on the main window s toolbar to see the Statistics pane as in Figure 7 1 lt CorpusTool 2 0 beta 5 Project Misc Help Search Auto ode Statistics Type of Study Describe a dataset Aspect of Interest General Text Statistics HEI Unit clauses Sow Figure 7 1 The Statistics Pane You can use this interface to perform two kinds of studies on your corpus 1 General Text Statistics offers general statistics of the corpus such as total number of segments number of words per segment lexical density in the corpus pronominal usage etc 2 Feature usage you specify a feature in a layer most typically the root feature of the layer and the program describes th
41. ontaining this segment will be opened at the right place The three columns at the left indicate the state of each coding e P Whether or not the segment is totally coded P partial e whether the segment has a comment associated Click on the segment to see the comment 30 Automating Coding 1 Introduction The Autocode window allows you to assign features to existing segments using search patterns For instance we can identify passive clauses in English using a pattern like clause containing be participle Using the Rule Editor we define a rule like select passive clause if clauses containing immediately be Gparticiple Note as with Search lexical based search patterns currently work for English We can then press the Show button and all instances matching the search query are shown with a check box next to each We can uncheck any item which is a false match not truly a passive Clicking on Code Selected will then assign the passive feature to each of the selected segments In this way we can quickly code many of the more common grammatical patterns To see a sample of such autocode rules add a new layer to your project and use the scheme included with the system clauses xml This includes rules for process type mental verbal etc voice active passive modality nonfinite clauses etc 1 Opening Autocoder Click on the Autocode button on the main window of CorpusTool 2 Addin
42. oups for the register of the whole text for appraisal analysis etc Lets start by adding a Layer for the Register features which belong to the document as a whole When you click on Add Layer a window will pop up asking several questions and use the Next button to move between questions e Layer Name the name given to the layer Put Register e Automatic or Manual Annotation choose Manual e Coding Object here you specify whether you want to assign features to a text as a whole e g its register or text type Annotate Document or whether you want to assign features to subsegments in the text e g clauses Let s assume that we are interested in the first and select on Annotate Document e Coding Scheme the coding scheme is a description of the features you want to annotate the text with You have two options here i Create New Scheme In most cases the user is interested in making their own coding scheme representing the features that they themselves are interested in organised in the way they feel they should be CorpusTool includes an easy to use interface for creating and modifying these schemes see section 3 ii Copy Existing Scheme In some cases you might reuse a coding scheme that you developed before or which was produced by someone else CorpusTool ships with a few schemes predefined which you could use One of these is Peter White s Appraisal network Another is based on Grang
43. s are not allowed so if you put capitals they will be changed to lower case and spaces will be substituted for e Repeat this process to change entity 2 to organisation Scheme Entity xm entity 4 100 Options ENTITY rhuman enti Y PE organisation Figure 4 3 Start of the Entity Scheme After this you should have the scheme as shown in Figure 4 3 Notice also that the choice between human and organisation has a name automatically provided as ENTITY TYPE If you want to s rename this choice you could click on ENTITY TYPE and select Rename System from the menu which appears Rename the system to SEMANTIC TYPE You can type in lowercase but UAMCT will always display system names in upper case Lets now add a more delicate distinction between types of organisations We want to sub classify organisations as either company government or media 15 e Click on organisation and select Add System a new system set of choices will appear under organisation as shown in Figure 4 4 Scheme Entity xml entity 4 100 Options ENTITY human entity ype _ ORGANISATION p feature organisation ee TYPE feature2 Figure 4 4 The Entity scheme with subsystem e Rename feature1 to company e Rename feature2 to governement body e Click on ORGANISATION TYPE and select Add Feature calling this feature media The resulting scheme
44. scheme create a folder and place the coder files and the scheme file for that analysis ensure the files are in cd3 format 2 Ensure all files which are analyses of the same file have the same file name e g if you have Text1 CLAUSE cd3 analysed for clauses and Text1 GROUP cd3 analysed for groups rename both files to Text1 cd3 CorpusTool can only tell two files are analyses of the same text by having the same filename 3 Open a new project and use the Import Layer option as described above for one of the folders 4 Repeat 3 for the other folders Some problems may arise e CorpusTool says it cannot read one or more of your cd3 files it may contain characters which are outside of ASCII text CorpusTool should handle this but currently cannot Send me your files and will import it for you e f you have any other problem importing cd3 files send them to me make a zip of the folder and will look at it this is good for me to see the kinds of problems people are having so can fix them 44 Lexical Features for Concordance Searching Nouns noun NOUN TYPE proper noun PROPER TYPE COMMON pur INFLECTION pjural noun rreg noun rregd noun r glreg noun COMMON NOUN inv noun REG PATTERNS fixedsingular r fixedplural apostr s r thing noun J rtemporal noun rcommon noun 4 revent noun rinstitution noun rreport noun COMMON
45. sefully taught For instance assuming we were teaching students how to write introduction sections to academic papers we collect a corpus of such texts and produce the key n grams for various lengths From such a corpus we can pick up frequent phrases such as this paper reports on or this paper article is organized as follows 3 KeyFeatures Key Features does what keywords does except that rather than looking at the words in the text it looks at the features assigned to segments The software thus shows which features are special to the focus corpus as compared with the reference corpus A key value of 2 0 indicates that the feature is twice as common on the focus corpus as in the reference corpus 39 4 Text Styling It is sometimes useful to view the coding of a text visually CorpusTool allows you to view one of the text files of your project specifying that particular segments on whichever layer should be showed in bold italic underline larger font or coloured See Figure 8 1 for the text style view of a file within the Finance project C Documents and Settings Administrator Desktop Finance1 KBX Source Melbourne Age Breaking News Article Bush defends tax cuts Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US350 billion A533 94 billion tax cut package against opposition accusations that it unfairly benefits the rich Bush who is launching int
46. separated by a space Each search token can be of the following format 28 1 2 3 Literal token a token not containing or 96 will match the token itself only Wildcard token if the query token includes an the will match any number of chars Thus ca matches cat carburettor etc ed matches weed lived etc bro en matches broken Brollerglen etc Match any a by itself matches any single token The above 3 cases should work for any language where words are divided by space characters or punctuation 4 Constraining by class a wildcard form can be followed with and then a lexical feature and the form will match only tokens which according to the system s lexicon can take that lexical class E g ca noun matches nouns starting with ca ing mental projecting matches mental projecting verbs ending with ing An asterisk cannot appear by itself it must have text either before or after it A full list of the lexical features that can be used are in Appendix ll and can be seen within the tool by selecting Show Wordclass Network from the Misc menu of CorpusTool General class matching If no token string is provided before the then the query form matches all tokens which could represent the specified class E g Qnoun matches any noun form verb matches any verb form adverb matches any adverb form
47. set of tools for the linguistic annotation of text Core concepts include e The user defines a project which is a set of files and a set of analyses which are applied to each of these files e All the files of a project are stored in a single folder the original texts the corpus the annotations on this text and the coding schemes the tags applied to the texts e Each analysis can be seen as a layer of annotation CorpusTool currently allows two types of annotation 1 Document Coding where the text as a whole is assigned features For instance these features could represent the register of the document field tenor mode or text type 2 Segment Coding The user can select segments within a file and assign features to each of these segments Segments are specified by dragging the mouse over a span of text and the user is then prompted to specify the features of this segment e Annotation can be manual the user swipes text and chooses categories for it or automatic the program does the annotation for you Sometimes annotation is mixed for instance you can have the program recognise clause or noun phase segments but it is up to the you to code them CorpusTool is available from http www wagsoft com CorpusTool See that site for instructions on how to install CorpusTool on your machine Tutorial 1 Starting a new project 1 Launch UAM CorpusTool Once UAM CorpusTool is installed o
48. ssign features to a text as a whole e g its register or text type Whole Document or whether you want to assign features to subsegments in the text e g clauses Let s assume that we are interested in the second so click on Segments within a Document Special Layer This window offers options for special kinds of annotation Error annotation layers provide a special slot on the coding interface for you to provide the correction of the error RST annotation provides a special interface for annotating the rhetorical structure of the text For now just select No Automatic Segmentation here you can specify whether the text should be segmented for you recognising paragraphs sentences or words For English texts automatic recognition of clauses and NPs is also possible For this tutorial select No After following these steps you will see a final window displaying your choices as in Figure 3 1 If any of the settings vary from yours use the Back button to go back and change it Then press Create Layer to return to the main window eoo Create Layer Assistant Final check Layer name UEM None Automatic Coding NO Scheme souroe Specia TE Create Layer Figure 3 1 Last pane of the Add Layer Assistant Figure 3 2 shows the Layers window with one layer added The Layer space provides some information about the layer There are two buttons on the Layer control panel e Edit Scheme this button w
49. sults e T Statistic T Stats are the numbers on which the level of significance of your result can be derived The bigger it is the higher the level of significance but this also depends on how much data you have In some more scientific papers you might be requested to provide T Stats but it is quite rare in linguistics e Chi Squared in recent years particularly in linguistics chi squared statistics are becoming the preferred means of testing significance CorpusTool provides the Chi Squared statistics for each comparison and the level of significance that corresponds to this At the end of each entry there will be between 0 and 3 signs These indicate how statistically significant is the difference of this features mean from that of the mean of the other set none Not significantly different Significant at the 90 level 10 chance of error 35 Significant at the 95 level 5 chance of error Significant at the 98 level 2 chance of error The level of significance is important to establish how repeatable your results are Results without significance may be accidents and if we repeat the study with other texts the result may be different If results are highly significant they are likely to be repeatable if we apply the analysis to a totally different set of texts To understand this a single means that of any 10 such results you can expect one to be a false result 90 significance or 10 chance of
50. tated five then you want the five with annotations to be clearly indicated This allows for a gradual expansion of your corpus over time but let s you get results at each point 3 Incorporating Files To incorporate a file into the project making it available for annotation you can either e Click on the Incorporate All button to incorporate all unincorporated files or e Click on the Action button next to a file and select Incorporate file from the menu This will incorporate just the single file If you do either of the above you will be presented with a window asking for some metadata regarding the file or files See Figure 2 4 This includes e Language which language the text written in This field is used to determine which language resources to use for the document These resources include lexicons for concordance searching calculation of 8 lexical density etc parsers for automatic segmentation and taggers Currently only English is really supported but soon lexical resources for other languages will be provided e Encoding text files are stored in a particular text encoding You can tell CorpusTool what encoding your file is in by selecting from this field The default option offered by UAMCT is a guess of what it should be but if the text does not display properly you may need to change it To find out what encoding the document is in try right clicking on the document and select Open with

Download Pdf Manuals

image

Related Search

Related Contents

Alpine INE-W957HD Owner's Manual  Betriebsanleitung als PDF herunterladen    

Copyright © All rights reserved.
Failed to retrieve file