Home
UAM CorpusTool Version 2.8 User Manual Mick O`Donnell
Contents
1. 30 3 Showing and selecting Matches nnne 31 Section 7 ee EE EE EE 33 io amm 33 2 A Contrastive Feature Study MET TT 34 o PR ADI EE 35 4 Interpreting the Results Feature based Studies anane naen a naen enere 35 5 Presenting Results as a Network ccooococcccoccconiccnconoconoccnooncnononononononnncanonanennninnonas 36 Gr SAVN SES ee 37 Section o The Explore PANG norena 38 A 39 O RIAS SE 39 GT SR aldo 40 Section 9 The Options Pala 41 Section T0 Text STING Ne 42 Section 10 The MenuD aaa a 44 le Merge POE see 44 HOW LO Import Goder STUNES rd 45 Appendix ll Annotating Rhetorical Structure aee ea eaaa naen nenen e ene ene 47 Appendix Ill Lexical Features for Concordance Searching 49 About the UAM CorpusTool 1 Introduction UAM CorpusTool is a set of tools for the linguistic annotation of text Core concepts include The user defines a project which is a set of files and a set of analyses which are applied to each of these files e Each analysis can be seen as a layer of annotation CorpusTool currently allows two types of annotation 1 Document Coding where the text as a whole is assigned features For instance these features could represent the register of the document field tenor mode or text type 2 Segment Coding The user can select segments within a file and assign features to each of these
2. Lexemes per sentence 1 22 Lexemes of text 36 31 Reference Density of tokens 1p Reference 2p Reference 3p Reference Figure 2 7 Info window for a file For English texts further information is available e Lexical density in terms of average number of open class terms per sentence or 9o open class items in whole text e Pronominal Reference Density detailing the usage of 1 2 and 3 person pronouns as a percentage of the text as a whole Note as lexicons for other languages are added these statistics will be available for those languages 4 3 Unincorporating a File from the Corpus The Unincorp button removes the file from the study WARNING Any annotation done on that file will be deleted The text file will be included in the unincorporated list so you can add the file back in later but totally unannotated 4 4 Opening an Annotation Window The remaining buttons on each row each correspond to an annotation layer defined in your project Click on the button to open an annotation window for this file at the specified layer Button Colours The buttons for each layer of a document are colour coded to indicate their degree of completeness 12 White totally coded Light Blue Partially Coded e Dark Blue Coded to a high degree Note that these colours are indicative only 5 Quitting CorpusTool Note that all changes to a project are automatically saved If you quit the P
3. UAM CorpusTool Version 2 8 User Manual May 2012 Mick O Donnell michael odonnell puam es Contents Section 1 About the UAM Corpustool nr 4 TN 4 Section 2 Starting a Project vassere remedier 5 UM SEAN PEN 5 on AA 7 gt AAS NE 9 4 Actions on Incorporated Files cooccoccconconconoccncnnnonocononnnonoccncancnnnnnnnanonnonnnnnnes 12 CU IV OFS TOO ME 14 av a aa gaga Ta aan aa GNU Gouden 14 Section 3 Defining the Coding Scheme nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnennnnnnnnnnnnnnnneenn 15 1 Opening the Scheme dor 15 2 ENE OCENG sree acces AA 16 3 Adding Glosses to features nennen nennen nnn ana nn 19 EEG 0 el MENU m 19 5 Producing Images for inclusion in documents or the web 19 Section 4 Annotating E 21 AMOUR TYPO mmm 21 2 Annotating Code Document files ooocoocnoccnoconnconncocnocanocanococononnonanonaness 21 3 Annotating Code Segment les 23 a The Other Actions MEL 25 Section COFDUS SEN e ars 26 M eoe e c 26 2 Specifying Search Queries nennen nnne nenas nna sn sna 27 gt Concordance MN 28 SG D RURANO e 29 SEE NOVIA QUE NEU 29 0 He e E 29 Section 6 Automating Coding xranxnnnnnnnnnnnnnnnnnnnnnnnnnnnnnennnnnnnennnnnnnnnnnnnnnnnnnnunnnnnnnnennn 30 TANGIS LI OT Em 30 2 Adding a new Autocode Rule or Editing an Existing one
4. quality noun group noun countable noun mass noun collectrve noun given name PERSONNAME TYPE surname title PLACE determiner required NAME TYPE qeterminer not required country name PLACE NAME TYPE2 planet name city or state name medication name religion name day Lu TIME ih ime name y AME TYPE Mon time organisation name AQ e Verbs INFLECTING VERB reg verb apicc infinitive be regd verb infinitive 4 thirdpersonsmgular LEXVERB b INFLECTION Past ver participle gerund 1 do aux AU verb V ERB TYPE AUX VERB reduced aux have aux TYPE2 nonreduced aux e T intransitive verb INFLECTING ge b inflecting verb VERB TYPE aan DOING PAREN TYPE ditransitive verb ink verb L LEX mental verb lex verb VERB TYPE ioc a PROJECTING ee projecting verb VERB pypp verbakverb appearance verb relating verb VERB verb TYPE be aux modal verb BE AUX Ips be 2ps be 3ps be INFLECTION BE pres plur be past sing be past plur be participle be gerund be reduced be on MODAL nonreduced be reduced modal VERB TYPE nonreduced modal Subclasses of mental verb mental verb MENTAL COGNITIVE cognition
5. From such a corpus we can pick up frequent phrases such as this paper reports on or this paper article is organized as follows 9 Key Features Key Features does what keywords does except that rather than looking at the words in the text it looks at the features assigned to segments The software thus shows which features are special to the focus corpus as compared with the reference corpus A key value of 2 0 indicates that the feature is twice as common on the focus corpus as in the reference corpus AQ A1 1 Text Styling It is sometimes useful to view the coding of a text visually CorpusTool allows you to view one of the text files of your project specifying that particular segments on whichever layer should be showed in bold italic underline larger font or coloured See Figure 8 1 for the text style view of a file within the Finance project C Documents and Settings Administrator Desktop Finance1 DX Source Melbourne Age Breaking News Article Bush defends tax cuts Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US350 billion A533 94 billion tax cut package against opposition accusations that it unfairly benefits the rich Bush who is launching into a re election campaign insisted in his weekly radio address that the package narrowly passed by Congress on Friday would boost the ailing economy and create badly needed jobs With t
6. Register layer fpn and editorial Since these features apply to whole texts they do contain the segments with feature participants 5 Press Show 4 Interpreting the Results Feature based Studies Only systems which are relevant are shown For instance if we had specified the unit of interest as person above then the study would involve only those segments with feature person For this reason the results for this system are not shown as person would score 100 and the other features in that system 0 Counts and Percentages The results for each feature are shown with both raw counts how often that feature occurred in the dataset and also as a percent The percent shows the proportion of segments which have this feature Note that the percentages in a system a given set of choices always adds up to 100 so really what it is measuring the propensity to select this particular feature as opposed to the other features in the same system Statistical Significance when a comparative study is done it is possible to measure whether the differences between the two datasets is statistically significant does it represent a real difference or is it possibly due to randomness in the data CorpusTool uses two measures of statistical significance and presents them both in the results e T Statistic T Stats are the numbers on which the level of significance of your result can be derived The bigger it is the higher the lev
7. Select Import Layer from the Project Menu You will be asked to specify the folder created in 3 above The cd3 file will be split into the raw text to be put into your Corpus folder and the analyses placed in the Analyses folder The next window asks in which subcorpus folder to place your text file The analysis scheme is imported as a new layer The next window asks for the name of the layer In Coder the only way not to code a bit of text was to ignore it In CorpusTool one selects only the bits of text one wants to code You may thus want the ignored segments in your Coder study to disappear The next window allows you to do this 10 Press Finalise and you have a new Layer added and your cd3 file is imported If you have a set of files all annotated with the same scheme d 2 K 4 Place all the Coder files in a folder Make sure ALL the files are in cd3 format not cd2 Follow step 1 for a single file for at least ONE of the files e g make sure there is a scheme file in the folder Proceed from step 4 for the single file case above If you have one or more files where the same text s have been coded with different networks in a sense you have done multi layered annotation using Coder 1 2 For each set of files annotated with the same scheme create a folder and place the coder files and the scheme file for that analysis ensure the files are in cd3 format Ensure all files whi
8. features finance military sport etc The set of systems that you define form a system network with the features of one system forming the entry conditions for more specific systems How you can create these networks is described blow 2 2 Creating amp Modifying Systems If you click on one of the feature lower case or the system upper case of the network you will be presented with a popup menu of actions These allow you to extend or modify the network 2 2 1 Actions on Systems e Add Feature adds a new feature to the system e Rename System allows you to change the name of the system Delete System deletes the system from the network Note the features which belong to the system and any systems which depend upon them will also be deleted And there is no undo at present If any codings have been assigned these features the features will be deleted from the codings e Change Entry Condition change the entry condition of the system from one feature to another e Move Up moves the system higher up in the graph to reorganise the layout e Move Down moves the system lower up in the graph to reorganise the layout 2 2 2 Actions on Features e Add System creates a dummy system under the feature e Rename Feature changes the name of the feature Delete Feature deletes the feature from the system Note the any systems which depend upon this feature will also be deleted And there is no undo at present If an
9. if you insert the following text as a gloss For more information click a href http www wagsoft com gt here lt a gt The gloss will display as For more information click here and clicking on the here will open the appropriate webpage Note Be very precise on the formation of the link you must have all the following chars except URL and TEXT which you provide lt a href URL gt TEXT lt a gt Note you can also link to html pages that are stored within your project folder a href fred html gt here lt a gt Would open a file called fred html at the top level of your current project folder a href Manual fred html gt here lt a gt Would open a file in the Manual subfolder of your project folder You can access files anywhere on your hard disk as follows a href file C Users Fred Manual fred html gt here lt a gt 4 The Options menu Every Scheme menu has a Options menu which allows you to do the following e Save as saves the scheme into a separate folder e Show Hide Glosses The glosses under features can be hidden or shown by selecting this option e Show Hide System Names The names of features can be hidden or shown by selecting this option With system names hidden editing of the scheme is more difficult you usually click on the system name to access functions such as add feature e Save Diagram as PDF saves the network as currently displayed to a PDF fil
10. matching 9o at the end of a token indicates that all inflection forms of the token which should be a root form should be matched Thus break matches break broken broke breaking breaks red matches red reds noun redder adj reddest be matches be is are was were been being js matches nothing only roots can be used To constrain the inflection matching to a limited set of inflections one can add noun verb adjective or pronoun after the fv E g red noun matches red reds red adjective matches red redder reddest Note that wildcards cannot be used within forms Nor can the string before the be blank 4 Running a Query After entering your query you can hit the Show button If your cursor is in a text field Containing String you can hit the Return Key 5 Modifying a Query To change a feature selection just click on the feature to change it To delete any of your search extensions click on the keyword 87 P containing in and click on remove 6 The Result Space The white space below the Query space displays the results Click on a result and the annotation file containing this segment will be opened at the right place The three columns at the left indicate the state of each coding s P Whether or not the segment is totally coded P p
11. possible analyses of that file Let us first add a layer to the project 2 Adding a Layer The first thing to do in a new project is to specify what analyses we want in the project Let s start by adding just one layer 1 Click on the Add Layer button A Layer is a type of analysis of the text files We can add layers for coding clauses for coding groups for the register of the whole text for appraisal analysis etc Lets start by adding a Layer for the Register features which belong to the document as a whole When you click on Add Layer a window will pop up asking several questions and use the Next button to move between questions e Layer Name the name given to the layer Put Register e Coding Object here you specify whether you want to assign features to a text as a whole e g its register or text type Annotate Document or whether you want to assign features to subsegments in the text e g clauses Lets assume that we are interested in the first and select on Annotate Document e Coding Scheme the coding scheme is a description of the features you want to annotate the text with You have two options here i Create New Scheme In most cases the use is interested in making their own coding scheme representing the features that they themselves are interested in organised in the way they feel they should be CorpusTool includes an easy to use interface for creating and modifying
12. segments Segments are specified by dragging the mouse over a span of text and the user is then prompted to specify the features of this segment Other annotation types will be added in later versions allowing annotation of rhetorical structure theory RST Generic Structure GSP participant chaining sentence structuring e g Subj Pred Mood Adjunct etc annotation of spoken data etc UAM CorpusTool replaces prior software of the author Systemic Coder which allowed coding of single documents at a single layer CorpusTool is an attempt to overcome the various limitations that constrained users of Coder wish to thank the many users of Coder who forwarded their comments over the years and to thank those sending me comments on this new tool To import Systemic Coder studies into CorpusTool see Appendix Corpus Tool is available from http www wagsoft com CorpusTool oee that site for instructions on how to install CorpusTool on your machine Starting a Project 1 Starting a New Project 1 1 Open CorpusTool Once UAM CorpusTool is installed on your machine you can begin working with it The first thing to do is to create a new project Windows When installing CorpusTool you had the option to place an icon on the desktop Click on this icon to launch CorpusTool e Alternatively there should be a UAM CorpusTool icon in the Programs menu in the Start menu on Windows Toolbar Select this to launch CorpusToo
13. state of the economy now dominating voter opinion polls commentators are divided over the effect the tax package will have The president said By leaving American families with more to spend more to save lt gt gt gt Ignore Delete Other Action save Close dein qup Woo es A panay S El El El l Comment Figure 4 2 Code segments window This display differs from that for coding a whole document in that there are more buttons in the toolbar in the middle These buttons basically allow you to move through the segments 3 1 Making Moving and Selecting Segments Make segments by swiping text clicking down at one point in the text and dragging to the place you want to end the segment then releasing the mouse Select segment you can select a segment by clicking on the segment line which runs under each segment You can tell which segment the mouse is over as the line of the segment is highlighted Select next previous segment use the lt and gt buttons in the toolbar to move around between segments Select next previous incomplete segment use the lt lt and gt gt buttons in the toolbar to move to the next or previous segment which is not totally coded yet Resizing Segments Select the border of a segment by moving the cursor over the small border marker a vertical line until it goes red to indicate you are over it Then click down and drag it where you want to go Delete segments if
14. table format See Figure 7 3 After a study is presented in table form a new menu is presented labelled View As Select Network to switch to Network view This way of displaying statistics has been copied from a similar feature in SysFan thank Canzhong Wu the author of SysFan for allowing me to use this feature CorpusTool 2 0 beta 5 Project Misc Help Project Search AutoGode Statistics Help Type of Study Compare two datasets Aspect of Interest Feature Coding pel HE participant Mesa VieWas Network Save fon editorial participant 100 Options person set 1 27 00 set 2 61 043 PARTICIPANTS ORGANISATION organisation TYPE set 1 53 00 set 2 24 68 participant Figure 7 3 Network View of Statistics Available from http minerva ling mq edu au units tools index htm IA 6 Saving Statistics Each Statistics window offers a Save button which allows you to save the results to file in HTML format tabbed delimited or plain text Results saved in HTML can be opened in MS Word and then cut pasted into your publications Results saved tab delimited can be opened in MS Excel on Windows right click on the txt file and specify Open with Excel These files may also be useful for programs such as SPSS 27 This section is wildly out of date and needs to be revised 7 Keywords The top words in any frequency list for English will be words such as
15. you create a segment erroneously you can delete it by selecting the segment then clicking on the delete button in the toolbar Alternatively hit the Delete key 2A 3 2 Ignoring Segments Click the Ignore button when a segment is selected and this segment will not be used in Statistical analyses Ignore segments are shown in grey in the text window The same button can be used to unignore a segment 4 The Other Actions Menu This menu displays some extra options depending on the kind of annotation whole document segments that you are annotating Edit Scheme Opens the scheme window for this annotation layer so that you can edit the scheme or add change the glosses associated with features Add New Feature Prompts you to type in the name of a new feature which is added to the currently displayed set of choices and assigned to this segment Copy Features Copy the features so far assigned to this segment into memory Paste Features Assigns the features previously copied to this segment Resegment Document Wipes all segmentation of this layer for this document Note this deletes all annotation of the document at this layer Show XML Displays how the currently open file is stored on disk in XML format Show Structure Switches to an alternative display of the segmentation interface which approximates more to the standard structural display of Functional Linguistics See figure 4 3 Note to show the elements of a
16. Data Metadata tornile Financeblews Aqe25 05 03 ba english 1252 Arial 14 Source Melbourne Age Breaking News Article Bush defends tax cuts Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US3 50 billion A533 94 billion tax cut package against oppositi on accusations that it unfairly benefits the rich OK lance Figure 2 5 File Metadata Window After incorporating two files the Project Window appears as in Figure 2 6 Note that these two files appear at the top 11 CorpusTool 2 0 beta 5 Project Misc Help NL Search AutoCode Statistics Help Project MyFirstProject Layers in this project Name Register Delete Type code documen Scheme Register xml Edit Files inthis project ergoe Action Register FinanceNews Age25 05 03 bd Action Y Register FinanceNews BBC25 05 03 bt Files in corpus but not incorporated in project iconsorporate Al Action FinanceNews CNN 25 05 03 bd Action FinanceNews FinanceEdit txt Action FinanceNews icliverpool10 07 03 bd Figure 2 6 The Project Window after incorporating files 3 3 Other Options on Unincorporated Files The other options available for unincorporated files are e Info gives some statistics about the text file number of words sentences average sentence length etc For English files also a measure of lexical density and some counts regarding pronominal usage see below Delete re
17. Layer added You can also select Import Files from the Project menu to add a layer using files from a Systemic Coder study cd3 files or from RSTTool rs3 files See Appendix for more details 3 Adding Files to Analyse The next step is to add some files to the project If during creation of your project you nominated text files to include in your project they will be displayed in the File Pane of your Project Window For now will assume you did not do this so the File Pane will appear as in Figure 2 3 empty 3 1 Extending the Corpus To add files to your corpus 1 Click on the Extend Corpus button a wizard will appear to guide you through the process of adding files You have a choice of adding a single file or adding a folder of text files 2 f you select to add a single file you can either add it to an existing subcorpus a folder within your project s Corpus folder or to add it into a new subcorpus in which case a new folder will be created with the name you supply In either case the file you select will be copied from where it is into the subcorpus folder If you select to add a folder you select a particular folder on your disks and it is copied into your project s Corpus folder 3 Once the file or folder is nominated click on the Next button and then the Finalise button The file or files will now be shown in the File Pane see Figure 2 4 The newly added files are under the caption Files in c
18. STTool supported three types of elements as well as rhetorical relations o Multinuclear nodes groupings of text segments in lists joint sequence etc all elements have equal status o Schema basically constituency structure each element links to a parent node by a named relation e g Orientation Complication Resolution o Span just a line over a node and all of its satelites used mainly for graphical niceness To insert any of these nodes over a text node right click on the blue dot of a node its handle and select one of the options from the presented list A7 7 Unlinking a subtree or node right click on a node and select Unlink Node to unlink it from its nuceus multinuclear complex or schema 8 Assigning features to segments Move between segments with the and gt buttons and code as you do with non RST interfaces see Text Annotation in the menu on the left AR Lexical Features for Concordance Searching 1 Nouns NOUN COMMON Singular noun INFLECTION plural noun reg noun regd noun glreg noun COMMON NOUN common noun POUN pg proper noun COMMON TYPE COMMON NOUN TYPE2 PROPER person Named place name TYPE REG PATTERNS Pou fixedsingular fixedplural apostr s thing noun temporal noun event noun institution noun report noun substance noun place noun human noun disease noun
19. Seti Results Set2 Results Feature N Percent N Percent T Stat Sign ChiSqu Sign PARTICIPANTS TYPE N 100 N 77 person 27 27 00 47 61 04 4817 20718 country 20 20 00 11 14 29 0 989 0 983 organisation 53 53 00 19 2468 3946 14463 ORGANISATION TYPE N 53 N 18 company 48 90 57 0 00 0000 50 323 0 government 3 11 4 2222 2503 5 911 1 89 9 50 00 6257 25 704 2 4 2 union 1 other organisation 0 0 00 11 11 0 000 6 060 2 16 67 1866 3412 political party 3 11 ls Weak Significance 90 Medium Significance 95 High Significance 98 Figure 7 2 A Contrastive Stats Study 24 3 Performing a Study To perform one of the studies outlined above 1 Choose one of the options from the Type of Study menu describe a dataset compare two datasets or compare multiple files 2 Choose from the Aspect of Interest menu choose either Feature Coding or General Text Statistics 3 Specify the unit that you are interested in see section 5 part 2 Specifying Search Queries This should be the unit which you wish to explore differences in It could be the root feature in a network as in the case in Figure 7 2 or a more delicate one 4 f you are selected Compare two datasets then enter a feature in the Set 1 space and another in the Set 2 space This should be a unit which CONTAINS the unit of interest In this case we specify units of the
20. ach file you will be presented with a window asking for some metadata regarding the file See Figure 2 5 This includes e Language which language the text written in This field is used to determine which language resources to use for the document These resources include lexicons for concordance searching calculation of lexical density etc parsers 10 for automatic segmentation and taggers Currently only English is really supported but soon lexical resources for other languages will be provided Encoding text files are stored in a particular text encoding You can tell CorpusTool what encoding your file is in by selecting from this field The default option offered by CorpusTool is a guess of what it should be but if the text does not display properly you may need to change it To find out what encoding the document is in try right clicking on the document and select Open with or the MacOSX equivalent and open the text with MS Word which may help you choose the best encoding Otherwise using Open with select Firefox and see which encoding it assigned using the Character Encoding sub menu under View Display Font Choose here the font family and size you want to use to display your text in the annotation windows Some fonts will better cope with non western writing systems e g some fonts are designed to display Chinese etc However many modern fonts should display any writing system File Meta
21. artial e whether the segment has a comment associated Click on the segment to see the comment 20 Automating Coding 1 Introduction The Autocode window allows you to either Create segments in one layer corresponding to segments already coded in another layer For instance if we use the Automatic Grammar Analysis to parse all of our sentences we can use this option to create a segment in a new layer for each modal clause in the grammar layer e Create segments in a layer based on string matches in the text For instance we can specify that a new segment should be created for any occurence corresponding to the Qnoun e Change the features of an existing segment where it matches the search criteria For instance we might have a layer of segments where they all just have the feature clause We could then use Autocode to code all clause segments containing will as modal clause 2 Adding a new Autocode Rule or Editing an Existing one Click on the Autocode button to change to the Autocode pane Initially there are no Autocode rules associated with your project S click on the Add button to add a new one A window like the following will appear r Autocode Rule Editor e Search for segments and assign those segments a particular feature C Create new segments in one layer based on corresponding segments in another layer C Create new segments in one layer based on string patterns in
22. ch are analyses of the same file have the same file name e g if you have Text1 CLAUSE cd3 analysed for clauses and Text1 GROUP cd3 AR analysed for groups rename both files to Text1 cd3 CorpusTool can only tell two files are analyses of the same text by having the same filename 3 Open a new project and use the Import Layer option as described above for one of the folders 4 Repeat 3 for the other folders Some problems may arise CorpusTool says it cannot read one or more of your cd3 files it may contain characters which are outside of ASCII text CorpusTool should handle this but currently cannot Send me your files and will import it for you If you have any other problem importing cd3 files send them to me make a zip of the folder and will look at it this is good for me to see the kinds of problems people are having so can fix them AR Annotating Rhetorical Structure UAM CorpusTool supports the annotation of texts in terms of Rhetorical Structure Theory RST Mann amp Thompson 1987 NOTE This functionality is barely functional will improve in later releases 1 Create an RST Layer Firstly add a new layer o Give itaname e g RST o Specify segmentation as Rhetorical Structure Annotation o Select a mode of automatic segmentation e g Paragraph should work for any language Sentence will work for any language where a terminates a sentence You may need to correct mis
23. e of your choice e Save Diagram as SVG saves the presented network in SVG Scalable Vector Graphics format See below for more on SVG Copy to Clipboard Windows only copies the graph as displayed to the clipboard You can then paste into MS Word or other program Note for reasons only Microsoft understands MS Word needs to be open when you copy to clip or else you cannot paste into MS Word 5 Producing Images for inclusion in documents or the web The easiest way to copy your scheme into a document is via the Copy to Clipboard option above and paste into a Word document Howver this only works under Windows 10 While the SVG format is not that widely supported it is a great format for converting to other formats since it stores the image geometrically rather than as a bitmap To produce other formats from your SVG file download and install nkScape This software is free and works on Windows Macintosh and Linux Download from http www inkscape org Open InkScape and select open from the File menu Select your svg file You can edit the file here if you wish To save in another format there are two options 1 Select Save as to save as PDF EPS EMF or other vector based file formats 2 Select Export Bitmap to save as PNG format which is a bitmap format which can be included in web pages or Word documents The diagram in Figure 3 6 is a PNG file produced via InkScape person countr
24. e search query are shown with a check box next to each We can uncheck any item which is a false match e g not truly a passive Clicking on Code Selected will then either create segments corresponding to each selected match or the re code them if that is the option selected In this way we can quickly code many of the more common grammatical patterns You can also use Autocode to quickly assign features at document layer For instance recoding documents as spoken if they are in the SPOKEN subcorpus of the project eg documentos With Fr ld SUBOOBP SPOR 21 1 Display All Agreements Conflicting Nonconflicting selecting from this list allows you to filter out some of the matches e All shows all of the matches e Agreements shows all segments already coded with the specified feature e Conflicting shows those segments which are already coded with a feature which conflicts with the feature you are autocoding For instance if autocoding as passive this would show all segments already coded as active e Nonconficting shows all segments which are neither agreements nor conflicting 2 Select All None selecting one of these options will select deselect the check boxes next to each segment 3 Code Selected Clicking on this button will automatically code all displayed segments which are selected Hints For some grammatical phenomena you can provide a pair of rules like e Select passive if contains be a
25. el of significance but this also depends on how much data you have In some more scientific papers you might be requested to provide T Stats but it is quite rare in linguistics Chi Squared in recent years particularly in linguistics chi squared statistics are becoming the preferred means of testing significance CorpusTool provides the Chi oquared statistics for each comparison and the level of significance that corresponds to this At the end of each entry there will be between O and 3 signs These indicate how statistically significant is the difference of this features mean from that of the mean of the other set none Not significantly different Significant at the 90 level 10 chance of error Significant at the 95 level 5 chance of error Significant at the 98 level 2 chance of error QA The level of significance is important to establish how repeatable your results are Results without significance may be accidents and if we repeat the study with other texts the result may be different If results are highly significant they are likely to be repeatable if we apply the analysis to a totally different set of texts To understand this a single means that of any 10 such results you can expect one to be a false result 90 significance or 10 chance of error 5 Presenting Results as a Network When performing a feature based study you can now view the results in a system network instead of in
26. enant withdrawal missile bomber invasion combat rounds missions OO OOO O O OOOO OO temporary 40 street gay cent SU body E analysts CER incident 98 growth B R OO OO OO CO CO CO OO O OO OO OR CO Co CO C9 OO O O 8 Phrases Rather than looking at single words n gram analysis looks for sequences of words which are common in the corpus For instance a list of the frequent 3 grams sequence of 3 words that occur in a small corpus of introductions to academic papers are shown below in terms of ad hoc networks a set of we believe that in this paper of this paper the performance of terms of a of the two in section 4 be able to some of the a number of in order to the design of large number of which can be that can be the problem of ad hoc network Q O1 O1 O1 O O1 O O O OT According to Biber e g Biber and Barbieri 2007 as the corpus grows to a reasonable size millions of words the kinds of phrases that raise to the top don t contain lexical content as such eg ad hoc networks Rather they are phrases which are used to frame such meanings We see here in terms of a set of etc 20 While keywords tell us which words we should teach in a text n grams can tell us which phrasings are usefully taught For instance assuming we were teaching students how to write introduction sections to academic papers we collect a corpus of such texts and produce the key n grams for various lengths
27. es by default to all documents Current Choice the middle box is a choice which needs to be made for this document Double click on one of the options That choice will be moved to the Selected Features box If there are more choices in the coding scheme the next choice will then be displayed Gloss Box If you introduced a gloss for a feature in the scheme see Section 3 3 above then if you single click on a feature in the Current Choice box the gloss will be displayed in this space This is useful when you have forgotten what exactly is the coding criteria for this feature 2 The Comment Frame In this box you can type comments about the current segment either to remind yourself of some problem or to communicate with other people working with the same project For instance one might write Is this a material or behavioural clause Check with IFG In summary to code a whole document 1 Select from the options shown in the Current Choice box until no options remain 292 2 If you make a mistake double click on features in the Selected Features box to undo the selection 3 Close the window and your codings will be saved 3 Annotating Code Segment files When annotating a document at a layer specified as Code Segments the process is slightly more complex Firstly for the sake of this tutorial let s add a new layer to our study 1 Bring the Project Window to the front Click on the Add Layer button o
28. first feature which contain segments at another layer tagged with the second feature For instance one might search for finite clause containing person amp subject to find all finite clauses where the segment boundaries totally include a segment at the participant layer which is coded both person and subject containing string this will allow you to find all segments with the nominated feature which contain a given string Matching is not case sensitive NOTE this feature is also used for concordance searching searching based on lexical features wildcard matching etc See below for more details in segment this allows search across layers specifying that segments should match only if they are contained within segments at the second specified layer For instance one might search for person in editorial to find segments tagged as person in editorials Immediate containment NOTE for search queries including containing segment containing string or in segment you can choose between immediate and anywhere The difference is as follows e anywhere if the containing segment contains the specified segment or string it will match e immediately Sometimes users allow units embedded within others at the same layer for instance clauses can be embedded within other clauses If you specify immediately then if the contained segment or string falls within such an embedded unit it will not match the un
29. he state of the economy now dominating voter opinion polls commentators are divided over the effect the tax package will have The president said By leaving American families with more to spend more to save and more to invest these reforms will help boost the nation s economy and create jobs He added When people have extra take home pay there s greater demand for goods and services And emnlovers will need more workers to meet that demand DE person Add ae organisation Save Tex ED verbal as natal Figure 8 1 The Text Styler Window 2 Opening the Text Styler From the Project window the main window click on the filename of one of your files Note this only works for files incorporated in your project Also your project needs to have at least one layer defined 3 Styling the Text You can assign colour and or font effects bold italic underline to all text tagged with a given feature or feature combination This allows the patterns of selection throughout the text to be visible E g use bold italic underline for appraisal categories and colour coding for clause type to see how appraisal is distributed in respect to clause types Ad 4 Saving Styled Text You can save styled text to an HTML file To include styled text in an MS Word document open the HTML file in MS Word and from there cut pasted into your own document A 1 Merge Projects Up until Version 2 8 4 this function did
30. howing results for each document individually 1 Describe a dataset offers descriptions of your corpus or a specified subcorpus 2 Compare two datasets provides a comparison of two subsets of your corpus e g english vs spanish When Feature is selected the two sets are contrasted in terms of the occurrence of presence of the features in the codings at the layer specified Levels of significance of the differences between the sets are displayed both in terms of Students T test and Chi Squared see below 3 Compare Multiple Files provides details of each file in your corpus one column per file 2 AContrastive Feature Study Figure 7 2 shows a sample Comparative study done using the Finance project Note very little of the text has been annotated so the results are for small numbers only We would need to tag a thousand or more participants from a range of editorials and fpn front page news articles before we could start to trust the results This preliminary study shows two significant results more reference to people rather than organisations at a 98 level of significance and significant differences in the types of organisations discussed but the numbers are too low to trust lt CorpusTool 2 0 beta 5 Project Misc Help Project Search AutoCode Statistics Help Type of Study Compare two datasets Aspect Of Interest Feature Coding Help Unit participant Shows ViewaS Table p Save fpn editorial
31. ick on a layer button where the layer was specified as Code document then a window like that in Figure 4 1 will appear 21 Register analysis for FinanceNews Age25 05 03 txt JON Bush defends tax cuts 4 Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US350 billion A533 94 billion tax cut package against opposition accusations that it unfairly benefits the rich Bush who is launching into a re election campaign insisted in his weekly radio address that the package narrowly passed by Congress on Friday would boost the ailing economy and create badly needed jobs With the state of the economy now dominating voter opinion polls commentators Other Action Saye Klose Help register n a editorial Figure 4 1 The Code document window The code document window has 4 parts 1 The Text Frame shows the text file You can scroll to see the whole text 2 The ToolBar giving various actions such as Save Close and Help see below la The Coding Frame contains three boxes a Selected Features labelled Assigned the features already assigned to the text Initially this will contain one feature the leftmost root feature of the coding scheme for this layer As other features are assigned they will appear here You can delete features by double clicking on the features in the Selected Features box The root feature cannot be deleted since it appli
32. inanceNews Age25 05 03 txt 7 hits Bush President George W Bush his opposition the rich Bush his FinanceNews BBC25 05 03 txt 18 hits the country s finance minister Hans Eichel he Mr Eichel ha Figure 5 1 The Corpus Search Window DA 2 Specifying Search Queries At the top of the window is a menu driven widget to define your search query For this tutorial we will use a small project called Finance which can be downloaded from the CorpusTool website on the Download page 1 Simple Feature Search To search for all segments containing a given feature click on the widget at the top left in Figure 5 1 person and select a feature from one of your layers Then press Show to see all instances Press Save to save the search results to a file More Complex Searching Click on the small next to the feature selector to extend your query and Allows you to add another feature and the search will return all segments containing both the nominated features or Allows you to add another feature and the search will return all segments containing either of the nominated features NOTE and and or cannot be mixed and not Allows you to add another feature which should be excluded and the search will return all segments containing the first feature but not the second feature containing segment this allows search across layers it returns all units tagged with the
33. its in which the unit is embedded For instance with They left because she was tired a search for clause containing immediately was would only match the inner clause 3 Combining Complex Searches One can combine complex searches e g person containing immediately bush in finite clause in editorial amp english 27 3 Concordance Searching CorpusTool lets you search for lexical patterns English only currently for most features 3 1 Specifying the Search Query If you specify containing string see above you can specify a lexical pattern instead of a simple string For example to find passive clauses be Qparticiple will match all segments containing any form of be followed by a participle verb en verb Note that the corpus is NOT tagged in terms of part of speech POS Rather CorpusTool includes a large dictionary of English and looks up each word in the dictionary Because of this a word will match all POS classes to which it belongs For instance beyo will match all occurrences of being even in the context where the word is not a verb e g the being Matching occurs as follows Case Insensitive all searching is case insensitive Thus Birch will match Birch and birch and BIRCH The search string consists of a sequence of search tokens separated by a space Each search token can be of the following format 1 Literal token a token not containing o
34. l Macintosh The installation of CorpusTool placed the application in your Applications folder Double click on the application to launch it e You might find it useful to place the application in the Dock for easy access If you have already created a project you can open it simply by double clicking the cptr file in the Project folder This file has an icon as below MacOSX Windows UJ The Opening Window A window should appear as in Figure 2 1 This window provides amongst other information the version number you are using useful if you need to communicate bugs The Window offers several options Start New Project or Open Project to continue with a project you have already started If you have opened a project previously on this machine there will also be a button to open the last project opened Open Project R Wir den unabhargsgee Veal TT sted Version 2 0 beta 5 Copyright Mick O Donnell 2007 Email michael odonnell f uam es Web http www wagsoft com CorpusTool Start Heyy Open S E P GANGE Figure 2 1 The Opening Window 1 2 Click on the Start New Project button After clicking this button a Create Project Wizard will appear which will lead you through the steps needed to create your project 1 Providing a name for a new project 2 Specify the folder where your new project s folder is to be stored For instance choose the Desktop folder on your machine When
35. moves the file from the corpus Also deletes the file from the project s Corpus folder Filename clicking on the filename will present the text in full 4 Actions on Incorporated Files Once a file is incorporated it offers one button for each defined layer In the sample project we have so far defined only Register so the incorporated files only have a button for this layer As other layers are added buttons for those layers will appear also 4 1 Changing File Metadata Text annotation only We saw above that when a file is incorporated you are prompted to specify its language encoding and display font You can change these choices at any time by selecting Change File Metadata from the Action menu associated with each file 4 2 Viewing General Statistics for a file Text annotation only To see general statistics about each text file select View Basic Text Stats from the Action menu on each row This will provide some basic statistics 12 about the text which do NOT depend on any annotations of the file see Figure 2 7 This includes e number of words in text e average word length e number of sentences in text should work for European languages e average sentence length in words again for European languages File Infor File Crime Anova15 07 03 txt Length Words in text 179 Sentences in text 9 Text Complexity Av Word Length 5 30 Av Sentence Length 19 8 Lexical Density
36. n the right of the screen Call the layer Participant Select Annotate Segments Select Do not automatically segment Select Create New Scheme 7 Press the Finalise button Note that this adds a new Layer in the layer space and also adds a new button for each incorporated file Now lets define the scheme for this layer 1 Click on the Edit button in the space for the Participant Layer 2 When the scheme window opens change participant 1 to human and participant 2 to organisation 3 Click on PARTICIPANT TYPE and select the option add feature Type in other participant p WE gc I Your network should look like that shown below participant Now close this window returning to the Project Window Click on the Participant button for one of your text files This will open an annotation window for the document at this layer See Figure 4 2 92 JParticipant analysis for FinanceNews Age25 05 03 txt CIO Xx Source Melbourne Age Breaking News Article Bush defends tax cuts Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US350 billion A533 94 billion tax cut package against opposition accusations that it unfairly benefits the rich Bush who is launching into a re election campaign insisted in his weekly radio address that the package narrowly passed by Congress on Friday would boost the ailing economy and create badly needed jobs With the
37. ndition You can also introduce complex entry conditions into your network A complex entry condition involves a conjunction and or disjunction or of features Set the number of terms to 2 or higher Then select the features you wish as the input Note that at present you cannot mix AND and OR in the same entry condition You can simulate such by first making a gate a system with only one feature For instance to construct the entry condition A and B or C first make a system with one feature call the feature b or c Then use this feature and the A feature in an AND entry condition for your original system The entry condition for this system will now be A and B or CY 2 2 4 Moving a feature to another system If you want to move a feature from one system to another click on the system you wish to add the feature to and select Add Feature Then type in the name of the feature you wish to move The feature will be moved to this system All codings will be adjusted for the change 12 3 Adding Glosses to features You can add a description of each of the coding features Click on the feature and select Add Gloss This will open a window where you can type Type a description of the feature the criteria under which it could be selected This gloss will then be visible while annotating a document see Coding section below Hyperlinks in Glosses As of version 2 7 1 you can put hyperlinks in your glosses For instance
38. not work correctly Before Merging To ensure cleaner merger delete any layers in either project which have no annotation associated with them In all cases where the same files exist in both projects e g schemes corpus files and annotation files the file in the current project will be preserved Files will moved from the other project where they do not exist in the current project e Select Merge projects from the Project menu You will be asked to select another project to merge with the current one Results will be saved in a new folder with the same name as the currrent project plus merged added AA 1 Appendix Importing Systemic Coder Studies 1 How to Import Coder Studies The analysis files in Systemic Coder can be imported into CorpusTool To do so follow the following instructions If you have a single file to import 1 n2 2 2 28 Ensure that the coding scheme is saved as an external file master scheme To do this open the file in Coder and select Scheme Storing from the Options menu Select Save to Master and specify a location to save the scheme Ensure the codings are saved as cd3 not cd2 if the file on disk has a cd2 extension you need to open the file and select Save Codings As from the File menu The program will offer to save it as a cd3 file Now make a new folder and place within it the scheme file and the codings file Open CorpusTool and create a new project
39. on from Systemic Functional Linguistics The hierarchy is called a system network lt consists of a number of inter dependent choice points called systems The system shown in Figure 3 2 is one such Figure 3 4 presents 6 systems organised into a network 16 A system consists of three parts e System name the name of the choice Typical names in grammar may be MOOD POLARITY FINITENESS etc System names should consist of a sequence of letters and perhaps numbers and hyphens but no spaces within the symbol The system name needs to be unique CorpusTool will not allow you to provide the same name for two systems The program will automatically display systems in upper case features the alternatives in the choice In the above example editorial and fpn are the features of the system These are displayed in lower case regardless of how you type them in The same name cannot be used in two systems e entry condition each system has an entry condition the feature or complex of features which forms the context in which the choice becomes relevant In the above case the entry condition for the system is register which happens to be the root feature of the system network several systems can have the same entry condition in which case the systems are called simultaneous systems They form a cross classification of the entry condition For instance we might introduce another system with register as entry condition which might have
40. orpus but not incorporated in project CorpusTool makes a distinction between incorporated files which have buttons to annotate at all Q available levels and unincorporated files which are in the corpus but not yet opened for annotation This distinction is made to make it easy to keep track of those files which you have started editing distinct from those you may wish to add later If you have 100 files in the corpus but have only annotated five then you want the five with annotations to be clearly indicated This allows for a gradual expansion of your corpus over time but lets you get results at each point CorpusTool 2 0 beta 5 Project Misc Help Project Search AutoGode Statistics Help Project MyFirstProject Layers In this project Name Register Delete Type code documen Scheme Register xml Edit n Ba Files in this project end Corpus Files in corpus but not incorporated in project sanan All Action FinanceNews Age25 05 03 bd Action Y FinanceNews BBC25 05 03 bt Action Y FinanceNews CNN 25 05 03 bd Action Y FinanceNews FinanceEdit bt Action FinanceNews icliverpool10 07 03 bt AoE Figure 2 4 The Project Window after extending the corpus 3 2 Incorporating Files To incorporate a file into the project making it available for annotation click on the Incorporate button next to that file Defining Language Encoding and Display Font as you incorporate e
41. participle e Select active if clauses and not passive Use the first rule to code passives and then use the second rule to put everything else as active Provide one rule such as the passive rule above Code these Then edit the rule inserting a between the search terms e g e Select passive if contains be Q participle This will find some instances where not or an adverb falls between the verbs 29 Corpus Statistics 1 Introduction The Corpus Statistics pane allows various statistics to be derived from your tagged corpus Press the Statistics tab on the main window s toolbar to see the Statistics pane as in Figure 7 1 Corpustool 2 0 beta 5 Project Misc Help Statistics Type of study Describe a dataset Aspect Of Interest General Text Statistics Bel Unit clauses Siow Figure 7 1 The Statistics Pane You can use this interface to perform two kinds of studies on your corpus 1 General Text Statistics offers general statistics of the corpus such as total number of segments number of words per segment lexical density in the corpus pronominal usage etc 2 Feature usage you specify a feature in a layer most typically the root feature of the layer and the program describes the usage of features in the corpus at that layer counts mean and standard deviation 22 These studies can be done for a single dataset descriptive statistics two datasets comparative statistics or s
42. pronoun reflexive pronoun genitive pronoun Ip pronoun personal pronoun PRONOUN 2p pronoun PERSON wh personal pronoun human pronoun P 3p pronoun PRONOUN TYPE neuter pronoun PRONOUN r singular pronoun NOUN pronoun vc NUMBER plural pronoun spatial pronoun temporal pronoun LOCATION thin PRONOUN TYPE TE pronoun one pronoun nonpersonal pronoun that pronoun LOCATION wh pronoun PRONOUN TYPE2 L nonwh pronoun e Number cardinal number DOE ordinal percentage e Conjunction conjunction and conjunction coordinating conjunction E juncti coordinating conjunction E or conjunction ONJ but conjunction TYPE pre or Infix conjunctior NORMAL SUBORDINATING infix only conjunction subordmating conjunction E CONJUNCTION TYPE Geen gerund conjomer Prepositions preposition REP agent preposition to preposition of preposition TYPE as preposition by preposition other preposition KD e Determiners DEI TYPE2 determiner QUANTIFYING strict determmer nonstrict determiner STRICT positive strict determiner DETERMINER TYPE NONSTRICT negative strict determmer wh strict determiner quantifying determiner DEIERMINER TYPE demonstrative determinet DEIERMINER NUMBER e Punctuation SENTENCE singular dete
43. r will match the token itself only 2 Wildcard token if the query token includes an the will match any number of chars Thus Ca matches cat carburettor etc ed matches weed lived etc bro en matches broken Brollerglen etc 3 Match any a by itself matches any single token The above 3 cases should work for any language where words are divided by space characters or punctuation 4 Constraining by class a wildcard form can be followed with and then a lexical feature and the form will match only tokens which according to the system s lexicon can take that lexical class E g ca noun matches nouns starting with ca ing mental projecting matches mental projecting verbs ending with ing An asterisk cannot appear by itself it must have text either before or after it A full list of the lexical features that can be used are in Appendix and can be seen within the tool by selecting Show Wordclass Network from the Misc menu of Corpus Tool 5 General class matching If no token string is provided before the then the query form matches all tokens which could represent the specified class E g noun matches any noun form verb matches any verb form adverb matches any adverb form 28 mental projecting matches any verb which is classified as mental human noun matches any noun classified as a human noun 6 Inflection
44. rmmer plural determiner period exclamation mark sentence final punctuation FINAL TYPE question mark comma CONJUNCTIVE semicolon hyphen colon open bracket close bracket P BN GE UNCTUATION conjunctive punctuation TYPE PEER mut BRACKETING racketing punctuation rr genitive s single quote DOUBLE Open quote QUOTE TYPE close quote double quote ER
45. roject Management Window using the X in the top right corner you quit CorpusTool all changes saved 6 Continuing a Project Once your project is created the easiest way to open CorpusTool to work on your project is 1 Open your project folder on the desktop 2 Double click on the cptr file which has a blue globe icon Corpus Tool will open directly with your Project Window UNDO No undo is currently supported It will be supported in a later version 14 Defining the Coding Scheme 1 Opening the Scheme Editor Before annotating files for a given layer you need to define the annotation scheme for the layer The first step here is to open the scheme editor Click on the Edit button within the Layer Toolbar see figure 3 1 Layers in this project Name Register pase Type code documen n y Figure 3 1 The Scheme Edit button Scheme Register_xml Options REGISTER rregister 1 BS are le Figure 3 2 The Register Scheme before editing A window like Figure 3 2 will pop up It shows a small system network a hierarchy of features with register as the most basic concept and a choice between register 1 and register 2 15 2 Editing the Scheme These features have been automatically generated and we will change them to features more informative Click on register 1 and a menu will appear with options as in Figure 3 3 Choose Rename Feature Delete Feature dd Glo
46. segment the element itself must be a segment in this example the clause as a whole was also a segment JFunctionStruct analysis for FinanceNews Age25 05 03 txt l 1 ES Sunday 25 May 2003 8 05 AM President George W Bush on Saturday defended his US350 Ator Circumstan Process Goal De Numerat billion A533 94 billion tax cut package against opposition Goal i Circumstance Numerator Classi Thing Classifier accusations that it unfairly benefits the rich Circumstance Thing A Circum Proces Goal Riich whn is laiinchinan into a re electinn camnaian insisted in his vi GE Janore Dejete Other Action Other Action wave Close elo material participant actor Figure 4 3 Function Structure Display Mode 2h Corpus Search 1 Introduction The Search Interface is opened by clicking on the Corpus Search button on the Project Window Figure 5 1 shows this window NOTE You can also open the Search Window from a Scheme window Click on a feature and select Show Examples CorpusTool will open the Search window with all segments marked with that feature displayed Descriptive or Comparative Feature Statistics Click on the count field of any set and the instances which make up the count will be displayed CorpusTiool 2 0 beta 5 Project Misc Help Search Enter Search Query Below person Show SHOW AS by riel Saven eng F
47. ss dd Realisations Move Up Move Down Graph from here Graph from parent Show examples Figure 3 3 Options for Features These options will be explained more fully below For now we want to change register 1 to something more plausible Lets assume that all of our texts are news articles and that they are either front page news or editorials We thus want to change register 1 to fpn and register 2 to editorial The important option is Rename Feature Click on this option A window will appear asking you to provide the new name for this feature Type fpn and then press Return Repeat the same process with register 2 and rename it as editorial Notice also that the choice between fpn and editorial has a name automatically provided as REGISTER TYPE Lets rename this as ARTICLE TYPE Click on REGISTER TYPE and when the menu comes up select Rename System Coding Schemes can get quite complex The scheme in Figure 3 4 is marginally more complex and these schemes can grow to contain hundreds of choices However for now the smaller the scheme the quicker the coding E visually present speaker later mention pronominal ED is group REFERENCE demonstrative other reference definite TYPE fer h DEFIHITEHESS DEVICE E reler by type substitution indefinite reter by name ellipsis Figure 3 4 A more complex scheme 2 1 System networks CorpusTool uses the hierarchy representati
48. takes it makes o The RST coding scheme will be selected automatically for you 2 Load in texts You can either o Load in new texts see section 2 on Extending the Corpus o Import texts already annotated with RSTTool you can load in texts already annotated with RSTTool they must be saved in format rs3 Under the File menu select Import Files select RSTTool specify either Single File or Folder of Files Then select the file or folder you want imported On the next pane select the layer you created in the last step On the next pane specify where you want your file s stored within your project folder a subcorpus name all texts are assigned to a sub corpus 3 Annotating a text To open a text for annotation click on the button next to a file with the name of your RST layer A window should open showing your text 4 Segmenting the text click at the point in the text where you want to segment If you get it wrong currently the only way to correct segmentation is to click on the Actions menu in the Toolbar and select Show Text which switches to the usual CorpusTool interface and you can change segment boundaries here This will be fixed in later versions 5 Structuring the text Each segment in the text tree has a blue dot Drag the blue dot of one node to another blue dot and you will be prompted for the relation type between them dragging from the satelite to the nucleus 6 Inserting other types of elements R
49. the of and a A more informative listing works out how important each word is for a particular corpus when compared with a more general corpus For instance the keywords from a corpus split over three fields are shown below The words are ordered in terms of their specialness for this corpus relative frequency in this corpus when compared to the relative frequency in the general corpus A value of 100 indicates the word appears 100 times more in this corpus than in other corpora NOTE for this to work one needs to select only a sub corpus If you select the whole corpus then nothing will happen RR Military troops economy 121458 crime 142405 weapons companies 116 52 detective 50 0 engine Stock 10050 police 49 16 mountains tax HOD disappearance 40 0 smoke GUES D o criminal 5 ui gult profits 90 Court 34 88 enemy investment 75 Justice 20425 aircraft en hiem TESE driver 50 23 force returns d boy 29 00 CIVIlrdne sales 103 VIGEINS 186 Civilian earnings Od family e investors 65 cdd de FOOS 9540 Car Se package 05 lived 11490 assets 05 officers d EE prices 60 legal Flegi IE Bal 60 children DO corporate 60 kids NIC stocks Ds mercy GEN markets SNO ME investigators budget D woman 9401 finance SU murder 8 5 VOlaLtrliry 50 boys Cero reforms a age 1 11 commercial 40 Victim 6 64 6 6 5 EN OO OO OO OO O guys military Squadron suicide tanks soldier Jungle altitude Strikes trees lieut
50. the text register register Cancel A eee e Choose amongst the 3 options to either create new segments or recode existing ones e Then use the widgets at the bottom to specify your search criteria RA Some examples follow R Autocode Rule Editor Search for segments and assign those segments a particular feature C Create new segments in one layer based on corresponding segments in another layer C Create new segments in one layer based on string patterns in the text clause containing immediately be passive passive clause Save Cancel Autocode Rule Editor C Search for segments and assign those segments a particular feature Create new segments in one layer based on corresponding segments in another layer C Create new segments in one layer based on string patterns in the text clause containing immediately will modal clause Save Cancel Autocode Rule Editor x C Search for segments and assign those segments a particular feature C Create new segments in one layer based on corresponding segments in another layer Create new segments in one layer based on string patterns in the text modal verb modal auxiliary Save Cancel Note as with Search lexical based search patterns currently work for English 3 Showing and selecting Matches Once you have added or edited an Autocode rule press the Show button and all instances matching th
51. these schemes see section 3 li Copy Existing Scheme In some cases you might reuse a coding scheme that you developed before or which was produced by someone else CorpusTool ships with a few schemes predefined which you could use One of these is Peter White s Appraisal network Another is based on Granger s error annotation scheme For this tutorial select Create new scheme Then click on the Finalise button and your new layer will be added to the Project Window Figure 2 3 shows the Project window with one layer added The Layer space provides some information about the layer it s name Register its type code document and the name of the scheme associated with the layer Register xml There are two buttons on the Layer control panel e Delete this will delete the layer and all analyses of text files performed on this layer Press this only before you begin coding of the layer or if you really want to delete the layer e Edit this button will open a window to allow you to edit the coding scheme We will come back to this in the next section CorpusToo 2 8 7 Project Misc Help MOS Search AutoCode Statistics Explore Options Help Project My First Project Layers in this project Name Register Delete Type code documen Segmentanone Scheme Registerxml Edi Kl n Files in this project 1 Extend Corpus Figure 2 3 The Project Window with one
52. verb VERB TYPE reaction verb please type verb Subclasses of verbal verb VERB TYPE cognitive action verb cognitive state verb VERBAL addressee oriented verb ADDRESSEE verbal verb VERB TYPE not addressee oriented verb ORIENTED VERB TYPE2 ADDRESSEE ORIENTED VERB TYPE telling verb asking verb proposing verb STATING SAYING y stating verb VERB TYPE saying verb stating saying verb SAYING VERB TYPE N saying required saying not required AN e Adjectives absolute adjective comparative adjective superlative adjective NONINFLECTING r comparable adjective ER EST ADJ INFLECTION er est adjective ER EST ADJECTIVE reg ad ADJECTIVE TYPE REG PATTERNS regd adj adjective noninflecting adjective ADJECTIVE nationality adjective SEMIYPE other adjective e Adverbs ADJECTIVE TYPE noncomparable adjective manner adverb temporal descriptive DESCRIPTIVE connective adverb descriptive adverb ADV more less adverb EU INTENSIFIER saa TYPE r intensifler ay pp most leas degree intensifier jussative nterrogative adverb ago adverb ADVERB TYPE2 modaLadverb location adverb other deseriptive adverb D1 e Pronouns nominative pronoun accusative pronoun PRONOUN CASE genitive2
53. y PARTICIPANTS participants company TYPE government ORGANISATION TYPE union organisation other organisation political party Figure 3 6 PNG file output 2n Annotating Files 1 Annotation Types CorpusTool currently supports 4 types of annotation 1 Code document the document as a whole is assigned features Useful for defining document language text type register etc Also can be used to code features of the writer e g language proficiency 2 Code segments the user defines segments in the document and assigns features to each segment For instance clauses NPs words speaker turns etc 3 Automatic Grammar Analysis The text is automatically analysed by a parser the otanford parser and then incorporated as a layer of analysis showing clauses and constituents Subject Object etc This currently works only for English Works only on well formed language not well on dialogue or twitter 4 Rhetorical Structure Annotation the user breaks the text down into segments and connects these together into a rhetorical structure tree as used in Rhetorical Strcuture Theory Mann and Thompson 1986 Below we will explain how to annotate in the first two ways Automatic Grammar Analysis and Rhetorical Structure Annotation will be explained in their own sections 2 Annotating Code Document files Each text file incorporated into your project has a button for each layer of analysis If you cl
54. y text has been annotated with the deleted features the features will be deleted from the codings 17 e Add Gloss here you can add a description of what this feature means perhaps in terms of a coding criteria eg select this feature if See below e Move Up moves the feature higher up in the system Note that the first feature in the system is the default in coding e Move Down moves the feature higher up in the system e Edit Realisations You can add realisations attached under features The program doesn t do anything with them but you can use this feature to annotate features e g with a gloss of the choice e Show Examples Once you have annotated some texts selecting this option will open a Corpus Search window showing all instances in the corpus tagged with this feature 2 2 3 Changing Entry Conditions To change the entry condition of a system click on the system and select Change Entry Condition You will be presented with a dialog box as in Figure 3 5 Entry Condition and register OK ance Figure 3 5 The Change Entry Condition Dialog Simple Entry Condition If you want a simple entry condition the system extends from a single feature then set Number of terms to 1 then choose the feature you want as the entry condition Press OK and the graph will be redrawn as specified The CorpusTool will automatically update the codings which are affected by the change Complex Entry Co
55. you click the Finalise button CorpusTool will create your project which is a folder containing all the details related to your project including the corpus and the annotation files It also contains an icon which can be used to launch your project directly the cptr file Once you have finished with the Create Project Wizard the CorpusTool Main Window will open showing the Project Management pane See Figure 2 2 This pane is where you control details of your project such as which files are included and what types of analyses are involved Eat CorpusToo 2 8 7 ca le 2s Project Misc Help Project Search AutoCode Statistics Explore Options Project My First Project Layers in this project Add lv m layer Files in this project 1 Extend Corpus Figure 2 2 The Project Management pane The buttons at the top of the pane allow you to switch between the different panes of CorpusTool Project this section Search section 5 Autocode section 6 Statistics section 7 Explore section 8 Options and Help We will assume for now that the Project pane is selected The big letters at the top show the name of your project Below this is a space showing which analyses layers are involved in the project Initially this is empty Below the Layers space is a box showing all the files in the project initially empty and for each file one button for each of the
Download Pdf Manuals
Related Search
Related Contents
SDI入力ボード ハードウェア取扱説明書 V1.0 Panasonic WV-SFN611L surveillance camera 009185EX - bredent GmbH & Co.KG Important Information Guide (French) Sweex SA629 SMS Smart Media Solutions Flatscreen FM ST1200 Panas。nic - Panasonic K620 紙幣・硬貨対応の小型両替機 Paso 5 - Brother Flash-Contact 76 Copyright © All rights reserved.
Failed to retrieve file