Home

Readiris User`s Manual

image

Contents

1. s so svelte that you can tuck iti The page analysis will even detect zones where you get white text on a black background Recognizing such inserts is no problem while the preview displays the scanned document correctly on screen Readiris inverts the image when the need arises to recognize such text blocks You can have your scanner generate fully inverted images to process pages with white text on a black back ground See below ONE AND A HALF SORTING WINDOWS Readiris not only detects the various blocks but also sorts them the zones are sorted top down left to right by default to cope with columnized documents Evidently you can modify the sort order To do so click the Sort button on the main toolbar The mouse cursor becomes a pointing hand as soon as the sort mode is enabled Click on the windows you want to include Windows you do not click on are simply ignored excluded from recognition It s easy to see which windows are Ze Io USER S GUIDE selected and which aren t the selected windows have their full color non se lected windows have a lighter color tone Readiris C Program Files Readiris english jpg page 1 of 1 File Edit Settings View Process Learn Register Help OCR Wizard x cost way Although the first research and development on Optical Charac ygiuition OCR began more than 30 years ago this technology is still unknown asl of the people who could use it fo
2. Ei Adobe Acrobat Autoformat pdf Ei Adobe Acrobat Autoformat pdf pi File Edit Document Tools View Window Help je File Edit Document Tools View Window Help sense Aa Euro SBa aaa AE B Tal Arr BAESU Sl EB Bookmark eee TE Nah Aut The aim of au g a7 I iek 0 Image 1 B L Tiles 1 Bookmarks The aim of at oe Autoformatting the original do fd the original do ee H S E i et 2002 Copyright he OCR proces E F l She OCR proces a Tables recognize your t recognize your t ae 0 Table 1 you too you toa In a way text m In a way text n OR READING THEM Let s look the other way for a moment As Readiris offers full support of the Adobe Acrobat PDF format you won t just generate PDF files you can also read them Repurposing PDF documents may be a major application of Readiris There are several reason why this is the case First of all it s a way of converting images into text open image based PDF documents execute the recognition and save the OCR result to a text document in any supported text format Text files are editable image files are not Second case you can convert image based PDF files to text based PDF docu ments You then execute the recognition on image only PDF files and save the OCR results as text based PDF documents Text based PDF files are search able and ed
3. 16 nights 9th 25th Oct 2000 The image smoothening can also be enabled when you load prescanned m ages into memory Files of type ai image files Digital camera Force as 300 dpi Wo Smoothen color images Load POF documents in color The brightness now By brightness we actually mean the black and white threshold The setting Automatic determines the bilevel threshold automatically Apply a different threshold when necessary by darkening or lightening the black 5 22 0 and white image when you darken the image more pixels become black n the black and white version when you lighten the image less prxels become black in the black and white version Note above all that no image adjustment is executed until you click the Ap ply button By clicking OK you execute the adjustment and close the window Here s an example where we lightened the black and white image dramatically though admittedly not with OCR accuracy in mind W Smaothen color image With some scanner models reduction of the sharpness is needed to recognize color and greyscale images Smoothening allows to separate the text from the colored background te Cancel i Brightness o Cel e Apply Dhe deerimenl is reci ber yoni se Automatic amd re H e a Ai s ale ob Menmuell E7 black punts sels on a wwe Help infarmatika From L ser piels tl lighten I oy sten tO ety LSL Lisiscb rating e
4. Readiris USER S GUIDE HIRIS Document to Knowledge Readiris Pro 2002 I R 1 S All rights reserved OCR technology by I R LS Connectionist AutoFormat and Linguistic technology by I R LS 2002 I R 1 S All rights reserved III USER S GUIDE SAVE TIME No More RETYPING Congratulations on acquiring Readiris This software package will undoubt edly be of great help in recapturing your texts tables and graphics As efficient as computers are you have to key in your information first If you have ever retyped a 15 page report or a large table of figures you know how tedious and time consuming it can be Use this state of the art OCR package to automatically enter text in your applications and you Il acquire an unprecedented level of efficiency and comfort Scan a printed or typed document indicate the zones of interest or have the system detect them for you and execute the character recognition Documents composed of many pages are processed from start to finish in a single effort A few mouse clicks beat long hours of work as Readiris converts your paper docu ments into editable computer files it s up to 40 times faster than manual retyping The wizard guides you through the OCR process comfortably answer a few simple questions and you ll obtain quick and easy results with Readiris You can send the reading results directl
5. nee 1 21 User Lexicons to Boost the Linguistics 2222222ssssnennnnneennnneeeseseennneenen 1 26 Readiris Changes Languages As Needed 0 00ccccceccccssesesssseeeeeeeeeeeeeeeeeeeeeens 1 29 Reading Documents with Mixed Languages uu2uuunnnneeeeeeeeeeeeeeennenenn 1 31 Defining the Document Characteristics 02202sssssnsnsnneennnnnnennnenseeennnnneenenn 1 33 Readiris Gets More Intelligent Each Time nn 1 34 1 ANTM cineca e nlc oneness paisa dade de eee UND E E E E EEEE E EEE 1 36 Don AGM ee eek ea ee 1 37 B 121 1 SEEN e nO eee PRUE te SE Te CP eee PET AL LEHRER Te a eee eee er UNE oe 1 37 ee A E E ed ne ee A 1 38 Finish Repent na oo A EU EOE PIEPER OE CTU Te a OT e PE Deter Mine RRC EEN Te CED ETC Oe Re 1 38 ADO ce erase Ses dct sweet eee nassau ec et ees ee rece 1 38 TheRole ot Foni B e111 6 01 10 a ee ee eu 1 38 Sending the Result Directly to Your Application eeennenn 141 Saving the Results in a Text File u ee ae 145 Creatine Portable DocUmeniS nee ee ae 1 47 e IER ae I O a EE EEA E E E E EA 1 51 Recon IVIL AS nee 1 53 Editing multipage documents cccceceessssssccceeeeeeeccceeececeeeeeeeeeessssssesaeeeeeeeeeeseess 1 59 Sirung a New IOCHITIE Eee 1 60 Organizme ihe TESL OUMU een N A E 1 60 BELLINP UP our SCANNER en ee 1 62 Bring Color to Your Text Scans sense ee
6. rth g Sottwarebl Pro PC Suite rtf Sun StarOffice 5 rth Sun StarOffice 6 0 rth Test MS DOS Format tet Text txt Unicode txt Unicode UTF 8 txt WardPerfect 4 2 wp WordStar ws Recreate source document Options WoardStar 2000 The option Open after Saving is largely similar to the send feature you open the recognized document once it s saved Output Send ta External file Microsoft Ward 97 2000 2002 rtf W Open after saving USER S GUIDE However the method used to address the target application 1s different This time the Windows file types determine which application will be started up It s as if you double clicked the output file in the Windows Explorer With the op tion Send to Readiris addresses specific target applications directly Folder Options General View 4 File Types Offline Files Registered file types Extensions File Types Microsoft Excel Data Interchange Format Microsoft Word Document Microsoft Word HTML Document DOCMHTML File Microsoft Word Template Microsoft Word HTML Te Details for DOC extension Opens with Microsoft Word Files with extension DOC are of type Microsoft Word Document To change settings that affect all Microsoft Word Document files click Advanced CREATING PORTABLE DOCUMENTS We ll go deeper into one format Adobe Ac
7. Excel so we select Excel as target application under the Format button Output f Send ta Microsoft Excel AbiS ource Abiw ord External file Adobe Acrobat Header Image Text Adobe Acrobat Reader Text v Clipboard Clipboard Microsoft Excel Layout Corel WordPerfect HTML editor Jarte 1 Retaj een hi Microsoft Excel etan word and paragraph Id yierasaft Internet Explorer Microsoft Word 97 word 2000 Word 2002 Create body text m Netscape OpenOfice ong nter 1 0 Softwareblz Pro PC Suite Sun StarOffice 6 0 Options The spreadsheet is started up automatically and the result looks like this the typical table structure with rows and columns is recreated and you are immedi ately ready to process the data USER S GUIDE E Microsoft Excel Book File Edit wiew Insert Format Tools Data window OSM AR SRY BO e a Ti ro 2 390 397 745 129 24 5 509 1 2 J 19 149 915 91 549 4 207 410 49 526 3 012 5 429 000 U 17 17 429 B 499 123 149 25 122 098 i E You may come across ungridded tables the page analysis does not detect as table zones because the columns are too widely spaced Readiris tries to avoid confusion with columnized text blocks To create a table window manually click on the Table Window tool in the image toolbar and proceed as usual the button s tooltip again indicates the number of table windows EE oan LLH GETTING ON LINE HELP This concludes
8. The Readiris software is delivered exclusively on an autorunning CD ROM To install simply insert the CD ROM in your CD ROM drive and wait for the installation program to start running Follow the on screen instructions Should the installation not begin to run when the CD ROM is inserted in your CD ROM drive run the setup program MENU EXE to install the software Users of Windows XP Windows 2000 and Windows NT must ensure that they have the necessary access rights contact the system administrator if necessary Some installation options are offered Be sure to install the linguistic data bases of all languages you intend to read By default all lexicons are installed You are recommended to install the sample images which are used in the tuto rials of this manual InstallShield Wizard Select Components Choose the components Setup will install Select the components you want bo install clear the components you do not want to install Languages Sample Images 17117 K Electronic Manual adar k Adobe Acrobat Reader Bes K Space Required on C 92395 K Space Available on C OKE Description Includes the linguistic databases Install the lexicon of all languages you intend to recognize Change lt Back Cancel Similarly install the Acrobat Reader software required to access the software documentation should this be necessary The electronic manual is by default copied to your hard disk
9. You can also leave t on the CD ROM USER S GUIDE The submenu I R I S Applications Readiris under the Programs menu is created automat cally by the installation program m LRLS Applications fan Cardiris I IRISPen fay Readiris 1 R 1 5 on the Internet Readiris Uninstall Readiris i T User s Manual The same holds for a shortcut to Readiris on the Windows desktop As a result you are able to start Readiris directly from your desktop 5 1 4 UNINSTALLING THE READIRIS SOFTWARE There are only two correct ways of uninstalling Readiris using the Readiris uninstall program and using the Windows un install wizard You are strongly recommended not to uninstall Readiris or its software modules by manually eras ing the program files Readiris uninstall program Select Uninstall Readiris under the submenu I R I S Applications Readiris to start the Readiris uninstall program and follow the on screen instructions fam LRLS Applications fan Cardiris IA IRISPen ff Readiris 7 LRLS on the Internet m Reading Asiar documents Ed Readiris Uninstall Readiris Windows un install wizard Execute the following steps to make use of the Windows un install wizard J Click Settings under the Start menu of Windows and go to the Con trol Panel Q Click the icon Add Remove Programs under the contro
10. and on the Readiris icon to recog nize them As soon as pages gets processed an additional toolbar the page toolbar is added on the right side t represents the various pages of the document and gives access to the page commands using the right click the Context menu 5 220 GETTING STARTED WITH A First TUTORIAL The best way to become familiar with the operation of Readiris is undoubtedly by using it A number of prescanned images is provided with the software they allow you to get started even when there is no scanner connected to your com puter Let s turn to these now The Source button on the main toolbar determines whether you are going to use a scanner or a prescanned image as image source Color greyscale and black and white images are supported on an equal basis Readiris allows you to open Adobe Acrobat PDF documents JPEG images Paint brush PCX images DCX fax images a multipage version of the Paintbrush format PNG images TIFF images uncompressed LZW PackBits Group 3 and Group 4 compressed multipage TIFF images and Windows bitmaps BMP This capability 1s particularly useful to convert your faxes into editable text files As you are going to open a prescanned image you should select the disk and not the scanner as image source with the Source button Next click the Open button When you select the disk as image source the Scan button is replaced by the Open button an
11. Know that the tooltip of the Learn button indicates at all times which font dictionary is currently active and in which mode that dictionary operates era nteractive learning TAMY Documents Readiris dus New Dictionary When you enter the interactive learning the dictionary and its operating mode are indicated in the window title you should click the Abort button and start over In case they are wrong New Dictionary C My Documents Readiris dus Dont leam Abort SENDING THE RESULT DIRECTLY TO YOUR APPLICATION The interactive training concludes the character recognition As Microsoft Word operates as output target by default your wordprocessor is started up au tomatically at the end of the recognition if necessary and the recognized text 1s inserted You may get a progress bar on screen as the recognized document gets for matted Whether this progress bar appears on screen or not depends on the size of the document and the complexity of the formatting to be performed Formatting text CC USER S GUIDE The scanned image s displayed again with the zoning as created to be avail able for further processing t stays there until you scan another page You have indeed converted a paper document into an editable computer file be it 40 times faster than manual retyping Go ahead and compare it with the image you have inside your Readiris window Actually Readiris offers three differen
12. Properties You can even drag several prescanned images from the Windows Explorer onto the Readiris window The same argument holds all images you drag onto the Readiris window are added to the current document until you click the com mand New Document Readiris sorts the images automatically image 001 tif precedes 002 tif pre cedes 003 tif etc The page toolbar on the right side it is displayed as soon as pages get processed represents the various pages of the document and gives access to the page commands using the right click 2 9 USER S GUIDE The current page s highlighted n the page toolbar and mentioned in the Readiris title bar The page toolbar comes with a tooltip hold your mouse pointer over a page thumbnail to learn which image was loaded into the memory If a multipage image was opened there s obviously just one file for all the images When you are scanning multipage documents the tooltip simply mentions the scanner model HP Scarlet 5470C Load the sample image MULTIPAG TIF and start the recognition The vari ous pages are displayed one after the other the Readiris title bar indicates the page number Readiris C Program Files Readiris multipag tif page 2 of 5 File Edit Settings view Process Learn Register Help OCR Wizard KS rihemiore no distinction shall be made on the basis of the political jurisdictional Scan Q Recognize r h Pag
13. Readiris is unable to recognize such as mathematical and scientific symbols and dingbats Some examples Readiris can be trained to recognize the r symbol as pi or the dingbat as Tel However the list of recognized symbols cannot be extended with the symbols z and The recognized text 1s displayed progressively and the system stops on doubt ful characters or if you are dealing with touching characters ligatures on doubtful character strings They are always presented n their context the doubt ful characters are highlighted Unrecognized characters are represented by a tilde the symbol USER S GUIDE New Dictionary C My Documents Readiris dus The first thing you should do 1s verify if you activated the correct font dictio nary and dictionary mode these are always indicated in the title of the learning window If that 1s not the case click the Abort button the document image is redisplayed with the zoning as was created enable the right font dictionary or dictionary mode and run the OCR again The operation of font dictionaries will be discussed shortly If necessary enter a character or character string for the incorrect or un known shape and click one of the following buttons Learn You agree with the proposed solution or correct it The program saves this doubtful character in the font dictionary as sure final Future recognition will no longer require your
14. inside your Readiris window But how do you save the text of additional pages Or n other words how do you process documents consisting of multiple pages It s actually very simple go on recognizing pages but enable the option Append when you are saving to the same file If you append an existing file be sure it isn t currently open because that will prevent you from writing to it Secondly don t forget to put the font dictionary in the append mode so that you can continue the font training comfort ably a File name Readiris Save as type Microsoft Word 97 2000 2002 lf Append As soon as you scan pages or open image files inside a document you have to decide whether you want to start a new document or complete the current document Readiris Are vou ready to delete the current document Yes No Cancel 5 2 48 Answer no to add pages to the current document answer yes to create a new document Th s answer has the same effect as the command New Docu ment under the File menu New Document Ctrl M But there s a more efficient way of recognizing several pages than scanning and OCRing them one after the other processing multipage documents di rectly To scan a document composed of several pages in one operation enable the document feeder of your scanner with the option ADF under the Scanner button Landscape Ww ADF gr Invert il Digital camera Scan
15. intervention the shape s considered learnt once and for all In the example above the system stops on a soiled character and we click Learn to accept a shape which cannot be confused with other characters Don t Learn You agree with the proposed solution or correct it The difference with the Learn button is that the learnt symbol gets the status unsure n the dictionary For future recognition the system will propose the learnt solution but still re quire a confirmation This button is used for symbols which might be confused with others a de faced e which might be mistaken for a c a damaged t which closely re sembles an r etc New Dictionary C My Documents Readiris dus The e above is seriously damaged in fact it is close to the e symbol and you should click Don t Learn so as not to confuse the two symbols Delete The displayed form is eliminated from the output This button is used to ignore noise on the documents spots coffee stains etc which might get recognized as points commas and what have you and to erase any other unwanted sym bol USER S GUIDE Undo You go back to correct mistakes You can undo the nine last decisions Finish The learning process is aborted but the OCR continues in automatic mode All decisions by the system thereafter are accepted without user validation Click this button when you see that the recogniti
16. our overview of Readiris Some last minute information may not be included in this manual We thus recommend you to consult the on line help system for additional information on Readiris Go to the Help menu to do so The command Help Topics and its shortcut key Fl allow you to navigate through the many help topics E Readiris help SE y gt A Hide lack Foward Home Print Options Contents Index Search Welcome to the Readiris help Introducing OCR Recognizing Documents How to Reference Information Software Versions and Options Product Registration Product Support LRLS Welcome to Readiris Help Use on line help to learn more about Readiris e Quickly find answers to questions Connect to the IR1S web site for latest tips and product updates 2002 Copyright LR LS All rights reserved The other commands of the Help menu tell you how to get product support how to contact I R I S give direct access to the I R I S home page etc
17. pages for you Enable the op tion Detect Page Orientation under the Settings menu and Readiris will cor rect the page orientation where needed Detect Page Orientation You can make good use of the image DESKEW JPG in the Readiris folder if you want to try it Enable the options Page Deskewing and Detect Page Ori USER S GUIDE entation before you open the image and let Readiris restore the Tower of Pisa the way we like it m j Ne J f p Ej Readiris C Program Files Readirisideskew jpg page 1 of 1 OJ X File Edit Settings View Process Learn Register Help oa magnetar us een a novia We Geigy ged ul p pgo wA aog atsAqeue ad ng Fal oy tle maa ri A wpe a A pu ay en piana w I roe ore san vp Fal may vet og AFT Q i i Ko para I fer igne ap t Ga rer u a a BE ee de none migao E we anaa ae ny pic EHEN a er means im az men um mir Pe omiaa ND NE N une a FE hen ump zo RE paar Ga ye LS spree aca iura a st peyar pera ANI ma or u zied ap PWAN F siup NA PT s ru ayt Va sto hc fe jno sa nopan aa Bed PPP zu opiu ma au sags Buruau tens zart po aged ay durpapdl woTyo He ADJUSTING THE SCANNED IMAGES As was already indicated powerful intelligent routines automatically convert color and greyscale images into black and white Should this still be necessary 5 2 68 the user can optimize the image further for th
18. struct them for you by recreating the tables cell by cell in your spreadsheet or by inserting a table object inside your wordprocessor files USER S GUIDE Let s explore the different solutions starting with the gridded or framed table it has borders around the cells 3 Readiris C Program Files Readiris tables jpg page 1 of 1 File Edit Settings View Process Learn Register Help OCR Wizard Readi ng Ta bles 1A Readiris recognizes tabular data and recreates them ceil by cell in worksheets Scan or as lable objecis inside wurdprocessor files To insert tables as table objects you must retain ihe word and paragraph formatting or recreate the soure document we the Format button on the main toolbar The page analysis detects gridded and ungndded tables Gndded or fumed lables have basa sare borders around the cells as docs the example below The barders of lhe table cells get recreated Performanee test optical media English CD ROM Average accesy CPL Vielen clip Sequential Digital Versatile Disk time msec utilization playbacks read 16 KB a Source CD ROM Ax speed seni ee gt a E corres ni CD ROM 32x s sn 2 987 Tr Tested on 333 HHz Pentium H eee Ungridded tables don t have any borders around the cells When Ihe volum s ol ungerichdest lables arc loo widely spaced the page analysis may nor derect a table window to aveid
19. top accuracy including low quality documents faxes and dot matrix printouts It copes beautifully with badly scanned and copied documents containing too light or dark font shapes Joined characters ligatures are resolved and fragmented forms such as dot matrix symbols are recomposed User verification in pop up style not only flags doubtful characters but also increases the system s precision All solutions confirmed by the user are memo rized increasing speed and confidence as you go along Using Readiris means rendering it more intelligent each time This powerful learning tool allows you to train Readiris on special characters such as mathematic symbols and dingbats but also to handle distorted fonts as you will find in real documents To increase your productivity further Readiris not only recognizes your texts but can format them for you as well Make use of autoformatt ng and Readiris recreates a facsimile copy of the scanned document the word paragraph and page formatting of the original document are retained Similar typefaces are used the point sizes and typestyles as used in the source document are maintained across the recognition The placement of columns text blocks and graphics follows your original documents And as Readiris supports greyscale and color scanning effortlessly you can recapture any graphics be they lineart black and white photos or color illustrations When a document con tains table
20. un WA TEXHOJIOTUA EINE NOKA is HEH3BECTH IIHPoKoM TIy nuke us et aBTOMATHYECKOTO BBOoAa MaTepmasa H Scanner JOKYMEHTOB USER S GUIDE The end result looks l ke th s when opened with the wordprocessor you may have to select a Cyrillic font to display the Russian text correctly WordPad File Edit Mew Insert Format Help mee ex er Bs UD TlpequasHadeHiem CHCTEMEI OnTHyeckoroa PacnosHabaHHa SHAKOB ABIIGeTCA ABTOMATHYECKHA BBON IIEYATHEIK HOKYMEHTOB B TIaAWATE KOMIEROTEDA KpamHe HORYMECHTOB For Help press F1 To mix other languages simply select the language with the most extended character set If you have a document where the say French translation is placed alongside an English text you have to select French as language to ensure that the accentuated characters such as and get recognized correctly DEFINING THE DOCUMENT CHARACTERISTICS Now that the language is set we ll turn to the other document characteristics You can fine tune the recognition by specifying some document features the font 5 2 28 type and character pitch These commands do not apply to As an documents Let s clar fy what th s means Let s start w th the command Font Type under the Settings menu The font modes separate normal documents from dot matrix printed documents Draft or 9 pin dot matrix symbols are made up of isolated separate dots and highly specialized recognition rout
21. view and edit such documents Office XP and 2000 were specifically designed to cope with documents in many different lan guages Refer to the Readiris Read Me file for more information on this subject USER S GUIDE Selecting the proper document language is imperative Based on the selection of a language the software knows which symbol set to recognize Multi linguis tic support ensures that exotic characters such as B y and are recog nized correctly Secondly the software extensively uses linguistic databases to validate its results Suppose that you have to read the word president where an ink stain makes the r look like an f Looking things up in the English lexicon Readiris will detect autonomously that the word president is being read and that it doesn t make any sense to recognize the symbol f This self learning technique is of course highly dependent on the linguistic context Linguistics offer useful help to solve ambiguous cases such as an O which might be mistaken for a 0 Another typical example is the letter I and number 1 which have an identical form in many fonts think of texts produced on old typewriters The linguistic context helps to determine whether you are dealing with I or 1 The illustration below shows various shapes of l and I The shapes on the first line are unambiguous the shapes on the second line are ambiguous but linguistics can solve them W
22. zur Intimit t ist z ee ee I 5 und die symbolische das Weibliche zur ck MR estellt das Weibliche Tod Auch im zweit BE REN art robots The Marine 5 programmatisch Er apply to whole document 1 Start new column Das war Hartmans m Prostituierten die si es verwirklicht 7 wird Rafterman der Fotoapparat gestohlen Freilich ist auch diese Lesart nochmals zwiesp ltig Denn im Damit sind die Sujets umrissen die Unf higkeit Kampf kann Joker nicht auf die Frau schie en eine symbolische der Amerikaner sich ein Bild zu machen und Ladehemmung ausgerechnet Rafterman der Naivste und o sich in einer Welt die vor allem unordentlich Schw chste der Truppe erschie t die Hecken sch tzin und rettet ist zurecht zufinden in einer Welt die durch die Joker das Leben Jokers Geburt als Soldat ist mehrfach gt TEXT FORMATTING PART 2 The other layout options are Create Body Text and Retain Word and Para graph Formatting As the icon on the right side illustrates creating body text means you create a non formatted running text The text will be captured but its formatting 1s USER S GUIDE entirely ignored Use this option when you just need to recapture a text but not its layout Layout f Create body text Retain word and paragraph formatting Recreate source document wi The option Retain Word and Paragraph Formatting represents the middle road the word formatti
23. 1 64 Different Devices Different Resolution uuuusaaeeeeeeeeennneeseeenennnnneeeeneennnnen 1 66 BASE Delan leoc IES see a ee ae ae ee 1 69 VII USER S GUIDE Saving Specific Settings aa nenn nenne erinnern 1 70 Scanning DOCUMENTS ernennen 1 71 Adjusting the Scanned TMG GES nee een ie 1 73 Letting the OCR Wizard Work for You 2222222snnseneeeneeeneeeenesseseennnnneenn 1 77 Readiris Recreates Your Document Layout uuusessseeseeeeneeeenessseennnnnenenn 1 78 Columns Please NOL Frames en ee een 1 83 De SE POT ar a ee ehren 1 84 Saving Graphics Separately uuuueeeeeeeeeeeesessssssssnnnnnnnnnnnnnnnnnnnnnnnnnnnneeeeenn 1 85 Taking Graphics to the Hilt 000neneeeeeeneneneneeseessnsnennnnnnnneennn 1 87 Reading Faxes and Deferred Recognition ennnnnnnnnneenn 1 89 Roo aane ADE eee net ne rg nn ae rn rE eT EUR 1 90 Recognizing Business Cards ein 1 95 Scanning Business Cards se ocouseirusncicsesayancwssouitnimcdpinmominintgugeceeandeddodadnoseinwardieseaaseeaes 1 96 It Takes a Business Card Reading Mode ssneeeeenen 1 100 RECOGMIZING Business Cards a nee ee 1 102 Getting On line FIG 522 csteenacecenescinsdtcuaeosageaeisyeccienctcineeecenatateachadectintociocecntticneseantacenteee 1 104 CREDITS AND COPYRIGHTS The Readiris software is designed and developed by I R I S OCR Con
24. DOS Text ASCII etc do not support advanced formatting codes and therefore cannot offer autoformatting The Adobe Acrobat PDF format on the other hand was designed to copy the look of your documents PDF documents by nature imply autoformatting When the recognized text s opened using a wordprocessor the text looks like this without any intervention by the user ig Autoformat Microsoft Word File Edit View Insert Format Tools Table Window Help DRAA SRV BAS o HOR 44 Normal Courier Ne Courier New 12 3w PDD ia OB Final Showing Markup EIER Type a question for help X Er aR 1 7 1 6 1 5 i DER 1 172 1 3 104 0 500 6 0 7 0 gt ae Autoformatting The aim of autoformatting is to recreate a facsimile copy of the original document he OCR process does more than just recognize your text it can format it for you too In a way text recognition is becoming more and more page recognition or document recognition Whether your OCR software reformats the recognized text or not is up to the user You can perform OCR because you just need the text in which case you will edit and format it yourself and you can recreate the source document including its formatting At 7 5cm Ln 28 Col 16 The various levels of formatting are creating body text retaining the word and paragraph formatting and creating a facsimile copy Creating body text means no formatting is appl
25. Formatting is applied you get a continucus running text All formatting if any is done afterwards by the user If you ren Ihe wurd und permmaph formatting the font ope size and typestyle arc maintained across the recognition The justitication of the paragraphs s alao detected TInwever no graphics are captuted and the cuhun iren L cecrewted Ihe paragraph just follow each other etc Antoformatting recreates a facsimile copy of the original document the text blocks graphics an lhles are reerealed in the same placce and the word and paragraph formatting are maintained across the recognition 100 000 As uresull you gel a bre copy of yaur soure doenment be it a compact and editable teat Ale no lonver a seemned image of vaur document TO02 Copyedyck Immy vun i I nayenzat Geucere Click the Format button on the main toolbar and choose to send the OCR result to Microsoft Word or select the RTF Rich Text Format or Word DOC format Secondly select Recreate Source Document as layout option The option Merge Lines into Paragraphs is enabled by default to apply wordwrap within the paragraphs USER S GUIDE Layout Create body text f Retain word and paragraph formatting e Recreate source document W Use columns instead of frames Whether layout reconstruction is available depends on the selected output mode Some poor formats generating plain text such as Text ANSI MS
26. I SFR Open with 20KB Y Bla ZKE Y F 2 e 544KE JI af Cut IKB Y Er copy 975KB Jl E or 597KB Y E ar nn Shortcut 1 264 KB Y gr Peste 20KB Y vc r 20KB he Properties SOKB Y hur ver BSKB D That does not mean the OCR is promptly executed to give the user full flex ibility Readiris is simply started up and the image is opened The image toolbar on the right side of the Readiris application window con tains all commands you need during the image preview tools to indicate the zones of interest to rotate the image zoom in and out etc LOOMING IN ON IMAGES Readiris has several commands that allow you to zoom in on a scanned im age for instance to verify the scanning quality The image toolbar contains buttons that allow you to zoom in at real size to fit the image to the page width and to fit the entire image in the preview window 2 11 USER S GUIDE The View menu contains the same commands and adds two extra zoom levels you can display the image at 50 and 200 of its actual size At actual size a screen pixel corresponds to an image pixel Shortcuts are available for all zoom levels View w Fit to Window Ctrl F Fit bo width Cr Actual Size Ctrl 1 200 Actual Size Chrl 2 Also notice that the zoom levels are available on the right click Click with the right mouse button to invoke the Context menu and select the appropriate zoom level 5 Readiris C Program Files Readiris english jpg
27. R I S under the Help menu of Readiris details in which ways you can get in touch with LR LS SE A Hide Back Print Contents Index Search 2 Welcome to the Readiris help ie Introducing OCR Recognizing Documents How to Reference Information Software Versions and Options Product Registration Product Support Register your Readiris licence Readiris registration form How to get product support How to acquire software options da LRLS on the Internet Options How to Get in Touch with I R 1 S Head Office Belgium Phone 32 10 45 13 64 Fax 32 10 45 34 43 1 R 1 5 on the Internet LR 1S home page http jiw ww irislink com Readiris web site http jiw ww readiris com On line shop http fshopirislink com E mail info infomirislink com E mail sales sales irislink com E mail support support irislink com USA Office East Coast Phone 1 561 395 7831 7 000 447 4744 Fax 1 561 347 6267 USA Office West Coast Phone 1 480 854 3111 800 7USAIRIS Fax 1 480 854 2929 France Office Phone 0810 00 19 27 Fax 0810 42 41 43 An application icon in the submenu I R I S Applications Readiris under the Programs menu takes you directly to the I R I S home page So does the Readiris startup screen and the command I R I S on the Internet under the Help menu of Readiris USER S GUIDE fan Cardiris IA IRISPen fag LRLS Applications gt LRLS on th
28. You can also use the command Select Source under the File menu USER S GUIDE Select Source S Ounces HP Precisionscan Pro 3 1 IBCR Il 1 2 WlA Hewlett Packard Scanlet 44000 1 2 Cancel Once the scanner is selected the same window may allow you to set the scanning resolution the page format and orientation brightness and contrast and may allow you to indicate whether you are going to use the scanner s document feeder With Twain compliant scanners all scanning parameters are often set inside the Twain interface Set the brightness and if available the contrast By enabling the option Landscape you indicate that the selected page orien tation is wide landscape instead of tall portrait The page orientation actu ally applies to reduced page formats on an A4 flatbed scanner you can scan say A5 pages half that big in portrait or landscape format but you can obviously only scan the full A4 surface in one direction The option Invert allows you to generate inverted images in the black and white scanning mode you can activate this option to process full pages with white text on a black background 5 2 96 BRING COLOR TO YOUR TEXT SCANS Readiris supports black and white greyscale and color images on an equal basis so you are free to choose the color mode that best suits your needs To include lineart graphics in the recognized documents scan in black and white
29. ared se utne har ah Despeckle off chavagciers gind Boytest ha whine miwa lu len vo ha rraz virlu islellisenl cach line yaa use H 0 zu 2002 Copyrig Wi The first two options concern color and greyscale images the last one Despeckle exclusively concerns black and white images Despeckling means that the parasite pixels also called salt and pepper noise will be removed from black and white images 2 7 USER S GUIDE If computers can t If computers can t adapt easily then adapt easily then maybe the people maybe the people using themcan using them can Be sure that you don t erase spots that are too big otherwise you might start erasing the dots on 1 etc portions of dot matrix letters etc Despeckle remove 10 pixel dots 0 20 Removing too large dots may erase useful information from the image The best way of optimizing the images for the OCR process 1s this place the adjustment window where it doesn t prevent you from judging the image adjust ment you execute Adapt the parameters clicking Apply each time until the image is crisp and clear LETTING THE OCR WIZARD WORK FOR YOU Let s get started capturing documents now Instead of going through all the parameters we ll use the OCR wizard a very comfortable way of recognizing pages Click the OCR Wizard button on the main toolbar or select the command OCR Wizard under the Process menu p
30. at ol re a OCR Wizard Cbhrl O SLR Wizard 5 2272 The wizard guides you through the OCR process comfortably answer a few simple questions and you Il obtain quick and easy results with Readiris OCR Wizard The OCR wizard leads you through the OCR process comfortably Just answer these questions and you ll get quick results with Readiris Click Nest to begin W Enable Wizard on Startup Cancel Actually the OCR wizard starts running each time you start up Readiris you can avoid this by disabling the option Enable Wizard on Startup in the first screen of the wizard and with the equivalent option under the Settings menu READIRIS RECREATES YOUR DOCUMENT LAYOUT The OCR wizard renders the recognition process highly automatic but auto matic OCR should not be confused with autoformatting Autoformatting means that Readiris recreates a facsimile copy of the scanned document the word paragraph and page formatting of your original document are applied Similar typefaces serif and sans serif proportional and fixed normal and condensed are used as in the source document the point sizes and typestyles USER S GUIDE bold italic and underlined are maintained across the recognition The tabs and the alignment left centered right and justified of each text block are recreated The placement of columns text blocks and graphics follows your original docu ment In other word
31. confusion N with columnized text blacks When your lables eaclusiveky conlains numeric characters enable the numeric reading mode with the langnage butrar on the main toolbar for increased aueuracy Run the recognition with the layout option Retain Word and Paragraph For matting or Recreate Source Document enabled and the table gets recreated Open your wordprocessor to have a look at the result You could obviously have included the text paragraphs in the text file as well ial Table Microsoft Word SEE File Edit View Insert Format Tools Table Window Help Type a question for help X OsGaa SRY oO 0 Alm 44 Normal gt Times New Roman 12 SIB Z i gt Final Showing Markup Showy PAD Dr Ga amp By TER WEHEACHICH DERTROG EBUETERUKETAUKE IE dar eds reise A Performance test optical media ee e E CD ROM Average access CPU Video clip Sequential Digital Versatile Disk time msec utilization playbacks read 16 KB frames Kbps dropped 7 CD ROM4xspeed m a Juu CD ROM 24x speed s aa j oo CD ROM 32x speed o o ma oo on bo O s om Page 1 Sec 1 1 1 At 12 4cm Ln 14 Col 1 REC TRK EXT OVR French Gy Now the ungridded example it has no borders around the cells Note that the page analysis nevertheless detects the table USER S GUIDE Readiris C Program Files Readiris table jpg page 1 of 1 File Edit Settings View Process Learn Register H
32. d above the text in a two layered PDF file Use the Search tool of Adobe Acrobat Reader and this becomes quickly obvious Adobe Acrobat Autoformat pdf File Edit Document Tools View Window Help H64 Ad Ble i gt vu a Oc FAAS 7 amp E S B 0 2 BS BQUHeT Autoformatting The aim of autoformatting is to recreate a facsimile copy of the original document i The various levels of he B fprocess docs more than just formatting are creating body Tecognize your text it can format it for text retaining the word and you too Fee formatting and ereaftfing a facsimile copy Find Again 1 O Match Whole Word Only Cancel D Match Case Find Backwards Ignore Asian Character width Comments a i Signatures 5 Click the Format button to discover an option that concerns the Acrobat PDF format Create Bookmarks Options IY Merge lines into paragraphs i Include graphics W Create bookmarks The option Create Bookmarks sees to t that a bookmark is created for each document element the graphics as well as the text blocks and tables For USER S GUIDE the text zones Readiris applies an intelligent algorithm to come up with a title a summary per zone the tables and graphics are simply numbered Another navigational element of PDF documents page thumbnails can be created dy namically by your Adobe Acrobat Reader software
33. d the corresponding Scan command under the Process menu is replaced by the Open command mr pen You could also select the command Open from the File menu and open a prescanned image directly this works even 1f your scanner operates as current image source USER S GUIDE You are invited to select an image file Select the file ENGLISH JPG in the Readiris folder As this sample file is a color image t is not only read from disk a binar zed black and white version is created for the OCR process Loading O Readine english jpa Finally the image is displayed in the image zone The page toolbar indicates that a single page is loaded into Readiris Converting Readiris C Program Files Readiris english jpg page 1 of 1 File Edit Settings View Process Learn Register Help Recognize English gt Page Analysis AB Learn Fe Format 2 Scanner A third way of opening prescanned images is the use of drag and drop drag images from the Windows Explorer onto the Readiris image zone or on the A word about OCR The aim of OCR is ta an tamatically enter printed text documents in a very effective and low cost way Although the first research and development on Optical Characler Recognition OCR began more than 30 years ago this technology is still unknown by masl of the people who could use it tor their document entry applications Now you can use th
34. de to change the window size To move a window simply select t and drag it to another location To delete windows select them and choose the Cut or Clear command from the Edit menu The Cut command cuts the window s to an internal buffer Clear erases the window s irretrievably When you paste zones they are inserted in their original position and you have to drag them to their new location In fact all familiar commands from the Edit menu apply to the windows you can delete cut copy and paste them The Undo command also applies if you have unfortunately deleted moved resized etc some windows Undo will cancel the last operation 2 19 USER S GUIDE uk Ctrl Copy Cbrl i2 Paste krl Clear Del Delete Small windows Ctrl M Select All Chrl 4 Also note that shortcuts are available for all commands Let s give an ex ample to erase all existing windows you can choose the command Select All or its shortcut Ctrl A and click the command Clear or its shortcut Delete You are now ready to recreate the necessary layout To restore the previous layout you can choose Undo or the shortcut Ctrl Z THREE SAVING WINDOWING TEMPLATES The resulting windowing layouts can be saved as zoning templates for fu ture use with the command Save Layout under the File menu and loaded into memory with the command Load Layout Save Layout N If you have to recognize documents with a similar layo
35. e Analysis AB h o e Learn A Format i Scanner ibunal in the determination of his rights and obligations and of any criminal charge against illy according to law in a public trial at which he has had all the guarantees necessary If the interactive learning 1s enabled you go through the recognition and learn ing phases page by page The dictionary mode New s used for the first page and the mode Append for the successive pages When you click the Finish button all decisions by the system thereafter are accepted without user validation In other words the interactive learning is aborted for all pages the OCR for this document continues in automatic mode USER S GUIDE The recognition result of multipage documents is saved in a single output file When the recognition result 1s sent to a target application multiple pages get created inside a single document EDITING MULTIPAGE DOCUMENTS The user can edit multipage documents mainly to correct scanning errors he can delete pages from the document and move pages to other locations in the document The navigation first To go to a page click on its icon in the page toolbar or hold your cursor over its thumbnail invoke the Context menu by right clicking and use the command Select Page To go to the previous page you can use the shortcut PageUp to go to the next page press PageDn Or use the correspond ing commands under the View menu Prev
36. e Internet Ei Readiris et Uninstall Readiris User s Manual INSTALLED FILES The installation program has created a folder where the Readiris files are located Never try to uninstall Readiris or some of its modules by manually eras ing the program files use the Readiris uninstall program or the Windows un install wizard instead See above Read Me files and documentation README DOC Read Me file MANUAL PDF User s manual in Adobe Acrobat format Scanner drivers Finally you may find some scanner drivers on the Readiris CD ROM under the folder Drivers I R LS offers no guarantee that drivers are supplied for your scanner model or that the drivers supplied on the Readiris CD ROM will work well with your scanner model Don t hesitate to contact your scanner manufacturer or its representative should problems with scanner drivers continue Most manufacturers allow you to down load the latest versions of the scanners drivers from their web site 5 l 10 REGISTER TO VOTE Don t forget to register your Readiris licence Doing so will allow us to keep you informed of future product developments and related I R I S products The registration benefits including free product support and special offers are strictly limited to registered users You can register n many ways by sending in your registration card or faxing its electronic counterpart by calling I R I S during working hours and by f
37. e by adding one room after the other Creating polygonal table windows doesn t make any sense 3 Readiris C Program Files Readiris english jpg page 1 of 1 File Edit Settings View Process Learn Register Help fag on Optical Character psy is still unknown by Hlications AR B den yourself with the Format i and fastest tool to enter Furthermore manual windowing can be combined with window sorting you can draw new windows even when the sort mode is enabled You then use sorting to include a number of detected windows and manually create some other windows where the page analysis didn t yield the appropriate results As soon as 5 210 you start creating windows in the sort mode all zones you didn t select are promptly erased To modify move and delete windows you need to select them first To do so select the Window Selection or arrow tool in the image toolbar and click inside a window Rectangular markers now appear at each corner and n the middle of the window sides A word about OCR To unselect windows click the mouse button elsewhere To select add tional windows hold down the Shift key while clicking on these extra windows To select a window and the included windows of another type hold down the Ctrl key while clicking on the main window So much for selecting windows To modify a window select it put your mouse cursor over a marker and drag the si
38. e consecutive OCR process Select the command Adjust Image under the Process menu to do so Adjust Image Ctri J When you access this command the black and white version 1s displayed automatically It s as if you disabled the option Display Document in Color There are some complicated concepts here and we need to discuss them in detail Adjust Image iM Smoothen color image With some scanner models reduction of the sharpness i needed to recognize color and greyscale images Smoothening allows to separate the text from the colored background E Brightness Cancel f Automatic Apply C Manual Help Despeckle off ee 0 The option Smoothen Color Image renders greyscale and color images more homogeneous by flattening smoothing out relative differences in intensity As a USER S GUIDE result a stronger contrast s created between the foreground the text and the background a color artwork etc This preprocessing feature may seem highly technical and difficult to under stand but it certainly has its role to play with some scanner models this reduction of the sharpness 1s needed to recognize color and greyscale images Smoothen ing 1s sometimes the only way separate text from the colored background Below is asample image that is simply illegible without image smoothing WARE OF GALES E nigheas deh si Deich IN QUEST OF CALYPSO from only 1 650
39. elp OCR Wizard 1 Scan Q Recognize m gt English Ran m Source Pi ii En age Analysis AB Learn FE Format Sca Ungridded tables don t have any borders around the cells When the columns of ungndded tables are too widely spaced the page analysis may net detect a table window to avoid confusion with columnized text blocks When your tables exclusively contains numerir characters enable the numeric reading mode with the Language button on the main toolbar for increased accuracy Finally you can send your tables of figures directly tn Microsoft Fxcel hy selecting the spreadsheet as target application refer to the Format button on the main toolbar 2002 Copyright Image Reengnition Integrated Systems Web site hitp www inslink com Eak For optimal OCR accuracy you should limit recognition to the numeric sym bols with the Language button The numeric mode is not strictly numeric it includes the symbols 0 to 9 comma dot and the symbol Language W Numeric Baa o English Cancel Mumeric 5 2 86 As you can only do this when the table doesn t contain any alphabetic symbols otherwise the text portions won t be recognized correctly we can activate the numeric mode now but couldn t do it for the first table This time we will send the OCR result directly to the spreadsheet Microsoft
40. end of the recognition the target application is started up and the rec ognized document is opened inside a new text file or worksheet Please wait while loading Microsoft Word 97 word 2000 word 2002 Don t forget that the option Send to also allows you to copy the recognized text to the Windows clipboard so there 1s no strict need to export the result or save it to an external file SAVING THE RESULTS IN A TEXT FILE You can indeed write the OCR result to an external file Here again Readiris supports a wide range of file formats incorporating all popular wordprocessors spreadsheets web applications etc Microsoft Word DOC RTF Rich Text Format and HTML etc Output C Send to f External file Microsoft Word 97 2000 2002 doc AbSource Aboard rtf W Open after saving Adobe Acrobat POF Image Text pdf Adobe Acrobat POF Text pdf Layout a Bs Bos oe 9 10 fl dca Create body text Display Write dw Retain word and paragraph fq a a Lotus WordPro Amifro rth Microsoft Excel cr Microsoft Excel htm v Use columns instead o Microsoft Excel tab tet Microsoft Word 2 doc Microsoft Word 4 0 6 0 7 0 Microsoft Word 97 2000 2002 dac Microsoft Word 97 2000 2002 rtf W Merge lines into paragraphs Microsoft Works 4 5 5 0 6 0 aps Multikd ate mm W Include graphics OpenOffice org Writer 1 0 rtf Rich Text Format
41. ever your scanning mode may be use a scanning resolution of 300 dpi for normal applications Use a higher resolution of 400 dpi for small print below 10 point and when the document is very degraded Readiris reads point sizes of 6 to 72 point 0 08 to 1 or 0 21 to 2 54 cm 6 point 72 point Readiris also recognizes drop letters large caps that cover several lines These can of course be no bigger than 72 point eadiris reads drop letters also called drop caps that cover several lines and assigns them to their starting line As optimal OCR requires a resolution between 300 and 400 dpi Readiris warns you when you re submitting images with a resolution lower than 200 dpi or higher than 800 dpi However Readiris can correct scans with too much detail for you Enable the option Optimize Resolution for OCR in the scan settings to do so Whenever the image resolution of your scans exceeds 600 dpi the resolu tion is reduced for the OCR process i Optimize resolution for OCA There are other ways of avoiding this warning you may be reading faxes which have a resolution of 100 or 200 dpi when you re creating images with a digital camera where the resolution is unknown and when you re opening USER S GUIDE images where the file header contains an incorrect resolution To process such images hassle free enable the option Force to 300 dpi This setting applies to both direct scanning and t
42. h a carriage return added at the end of each line This option is not available when the PDF format is selected Adobe Acrobat PDF files always store text line by line The Format button contains some formatting options we haven t discussed yet this will be done shortly SETTING UP YOUR SCANNER Let s set our scanner up now It 1s assumed that the scanner hardware and necessary drivers are installed correctly 5 2 96 If your Readiris software licence was bundled with a scanner or digital cam era this step probably is unnecessary as your hardware may already be set up under Readiris Click the Scanner button on the main toolbar Scanner Click the button Scanner Model to determine your scanner model Scanner Type Format ok HF Scanlet 54700 Ad Config Scanner Model Cancel Resolution 300 Bright m Black and white m Greyscale EB Invert lighter darken Cals Digital camera Force as 300 dpi W Optimize resolution for OCR Smoothen color images When you select the option lt Image gt as scanner prescanned images func tion as image source at all times you won t have even to select the disk as image source with the Source button on the main toolbar The Config button 1s only available when you scanner allows it It gives access to some advanced scanning parameters with Twain scanners clicking the Config button allows you to select the Twain source
43. h numerous advanced features We will discuss all major features n this chapter and add many tips and hints concerning the use of Readiris STARTING THE SOFTWARE UP Click on the Readiris application in the submenu I R 1 S Applications Readiris or click on the shortcut to the Readiris application on your desktop ZI Cardiris G r spen m Readiris fy LRLS Applications I R 1 5 on the Internet Ko Readiris et Uninstall Readiris w User s Manual The Readiris startup screen and application window are displayed The startup screen displays the version and copyrights of the Readiris software It also gives direct access to I R I S s home page simply click on the URL to visit the I R LS web site Clicking the mouse anywhere else makes this screen disap pear Readiris release 9 0 2002 Image Recognition Integrated Systems SA All right reserved For more info on new products and Upgrades vist our web site ww irislink com eh Ve a a Sea E The next window concerns the OCR wizard click Cancel for the time be ing THE First TIME STARTUP Depending on the software bundle you acquired the first startup may be spe cial you may be prompted to register your licence If this is the case the use of Readiris is limited to 30 days and by registering you receive a free softkey from I R I S to continue using the software after the first month It take
44. he opening of prescanned images Invert Digital camera Digital camera w Force as 300 dpi w Force as 300 dpi Smoothen color images Smoothen color images Load POF documents in color When your images are acquired by a digital camera instead of a scanner it is mandatory that you enable a special option that also applies to scans and prescanned images Invert i Digital camera Digital camera w Force as 300 dpi w Force as 300 dpi Smoothen color images Smoothen color images Load POF documents in color By doing this you enhance the image before it gets recognized There are specific challenges to be met when it comes to digital cameras they produce low resolution images even when you hold the camera very close over your document and the image resolution is in any case unkown There are some finer points to be aware of when it comes to successfully recognizing images captured with a digital camera First of all select the highest possible image resolution Create for instance 2 048 x 1 536 size images when 1 024 x 768 and 640 x 480 images are also supported Secondly enable the macro mode of your camera to take closeups which is always the case when you photograph documents This mode was designed to capture flowers insects etc Otherwise the images are unsharp and illegible Limit yourself to no or small compression important compression reduces the sharpness of
45. hen the context does not suffice the user inter 193 19505 ihr Well Rossellini READIRIS CHANGES LANGUAGES AS NEEDED But the buck doesn t stop here Readiris can switch languages in the middle of a sentence without any help from the user When Western words pop up in Greek Cyrillic or Asian documents many untranscrible proper names brand names etc are written using the familiar Western symbols Readiris can switch 5 2 26 to the correct alphabet automatically In other words you can activate a mixed alphabet of Greek Cyrillic or Asian and Western characters Be sure to select Greek English or the appropriate Cyrillic language setting for instance Byelorussian English In other words don t try to just select Greek or Byelorussian as document language and hope that the Western sym bols will come out fine are Russian E Here s an example where a Russian text contains some English words open the image file ALPHABET TIF if you want to try it for yourself 9 Readiris C Program Files Readiris alphabet tif page 1 of 1 File Edit Settings View Process Learn Register Help OCR Vizard Scan ig t Russian English e Recognize IIpeqHa3sHayeHnem CHCTEMbI ONTHYECKOTO Pacnho3HaBaHHa 3HAKOB ABIIACTCA A aBTOMATHUYECKHUH BBOA TIeyaTHBIX NOKYMEHTOB B NaMATb KOMIbIOTepa KpafiHe P ShbeKTHBHbBIM MH ANEIEBBIM NYT M Page Analysis Co uro pa3pa0oTka 3ToN AB JbA Tpeanmpmunata ee
46. icient and fastest tool to ent Recognize English M Page Analysis AB Learn FE Format 3 a id sends il Ihe image Al Ihis step Ihe document image is only a meaningless cloud s it ack points pixels on a white background Ihe OCR sottware has to exlracl Brees formation from these pixels it has to recognize shapes by assigning characlers e system extensively uscs linguistic databases when analyzing the context in this v nding correct solutions tur difficut cases he user trains the sollware on n haracters and typestyles which are recognized automatically later on This learniz adule allows you to read virlually any font In other words the software gets mo We KLOPSYTIERE IMAZ RROZALIOU mMmterratid Systems Aifa sugar hime seria aise Nebr ase Page decomposition uses three window types text graphic and table win dows Readiris discriminates text blocks tables and graphic zones containing photos illustrations etc on the page Saving graphics and recognizing tables will be discussed at great length below A color code indicates the window type text zones have a yellow border graphic windows have a blue border and tables a purple border 5 2 14 The number of windows is indicated at all times in the tooltips of the Text Window Graphic Window and Table Window tools Page analyisis is fast skew tolerant and highly accurate it traces complex irregular shapes
47. ied you get a continuous running text All formatting if any is done afterwards by the user If you retain the word and paragraph formatting the font type size and type style are maintained across the recognition The justification of the paragraphs is also detected However no graphics are captured and the columns aren t recreated the paragraph just follow each other etc REC TRE EXT OVR French Belg Gof To see the effect correctly you need to enable the WYSIWIG mode of your wordprocessor mostly called page layout mode However if you send the recognized document directly to Microsoft Word the page or print layout view is activated automatically USER S GUIDE ent Microsoft Word View Insert Format Tools cay Document Microsoft Word Eile Edit view Insert Format Tools 5 T h MHormal vo Web Layout 63 z Outline In short Readiris not only recognizes your texts but can format them for you as well OCR isn t just text recognition anymore it is becoming more and more page or document recognition as well J Normal i x Web Layout Print Layout E Outline Task Pane COLUMNS PLEASE NOT FRAMES The formatting option Use Columns instead of Frames determines how the autoformatting gets done the text blocks tables and graphics can either be stored in frames or in editable columns Frames are separate containers for text used to po
48. illing out a registration form on the I R I S web site pep USER S GUIDE E Readiris help 7 i g DH Hide Back Print Options Contents Index Search Me Register Your Readiris Licence 2 Welcome to the Readiris help ie Introducing OCR we Why you should register we Recognizing Documents e Registering allows us to keep you informed of future product developments and related i Software Versions and Options LRLS products NE Product Registration Register your Readiris licence Registering entitles you to free product support A Readiris registration form and special offers ae nn zus Depending on the software bundle you ll receive the softkey in return 35 may be needed to continue using Readiris after one month fee Reference Information How to Registration wizard Click U to start the registration wizard Mail Send in your registration card WAW Click here to access the Readiris registration form on the 1LF 1 5 web site The Readiris registration wizard as you ll find under the menu Register of the Readiris software can guide you through the registration process comfort ably l 12 Readiris registration wizard Welcome to the Readiris registration wizard It allows you to register your Readiris software license Regi ternng allows us to keep you informed of future product developments and related 1 A 1 5 products Registering entitles you to free product supp
49. ines are used to recognize them ape descended life Letter quality dot matrix printing also called 25 pin or NLQ dot matrix requires the normal setting as do the printing qualities typeset typewritten laser printed and inkjet printed The setting Automatic means that Readiris will detect the font mode auto matically Let Readiris auto detect the font mode n all cases unless you are sure only dot matrix documents are being read Obviously Automatic is the default value Font Type Rand oad Dok Matrix The font type is indicated in the tooltip of the Recognize button when no message is added to the tooltip the auto detection of the printing quality ap plies when the message Dot Matrix shows up in the tooltip the dot matrix reading mode is enabled N a Perform text recognition Dot Matrix The character pitch can be set with the command Character Pitch under the Settings menu Character Pitch ae Fixed Proportional With fixed or monospaced fonts all symbols of the font have the same width An i takes up as much horizontal space on a line as a USER S GUIDE w as is the case in this sentence Think of documents produced using a typewriter where the carriage moves a fixed distance for each typed symbol A proportional pitch means that the width of a character depends on its shape Symbols like m and w are wider take more h
50. ious Page h PageUp PageDown Next Page Let s edit the document now To delete a page from the document hold your cursor over its thumbnail right click it and use the command Delete Page To move a page up in the document use the command Move Page Up and to move a page down use the command Move Page Down Delete Page 4 Move Page 4 Up Move Page 4 Down 5 2 54 STARTING A NEw DOCUMENT You can use the command New Document under the File menu to close the current document New Document Ctrl M This command cleans the slate Any document loaded into memory con taining a single page or multiple pages is erased You are now ready to create a new document But you can also create a new document from within the current document As long as the OCR was not executed the system assumes that you want to add pages to the current document You can for instance scan all the pages in the scanner s autofeeder fill the feeder again and start over All pages scanned will compose a single document Or you could scan a number of pages and add some image files say faxes These pages again form a single document all you have to do is change the image source in between with the Source button When the OCR was already executed and you re initiate the scanning or the loading of images you are prompted to start a new document or complete the current document Readiris Are vou ready to dele
51. is effective tool in your office and unburden yeursclf with the fastidious task of retyping printed text OCR is the mast efficient and fastest tool to enter texts into your computer automatically The document is read by your scanner This device acla as the eve of your computer and sends il Ihe image Al Ihis step Ihe document image is only a meaningless cloud of black points pixels on a white background Ihe OCR sottware has to exlracl ext information from these pixels it has to recognize shapes by assigning characlers The system extensively uscs linguistic databases when analyzing the context in this way tinding correct solutions fur difficudt cases The user trains the sollware on new characters and typestyles which are recognized automatically later on This learning module allows you to read virlually any font In other words the software gets more intelligent each Lime you use il 2002 Copright Image Rovegnition Intezrated Systems Web site hip swe inslink com Readiris icon and they are promptly opened USER S GUIDE You can even open images from within the Windows Explorer right click an image file and select the command Recognize from the Context menu This command only appears when the file s file type is supported 2 10 hai Mame Size ow Bllexcel ibt IKB I o fayt 20KB Y 5 Cira Preview ae ri 20KE Y a EE ac lem Fr 932KB A Print SFr a 340KE A x Resize Pictures Fre S14KB
52. itable mage only PDF files are not Finally converting PDF files 1s a way of unlocking PDF content You can recognize read only PDF documents where the text is normally inaccessible With unprotected PDF files the content can be retrieved copied and saved to an RIF file with read only files the content cannot be extracted These docu ments can only be viewed and printed An important nuance Readiris does not open password protected PDF docu ments even if all other PDF security barriers are broken down by Readiris Proceed as usual load PDF files into memory as you open prescanned images faxes snapshots made with your digital camera etc Still there s a specific option that concerns PDF files You can open them as color and as black and white documents This option is offered because rasterizing color documents 1s much slower Look ir Readiis do fe sample File name Files of type POF pdf Cancel Digital camera Force as 300 dpi Smoothen color images iM Load PDF documents in color USER S GUIDE RECOGNIZING MULTIPLE PAGES After the OCR the scanned image is redisplayed with the zoning as created to be available for further processing You can now open the recognized text with your wordprocessor or text editor import it into your desktop publishing software or any other text based applica tion Go ahead and compare it with the image you have
53. l panel USER S GUIDE i Add or Remove Programs 5 Currently installed programs sort by Change or Remove __ Programs Adobe Photoshop Size 9 OME g Cardiris Size 35 66MB Q Ma IRISPen Size 2 74MB Add Mew Programs jo Mcafee VirusScan Size 15 90MB pe Adobe Acrobat Size 76 358 EA Microsoft Office XP Professional Size 174 00MB Readiris Size Add Remove Windows Components Last Used On 31 07 2002 ee To change this program or remove ik From your computer click Change Remove Change Remove 7 RealOne Player Size 22 10MB Used fF LJ Follow the on screen instructions to remove the Readiris software INSTALLING SOFTWARE OPTIONS There s a single software option available for the Readiris software the Asian OCR add on It allows you to read Japanese Traditional Chinese Simplified Chinese and Korean This software 1s again delivered on an autorunning CD ROM E Readiris help a i g Hide Back Print Contents Index Search 2 Welcome to the Readiris help ig Introducing OCR i Recognizing Documents ie Reference Information C Software Versions and Options Software versions Asian OCR Add on word about the Asian languag gt Product Registration gt Product Support Options Software Option Asian OCR Add on Reading Asian documents The software option Asian OCR Add On offers recognition of the Asian language
54. ll use Select a format that s supported by your paint or photo retouching software The JPEG TIFF and Paintbrush PCX formats are supported Enable the option Greyscale Color to save the graphic as a color or greyscale graphic 2 81 USER S GUIDE Save Graphics Save ir My Documents tE File name Save as ype TIFF Ei 00000 TIFF f tif 7 soft Paintbrush pcx JPEG jpa READING FAXES AND DEFERRED RECOGNITION Saving images as image files opens another possibility you can save the full page and perform deferred OCR on it later on That s what we did with the prescanned images of our tutorials Simply scan the document Select the command Save Full Page as Image under the File menu to save a single page You Il again be prompted to save the entire page as TIFF or Paintbrush PCX file Save Full Page as Image N Save All Pages as Image Select the command Save All Pages as Image to save a multipage docu ment A single file format s available here multipage TIFF 5 2082 You can now select the disk as image source and open the image file with the Open button or with the corresponding command under the Process menu If you use the Open command under the File menu you don t even have to update the image source As color greyscale and black and white images are supported on an equal basis Readiris opens Adobe Acrobat PDF documents JPEG images Pain
55. n the dictionary is full the results of the learning are no longer held in memory or written to a dictionary You can set the dictionary mode inside the command Font Dictionary or directly under the Learn menu Three dictionary modes are available new append and read USER S GUIDE w New Font Dictionary Append Font Dictionary Read Font Dictionary w Interactive Learning By selecting New Font Dictionary you indicate that the training results will be saved in a new dictionary If you select an existing dictionary its contents will be erased The append mode indicates that the training results will be saved in an exist ing dictionary the recognition makes use of the extra intelligence already con tained n the dictionary and you add new font shapes to it In simple terms this option allows you to build up a font dictionary n several steps When you enter a filename for a new dictionary and activate the append mode an empty font dictionary is created and you complete it With the last option Read Font Dictionary the dictionary functions in read only mode you make use of the dictionary without adding new font shapes to it Select the new mode when a single page is recognized To recognize many pages of the same type pages with the same fonts and printing quality select the new mode for the first page the append mode for a few pages more and the read mode for the rest of the document s
56. nd the text may run from top to bottom from right to left And if you forgot to select the proper language select it afterwards Readiris re executes the page analysis automatically Some documents have many stray dots on the page may generate a black page border around the actual image etc To erase all small windows it s as sumed they don t contain any text and re sort the remaining zones you can click the command Delete Small Windows under the Edit menu Delete Small Windows Ctrl M Two WINDOWING A SCANNED IMAGE MANUALLY Page analysis is the automatic way of windowing a scanned page Alterna tively you can zone an image manually with the windowing tools of Readiris To draw a rectangle around a zone of interest select the corresponding tool in the image toolbar click the cursor n the upper left corner of the window stretch the window by moving the mouse to the lower right corner and click again Sides smaller than 1 mm are not allowed they wouldn t even contain a single character anyway 2 17 USER S GUIDE The windows are automatically sorted in the order of creation arrows indi cate the sort order You can also frame irregular text blocks by drawing polygonal windows around them Non rectangular windows are created by merging rectangular zones as soon as two rectangles of the same type intersect they become a single window automatically In a way you re building a hous
57. nectionist AutoFormat and Linguistic technology by I R I S I R I S detains the copyrights to the Readiris software the OCR technology the linguistic tech nology the on line help system and this manual AutoFormat Cardiris Connectionist I R I S Linguistic Technology the LR LS logo and Readiris are trademarks of I R LS Acrobat Reader and the PDF format are registered trademarks of Adobe AsianBridge is a trademark of TwinBridge AsianSuite is a trademark of Union Way Excel Windows and Word are registered trademarks of Microsoft Intel is a registered trademark of Intel WordPerfect is a registered trademark of Corel VIII USER S GUIDE Chapter INSTALLATION This chapter discusses the system requirements and installation of the Readiris software SYSTEM REQUIREMENTS This is the minimal system configuration required to use Readiris LJ a 486 based Intel PC or compatible A Pentium based PC is recom mended LJ 32 MB RAM 64 MB RAM is recommended to process greyscale and color images LJ 110 MB free disk space 95 MB of disk space suffices when you leave the sample files on the CD ROM LJ the Windows XP Windows ME Windows 2000 Windows 98 Windows NT 4 0 or Windows 95 operating system Note that some scanner drivers may not work under the latest Windows version s Refer to the documentation supplied with your scanner to see which platforms are supported INSTALLING THE READIRIS SOFTWARE
58. ner Force as 300 dpi Smoothen color images Place the pages of your document in the automatic document feeder and start the scanning all pages are scanned until the document feeder s empty You can also open multiple prescanned images To load several images se lect the first image and hold down the Ctrl key as you select additional images To load a continuous range of mages select the first image and hold down the Shift key as you select the last image USER S GUIDE C alphabet deskew greek multipag Telasiar ni digital BI italian ni norweg Cl autoform dutch japanese polish brazil korean russian French lite simp chinese ni german E matrix B spanish Wr File name english jpg alphabet tit autafarm jpg Open Files of type fan image files Cancel l Digital camera Force as 300 dpi Smoothen color images Load POF documents in color The same effect can be obtained comfortably from within the Windows Ex plorer select several image files right click and select the command Recognize from the Context menu You can repeat this operation all images you send to Readiris append the current document until you click the command New Docu ment x Mame E excel ibt Er Era Preview es Fax li Edt fin Print Frit Resize Pictures lim onen with N Fre FR Send To Frri Fru z gal py gan Create Shortcut gen Delete el Rename
59. ng font type serif sans serif proportional fixed normal condensed point size and typestyle bold italic and underlined is retained across the recognition and so is the paragraph formatting the tabs and the alignment left centered right and justified Don t confuse this formatting option with full autoformatting this option just puts one paragraph after the other t does not recreate columns or copy the relative position of the various zones SAVING GRAPHICS SEPARATELY In our example the graphic was included in the recognized text whether this is the case depends on the formatting option Include Graphics Whether it 1s possible to save graphics inside the text again depends on the output mode Poor text formats such as Text ANSI etc don t store graphics Options Merge lines into paragraphs iM Include graphics 3 2 80 Still with Readiris you can save graphics without performing text recognition As Readiris generates black and white greyscale and color images you can capture lineart graphics and photos How Draw a graphic zone around the illustrations cartoons etc you need Creating graphic windows manually is done in the same way as drawing text and table windows simply select the Graphic Window tool now Next choose the command Save Graphics under the File menu BREI ET a You are prompted to specify a filename Determine which graphic file format you wi
60. nnnnnennnensnnnnnnnennnnnnnnnnn Il Tabl OLE OT Te ee Eee V Ce an O E S e eT RTT Wi ee DER TEES VI Chapter 1 Installation S R E e ee ee 1 1 Installing the Readiris Software een ee 1 1 Uninstalling the Readiris Software anne een ee 14 Readiris uninstall program ee ea ee 1 4 Windows un install wizard ee ee ee ne ee 1 4 Installing Software Options asses serge cee ee ee nee ee 1 5 Installine Related eae 0 6 111 cae een 1 7 In ale RE ee ee nee ee RE 1 9 Read Me Tiles and Jo HB ee 1 9 SAN nee Bere Men 1 9 REITS EL TON DE er ee 1 10 Get ne Product UII OU seat ee na 1 12 5 VI Getting in Touch with LR LS 00000sseeeeeeeeeeeeseeeeesssenannnnnnnnnnnnnnennnneeeeeeeeen 1 13 Chapter 2 Guided Tour SAL IE IH SOLLE OD ee 1 1 The Firstime SET Dee ee eure 1 2 Discovering the Readiris Interface ee ee 1 3 Getting Started with a First Tutorial 2200000000eeeeeneeeennnnnnnnnnnenenennnnneeeeeeeeenn 1 6 ZOOM TP OMAN AO CS ee ee 1 10 One Decomposing a Scanned Image uuuuuunnnneeeeeeeeeeeeseeesnsnsnnnnnnnennnnn 1 12 One and a Half Sorting Windows cssicden cimardincoumaisncessiesissateetoutnicnnesesewaxsitenewasioseisusestescsinnes 1 14 Two Windowing a Scanned Image Manual eeeeen 1 16 Three Saving Windowing Templates 000000000s ee eeeeeeeennnnnnnnnnnnnennnnn 1 19 Readiris Takes You around The World u
61. of Readiris are stored in the settings files SAVING SPECIFIC SETTINGS The default settings will obviously be used at each program startup but you can save specific settings as well to avoid having to redefine the operational parameters The commands Save Settings and Load Settings under the File menu take care of this Save Settings Save Default Settings Let s give an example if you regularly have to OCR English documents with a specific layout you are recommended to create a settings file for this type of document You would then select English as the document language load a specific zoning template to avoid having to reapply the same windowing each time disable learning but activate a font dictionary in the read mode because the same typefaces are used systematically etc If you are unsure what the current settings are you don t have to plunge into every menu and command to discover what they are You can use the com mand Info from the File menu to get an overview USER S GUIDE Information on settings Scanner Model HF Scanlet 5500C Resolution i Format Text Format Microsoft word 97 Word 2000 word 2007 Faragraph On Mode Black and white Layout Recreate source document Landscape Off Document Font Type Automatic Language English Page Resolution SCANNING DOCUMENTS Now that our scanner is set up we want to get started scanning documents There are some elements you
62. on English To go to another letter say T press BackSpace before you enter the T character Readiris s far from limited to English up to 104 languages are supported All American and European languages are supported including the Central Euro pean languages Greek Turkish the Cyrillic Russian and the Baltic languages Optionally you can read Asian documents the extra module Asian OCR add on offers recognition of Japanese Simplified Chinese Traditional Chinese and Korean Simplified Chinese is used on China s mainland and in Singapore where Traditional Chinese 1s used by Hong Kong Taiwan Macau and the over seas Chinese communities Also note that the British and American or should we say international variants of the English language are distinguished It takes the appropriate Windows configuration to display Central European Greek Turkish Cyrillic and Baltic characters You may have to install the Win dows multilanguage support before your Windows system is able to cope with these languages On a Windows XP 2000 and Windows NT 4 0 operating system select the icon Regional Settings and Languages under the Control Panel USER S GUIDE Regional and Language Options Regional Options Languages Advanced Text services and input languages To view or change the languages and methods you can use to enter text click Details Supplemental language sup
63. on is highly accurate and does not require detailled proofreading Abort Don t confuse Finish with the Abort button with Abort no output is generated and you start all over with Finish the text is created it just isn t proofread in detail THE ROLE OF FONT DICTIONARIES The results of each training session are temporarily held in the computer s memory but can and should be stored in files called dictionaries for future use These font dictionaries should be loaded into memory when you want to rec ognize similar documents in order to make use of the extra intelligence they con tain n this way Readiris takes into account the intelligence stored in these font libraries You could say that Readiris gets more intelligence each time you use it How does this work The operation of font dictionaries is controlled by the Learn menu you have to select a dictionary with the command Font Dictio nary and determine its mode of operation Dictionary Look in EI My Documents ri Amy Music le Pictures My Videos Readiris dus File name Readiris Files of type Dictionary Cancel f New Dictionary Append Dictionary Read Dictionary Font dictionaries are limited to 500 shapes and you are recommended to create separate dictionaries for specific applications for instance per type of document Dictionaries have the default extension DUS Training no longer has effect whe
64. orizontal space on a line than the thin characters I or 4 Virtually all books magazines and newspa pers are printed in proportional pitch The simplest solution is to leave this option at all times on the default value Automatic which means that Readiris will detect the character pitch automati cally READIRIS GETS MORE INTELLIGENT EACH TIME When the document language is selected and document characteristics are set enable the interactive learning and click the Recognize button a ko Recognize el Learn The OCR progress is indicated on screen You can click the Stop button to abort the text recognition OCR in progress 5 280 At the end of the recognition Readiris enters the interactive learning phase when the learning is enabled with the Learn button on the main toolbar Interactive learning does not apply to Asian documents learning does not make sense for these languages which use thousands of different symbols and you d have to be able to enter the ideograms not an easy task when using a Western keyboard Font training can substantially enhance the accuracy of the recognition sys tem When the user tries to read distorted defaced forms as are found in real documents or stylized font shapes which Readiris does not recognize optimally training can overcome this temporary failure User learning is also used to train the system on special symbols which
65. ort and special offers Depending on the software version you acquired you ll receive the softkey in return as may be needed to continue using the Readiris software after one month GETTING PRODUCT SUPPORT The command Product Support under the Help menu of Readiris details how you can get technical support Please describe the phenomenon you experi ence clearly and include all relevant data concerning Readiris your scanner and your computer system l 13 E Readiris help P Hide Back Frint Contents Indes Search 2 Welcome to the Readiris help ie Introducing OCR Recognizing Documents How to a fe Reference Information Software Versions and Options ie Product Registration Ky Product Support How to get product support How to get in touch with IRIS Getting product support by e mail USER S GUIDE Options How to Get Product Support Free technical support is offered to all registered customers Registering alsa entitles you to special offers Europe Hotline 32 10 45 13 64 working hours fall major languages Fax 32 10 45 34 43 USA Hotline 1 561 395 7851 800 477 4744 working hours Fax 1 502 507 3418 APA AY www Irslink comy support htmi troubleshooting info Click here to access the troubleshooting info E mail supportmirislink com l 14 USER S GUIDE Chapter 2 GUIDED TOUR Readiris is a state of the art OCR package equipped wit
66. page 1 of 1 File Edit Settings View Process Learn Register Help OCR Wizard Window Fit ko Width y 50 Actual Size Actual Size 200 Actual Size a Recognize ewe Ihe aim of OCR English Finally you can double click the right mouse button over a region of the scanned image to zoom in at real size immediately Repeat the operation to zoom out again 2 12 ONE DECOMPOSING A SCANNED IMAGE Now that the image is scanned you have to indicate which parts you want to convert into editable text by drawing frames so called windows around the zones of interest Actually Readiris will do this for you automatically when the option Page Analysis is enabled on the main toolbar Page Analysis Automatic page decomposition is particularly useful when columnized texts and documents with a complex page layout possibly including graphics and tables are recognized 2 19 USER S GUIDE Readiris C Program Files Readiris english jpg page 1 of 1 SE File Edit Settings View Process Learn Register Help OCR Wizard TS Scan x cost way Although the first research and development on Optical Charac ygiuition OCR began more than 30 years ago this technology is still unknown asl of the people who could use it for their document entry applications ow you can use this effective tool in your office and unburden yourself with stidious task of retyping printed text OCR is ihe mast eff
67. port Most languages are installed by default To install additional languages select the appropriate check box below Install files for complex script and right to left languages including Thai Install files for East Asian languages On a Windows ME and 98 operating system select the icon Add Remove Programs under the Control Panel to find out if the module Multilanguage Support is installed on your PC Add Remove Programs Properties Ei Instal Uninstall Windows Setup Startup Disk To add or remove a component click the check box amp shaded box means that only part of the component will be installed To see what s included in a component click Details Components iy Microsoft Exchange C a Microsoft Fax 0 0 ME E Multilanguage Support 10 4 MB SO Multimedia 1 1 MB O RE The Microsoft Network DOMB Space required 1 2 MB Space available on disk 29 3 MB Description Includes options to change keyboard sound display and mouse behavior for people with mobility hearing and visual impairments 1 of 1 components selected Details Have Disk To view and edit Asian documents you can install an Asian version of the Windows operating system or run specialized emulating software such as UnionWay AsianSuite or TwinBridge AsianBridge on a Western version of Win dows to correctly represent the ideograms of these Asian languages Finally you can use Word 2002 or 2000 to
68. r their document entry applications ow you can use this effective tool in your office and unburden yourself with stidious task of retyping printed text OCR is the mast efficient and fastest tool to ent Recognize be English wen M Page Analysis AB Learn 4 Format gt Scanner The system extensively uses linguistic databases when analyzing the context in this wa ading correct solutions for difficult cases Ihe user trains the sollware on new haracters and typestyles which are recognized automatically later on This learning module allows you to read virlually any font In other words the software gets more intelligent each lime you use il i m m ZUZ Lupyright Image Revegninon Int rratcd Systems Web site hupi www inslink com Page analysis is enabled by default To force Readiris to decompose the cur rent page because you disabled page analysis by accident because you erased some windows erroneously and want to redo the page analysis etc you can simply click the button Analyze Page in the image toolbar 5 2 10 Select the document language before executing the page analysis when you are dealing with Asian documents Specific routines are used for these languages the interline spacing of Asian documents is in most cases bigger than in Western documents the text is made up of small icons ideograms that could easily be seen as graphic zones in Western documents a
69. recognized text or not is up to the user You can perform OCR because you just need the text in which case you will edit and format it yourself and you can recreate the source document including its formatting x Copyright b 14 4 10f1 gt ree 4 The various levels of formatting are creatin text retaining the wor Paragraph formatting and creating a facsimile copy bod o gl Creating body text means no formatting is applied you get a continuous running text All formatting if any is done afterwards by the user If you retain the werd and paragraph formatting the font type size and typestyle are maintained across the recognition The justification of the paragraphs is also detected However no graphics are captured and the columns aren t recreated the paragraph just follow each other etc Autoformatting recreates a facsimile copy of the original document the text blocks graphics and tables are recreated in the same place and the word and paragraph formatting are maintained across the recognition Cell ZA Cell 3A 100 000 As a result you get a true copy of your source document be it a compact and editable text file no longer a scanned image of your document Image Eecogsitica Integrated Systems The format PDF Image Text yields different results Readiris creates a searchable PDF file that contains the recognized text and the page image The z page image is containe
70. robat PDF Readiris allows you to create PDF documents of two types PDF Text and PDF Image Text Output f Send to Adobe Acrobat Reader Image Text m AbSource Abi ord External file Adobe Acrobat Reader Image T ext Adobe Acrobat POF Reader Text pdf he v Clipboard Clipboard Microsoft Excel Layout Corel WordPerfect Output Send to f External file Adobe Acrobat POF Image T est pdf Abib ource AbM ord rtf v Open after saving Adobe Acrobat PDF Image T ext Adobe Acrobat PDF Text pdf Layout Corel WordPerfect 5 4 6 1 8 8 9 10 What s the difference between the two When you select the format PDF Text Readiris creates a PDF file that contains the text result Graphics may occur but only when graphic zones occur on the page photographs artwork etc In other words the page image is not contained in the single layered PDF file amp amp Adobe Acrobat Autoformat pdf pa Fie Edit Document Tools view Window Help USER S GUIDE EEk BBEASAER AM E gt gt gt Oo e AOAAS N DA E 5 B 7 2zB amp BUSUHET Autoformatting The aim of autoformatting is to recreate a facsimile copy of the original document he OCR process does more than just recognize your text itcan format it for you too In a way text recognition is becoming more and more page recognition or document recognition Whether your OCR software reformats the
71. s Japanese Simplified Chinese Traditional Chinese and Korean Tip a large number of Asian languages such as Malay Tagalog etc are supported by the standard Readiris software because they use the Latin alphabet What it takes the working environment To view and edit Asian documents you can use Word 2002 Office P or Word 2000 Office 2000 or install a localized Asian version of the Windows operating system Alternatively you can run specialized emulating or overlay software such as Unionvway Aslansuite or Twinbridge AsianBridge on a Western version of Windows to correctly represent the ideograms of these languages By installing this option specific documentation becomes available that dis cusses how you can recognize Asian documents USER S GUIDE Cardiris IRISPen m Readiris LRLS on the Internet Ka Reading Asian documents be Ed Readiris HA Uninstall Readiris fag LRLS Applications User s Manual INSTALLING RELATED PRODUCTS Depending on the software bundle you acquired Readiris may be supplied with an evaluation version of the related product Cardiris a business card or ganizer If this free software package 1s included on your Readiris CD ROM it is also installed using the autorunning CD ROM and following the on screen instruc tions Contact I R I S to learn more about complementary software the command Contact I
72. s Readiris allows you to archive a true copy of your documents be t a editable and compact text file instead of a scanned image All this implies that the sorting of windows only partially applies when autoformatting is used you can include and exclude zones but any re ordering of zones is simply ignored Here s an example of how it works To get acquainted with this feature open the image AUTOFORM JPG which is found in your Readiris folder 2 74 Readiris C Program Files Readiris autoform jpg page 1 of 1 File Edit Settings View Process Learn Register Help Autoformatting The aim of autoformatting is to recreate a facsimile copy of OCR Wizard ES Mrz Scan Q Recognize English Page Analysis AB Learn FE Format J G Scanner the original document he OCR process dnes more thar just Tecoguize your text it can formar it for you Luss In a way text recognition is becoming maure wid nerw pige cecowoilion or dovument recognition Whether your OCR soliware reformat the rrwoggr zed lext or rot js up lo Ihe user You can perform MIR Secause you just need the text in which casc you will edit and focsnat 11 yourself nod you can recreule tbe source document including its formatting The various levels of orma Liug uro vscakimy body text retalnirg the word and Paragrasa formatting and meatirg a Tarsim le nopy Creatine body tex gt means neo
73. s Readiris reorganizes them in real cells and recreates the cell borders of the original tables In other words Readiris allows you to archive a true copy of your documents be it editable and compact text files instead of scanned images Various levels of formatting are available the choice 1s up to the user USER S GUIDE You can even recognize business cards with Readiris scan your business cards recognize them and convert them into an address database Think of your last exhibition when you came back with an entire stack of business cards and it took your secretary two days to encode them The card s data is extracted automatically from the image and the recognized data is assigned to specific database fields Readiris extensively uses a knowl edge database thus acquiring the necessary intelligence to discriminate the first and last name a city and its state a telephone and a fax number etc The result ing data can be sent directly to your contact management software such as Microsoft Outlook Express or any vCard compliant application Readiris supports a wide range of popular scanners numerous flatbed scan ners sheetfed scanners all in one devices or MFPs multifunctional pe r pherals and digital cameras can be used Readiris also supports the Twain scanning standard and some scanning platforms TABLE OF CONTENTS Save Time No More Retyping cccceensssssssssnnnn
74. s your identification number to generate the softkey be sure that this number is available or mentioned when you register your licence USER S GUIDE Readiris The identification number on this machine iz aco 425 35 085035 88032535444 508050 Help To enable this software you need a key Please contact A l S to obtain this key Enter your key number don t have this key DISCOVERING THE READIRIS INTERFACE The Readiris application window not only contains command menus but also two button bars that give quick access to all frequent commands Initially some command menus are dimmed they concern the preview As long as no image 1s opened they are unavailable Readiris File Edt Settings view Process Learn Register Help F S I Sort g Recognize English Page Analysis AB Learn ey Format Scanner The same goes for the image toolbar on the right side of the application window it contains all commands you need during the image preview The main toolbar on the left gives quick access to all frequent general commands To learn which command corresponds to a certain button hold your mouse pointer over it for a while a tooltip will tell you what the button does USER S GUIDE Readiris File Edit Settings view Process Le OCR Wizard Bo The window pane or image zone is where the scanned images are displayed You can drop image files onto the image zone
75. schap nier hoe was k in boeken diets anders d m cdrijf te amuseren Als ik bij hee Iczen ap macilijkhed it hije ik mijn tanden cr nier pp kapot Ik laat ze voor Page Analysis j zijn Oa ze u vt tweemaal te hebben peartaqueerd AB Learn 7 Fe et wel duizend hacken in zijn k st en een encyclupedische ke on s van de Crriekse en Latijuse Format Scanner USER S GUIDE READIRIS TAKES YOU AROUND THE WORLD Assuming that the w ndows are correctly defined you are now almost ready to execute the character recognition We say almost because we haven t veri fied the language and document settings yet The language setting can be found on the main toolbar English Click the Language button to modify the document language Language Numeric English Chinese Simplified Chinese Traditional Cancel Corsican Croatian Haitian Creole Hani 5 2 22 You can press a letter key to move to it directly if English is currently se lected and you want to select Occitan you can click the O key on your key board to go directly to the Occitan language When several languages have the same initial press the letter several times to go through the options Let s give an example Readiris reads English and Estonian By pressing E once you select English by pressing E a second time you select Estonian and by pressing E a third time you re back
76. should be aware of First of all pay some attention to lineskew Although the page analysis and recognition are skew tolerant t may become difficult to window and OCR a page correctly when the skew is too significant Limited lineskew less than 0 5 can be ignored because the OCR accuracy does not suffer The option Page Deskewing under the Settings menu determines whether pages which were scanned at an angle will be deskewed straightened auto matically limited lineskew gets ignored This option is disabled by default Page Deskewing If you forgot to enable this option use the Deskew Page button on the image toolbar and the command Deskew Page under the Process menu to straighten pages which were scanned at an angle 5 2 66 ij em The deskewing takes a few seconds the image is analyzed to detect the skew angle if any the color or greyscale image and its black and white version are deskewed and the page analysis gets re executed Detecting lin skew Deskewing You may also need to adjust the page orientation Use the rotation tools on the image toolbar Corresponding commands are found under the View menu Three rotation directions are available to the left to the right and upside down Rotation also takes a few seconds as the image itself is updated not just the display on screen Rotate Right h Turn Upside Down However Readiris can correct badly oriented
77. sition several blocks of text graphics and tables on a page With columns the text flows naturally from one column to the next and columnized texts are much easier to edit We now assume that real columns do occur on the scanned document when the system is unable to detect columns in the source document this formatting mode uses frames anyway as a fallback position You can make good use of the image COLUMNS TIF in the Readiris folder if you want to try it U columns Microsoft Word EX Eile Edit View Insert Format Tools Table Window Help Type a question for help X Deh AaSeay s aavio ET 0 2 Ic ma y y WO A Normal 10 pt Jus Times New Roman 10 B i Final Showing Markup Showy gt p UP Ahr Qe 21 or re re ut m nti m y EJs t Pala e 1 Br Br rer Zr ee zZe nee Schauspielem nicht vig d singen die Gls das Parris Island Insofern n ER e Revue des Terrors Dompteur der Rekrute Krieges der nur Tod s Kubrick entdecken Two Three Left Right T Right to left e fri t jede Bedeutung 2 Number of columns Line between Die Logik des Krie a m Width and spacing Preview H schwarz und zum _ er zweite Te Col width Spacing den Rolling Stones JACKET erz re 6 34 m H Dam H ische conclusio kein Drills Die s Krieges Denn darin 2 zivilen Existenzen si 2 g on a d J Kr hat erreicht A keit
78. t methods when it comes to saving the OCR result sending the recognized document directly to a target application saving the result in an external file and copying the result to the Windows clip board The output target is selected using the Format button on the main toolbar or the command Text Format under the Settings menu Text Format Output f Send to Microsoft Word 97 word 2000 word 2002 External file Open after saving Layout Create body text Retain word and paragraph formatting Recreate source document Use columns instead of frames Options W Merge lines into paragraphs i Include graphics Cancel The Send to feature offers a direct OCR link between your scanner and your Windows applications you send the scanned documents directly to your wordprocessor spreadsheet or web browser to Adobe Acrobat Reader etc 3 USER S GUIDE Output f Send to Microsoft Word 97 r Word 2000 Word 2002 AbSource Abi ord Adobe Acrobat Reader Image Text External file Adobe Acrobat Reader Text E Clipboard Clipboard Microsoft Excel Layout Corel WordPerfect Create body text Gee oa Microsoft Excel Retain word and paragraph fo Microsoft Internet Explorer u ond 7002 Recreate source document Netscape JOpenOffice org Writer 1 0 v Use columns instead ol s oftware602 Pra PC Suite Sun StarOffice 6 0 Web browser Options WondPad At the
79. tbrush PCX images DCX fax images a multipage version of the Paintbrush format PNG images TIFF images uncompressed LZW PackBits Group 3 and Group 4 compressed multipage TIFF images and Windows bitmaps BMP This capability is particularly useful to convert your faxes into editable text files Readiris uses extra intelligence when t comes to reading faxes the soft ware detects the typical fax resolutions 100 x 200 dpi normal quality 200 x 200 dpi fine quality and 200 x 400 dpi superfine quality and prepro cesses these images automatically to ensure optimal OCR results Nevertheless it s still a good idea to ask your correspondents to send faxes with the fine quality those faxes will yield better OCR results Don t forget that you can right click on images in the Windows Explorer and select the command Recognize from the Context menu to open images Al ternatively you can use drag and drop drop image files from the Windows Explorer onto the image zone or icon of Readiris and they are promptly opened RECOGNIZING TABLES So far we ve recognized texts and faxes and we ve saved graphics Let s process a table now Take a table of figures and scan it or open the sample image TABLES JPG in your Readiris folder Actually the image TABLES JPG contains two tables and that s no coinci dence The page analysis zones them as table windows and Readiris will recon
80. te the current document Ves No Cancel ORGANIZING THE TEXT OUTPUT Saving or exporting the text means more than selecting an output method or defining a filename for the output file You also select a file format and determine the appearance of the recognized text In short you have to decide where you want to take the text before you launch the execution USER S GUIDE Some options ofthe Format button allow you to influence the look ofthe text output The text flow of the output document is directly influenced by the option Merge Lines into Paragraphs Options lf Menge lines into paragraphs J Include graphics Keep this option enabled to have Readiris detect the paragraphs Readiris will then apply the normal wordwrap typical of wordprocessors otherwise a car riage return is added after each line and hyphenated words remain so Paragraph detection is enabled by default Let s give an example to clear things up When the first three lines of a col umn are Ihe new presi dent waved from the balcony and His wife had joined him the paragraph detection gives you the following result The new president waved from the balcony His wife had joined him The hyphenated parts of the word president were reglued and a space was added at the end of the first sentence thus creating naturally flowing text Had paragraph detection not been enabled the original layout would have been retained wit
81. the captured text Zoom manually to crop your document some cameras are bundled with photo stitching software but don t bother using it for document capture Hold the camera directly above the document to avoid capturing the docu ment at an angle However avoid shadows cast on the document by the camera or your hand Produce stable images Consider mounting your camera on a tripod when necessary Disable the flash when you re filming glossy paper otherwise the image may be too light Generally speaking adapt the brightness and contrast to the environ ment day light lamp light neon light etc Some cameras can be calibrated by filming a white document We EY CERO ST m To give it a try open the image DIGITAL JPG in the Readiris folder and execute the recognition USER S GUIDE Tg Readiris C Program Files Readiris digital jpg page 1 of 1 j L led File Edit Settings View Process Learn Register Help SAVING DEFAULT SETTINGS Set all scanning parameters correctly and click the command Save Default Settings under the File menu to save the current settings as default settings for future use Save Default Settings h Settings files contain more than the scanner settings they also determine whether you are going to use interactive learning which language the documents 5 2 64 have which output mode is used for instance send text to WordPad etc In short all operational settings
82. to include black and white photos scan in greyscales to include color pictures scan in color But why would you reduce the bit depth of the images during the scan It goes without saying that greyscale and color images are slower to acquire and require more RAM memory than bilevel images Scanning in greyscale and color isn t just useful to save the graphics with sufficient quality n some instances it s also useful or necessary to obtain good OCR results When text is printed on a color background scanning in color may create the tone differences that are lacking in black and white images When there is only limited contrast between the text and the background the back ground can create noise that renders the recognition difficult or impossible Think for instance of black text printed on a dark background when scanning such a document in black and white you may not be able to drop the back ground color without losing the text information as well as much as you may try to adjust the scanner brightness MASAYOSHI SON 42 president and CEO is the master Net empire builder His con glomerate holds stakes in 300 Internet companies in the U S Japan Europe and other Asian countries Today Softbank manages about 4 billion in venture capital funds for global investments YASUMITSU SHIGETA 35 has invested in more than 70 Web or mobile Net based ven tures in Japan and the U S including Tum ble
83. ut for instance a 50 page report where the header and footer should be excluded for obvious reasons a single template can be applied to zone all 50 pages When you load a template into memory page analysis is disabled automati cally The zoning template remains active until you re enable page analysis on the main toolbar Actually there s a nice alternative for zoning templates the preview tool Ig nore Exterior Zone limits the page decomposition to the cropped portion of the image Select th s tool and frame the portion ofthe image you want to process When you re dealing with a multipage document you can exclude the same outer zone from page analysis on every page Re execute the page analysis to cancel the image cropping or change the zones manually Readiris File Edit Settings View Process Learn Register Help Hue slimme mensen moneten kimken en praitzien OCR Wizard m imelligeniie geruigt tu handen hebben ala we hee niet mecr RZ jpen Want diepzinnige denkboelden kunnen nu ecamaal n Scan kindertaal worden uitgelegd lach zou her verband tussen co FE k en dicpeinnig niet zo snel beschreven moeten worden als e HE crair manifestatie van ven atwijking die we kennen uit het p Sort gQ achi en angripbare mensen dan voor betrouwbare Recognize lachtige hacken tk kin uep langdurig met ze verkeren t reef hij ik hout slechts van lectuur dic liche en amusanr is Dutch k de wecen
84. weed Communications and Phone com Shigeta is also developing new businesses that take advantage of the growth of the Internet and mobile communications VASUMITSU SHIGETA 35 has invested in USER S GUIDE more than 70 Web or mobile Net based ven tures in Japan and the U S including Tum bleweed Communications and Phone com Shigeta is also developing new businesses _ that take advantage of the growth of the Internet and mobile communications Readiris creates a black and white version for every greyscale and color im age Thanks to its intelligent routines even tough cases get solved here s how a difficult image gets binarized MASAYOSHI SON 42 president and CEO is the master Net empire builder His con glomerate holds stakes in 300 Internet companies in the U S Japan Europe and other Asian countries Today Softbank manages about 4 billion in venture capital funds for global investments YASUMITSU SHIGETA 35 has invested in more than 70 Web or mobile Net based ven tures in Japan and the U S including Tum bleweed Communications and Phone com Shigeta is also developing new businesses that take advantage of the growth of the Internet and mobile communications To view a scanned image in black and white disable the option Display Docu ment in Color under the View menu w Display Document in Color Ctrl 0 5 200 DIFFERENT DEVICES DIFFERENT RESOLUTION What
85. y to your wordprocessor and spreadsheet To rec ognize faxes and convert PDF documents you can drag the image files from the Windows Explorer to the Readiris application window Or right click on an image to send it prompty to Readiris Readiris recognizes tabular data and recreates them as worksheets or as table objects inside your wordprocessor your numeric data are immediately ready for further processing Based on the Connectionist technology from I R I S Readiris represents the best OCR has to offer Font independant feature extraction 1s complemented by self learning techniques derived from a proprietary neural network The system can learn new characters through context analysis linguistic knowledge about syllables and words improves the OCR performance Readiris supports up to 104 languages all American and European languages are supported including the Central European languages the Baltic languages Greek and the Cyrillic Russian languages Optionally you can read four 5 IV Asian languages Japanese Simplified and Traditional Chinese and Korean Readiris even copes with mixed alphabets the software detects Western words that pop up in Greek Cyrillic and Asian documents many untranscrible proper names brand names etc are written using the Western symbols Readiris uses linguistics during the recognition phase not after it As a direct result Readiris recognizes documents of all kinds with

Download Pdf Manuals

image

Related Search

Related Contents

h i j h i j  Intelbras WBN 312  Datavideo MP-4200 User's Manual  USER`S, MAINTENANCE and SERVICE INFORMATION    User Manual for Connect2NSE Utilities  Le guide des premières fois avec mon bébé :  取扱説明書等 - アイ・オー・データ機器  Urgomed - medicalplus83.fr  - biovendis  

Copyright © All rights reserved.
Failed to retrieve file