Home
Interactive document summarization
Contents
1. i e each weight in a vector is divided by the square root of the sum of all the squares of the unnormalized weights for the vector Referring now to FIG 5 the process of the present invention will now be described When a document is to be summarized 501 with the present invention it must first be determined 503 where the sentence breaks are in the docu ment Note that the sentence break determination approach of the preferred embodiment of the present invention is shown in the C programming language format in Appen dix A to the present specification The next step is to determine the sentence ranking within the document being summarized This is accomplished by first 505 building an index which is a database representing the contents of the sentences in the document in the form of statistics about the words in those sentences a process which is well known in the art Then 507 the entire original document is treated as a query to the corpus of individual sentences in the document in accordance with the standard vector model approach The result is a score indicating how well each sentence matches the query of the entire document and hence the output of the queries is a rank ordered list by score of all the sentences in the document 509 Then the desired number of sentences to include in the document summary display is determined 511 once a ranked list of each sentence in the original document is obtained by examining either a pre
2. DoJ V sjuauuJo pesn sog Z 6Z ASIP uo YOY ezis jueuinoop Old S UMOIDW puly jusuinooQ Old Q IMON juejpdg Ainwiwins ony e oju U9 Dg K ipuuung ojny 5 867 164 Sheet 7 of 8 Feb 2 1999 U S Patent Vv 4 eu e a ee Dujus juauin2op D u papnjoui eg jo yunoluD y Jo33uo2 isnonunuo UOYDZUDUUNS jueuunoop OLUN DOJ V t y SMOD YOIUM uJajs s SjueuJulo Wd 6S G661 92 d s ni Noop old IMN Y91 SO9DN 9SUS2I Jeauold L Wd 11 5 G661 9 dag ni noop Old 491 62 6 TIN W N Kopuoy L Wd vC G661 9z d s ni noop Old 93UuM2DW Y p 1507 eqD pu ezis MOPUIM AIDWUWWNS juejpg Kipuung av YT 5 867 164 Sheet 8 of 8 Feb 2 1999 U S Patent 6 Ola a QD DAV V a s d ed l d4L yuawnsog 9900 doy4seq 129f3 M JA Jd MOUS X ay sp pub Ajo Jo uoisiAIp moda duj E Joyonpuos was SD suonisod SQODW su r ol Je8uold e jddy sulop PUT ozooW Ez L 40 juepiseud SD poeAl9S H ig uox3 a Sjuawnoog ajdwps Ka N L i 5 867 164 1 INTERACTIVE DOCUMENT SUMMARIZATION A portion of the disclosure of this patent document contains material which is subject to copyright protection The c
3. Sheet 2 of 8 Feb 2 1999 U S Patent pup sjonpoud bueudued A y jo uonpubeiu eui DY s A ll q 1eeuoig abd pipaLun nu buiuoo y 404 jueuuuipuejue Ayjpnb ubiu uj saeujojsno PUD O WID sy uo paspq OS Q4 SD v SD uons sjonpoud bondo SD eAnp4ouu jo jequinu D BuldojaAap ueeq 504 je8uolg SYonpoud A v Ayjonb ybiy pup ipuosi d ano jo uolDnubajul y uBnouu yuauu Duajua woy jo ed j Mau D o sn ajgbua jm addy ULM uonpjoqp oo no yoy A ll q eM abo DIpeuunnu Buluuoo ay pup ayo jejndujoo buosjed A V 94 oiu JNO SDU PUD J89UOld 40 yupoylub s Ajsnopuawal s 5141 s luon yq J99u01d juepiseud d o JOLUNS OW D UDM pios Aupdujoo jejndujoo snouup pom y ouj addy uj 31ueuu99JbD su ll D peuopel M DY eounouuD oj eunspej d 109 45 Jno s Ss483ndujoo puosjed s q ozu sjonpoud A V s jo WID y y y ozu sjejnduoo 0405 bnsi oipnp Mv sonpojjul IM jeeuoig sJejndujoo 0405 jo eui Meu 5 ul esn JO SQ IOW ay esueo oj Sul jejnduuo e ddy ui UD soy ij jDuj 40po uonpJodJo s3luoJj29 j Jeeuolg SO OWN S 41ddV 3SN3Oll OL M33NOld S66 GL AiDniq q KppseupeM 1101390 SQO
4. and Images 1994 pp 141 148 Frakes and Baeza Yates Information Retrieval Data Struc tures amp Algorithms 1992 pp 363 392 Salvador and Zamora Automatic Abstracting and Index ing II Production of Indicative Abstracts of Application of Contextual Inference and Syntactic Coherence Criteria Journal of American Society for Information Science 1971 pp 260 274 Edmundson New Methods in Automatic Extracting Jour nal of the Association for Computing Machinery vol 16 No 2 1969 pp 264 285 Edmundson Problems in Automatic Abstracting Com munications of the ACM vol 7 No 4 1964 pp 259 263 Edmundson Automatic Abstracting and Indexing Survey and Recommendations Communications of the ACM vol 4 No 5 1961 pp 226 234 Rose et al Content Awareness in a File System Interface Implementing the Pile Metaphor for Organizing Informa tion ACM Press SIGIR 93 pp 260 269 Mander et al A Pile Metaphor for Supporting Casual Organization of Information CHI 92 Conference Proceed ings ACM Press Human Factors in Computing Systems 1992 pp 627 634 Kupiec et al A Trainable Document Summarizer ACM Press Proceeding of the 18th Annual International ACM SIGIR Conference on Research and Development in Infor mation Retrieval 1995 pp 68 73 Inside Architext The Herring Reporter 1995 pp 45 47 Gerald Salton entitled The Smart Retrieval Sys
5. has little interest as well as review of up to the entire document in the case of great user interest Furthermore such inter active control allows the user to expand and contract sum marized documents at will thus freeing the user to focus on the content of the summarized document rather than on trying to determine what amount or percentage is sufficient or how the underlying abstracting mechanism operates BRIEF DESCRIPTION OF THE DRAWINGS The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements and in which FIG 1 is a diagram of a typical computer system as might be used with the present invention FIG 2 is a sample summary document window according to one implementation of the present invention wherein All of the original document to be summarized is display able FIG 3 is a sample summary document window according to one implementation of the present invention wherein one eighth of the original document to be summarized is displayable FIG 4 is a sample summary document window according to one implementation of the present invention wherein One most representative sentence of the original docu ment to be summarized is displayable FIG 5 is a flowchart of the document summarization methodology according to one implementation of the present invention FIG 6 is a sample user interface display show
6. is changed 513 displaying the larger or smaller summary is a relatively simple matter of merely displaying the more or less sentences as dictated by the previously generated relevance ranked list In other words by precom 10 15 20 25 30 35 40 45 50 55 60 65 8 puting the relevance ranking displaying more or less detail can be accomplished quickly without an additional query to be performed for each change in the slider position Further in the preferred embodiment of the present invention displaying more or less detail is done using an offscreen bitmap a technique well known in the computer art Using an offscreen bitmap makes the display appear to have the sentences instantly inserted or deleted in place rather than having the entire document summary appear to scroll from the top down whenever the user asks for more or less detail Note that the present invention has numerous applica tions A more clear application would be as part of a document browser or within a document retrieval context thus allowing more rapid review of a corpus of documents The present invention is equally useful within an electronic mail context where the user can view a summary of the electronic mail received and can then determine whether more or less of the contents of the entire electronic mail message s is desired Another useful application of the present invention is within the user interface of a modern computer sy
7. more sentences of the document based upon the relevance ranking of the one or more sentences of the document repeatedly specifying a subsequent number of the separate one or more sentences of the document by user control of a user control means and displaying the summary including the subsequent number of the ranked one or more sentences of the document 20 The method of claim 19 wherein the user control means is continuously variable 21 The method of claim 20 wherein the user control of the continuously variable user control means is by moving a slider displayed on the electronic display 22 The method of claim 20 wherein the user control of the continuously variable user control means is by rotating a knob displayed on the electronic display 23 A method for a user to display a summary of a document on a display means comprising the following steps separating the document into its constituent parts 10 15 20 25 30 40 45 50 55 60 65 16 ranking the relevance of the separate constituent parts of the document to the document as a whole displaying on the display means an initial number of the separate constituent parts of the document based upon the relevance ranking of the separate constituent parts of the document repeatedly specifying a subsequent number of the separate constituent parts of the docu ment by user control of a continuously variable control means and displaying on the display me
8. relevance to the document as a whole 30 The computer system medium of claim 28 wherein the user control of the continuously variable user control means is by moving a slider displayed on the electronic display 31 The computer system medium of claim 28 wherein the user control of the continuously variable user control means is by moving a mechanical slider 32 The computer system medium of claim 28 wherein the user control of the continuously variable user control means is by rotating a knob 33 A computer system medium containing a series of instructions configured to cause a computer system to per form steps of the following method a dividing a document into separate portions b relevance ranking the separate portions of the docu ment to the document as a whole c receiving a specified number of the separate portions of the document specified by a continuously variable user interface element and d displaying the specified number of the separate portions of the document in an order determined by the rel evance ranking 34 The computer system medium of claim 33 wherein the user interface element includes a slider 35 The computer system medium of claim 33 wherein the user interface element includes a knob 36 The computer system medium of claim 33 further comprising instructions to cause the computer system to 5 867 164 17 continuously repeat the steps of receiving a specified num ber of the separate portion
9. the slider the window instantaneously updates to display a summary with more or less detail and in the same order as the original document Thus as the user moves the slider to ask for more detail the summarized document appears to grow with the ever increasing number of sen tences instantly appearing in their original order and para graph structure with the upper limit being the entire original document And as the user moves the slider to ask for less detail the summarized document appears to shrink with the sentences instantly disappearing and the remaining sen tences within each remaining paragraph collapsing to form new summary paragraphs with the lower limit being the one sentence most characteristic of the entire document accord ing to the summarization mechanism And again the inter face mechanism of the preferred embodiment of the present invention operates as simply as having the user manipulate a cursor control device such as a mouse trackball or trackpad to move a slider control on the computer display to indicate that more or less summary information is desired Referring now to FIG 2 a sample screen from the system before it has summarized the document can be seen In the figure a document summary window 201 can be seen wherein the slider 203 is set to All indicating that all of the sentences in the original document are to be shown The scroll bar 205 on the right hand side of the window a standard feature of
10. would mere up and down buttons having discrete quantized levels A slider control combined with immediate display feedback immediately displaying greater or fewer sentences in the document being summarized as the user moves the slider means the user only has to be concerned about whether the amount of summarized information being displayed is of the desired quantity And the present invention has clear advantages over requiring the user to specify actual summary values or percentages Just as in the case of a light dimmer switch where the user only knows that they want more or less light rather than say knowing that what they want is 15 more light or 22 less light the slider control of the present invention avoids placing on the user the additional cognitive load of first estimating the new amount desired In other words after the user determined that more or less summary information was desired if the interface mechanism required specifying a summary percentage or utilizing up and down buttons then the user would have to be concerned with exactly how much or less information is truly desired It is less intuitive to require the user desiring more infor mation to first determine that 49 isn t enough but that 58 is sufficient or to try a series of static up and down clicks until the desired amount is obtained The more intuitive 10 15 20 25 30 35 40 45 50 55 60 65 6 interaction mechanism of th
11. DW su r 99U0lg Old 5 867 164 Sheet 3 of 8 Feb 2 1999 U S Patent eoupuuojed eoud poob 002189 D1040 0 N ZHWES Nd e up WOM QO x pub JayDeds a Ayjonb ybiy UUM p ddinb jejndujoo puosuad A y doyysap edAjojoud M N 001 1 24 4e ndwoy buosjed Djlbip pup punos Pursseooud 40 eoupuuojied ubi LOG 2d1ewog zHW99 Pespq 2SlM Add SAUP WON 02 peeds x pp pup jexpeds qc uy peddinbe puosjed A v doyjsep edAjojoud MAN 1X9 9diN 4ejnduo buosjed updo ejddy juepiseud iias pips jaysow yuouulDuajua woy eui Ol suonnjos p zilmuosi d enbiun 4940 o p ddinb Aiyd pp si j9euolg SD sjeuimnsuoo 40 SM U 109 6 51 yuoLuao9unouuD au 49ul0jsno 4 ndulo 0405 ouueo Dipeuun nuu s o suoyhnlos ul lD l woy oj jeeuold s lqpu w s s ysojuloDW y jo samod pub lllqix li aul sJa Ndwod DUOSsJad jo aul Meu S j9euold ul asn 40 SO ODN eui esueo o Sul 4ejndujo3 e ddy uj yu w ibo UD soy i DU uonDpJodJoj S91U04 99J3 J99uolg 110394 SOODW su r Ol J898uolg YI 5 867 164 Sheet 4 of 8 Feb 2 1999 U S Patent byBip pup punos Duissoeooud 40 eoupuuoped ublu og odiewog zHW99 PaSDA IS
12. IY nd9 e up NOW g9 peeds x py pub 6 gF uy p ddinb puosjed A y doyysap ed jojoud 1X9 9dW 4 nd lo buosueg 110790 SQODW su r OL 499U0lg b Did 5 867 164 Sheet 5 of 8 Feb 2 1999 U S Patent N LXD dN Jeyndujo puosJag Wd 6S 7 9661 92 des ni noop Old IPMN 491 9 DA SQODW 9SU92I Ol jo9UOlg U 2 12 0061010 UD 4940 yone Wd LUS G66 92 des ni noop Old IMON YIL 02 6 144 1461 fopuoy 9AnoDJeju 024 V 5149 2 Wd 9225 S661 92 dag ni noop Old IMON YOY pay pon 1807 Jeqp pul ezis MOPUIM onpulojny qua o4 Aupwiwins ojny L 5 867 164 Sheet 6 of 8 Feb 2 1999 U S Patent Z Ol pod As uonms pexoo1 ipuuuns jueuinoop D ul p pnl ul oj jo Junolup y 0402 AIsnonunuo o jasn y SMO D uJejs s UONDZUDWLUNS 3ueuinoop aAnoDJ9jul 9LUN D9J Y SjuauJuJo23 66 Jequie eg LAG Old S MOON UOISJ9A Wd vZ G G661 92 des en peuipow Wd yz G66L gz des an pejoeu N 1X9 2dNW 493nduio jouosuad 12 OOSIDUDI4 UDS 10190 SQODW esuaor 0 j9auolg L 62 6 TN WON Kopuow juejpd Aipwwns on MODUIM Ainwwing o3npuojny YSOJUIODW 9J9uM ueuinoop eAnopJeju eul
13. United States Patent nol Bornstein et al US005867164A 5 867 164 Feb 2 1999 1 Patent Number 4 Date of Patent 54 INTERACTIVE DOCUMENT SUMMARIZATION 75 Inventors Jeremy J Bornstein Menlo Park Douglass R Cutting Oakland John D Hatton Mt Hermon Daniel E Rose Cupertino all of Calif 73 Assignee Apple Computer Inc Cupertino Calif Notice This patent issued on a continued pros ecution application filed under 37 CFR 1 53 d and is subject to the twenty year patent term provisions of 35 U S C 154 2 21 Appl No 536 020 22 Filed Sep 29 1995 53 Bit CES uus c dett ts tes G06F 15 00 52 US 6 eR 345 357 345 340 58 Field of Search 395 357 340 395 339 341 346 336 331 329 326 348 349 350 351 352 353 354 355 356 56 References Cited U S PATENT DOCUMENTS 5 168 533 12 1992 Katoctal 382 54 5 278 980 1 1994 Pedersen et al 395 600 5 477 451 12 1995 Brown et al 364 419 5 483 468 1 1996 Chen et al 364 551 01 OTHER PUBLICATIONS Salton amp McGill The Smart and Sire Experimental Retrieval Systems 1983 pp 120 123 Salton amp Buckley Term Weighting Approaches in Auto matic Text Retrieval Information Processing amp Manage ment vol 24 No 5 pp 513 523 Witten Moffat amp Bell Managing Gigabytes Compressing and Indexing Documents
14. ans the subse quent number of the ranked separate constituent parts of the document 24 The method of claim 23 wherein the constituent parts of the document are sentences of the document 25 The method of claim 24 wherein the user control of the continuously variable control means is by moving a slider displayed on the display means 26 The method of claim 24 wherein the user control of the continuously variable control means is by rotating a knob displayed on the display means 27 A computer system medium containing a series of instructions configured to cause a computer system to per form steps of the following method for displaying on an electronic display a summary of a document comprising one or more sentences ranking the relevance of the one or more sentences of the document to the document as a whole displaying on the electronic display an initial number of the one or more sentences of the document based upon the ranking of the one or more sentences of the docu ment specifying a subsequent number of the one or more sentences of the document by user control of a con tinuously variable user control means displaying on the electronic display the subsequent num ber of the raked one or more sentences of the document 28 The computer system medium of claim 27 wherein the user control means is continuously variable 29 The computer system medium of claim 27 wherein ranking the one or more sentences of the document is by
15. ard 7 The computer system of claim 5 wherein the input device is a mouse 8 The computer system of claim 1 wherein the separate portions are sentences of the document 9 The computer system of claim 1 wherein ranking the separate portions of the document from highest to lowest is based upon the relevance of the separate portions to the entire document 10 A computer system for displaying a summary of a document comprising a document containing one or more separate sentences a relevance ranking means for ranking the relevance of the one or more separate sentences to the document as a whole a continuously variable graphical control means for speci fying an amount of the document to be included in the summary 5 867 164 15 a display means for displaying the summary of the document based upon the specified document summary amount and the ranked relevance of the one or more sentences 11 The computer system of claim 10 wherein the con tinuously variable control means is a slider means displayed on the display means 12 The computer system of claim 10 wherein the con tinuously variable control means is a rotating knob displayed on the display means 13 A method for displaying on an electronic display a summary of a document comprising one or more sentences the method comprising the following steps ranking the relevance of the one or more sentences of the document to the document as a whole displaying on the electronic
16. contextual inference and or syntactic coherence However again regardless of the sophistication of the summarization mechanism and note that the present inven tion is equally applicable to document summarization using any reasonable summarization mechanism now known or later developed it is highly unlikely that any particular summarization mechanism will always generate the degree of detail desired by the user As such the present invention provides the user with a control mechanism to vary the degree of summary detail so as to suit the particular user s tastes and interests at that point in time and for that particular purpose In the preferred embodiment of the present invention a summarization engine again any reasonable summariza tion mechanism would work with the present invention running on a personal computer is used to rank all of the sentences in a document from most to least representative The user interacts with the system by adjusting a slider control displayed in a graphical user interface of the com puter system As the user moves the slider to a given position the engine returns the top n sentences where n is based on the slider s position The sentences original order and paragraph structure are maintained in the preferred embodiment of the present invention as a summary consist ing of those n sentences is displayed in a window on the computer screen The effect of the present invention is that as the user moves
17. d switch buf Consider the current character in the buffer case if buf 1 handle Suzanne said I love you If it s a quotation mark preceded by a period we found a sentence break conclusive sentence True break case lookahead buf 1 If it s a period consider next character if lookahead handle elipses If part of an ellipsis consider the character after the last period while lookahead amp amp ookahead lt last_loc_of_buffer lookahead if lookahead gt last loc of buffer no more characters 1 buf lookahead break rule out some abbreviations by checking for space followed by capital letter bool was space after period False while isspace lookahead amp amp skip white space Was there a space after the period If so it might be a sentence break ookahead lt last loc of buffer lookahead was space after period True if lookahead last loc of buffer buf lookahead break if lwas space after period break things a sentence can start with here If we have a quote bullet or dash after the space we ll treat this as a sentence break if lookahead lookahead lookahead conclusive sentence True break else if tisupper lookahead break If lowercase letter after period it s not a sentence break otherwise c
18. display an initial number of the one or more sentences of the document based upon the ranking of the relevance of the one or more sen tences of the document to the document as a whole specifying a subsequent number of the one or more sentences of the document by user control of a con tinuously variable graphical user control means displaying on the electronic display the subsequent num ber of the ranked one or more sentences of the docu ment 14 The method of claim 13 wherein the user control means is continuously variable 15 The method of claim 13 wherein ranking the one or more sentences of the document is by relevance to the document as a whole 16 The method of claim 14 wherein the user control of the continuously variable user control means is by moving a slider displayed on the electronic display 17 The method of claim 14 wherein the user control of the continuously variable user control means is by moving a mechanical slider 18 The method of claim 14 wherein the user control of the continuously variable user control means is by rotating a knob 19 A method for a user to display a summary of a document on an electronic display the document comprising one or more sentences the method comprising the following steps separating the one or more sentences of the document ranking the relevance of the one or more sentences of the document to the document as a whole displaying an initial number of the separate one or
19. e here that the examples of FIGS 2 4 are merely static points in time and that the user has the flexibility to continuously alter the slider position In this way the user might first see the summary window as it appears in FIG 3 wherein one eighth of the document is displayed Then the user might continuously move the slider towards the A T setting thus requesting more and more of the document be displayed in the summary window until he reaches the summary window as it appears in FIG 2 wherein all of the original document is available for viewing Then the user might decide that less of the document is desired to be viewed and thus move the slider back towards the One setting such that the system is continuously showing less and less of the original document Finally the user might end up moving the slider all the way down to the One setting wherein only the one most indicative sentence is displayed in the document summary window as it appears in FIG 4 As just explained a significant advantage of the present invention lies in the use of the slider or knob user interface control Just as in the case of a dimmer switch to control room lighting which provides direct feedback by having the light get brighter or dimmer as the user moves the slider or knob control as well as having an essentially infinite number of settings using a slider or knob control in the present invention has greater intuitiveness and utility than
20. e present invention allows the user to interactively operate a continuously variable control while providing immediate display feedback of the greater or lesser information until the user determines that the appropriate amount of information is displayed Thus another advantage of the present invention as alluded to above is that the user has the option of continu ously changing the amount of summary information being displayed which thus facilitates the user requesting more and more of the original document as the greater and greater summary amount further piques the user s interest And then after the user has read the desired amount of document summary the user still has the option of decreasing the final amount of summary information This has the added benefit of providing the reader with as much information as desired while still facilitating minimal document summaries which might then be used in other ways e g see below regarding View by Sentence and comment window applications A general overview of the summarization engine of the present invention will now be explained Note first however that any of a large variety of well known summa rization techniques are equally applicable to the present invention In many prior art document retrieval systems a vector model approach has been taken where each record or document is represented by a vector representative of the distribution of terms in the document A part
21. f it Whether one is reviewing a previously arranged set of documents as in the case of reading an on line newspaper or magazine reviewing the results of an electronic search or scanning documents stored on a large hard disk drive of a personal computer it can still take considerable time to read more than a minimal amount What is needed therefore is a facility which provides a summary or abstract of each document Having a summary of each document allows the reader to determine whether that document is of interest and hence reading more of the document might be desirable Conversely reading the sum mary of a document could suffice to sufficiently inform the reader about the document or instead could indicate to the reader that the particular document is not of interest No matter the result a good document abstract mechanism could be quite valuable in the modern digital world However a good document abstract mechanism means more than merely providing an automatic summary of a document Prior approaches to document summarization or Automatic Sentence Extraction as discussed on pages 87 89 of the Introduction to Modern Information Retrieval by Salton and McGill Copyright 1983 incorpo rated herein by reference in its entirety have yet to yield abstracts in a readable natural language context which obey normal stylistic constraints Salton and McGill fur ther state that r eadable extracts are obtainable
22. ffer 45 before determining conclusively whether it s a sent or not ran out of buffer gives that indicator return buf start of sent We claim 1 A computer system with a direct manipulation interface comprising continuously variable graphical user control means for setting a level indicator in the computer system a separating means for dividing a document into separate portions a ranking means for ranking the separate portions of the document from highest to lowest relevance according to the relevance of the separate portions of the docu ment to the document as a whole asummary producing means for extracting as many of the highest ranking separate portions of the document as dictated by the level indicator setting a display means for displaying the extracted separate portions of the document on a display screen of the computer system 2 The computer system of claim 1 wherein the user control means is continuously variable 3 The computer system of claim 2 wherein the continu ously variable user control means is a slider means displayed on the display screen of the computer system 50 55 60 65 4 The computer system of claim 1 wherein the user control means is built into the display screen of the computer system 5 The computer system of claim 1 wherein the user control means is an input device of the computer system 6 The computer system of claim 5 wherein the input device is a keybo
23. head break detect list items lacking sentence punctuation clues If the newline followed by another or a tab or 3 or more spaces it s a sentence break 14 skip space that might be between two returns r two returns amp amp lookahead 1 amp amp lookahead 2 if lookahead n lookahead I lookahead t return followed by a tab gt paragraph delimiter lookahead return followed by 3 or more spaces conclusive sentence True break while isspace lookahead amp amp skip white space lookahead lt last loc of buffer lookahead if lookahead gt last loc of buffer buf lookahead break Ditto if followed by a bullet or two hpyhens if lookahead lookahead amp amp lookahead 1 1 conclusive sentence True break break Back to our initial character If a question mark or exclamation point it s a break case 2 case 1 conclusive sentence True if a period 1 or is immediately followed by a double quote count the quote as part of the sentence if buf 1 buf break default break buf while conclusive sentence amp amp buf lt last_loc__of__buffer ran out of buffer conclusive sentence return the length you conclusive sentence even if we ran out of bu
24. heck if it was just an abbreviation now we check for Mr Mrs etc currently handles Dr Mr Mrs Ms i e if buf start of sent gt 2 switch buf 1 case r if buf 2 M buf 2 D Dr Mr abrev True break case s if buf 2 M Ms abrev True break case e if buf 2 amp amp buf 3 i ie abrev True break T if buf start of sent gt 3 amp amp buf 1 s amp amp buf 2 r amp amp buf 3 M abrev True special case if a period is immediately followed by a double quote count the quote as part of the sentence Hif labrev amp amp buf 1 buff conclusive sentence abrev if we get here its the simple case of end of setence break that is hello there Go away now catch amp separate list items here expensive back to our initial character If it wasn t a quote or period what was it case r This section is trying to separate lists of items e g bullets that may not use punctuation to separate the items case An if remove returns buf s replace the return with a space 5 867 164 13 APPENDIX A continued lookahead buf 1 while lookahead amp amp lookahead lt last loc of buffer lookahead if lookahead gt last loc of buffer 1 buf looka
25. icular search query is then represented as a vector such that the retrieval of a particular record or document then depends upon the magnitude of a similarity computation between the particu lar document s representative vector and the query s repre sentative vector Suffice it to say that the vector model of document comparison is well known in the art of computer search and retrieval mechanisms see Salton and McGill Introduction to Modern Information Retrieval 1983 pages 120 123 Salton and Buckley Term Weighting Approaches in Automatic Text Retrieval Information Processing amp Management Vol 24 No 5 pp 513 523 Witten Moffat and Bell Managing Gigabytes Compressing and Indexing Documents and Images 1994 pp 141 148 and Frakes and Baeza Yates Information Retrieval Data Structures amp Algorithms 1992 pp 363 392 all incorporated herein by reference in their entirety Typical prior art search and retrieval mechanisms however attempt to find out of a corpus comprised of multiple documents one or more documents which are most similar to a single query which may itself be a document Instead the preferred embodiment of the present invention treats each sentence in the document to be summarized as being equivalent to an entire document and thus the set of all of the sentences of the document can be treated as the corpus of documents to be searched Then the present invention treats the text of the origi
26. ing some or all of the top sentence of each document in a display line or listing of documents in a computer system user interface FIG 7 is a sample user interface display showing the top sentence of a document in a comments field of an infor mational window of the document in a computer system user interface FIG 8 is a sample user interface display showing the top sentence of a document in a pop up area of a display line or listing of documents in a computer system user interface and FIG 9 is a sample user interface display showing the top sentence of a document in an open dialog box in a computer system user interface 5 867 164 3 SUMMARY AND OBJECTS OF THE INVENTION It is an object of the present invention to provide an interactive document summarization system It is a further object of the present invention to provide an interactive document summarization system wherein the user of the system can control the amount of the document summary It is a still further object of the present invention to provide a file listing containing document summary infor mation It is an even further object of the present invention to provide document summary information about a document in a variety of contexts The foregoing and other advantages are provided by a method for a user to display a summary of a document on an electronic display the document comprising one or more sentences said method compri
27. n expanded display in a display line or listing of documents when the user positioned a pointer over the document name or icon when in a particular expanded display mode or when depressing a particular keyboard key and or mouse button combination as is shown in FIG 8 Still further such display could also take the form of an open dialog box where instead of displaying a thumbnail miniature image of a graphic image document or merely the first sentence of a textual document a summary comprised of a top sentence or sentences could be displayed as is shown in FIG 9 An additional feature of the user interface document summary mechanism is the option as in the more general document summary invention described above for the user to control whether more or less of the document summary is to be displayed In other words while the default setting of a graphical user interface which displayed the show top sentence option might typically be to show only the one top sentence the user could have the option of displaying a greater number of representative sentences from the sum marized document Such additional sentences might simply wrap onto the next line of the display or instead might only 5 867 164 9 be displayed when the user positioned a pointer over the document name or icon when in a particular mode e g similarly to the standard Macintosh Finder Balloon Help feature or when depressing a particular keyboard key and or mo
28. nal document as the query to be applied to the corpus In this way a determina tion can be made as to how similar each sentence in the document is to the document as a whole The result is a ranking or value score for each sentence in the document being summarized Then depending upon either a preset value n or the user specified slider setting n only those sentences above the ranking or value score of n get displayed in the document summary Furthermore the present invention as is common in the art uses term weighting to provide distinctions between the various terms or in the present invention words in a document The present invention utilizes a well known term weighting formula see e g page 518 of Salton and Buck ley in the Term Weighting Approaches in Automatic Text Retrieval article referred to above and incorporated herein wherein the term weighting components are as follows 5 867 164 7 tf the number of times a term word occurs in a sentence or in a document as a whole N the number of sentences in the document and n the number of sentences in the document which contain a given term The term weighting formula is applied to both document and query vector terms and is tfc where t is replaced by log tf 1 to better normalize long documents and to keep things positive f is replaced with log N n 1 to permit a search for a word that occurs in every sentence to in fact find every sentence and c is unaltered
29. opyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records but otherwise reserves all copyright rights REFERENCE TO RELATED APPLICATIONS The present application is related to co pending U S patent application having attorney docket number P1809 filed on the same day as the present application assigned to the same assignee and having the same inventive entity FIELD OF THE INVENTION The present invention relates to the field of document summarization which is otherwise known as automatic abstracting wherein an extract of a document 1 a selec tion of sentences from the document can serve as an abstract BACKGROUND OF THE INVENTION The advent of the personal computer and modern tele communications has resulted in millions of computer users communicating with each other around the globe One of the primary uses of such computers by such users is accessing the vast store of digital information which has been created over the last several decades Further additional digital information is created daily due to both the conversion of information previously unavailable digitally and the large amount of new information created by an ever increasing computer user population One concern with this vast ever increasing amount of digital information is the time it takes to read even a small portion o
30. ously variable user interface element and d a displayer coupled to the ranker the receiver and a display for displaying on the display the specified number of the separate portions of the document in an order determined by the relevance ranking 42 The computer system of claim 40 wherein the user interface element includes a slider on the display 43 The computer system of claim 40 wherein the user interface element includes a knob on the display 44 The computer system of claim 40 further comprising an iterator for continuously activating the receiver and the displayer for displaying the specified number of the separate portions of the document in an order determined by the relevance ranking
31. out of buffer bool first in paragraph bool remove returns first in paragraph False chew up leading whitespace char last loc of buffer buf length 1 identify if this is the start of a paragraph bool last was return False while isspace buf amp amp buf last loc of buffer switch buf case Ar case m if last was return return followed by return first in paragraph True else last was return True break case t if last was return return followed by tab else last was return False something came after the preceding return other than a return or tab break case if last was return amp amp isspace buf 1 return followed by more than one white space first in paragraph True else last was return False something came after the preceding return other than a return or tab break default break buf T start of sent buf ran out of buffer True if buf gt last loc of buffer start of sent 0 return 0 note that past this point we ll return sum length even if we hit end of the buffer before concluding a sent Now we start looking for the end of the sentence start of sent buf bool conclusive sentence False bool abrev False char lookahead do we re going to repeat a big loop until we find a sentence break or run out of characters in the buffer 5 867 164 11 12 APPENDIX A continue
32. s of the document and displaying the specified number of the separate portions of the docu ment in an order determined by the relevance ranking 37 Acomputer system method comprising the following steps a dividing a document into separate portions b relevance ranking the separate portions of the docu ment to the document as a whole c receiving a specified number of the separate portions of the document specified by a continuously variable user interface element and d displaying the specified number of the separate portions of the document in an order determined by the rel evance raking 38 The computer system method of claim 37 wherein the user interface element includes a slider 39 The computer system method of claim 37 wherein the user interface element includes a knob 40 The computer system method of claim 37 further comprising continuously repeating the steps of receiving a specified number of the separate portions of the document and displaying the specified number of the separate portions of the document in an order determined by the relevance ranking 10 15 20 18 41 A computer system comprising a a separator for dividing a document into separate portions b a ranker coupled to said separator for relevance ranking the separate portions of the document to the document as a whole c a receiver for receiving a specified number of the separate portions of the document specified by a con tinu
33. set value or the slider position value which thus indicates how far down the ranked list to go Again the markers on the slider could be repre sented as a proportional amount of the entire document as a numeric value of the number of sentences of the total document or even as a non linear value indicator of the total document While this last form may not sound as intuitive as the former ones it is important to note that studies have shown that most of the content of a document can be understood by only reading a relatively small amount of the entire document e g 20 25 Further remember that the user interface of the present invention frees the user to focus on the displayed summary content rather than on some more obscure summary percentage or value As such a non linear slider may provide even greater utility to the user of the present invention Lastly the slider position is monitored 513 so that if the user changes its position thus indicating a desire for more or less information the appropriate amount of summary information based on the new slider position 511 can be displayed It is important to note a performance advantage in the process just described In the preferred embodiment of the present invention because the query 507 asked for all of the sentences in the document before concerning itself with how many sentences will be displayed every sentence in the document gets a ranking 509 Then whenever the slider position
34. sing the steps of i separating the one or more sentences of the document ii ranking the relevance of the separate one or more sentences of the document to the document as a whole iii displaying an initial number of said separate one or more sentences of said document based upon the relevance ranking of said one or more sentences of said document and iv repeatedly speci fying a subsequent number of said separate one or more sentences of said document by user control of a user control means and displaying said subsequent number of said ranked separate one or more sentences of said document The foregoing and other advantages are also provided by a computer system for displaying a summary of a document comprising i a document containing one or more separate sentences ii a relevance ranking means for ranking the relevance of the one or more separate sentences to the document as a whole iii a continuously variable control means for specifying an amount of the document to be included in the summary and iv a display means for displaying the summary of the document based upon the specified document summary amount and the ranked rel evance of the one or more sentences Other objects features and advantages of the present invention will be apparent from the accompanying drawings and from the detailed description which follows DETAILED DESCRIPTION OF THE INVENTION The present invention can be implemented on all kinds of compu
35. stem such as the Apple Macintosh Finder where stored documents either locally stored e g on a hard disk drive of the computer or remotely stored e g across a network or even across the internet can be displayed by name application type date created etc When using such an interface a user is oftentimes faced with a window displaying a long list of such stored documents without much hint as to what the documents actually contain While documents or files are often given a particular name in order to provide a hint of their content or subject matter the user is still often left wondering what a particular document or documents con tain As such using the summarization engine of the present invention the system could provide a show top sentence option This option would display to the user the one sentence of a document which is most indicative of the contents of that document Such display could take the form of a portion of the display line or listing of documents in a computer system user interface as in a Finder folder window of the Macintosh computer system as is shown in FIG 6 wherein the amount of the top sentence displayed is limited by the amount of window display space allotted to this field Such display could also take the form of being displayed in a comments field of an informational window about the document in a computer system user interface as is shown in FIG 7 Such display could also take the form of being a
36. tem Experiments in Automatic Document Processing Copy right 1971 Prentice Hall Inc pp 144 156 Trafton User s Manual for Ed Search Desktop Edition 1995 pp 1 8 10 11 1995 Primary Examiner Raymond J Bayerl Assistant Examiner Steven P Sax Attorney Agent or Firm Edward W Scott IV 57 ABSTRACT A real time interactive document summarization system which allows the user to continuously control the amount of detail to be included in a document summary 44 Claims 8 Drawing Sheets Input Document to be Summarized 501 Find Sentence Breaks in Document 503 Build Index Database for Document 205 Query Index to Compute Similarity Scores between each Sentence and entire Document 207 Prepare Relevance Ranked List of all Sentences in the Document 509 Display Desired Number of Sentences based on Slider Position las Slider Position Changed 513 211 U S Patent Feb 2 1999 Sheet 1 of 8 m 27 FIG 1 nput Document to be Summarized 501 Find Sentence Breaks in Document 203 Build Index Database for Document 205 Query Index to Compute Similarity Scores between each Sentence and entire Document 207 Prepare Relevance Ranked List of all Sentences in the Document 209 Display Desired Number of Sentences based on Slider Position 511 Position Changed 213 No FIG 5 5 867 164 5 867 164
37. ter systems Regardless of the manner in which the present invention is implemented the basic operation of a computer system embodying the present invention includ ing the software and electronics which allow it to be performed can be described with reference to the block diagram of FIG 1 wherein numeral 10 indicates a central processing unit CPU which controls the overall operations of the computer system numeral 12 indicates a standard display device such as a CRT or LCD numeral 14 indicates an input device which usually includes both a standard keyboard and a pointer controlling device such as a mouse and numeral 16 indicates a memory device which stores programs according to which the CPU 30 carries out various predefined tasks The interactive document summarization program according to the present invention for example is generally also stored in this memory 16 to be referenced by the CPU 10 As stated above the process of document summarization or automatic abstracting is well known in the art A variety 10 15 25 30 35 40 45 50 55 60 65 4 of different mechanisms used singly and in combination have been tried to automatically create document summaries or abstracts Such mechanisms typically start with determin ing the significance of particular words and or sentences usually by focusing on position in the document semantic relationships and term frequencies Further criteria may include
38. the standard Macintosh Finder user interface environment indicates that there is more of the document that exists than can fit within the window 201 displayed on the screen in other words while the AIL setting allows viewing of the entire document not all of the document may be displayable at a given point in time due to display screen and or window size constraints In this example the original document contains 32 sentences and with this window size would fill several screens of text 5 867 164 5 Referring now to FIG 3 the user has moved the slider 203 typically via a cursor control device such as a mouse trackball or trackpad to indicate that he only wants a summary one eighth the size of the original document note that predetermined summarization settings wherein the sys tem automatically generates a preset amount of summariza tion according to previously set system or user values are equally supportable with the present invention to be dis played within the document summary window 201 The summary now fits within the window 201 as indicated by the empty scroll bar 205 on the right hand side of the summary window Referring now to FIG 4 the user has now moved the slider 203 to indicate that he only wants a summary which shows the one sentence deemed by the summarization engine to be most representative of the document s content to be displayed within the document summary window 201 It is important to not
39. use button combination A large variety of display options is thus possible with the approach of the present invention depending upon such factors as display size and resolution user preferences and system capabilities 10 In the foregoing specification the present invention has been described with reference to a specific exemplary embodiment and alternative embodiments thereof It will however be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims The specifications and drawings are accordingly to be regarded in an illustrative rather than a restrictive sense APPENDIX A 1 find next sentence j On return start_of_sent will be gt buf if first chars encountered are whitespace Ui Normally returns length of sentence starting from returned value of start of sent rn If it returns 0 then it ran out of buffer before finding a sentence The caller will typically the remaining text to the beginning of a buffer fill up the buffer rn and then call this again The case where a complete sentence does not fit in the buffer should be checked by the caller Can t handle see J P there or call A Morgan s Handles Mr Mrs Ms Dr and i e Tea m in ss ss m P int find next sentence char buf uint32 length char start of sent bool ran
40. without excessive difficulties but perfection cannot be expected within the foreseeable future 10 15 20 25 40 45 50 55 60 65 2 One difficulty with prior document abstract mechanisms even when overcoming many of the natural language barriers is that the system or mechanism can never know for certain whether the user is receiving as much or as little of an abstract as they would like In other words no matter how well the mechanism can determine which portions of the document to include in the summary or abstract the mecha nism can never automatically include just the right amount of abstract to always please the user This can be due to different users interest levels different user s reasons for reviewing the document and even time or situation varying interests of the same user As such what is needed is not necessarily a better abstracting algorithm as much as a mechanism which allows the user to interactively specify whether the present abstract is sufficient or instead whether more or less of the original document should be included in the abstract or summary The present invention utilizes an interactive control which allows the user to specify whether more or less of the original document should be included in the document summary Allowing the user to interactively control how much of the original document gets included in the summary facilitates rapid review of documents in which the user
Download Pdf Manuals
Related Search
Related Contents
Service Manual - ApplianceAssistant.com AZV/PSI Manuale di Installazione Installation Manual 静電気除去ガン・イオンバズーカ 取扱説明書 SAN ANGELO INDEPENDENT SCHOOL - DDay.it Poulan 175668 User's Manual Zooper SL808 User's Manual fenix diesel Application of the RVSS to replace wye-delta motor starting MANUAL DE USUARIO portal - Centro Universitario UTEG Copyright © All rights reserved.
Failed to retrieve file