Home
Indexed Search Reference
Contents
1. phash The phash of the indexed document phash_t3 The phash of the parent TYPO3 page of the indexed document If the document being indexed is a TYPO3 page then phash and phash_t3 are the same But if the document is an external file PDF Word etc which are found as a LINK on a TYPO3 page then this phash_t3 points to the phash of that TYPO3 page Normally it goes like this when indexing 1 The TYPO3 document is indexed this has a phash value of course then 2 if any external files are found on the page they are indexed as well AND their phash_t3 will become the phash of the TYPO3 page they were on The significance of this value is that indexed external files may have more than one record in index_section with the same phash a record for each parent page where a link to the document was found There are details about this in the section of this document that describes the complexities of indexing pages rlO The id of the root page of the site rl1 The id of the level 1 page if any of the indexed page rl2 The id of the level 2 page if any of the indexed page page_id The page id of the indexed page uniqid This is just an autoincremented unique primary key Generally not used i think index fulltext For free text searching eg with a sentence in all content title description keywords body phash The phash of the indexed document fulltextdata The total
2. type required Same reasons results are bound to a page in the page tree The process of indexing a directory of files is the same as for the external URL For each directory a all files are indexed and b all sub directories added to the crawler queue for later processing This is shown in the crawler log SITE CRAWLER Crawler log Y 3 levels Display ali Show Result Log Show FE Vars i Reload list Download entries as CSV Flush entries Current server time 17 22 10 Page Title gid Scheduled Run time Status Url 3 File archive 721 El 28 02 06 17 22 01 28 02 06 17 22 01 OK fileadmin templat Taa El 28 02 06 17 22 04 ka fileadmin templates template_ce html 723 El 28 02 06 17 22 07 E fileadmin templates template html 724 El 28 02 06 17 22 10 ae fileadmin templa template _left_col html 725 28 02 06 17 22 13 T fileadmin templa template print html 726 El 28 02 06 17 22 16 En fileadmin templates template Xtra html PETI El 8 02 06 17 22 19 a filleadmin templates im 728 Ed 28 02 06 17 22 22 a filleadminy templates ri When processing is done the result is shown in the Web gt Info Indexed search Groups Proc Instr set_id Index Cfg UID 6 35399339 Index Cfg UID 6 35399339 Index Cfg UID 6 35399339 Index Cfg UID 6 35399339 Index Cfg UID 6 35399339 Index Cfg UID 6 35399339 Index Cfg UID 6 35399339 Index Cfg UID 6 35399339 15 i w TY PO 3 Index
3. 1 8 16 1 8 16 1 1 22 22 22 B 8 21 20 19 18 16 17 Location You must place the indexing configuration on the page where you want the search results to be displayed typically on the page where a plugin exists that can process the parameters pointing to the record In the case below the Indexing Configuration is placed on the same page as the frontend plugin Morbi diam enim that can display the search results Pagecontent 1 J Header Morbl diam enim sodales et C Alternative Page Language 1 Pagetitle Ikke cached Indexing Configuration 1 Title Mon cached elements The configuration record looks like this Localization Be All languages Sine Localize to 4 i 12 Yy TY PO 3 Indexed Search Reference doc_indexed_search Indexing configurations Test element I AB E 3 a ct ow ps El Ae 2 Alternative Source Page mic Ej Page Fields first is title ithe textarea 7 GET parameter string with UIDE substitution er_noncachedtest_pilSs5BshowWicSsSb 24UID 4 2 Calculate cHash force caching How many records to index a minute default is 100 2 Index Records immediately when saved If the records you want to index is not located on the page where the indexing configuration and fronend plugin is then you can point to the location Notice how the field with GET parameters is used to define how the searc
4. Download podcast g New to TYPOS g lists netflelders de g TYPO3 org Kull Hash cHash Kid amp type amp L tr 103197494 31856728 http typod org community your account log nlogout T bd 232397697 183032655 http typo3 org about m 23715268 235882421 http typo3 org community aboutf Ti Ed 25660643 244809879 http typo3 org teams m 102810832 262485163 http typod org development m Ed 193112513 78117387 http typo3 org documentation T La 197018271 66227186 http typo3 org download m 71278070 260668958 http typo3 org podcast n 4 231487114 8428303 http typos org about new to typos Ti Ed 5450794 236513955 http typo3 org frontpage menu links maillimglists Ti Ed 150031203 94669442 http typo3 org teams typo3orgf Indexing directories of files Filepath on server You can also have directories of files on your server indexed periodically using the type Filepath on server i leadmins tem plates Limit to extensions commalist 2 Depth 2 Levels Indexing configurations amp MP grlist Rootline page_ 0 1 i125 25 0 1 125 25 0 1 1 25 25 0 1 125 25 0 1 14 25 25 0 1 ys 25 0 1 1 25 25 0 1 1 25 25 0 1 1 25 25 0 1 1 25 25 0 1 125 25 7 Again the options are either easy to understand or your can read more about them in the Context Sensitive Help Location The Indexed Search configuration should be located on a not in menu page just like the External URL
5. 157570579 aia a Dp Lem al ayama i iL 0 1 Stu tireferenc 3 Cruptonet m 27611525 44925230 1221 12729 12391239 0 014 5 K 0 1 Stx t3referenc Malburgen District m 09092632 245050183 1221 122912991249 0 014 5 s 0 1 Bi t3referenc E karriere magazin tu m 183004213 139651972 decile set Shot So a eer Bid Sdx tireferenc vwwtilrmaholic de m 229402756 63813355 x ara ee Baa a AO AE ad Stx tireferenc Native Instruments m 18065393 20596503 1221 1229 12491249 0 017 BOL Bi t3referenc www drums de m i4i5i7516 202737934 a Daas Ds radia ea tat Wee alee a L ma re oaa Stu tireferenc 3 www kreis warendoa 171881822 213094181 1221 122912491249 0 018 5 F 0 1 Gtx toreferenc 5 Jenoptik Camera E m 21412092 109510109 122 b225 0245 eos Ors Bead Ste tireferenc E CIBS corporate we Wm 46639636 163542257 1221 1229 12391239 0 018 4 K 0 1 Stu t3referenc www imop muenchen de j 147115297 146690 1221 122912491249 0 017 9 BK 0 1 amp tx_ t3referenc 5 Green Square Ags m 14328375342 1936503593 L220 daa 49 1249 ode 0 1 Bi t3referenc Snowleopard Adven m 160917384 146158117 dada al ayama ra oe 0 1 Stu t3referenc Rosenbilderberg com qp262296210 192953962 1221 1229 12491249 0 017 4 K 0 1 amp tx_ t3referenc boarder ch m 233162976 2583663472 122 L Le El Sas ees i La 2 BOL Stu t3referenc Relations m 2461086395 261984020 1221 1229 12391239 0 01 0 k 0 1 Stu t3referenc www magix net m 123496477 24230116 1221 1229 12391239 0 016
6. EK 0 1 tx t3referenc Nubuk Sports Wm 22 24308 53501521 22d 249 1249 0 017 m w aaa Ei t3referenc schweizer illustr Wm e 2032061 1401592 1221 1229 12391239 0 016 6 K 0 1 Stu tireferenc E germanrnaps de 002341 273220476 1221 122912491249 0 016 e 0 1 amp tx_ t3referenc www uw ilead de Wm 7436735 182407398 1221 12291299 12949 0 0 16 8 0 1 Stu t3referenc wow UME edu Wm 62707286 199294174 1221 1229 12391239 0 016 8 k 0 1 Stu tireferenc Archined tT 162384911 249227564 12271 1229 127491249 0 016 7 EK O 1 Stx_ tireferenc stopchildtraffick m 243524094 103046063 1221 12291249 12949 0 017 39 w 0 1 Ste t3referenc As you can see most pages here are indexed only one time However a few are indexed twice This can happen for several reasons and here the reason is most likely due to a user login or something related TY P03 LY Indexed Search Reference doc_indexed_search Adminstration The most interesting occurence is the page References which has more than 20 indexed instances available The reason is that this page holds multiple cached views due to some parameters which are used by a plugin on that page Each instance will be searchable as a unique search result Now imagine that you want to clear out all those instances of the References page to let them be re indexed when viewed again Simply click the page References in the page tree to the left Then you see this Indexed search E EA Re
7. Indexed search E Aq Content elements Path Intros Startpage Content elem INDEXED SEARCH 2 levels x E Content elements m E Insert content e Special content ad E Your own scripts aoe E ISearch ae E ML WAP f PDA e E Rich Text Editor E Title 2 Content elements Insert content 3 Special content Special content mee onepage pdf WF test word doc Advanced Advanced Menu Sitemap Menu Sitemap E Multimedia Search Your own scripts E ML WAP 7 PDA Rich Text Editor bei E Thanks for your rm ISearch E ISEACH example ISEACH example Es Test af HTML exte temo txt E tsref onepage pdf tsref onepage pdf tsref onepage pdf WF test word doc dsjmediaftsref_onepage pd Indexed Search Reference doc_indexed_search Indexed search ipHash cHash T 244397583 266167241 m 189203343 1941581327 m 20870265 207670795 m 168927425 ala m 184260743 3431874 0519 24585101 51799 63209839 268192666 237588254 m 2799506 43655555 192162736 15427851 m 1 287415 21852095747 m 172407399 161154437 m 73987455 POLY F433 q eor7sdoo1 175686134 m 1332503 464606076 m 1402363025 14283305685 m 162036733 15350775 m 7473087 290779654 T 19032353555 1036627598 m 1729371832 z38674 446 7 m 106883768 125701936 m 1207463 37831661 m 1842360743 3431874 T 13851799 69209839 m 2662305319 24585101 m 68192666 BYo88254 H 01
8. Pages with messageboards may have multiple indexed versions based on what is displayed on the page The overview or a single messageboard item This is determined by the cHashParams value Pages with access restricted to must be observed Because pages can contain different content whether a user is logged in or not and even based on which groups he is a member of a single page identified by the combination of id type language cHashParams may even be available in more than one indexed version based on the user groups But while the same page may have different content based on the user groups and so must be indexed once for each such pages may just as well present the SAME content regardless of usergroups This is the very most tricky thing Understanding these complex scenarios The best thing to do is to grab an example Please refer to the picture below while reading the bulletlist here 1 The overview in general shows one line per phash row a single row from the index_phash table Such a row represents a single hit in a searching session In other words each line with grayish background in the overview may be a search hit The columns of these rows are e Title The search result title e icon Click here to remove the indexed information for this entry will be re indexed on the next hit e pHash The id of the search row The hash is calculated based on id type language MP cHashParams gr_list of the page when in
9. content stripped for any HTML codes Currently the MySQL FULLTEXT search is not used something with MATCH AGAINST but this will be added in the future index_grlist This table will hold records related to a phash row Records in this table confirms that certain gr_lists would actually share the same content as represented by phash row even though the phash row may be indexed under another login The table is used during result display to positively confirm if the current user may see the resume which otherwise might contain secret info Please see discussion far above index_words index_rel Words table and word relation table Almost self explanatory For the index_rel table some fields require explanation 31 TYPO3 Indexed Search Reference doc_indexed_search Database Tables count Number of occurrences on the page first How close to the top low number is better freq Frequency please see source for the calculations This is converted from some floating point to an integer flags Bits which describes the weight of the words 8th bit 128 word found in title 7th bit 64 word found in keywords 6th bit 32 word found in description Last 5 bits are not used yet but if used they will enter the weight hierarchy The result rows are ordered by this value if the Weight Frequency sorting is selected Thus results with a hit in the title keywords or description are ranked higher in t
10. in another language than the main language on the website see second illustration hereafter 2 If there is no phash rows found for a page this can mean three things 1 Either the page is not cached In this case both the tt_ products and tt news plugins apparently disables the caching of the page thereby disabling any indexing of the pages Searching in news and products must be done with a searching function looking up directly in the news and products tables 2 In the case with other pages the reason may be that the pages has never been visited and therefore not indexed yet Indexing of pages in TYPO3 happens during the rendering of the page there is currently no crawler to assist this job 3 Finally the reason for a page not being indexed can be the combination of 1 and 2 That the page has never been visited And if it was visited the cache would have been disabled 3 These numbers just tells us that e the page Lists was indexed once by a user with membership of group 1 and 2 e the page Addresses was also indexed by a user with membership of group 1 and 2 but has since been visited by a user without login Both instances yielded a similar page and it was therefore not indexed twice This raises the question about the page Lists Is that access restricted for users without login or has a user without login just never visited that page since no 0 1 grlist has been detected Both could be the answer On
11. not be shown Access restricted pages A TYPO3 page will always be available in the search result only if there is access to the page This is secured in the final result query Whether extendToSubpages is taken into account depends on the join_pages flag see above But the page will only be listed if the user has access However a page may be indexed more than once if the content differs from usergroup to usergroup or just without login Still the result display will display only one occurrence because similar pages determined based on phash_grouping will be detected The tricky scenario Say that a page has a content element with some secret information visible for only one usergroup The page as a whole will be visible for all users The page will be indexed twice both without login and with login because page content differs The problem is that if a search is conducted and matching one of the secret words in the access restricted section then the page will be in the search result even if the user is not logged in The best solution to this problem is to allow the result to be listed anyway but then HIDE the resume if the index_grlist table cannot confirm positively that the combination of usergroups of the user has access to the result So the result is there but no resume shown The resume might contain hidden text External media Equally for external media they are linked from a TYPO3 page When an external media is selected
12. test 672 Ed 26 02 06 16 25 06 26 02 06 16 30 13 OK 673 l 28 02 06 16 25 06 28 02 06 16 30 29 OK 674 Ed 28 02 06 16 25 06 26 02 06 16 29 06 OK 675 J 28 02 06 16 25 06 28 02 06 16 29 56 OK 676 Ed 28 02 06 16 25 06 28 02 06 16 29 40 OK 677 pi 28 02 06 16 25 06 28 02 06 16 29 23 OK i a E Non cached 678 l 28 02 06 16 25 07 28 02 06 16 31 06 OK 679 l 28 02 06 16 25 07 28 02 06 16 30 48 OK ae External Urls 680 Ed 28 02 06 16 25 08 26 02 06 16 31 24 OK m B Flle archive 661 Ed 28 02 06 16 25 09 26 02 06 16 31 40 OK i Ep Storage Folder 634 Ed 28 02 06 16 24 06 28 02 06 16 24 06 OK oh Lal l 28 02 06 16 24 09 28 02 06 16 24 42 OK l 28 02 06 16 24 12 28 02 06 16 25 00 OK j 28 02 06 16 24 15 28 02 06 16 25 01 OK l 28 02 06 16 24 18 28 02 06 16 25 02 OK 26 02 06 16 24 21 26 02 06 16 25 04 OK 26 02 06 16 24 24 28 02 06 16 25 05 OK sl a 2 1 2 3 6 i 8 22 Here you can notice that the visited URLs have additional parameters added those are combined based on the crawler extensions configuration in Page TSconfig Also notice the special crawler log entries found in the Storage folder These are the meta entries which calls an indexed search hook which in turn generates the URL entries and pushed them to the queue On the far right in this view you can see that noted as well including the set_id 11 TYPO3 Indexed Search Reference doc_indexed_search Index Cfg UID 5 Index
13. the Plugin type to be Indexed search Pagecontent NEW Guestbook s El ndexed search Board Lis Board Tree Guestbook Addresses Extension Repository Frontend User administration FAQ Consultancies References Mailing lists Documents Links Todo items Oo Docs 5 Then select the root page of your website as the Starting point of the plugin content element TY PO 3 LY Indexed Search Reference doc_indexed_search User manual Pagecontent 2030 Search Indexed search Z CODE Staringpomt www typos corn 3 www CY pos com E m Page General options Hide Start daH Aa And that s it Your frontend should now look like this Address E Search Advanced search Rules Only words with 2 or more characters are accepted Max 200 chars total Space is used to split words can be used to search for a whole string not indexed search then AND OR and NOT are prefix words overruling the default operator equals AND OR and NOT as operators All search words are converted to lowercase The styles are most likely different from this but that is controlled by the developer having administration access to the system TYPO3 Indexed Search Reference doc_indexed_search Adminstration Adminstration Monitoring indexed content The Indexed Search extension adds two backend modules one as a global database wide statistics module and a page spec
14. 0173 O 1 Andre sprog m 1232236083 24986559 10 123 0173 Opal Andere Sprachen m 066728515 DALES P L 110 173 0173 ral x sa Sitemap x en Eg www EY oS cam Hae im Default site Other lanquages Sider 3 1 Other languages 36 Other languages Other languages This page is SUDDO Size 4 3 K Created 13 12 01 Modified 13 12 01 16 27 Path Other languages 2 Andere Sprachen 83 Andere Sprachen Andere Sprachen Diese Seite sol a Size 4 3 F Created 13 12 01 Modified 08 01 02 18 10 mt Path Other languages 3 Andre sprog 57 Andre sprog Anore sporog Med denne side er det ment Size 4 3 K Created 13 12 01 Modified 13 12 01 16 27 Path Other languages Illustration 1A seach result showing how localized versions of a page are displayed 29 TYP03 Database Tables index_phash Indexed Search Reference doc_indexed_search Database Tables This table contains references to TYPO3 pages or external documents The fields are like this phash 7md5 int hash It s an integer based on a 7 char md5 hash This is a unique representation of the page indexed For TYPO3 pages this is a serialization of id type gr_list see later MP and cHashParams which enables subcaching with extra parameters This concept is also used for TYPO3 caching although the caching hash includes the all array and thus takes the template into account which this hash does not It s expected that template changes through cond
15. 2 pid t l 1 20 Z 0 0 Tana 50 0 jJ in be 6 0 0 dh en be 6 0 0 1 2 6 1 2 6 1 2 6 as E J fan ee 2 3 0 0 aL ae 2 9 0 0 nip eal 21 0 ral 31 0 TE 2 227 0 0 1 2 20 70 0 0 The n SAD 30 0 0 Ibn 2 5 29 0 0 162 265 2628 0 0 1 2 28 2 86 0 0 1 2 28 287 0 0 1 2 25 82 0 0 1 2 28 Iho iy ete 1 2 25 1229 1 2 25 12 287 Size oom co A EFE 10 7 F 1415 TZ Bod K 13 0 F 13 0 F 13 0 F Fig F Analysing the indexed data grist My gal HE F F 1 1 Saal 1 2 Z 1 F hi HH Pi a H J E MES 0 ae 0 0 0 2 1 2 cHashParams O 2 1 2Page 1 1 O 2 1 2Page aoe O 27 1 2Page 3 3 On the image below we are looking at another scenario In this case the cHashParams is obviously used by the plugin tt_board The plugin has been constructed so intelligently that it links to the messages in the message board without disabling the normal page cache but rather sending the tt_board_uid parameter along with a so called cHash If this is combined correctly the caching engine allows the page to be cached Not only does this mean a quicker display of pages in the message board it also means we can index the page Ea Board Indexed search Path Introf nother site Lists Board INDEXED SEARCH hl 3 levels aboard ahs 3 Sourcream and Oni Tite Board 3 Sourcream a
16. Cfg UID 5 Index Cfg UID 5 Index Cfg UID 5 Index Cfg UID 5 Index Cfg UID 5 Indexing configurations 132457570 132457570 132457570 132457570 132457570 132457570 Finally in the Web gt Info Indexed search you will see that these visited URLs were re indexed INDEXED SEARCH Title pHash cHash amp id amp type amp L amp MP E Testsite E Testsite ar 9015782 114738847 1 0 1 E Testside dr 170926166 255534328 1 1 0 1 a Ey Products E Products fy 118353222 104455287 22 0 1 E Haencrso ar 229905687 90905851 22 2 0 1 E Produkter dr 52513748 246855508 22 1 0 1 i se E Content Elements E Indholdselementer T 23775558 248181391 B i 1 E B rnynnax fy 173813522 6505866 8 2 0 1 E Content Elements ar 108516033 201749888 8 0 1 L E Headers E Headers dr 246129675 896580 21 0 1 L E Text Ej Text ay 43437476 135845447 20 0 1 i Bulletlists E Bulletlists 122041236 15225940 19 0 1 E Tables Ej Tables f 268380998 185823444 18 0 1 Ey Forms E Forms ar 212302303 226453064 16 0 1 Z Thank you E Thank you ar 30069562 140704441 17 0 1 Location Indexing configurations for indexing of the page tree should be placed in a SysFolder since their location in the page tree is not relevant to their function Periodic indexing of records Database Records You can also use the Indexing Configuration to index single records 1 1 1122 1 22 1122 1 8 1 6 1 8 1 8 21 1 8 20 1 8 19 1 8 18
17. Reference doc_indexed_search CRIA IA Q LUNA W PIN Search lar level Indexing configurations Deplying esuks 1 ta J aut aii E Text 81 cod This is butet 2 This is bulet i an the secamd eve This is butet Z an the second eve This is hule J an the first eve again with a fink ta TYPOT arg Buvesists cam tags This is am example afan indemtatian ta first eve This is indented anather eve pet Tables Yau can add tables as wen Calma Size 12 D K Created 21 D2 D6 Madied 21 D2 D6 1D 42 Path Content Elementy Text E Base template header menu content and footer 77 Base template header menys coment and fater Mens kem 1 Mena kem Z Mens kem J fact Leve Z kem Leve Z kem Leve Z kem act Mens kem 2 Buy PaperShredder tm Giema with J0 daps mamep back guarantee Adam Size 3 3 K Created 21 D2 D6 Madied 21 D2 D6 11 55 Path Vsans B Tables 60 Docupatian Warker drame safety inspector Sector 7G Sarimgiiend Nucear Pawer Plant Hads plant recard far mast pears warked al an entry ievel position Marge Simpson Ja Thread that hakis Simpsan family tagether Jabs Strikebreakimg teacher at Singleid Elementary warker al mucar Size 13 6 K Created 21 D2 D6 Madi ed 21 D2 D6 10 42 Path Content Elementy Tates Files archive Seaich la level Daplaying tesuks 1 ta J aut af J Base template header menu content and footer 81 Tsmahe s Nabaiath et Cedar el Adbee el M
18. TY P03 LN Indexed Search Reference doc_indexed_search Technical details Technical details HTML content HTML content is weighted by the indexing engine in this order 1 lt title gt data 2 lt meta keywords gt 3 lt meta description gt 4 lt body gt In addition you can insert markers as HTML comments which define which part of the body text to include or exclude in the indexing The marker is lt TYPO3SEARCH_begin gt or lt TYPOSSEARCH_end gt Rules 1 If there is no marker at all everything is included 2 If the first found marker is an end marker the previous content until that point is included and the preceeding code until next begin marker is excluded 3 If the first found marker is a begin marker the previous content until that point is excluded and preceeding content until next end marker is included Use of hashes The hashes used are md5 hashes where the first 7 chars are converted into an integer which is used as the hash in the database This is done in order to save space in the database thus using only 4 bytes and not a varchar of 32 bytes It s estimated that a hash of 7 chars 32 is sufficient originally 8 but at some point PHP changed behavior with hexdec function so that where originally a 32 bit value was input half the values would be negative they were suddenly positive all of them That would require a similar change of the fields in the database To cut it sim
19. TY PO 3 Y Indexed Search Reference doc_indexed_search Indexed Search Reference Extension Key doc_indexed_search Language en Keywords indexed search reference forDevelopers forAdvanced Copyright 2000 2008 TYPO3 Core Development Team lt info typo3 org gt This document is published under the Open Content License available from http www opencontent org opl shtml The content of this document is related to TYPO3 a GNU GPL CMS Framework available from www typo3 org Indexed Search Reference TYPO3 Indexed Search Reference doc_indexed_search Table of Contents Indexed Search Reference 1 IATFOGUCIIO p A E E 3 User MANU AN cria kiasuna aqukaqkasusunkunqa 5 AGMINSTACION cri 7 Indexing configurations 10 Indexed Search Reference Configura CON rincon 21 Technical details U U U cnn ceca 23 Analysing the indexed data 25 Database Tables IIII I T u 30 KNOWN problems ILII III aasawa 33 TY PO 3 Y Indexed Search Reference doc_indexed_search Introduction Introduction What does it do The Indexed Search Engine provides two major elements to TYPO3 1 Indexing An indexing engine which indexes TYPO3 pages on the fly as they are rendered by TYPO3 s frontend Indexing a page means that all words from the page or specifically defined areas on the page are registered counted w
20. a giobal search for information Dut the results are st diania he focal webaie Notice the advanced division of search results The new Website has a mullevel lapered Size 18 4 K Created 28 05 02 Modified 19 11 02 16 40 Path Cases amp Reviews References 3 www imp muenchen de 100 References Hundreds of websites are implemented with Tyoo3 word wide rough Independent cons fanci s These featured projects shows the great vanek of projects you Can crea ors are producing content for thelr Individual reasearch grouns and thelr leches They produce downloads size 17 9 K Created 28 05 02 Modified 19 11 02 16 40 Path Cases amp Reviews References TY P03 LY Indexed Search Reference doc_indexed_search Introduction Features of the indexer The indexing engine has several features HTML data priority 1 lt title gt data 2 lt meta keywords gt 3 lt meta description gt 4 lt body gt e Indexing external files Text formats like html and txt and doc pdf by external programs catdoc pdftotext Wordcounting and frequency used to rate results e Exact partially or metaphone search Searching freely for sentences non indexed e NOT case sensitive in any ways though Features of the search frontend the plugin The search interface has several options for advanced searching Any of those can be disabled and or preset with default values Searching whole word part of word sound
21. absam Menu item 1 Menu kem 2 Mens kem J act Leve Z item Leve Z kem Leve 2 kem act Menu kem 2 Main Dish A Sam inc 12345 Tricky Raad suite 9998 Gastar Size J J K Created DI Di D4 Madiied DI Di D4 21 12 Path File archive E Base template header menu content and footer 79 Dase template header menu coment and faater Menu kem 1 Menu item Z Menu item J fact Leve Z tem Leve Z bem Leve 2 tem act Menu kem 2 This is the eR calme this s the eR calme this is the def Size 3 5 K Created D3 D1 D4 Hadi ed D3 D1 Dd4 21 23 Path File archive Base template header menu content and footer 72 Dase template header menu coment and faater Menu kem 1 Mena kem 2 Menu kem J fact Leve Z kem Leve Z kem Leve 2 kem act Menu kem 2 Header Tex image Limk back This is the header Adam Seth Emas Size 9 5 K Created DI Di D4 MadiTied DI Di D4 20 53 Path File archive TYPO3 org Seaich la Seve Deplying tesuks 1 ta d aut a 4 E Newsfeed single view 81 Revised Unda Histary This impartam safety mel far editars just gat a complete averias The To obtain this categorization you must set TypoScript configuration in the Setup field like this plugin tx indexedsearch search d taultrresindexuighist 0 6 7 8 plugin tx indexedsearch blind freeIndexUid 0 The defaultFreeIndexUidList is uid numbers of indexing configurations to show in the categorization The ord
22. d if this option is set in your template the value is the internal language key config language dk TypoScript Still missing the major parts here Just use the object browser for now since that includes all options Property Data type Description Default templateFile resource The template file see examples in typo3 sysext indexed_search pi show forbiddenRecords boolean Explicitely display search hits although the visitor has no access to it Notice This behavior was different in TYPO3 lt 4 0 show resultNumber boolean Display the numbers of search results Notice This behavior was different in TYPO3 lt 4 0 show advancedSearchLink boolean Display the link to the advanced search page 1 search rootPidList list of int A list of integer which should be root pages to search from Thus you The current root can search multiple branches of the page tree by setting this property page id to a list of page id numbers If this value is set to less than zero eg 1 searching will happen in ALL of the page tree with no regard to branches at all Notice that by root page we mean a website root defined by a TypoScript Template If you just want to search in branches of your site use the possibility of searching in levels search detect_sys_domain_ boolean If set then the search results are linked to the proper domains where records they are found search detect_sys_domain_ string Targe
23. d is removed its indexing entry will also be removed upon next indexing simply because the set_id is used to finally clear out old entries after a re index Indexing External websites External URL You can index external websites using Indexing Configurations They can actually crawl an external URL Configuration looks like this 13 TY PO 3 LY Indexed Search Reference doc_indexed_search 2 Enter sub URLs in which not to decend http typod org extensions Indexing configurations It pretty much explains itself how it works The Context Sensitive Help will provide enough information to complete configuration Location You should place the Indexing Configuration on a Not in menu page in the root of the site for instance The page must be searchable since the external URL results are bound to a page in the page tree namely the page where the configuration is found This is how the crawler log looks immediately after the crawling has begun x a El External Urls 701 702 703 704 705 706 707 708 709 710 711 712 P 28 02 06 17 11 01 Ed 28 02 06 17 11 08 El 28 02 06 17 11 11 El 28 02 06 17 11 14 28 02 06 17 11 17 El 28 02 06 17 11 20 El 28 02 06 17 11 23 28 02 06 17 11 01 OK l 28 02 06 17 11 26 Ed 28 02 06 17 11 29 El 26 02 06 17 11 32 Ed 28 02 06 17 11 35 l 28 02 06 17 11 38 http http typos org about http typo3 org community about ht
24. dexed For external media this is based on filepath page interval for PDF s only e cHash Calculated based on the actual content which was indexed e rl 012 This is the rootline ids for level 0 1 2 Used when searching in certain sections For instance a search operation may select all pages with rl1 123 which will result in a search within pages which exist ONLY in the branch of the website where the level1 page has uid 123 e pid t l This is the page id type number sys_language uid Size How many bytes the indexed page consumed e grlist This is the gr_list of the user which initiated the indexing operation e CHashParams Additional parameters which are identifying the page in addition to the id type number which usually does that 1 The page Content elements has one indexed version The page id of the root page is 1 and the page on level 1 in the rootline had the uid 2 Notice how all subpages to Content elements has the exact same rl0 and rl1 value Where the page Content elements does NOT have a value for rl2 so does all the subpages because they ARE the level 2 themselves Furthermore the page has the page id 2 a type value of 0 and is indexed with the default language 0 The size was 10 6 KB and the user who initiated the indexing operation was a member of the groups 0 2 1 which is effectively fe_group 1 because 0 and 2 is pseudogroups On the page Special conte
25. dexing configurations Setting up the crawler extension Before you can work with Indexing configurations you must make sure you have set up the crawler extension and have a cron job running that will process the crawler queue as we fill it For this please refer to the documentation of the crawler extension Generally about indexing configurations Indexing configuration sets up indexing jobs that are performed by a cron script independently of frontend requests The crawler extension is used as a service to perform the execution of queue entries that controls the indexing The Indexing configuration contains two parts 1 Definition of execution time and periodicality 2 Definition of indexing type and settings Below you see what all Indexing Configurations have in common Every day 24 hours Session ID if gt zero then indexing job is running 132457570 These settings are described in the context sensitive help so please refer to that for more information The Session ID requires a show introduction When an indexing job is started it will set this value to a unique number which is used as ID for that process and all indexed entries are tagged with it When the processing of an indexing configuration is done it will be reset to zero again Periodic indexing of the website Page tree You can have the whole page tree indexed overnight using this indexing configuration
26. e every content element insertion and Size 10 2 K Created 28 02 06 Modified 28 02 06 17 30 Path htto tyoo3 0rg news single view tx newsimporter pil 5Bshowltem 5D S5 amp cHash 489ada5ad 7 About 67 co Germany nor Denmark but internationally The official TYPO3 language apart from TyooSeript is english and all communication on developer level is required to be in english so we build a shared base of information for everyone to use Activities This website and the mailing lists are Size 8 4 K Created 28 02 06 Modified 28 02 06 17 29 Path http typ ity However you can configure to have a division of the search results into categories following the indexing configurations Title ull puash cHash amp id amp type amp L amp MP grlist Rootline page_id phash_t3 CfgUid F File archive Base template heade m l 120269093 65155126 fileadmin templates template_ce html 0 1 1 26 26 i 6 35399339 E Base template heade y l 266776277 220674679 fileadmin templates template_page htm O 1 1 26 26 i 6 35399339 g Base template heade f 51774532 125890468 fileadmin templates template_page_left_col html 0 1 1 26 26 1 6 35399339 E Base template heade y l 222861137 220405559 flleadmintemplates template_page_print htril 0 1 1 26 26 i 6 35399339 Base template heade y l 146063361 236374481 fileadmin templates template_page_xtra html 0 1 1 26 26 1 6 35399339 16 TY PO 3 Y Indexed Search
27. ed Search Reference doc_indexed_search Indexing configurations Technical Details 3levels E Showing the search results By default the search results are shown with no distinction between those from local TYPO3 pages records indexed the file path and external URLs Only division follows that of the page on which the result is found El Base template header menu content and footer 76 Base template header menu content and footer Menu item 1 Menu item 2 Menu item 3 fact Level 2 item Level 2 item Level 2 item fact Menu item 2 This is the left column this is the left column this is the left Size 3 5 K Created 03 01 04 Modified 03 01 04 21 23 Path File archive El Base template header menu content and footer 68 Base template header menu content and footer Menu item 1 Menu item 2 Menu item 3 fact Level 2 item Level 2 item Level 2 item fact Menu item 2 Header Text Image Link block This is the header Adam Seth Enos Size 9 5 K Created 03 01 04 Modified 03 01 04 20 58 Path File archive Newsfeed single view 67 co Revised Undo History This important safety net for editors just got a complete overhaul The undo history feature is now available at the first level of the clickmenu and can show you detailed information about each content element when used at the page level Also noticeable is the amount of details that is now available It is for example possible to se
28. eighted and finally inserted into a database table of words Then another table will be filled with relation records between the word table and the page This is the basic idea 2 Searching A plugin you can insert on your website which allows website users to search for information on your website By searching the plugin first looks in the word table if the word exist and if it does all pages which has a relation to that word will be considered for the search result display The search results are ordered based on factors like where on the page the word was found or the frequency of the word on the page This is an example of how the search interface on a website looks Search Search Tor search Advanced search Search for searcir Displaying results 1 to 10 o0utof10 in 4 sections search 1 page Cases amp Review s 4 pages ma e um e ee e e Resources 1 page About 4 pages TEA m m F 1 search g Search earch Search Size 7 4 K Created 04 10 02 Modified 13 11 02 10 16 Path search E 2 DBS corporate website 100 References Aundreds of websites are implemented with Turpo word wide rough Independent cons fenci s These festured projects shows the great variety of projects you Can crea ales color scheme OBS website uses the Indexed Search engine Duid into Pipo The engine bullet Into Teo3 The search engine makes a global engine makes
29. embara ke pm Sovak Bat measte paur meeds Pas ma sarge reed ba sete for a sapanta proprietary CHS web perp Vibe features Lrstmaq wa ca aler Sue 21 KE Created 28 02 06 Modded 28 02 06 17 41 Path REG fe 3 comH3hlohts 1625 htm TYPO3 org Seaich fa erel Display ing zuks 1 ta 4 aut af d E Newsfeed single view 81 Qepioed Udo Hito This importa maisip met for editarse jug qat a campota araras The Tig Riar mature jS mor ara as at bie fet eral df kia comans amd carr Saw pas oe baled ATA ASOT abot each coment edema wie sed al hre page ral Ap mababa ig te amauTt af detavs rat is maw araiabe Tr is far akampa pass ba pe aparp oamhanmt alemani arse rhat amd Sue 10 2 E Created 28 02 06 Hodied 28 02 06 17 36 Path HERG Sk 3 019 nera age er ta sense pokes pil Ya Esham te m5 x S Haxh a 5 3 d 7 About 80 About Germacty mar Denmark but Athesatatalvp Tre a a TACOS larmquage apart fram Tepeaca lis mg amd a amaA aT ar cha pedo kem PZ required ba be memos so we The categorization happens when the Category selector in the Advanced search form is set like this E H KEE E IEA af FELIZ IS IA Category All categorized Notice you can preset this value from TypoScript as well Searching a specific category from URL If you want search forms on the site to make look up directly in results belonging to one or more indexing configurations you can use a set or GET variab
30. er determines which are shown in top Changing it could bring results from TYPO3 org and TYPO3 com in top 17 v TY P03 Indexed Search Reference doc_indexed_search Indexing configurations T AT TUY UA LE ML w ee IA a a x All Sach wads 41 conve ited ba hrs caza TYPO3 com Daach fa erel PEuplaying msuks dto 3 aut agf 3 E Screenshots 81 e List footed e eo Far TO amd adibita brase records The Web gt 4ocess mad Aoc camina dr TODI is dove an malima dela Apart fram acosss defined Far madue tables amd table ds arar page Has an gemer gup amd sets for each af Hem Page etka te curret database al TODI amd access e managed aytomaticaNp bp TOOT sheet p hhi a os AQT ATs a bors boo cha AN KB Sue JD E Created 28 02 06 Hoadied 28 02 06 17 32 Path REG fe q a mi Doea nghas 1627 html Feature list 70 co This makes finding pages amd fives eap for combat editors Spel Checker Y Soetchecker ig huit atta te dick Test Egibar Camnfquraive LT Levels Y A Customicabey i SN rao backed imberfaces far adbhar sipar oar Tew ise henag Helo sous arm dicated beside maes Pachas Wda 2 Drape Carrhae Y Hidden hma ar accom rertricbed omani car be prev Redal amita before publish Farina Cage Edit ATP Sue BI K Created 28 02 06 Modded 23 D2 D6E 17 11 Path hip d comys Feature lat 1243 00 html E Highlights 69 ba mataga combi T a daga coor ht mA TAT ogee oT oF a cama base xx we car paride pas WBA anm
31. ferences Path une tipos com Cases Reviews References INDEXED SEARCH Tide Er Hash cHash H 012 pid t l Size grist cHashParams E References References m 178049520 47026175 12271 1229 124912749 0 039 F ae I Inter Photo A S Ta 16996239 1575705739 1221 1229 12491249 0 015 0 e 0 1 tx tSreferences pil showllid 2 2 Cryptonet m 24511528 44928230 1221 1229 12491249 0 014 5 m 0 1 atx tSreferences pil showllid 81 Malburgen District m 209092652 245050183 1221 1229 12491249 0 014 5 m 0 1 atx tS3references pil showllid 90 karriere rnagazin tu m 125004213 139651972 1221 1229 12491249 0 017 6 K 0 1 tx tSreferences_pil showblid 5 3 www filrmaholic de m 234402786 63813385 1221 1229 12491249 0 017 7 K 0 1 St t3references pilfshowWid 7 You can either click the red garbage bin 1 in order to clear all listed instances or alternatively pick out single instances by clicking the local garbage bin 2 Monitoring the global picture of indexed pages Tools Tae User Admin El Ext Manager fel DB check E Configuration fl Install DS FF Log Ge indexing phpMyAdmin By the Tools gt Indexing module you can get statistics about the indexing engine Currently they are sparse and very roughly presented This view needs some more work to be friendly and really useful General statistics General statistics x RECORDS index phash 217 index_words 7119 index rel 40609 index grist 252 index section 217 index full
32. h results are shown this must correspond with what the plugin takes of parameters A fancy option is the Index Records immediately when saved which will index records as they are saved through TCEmain In the crawler log you will see the entries for record indexing like this E E News test No entries oe E Mon cached 699 Ed 28 02 06 16 54 02 28 02 06 16 54 02 OK Records start Index Cfg UID 1 113881523 700 Ed 26 02 06 16 54 04 28 02 06 16 55 00 OK Records from UlDO 5 7 Index Cfg UID 1 113881523 After processing the Web gt Info Indexed search view will show this view INDEXED SEARCH Title Rul pHash cHash amp id amp type amp L MP grlist Rootline page_id phash_t3 CfgUid RecUid GET parameters 1 I Non cached E Ikke cached m 80094836 3071062 24 1 O 1 1 44 24 E Non cached m 12939487 8278070 24 O 1 1 44 24 E Test header m 66990122 49626952 24 O 1 1 24 24 1 113881523 1 Auser_noncachedtest_pl1 showLldId 1 E Another element m 251824175 137126908 24 O 1 1 44 24 1 113881523 2 fuser_noncachedtest_pli showWid 2 asdf asdf as DIRECT y 15134902 28124098 24 O 1 1 24 24 1 113881523 4 amp user_noncachedtest_pli showUid 4 Test overskrift m 247957948 208930598 24 1 O 1 1 44 24 1 113881523 3 amp user_noncachedtest_pli showWid 3 Notice how the GET parameters are nicely added and how the CfgUid column contains the UID of the indexing configuration the set_id of the processing In fact if a recor
33. he result list 32 TY PO 3 Y Indexed Search Reference doc_indexed_search Known problems Known problems Currently the extension is under observation because instances of heavy server load unstability has been reported It is not yet clear if THIS extension has anything to do with So it s only under suspicion at this point until further data has been collected But for now it is adviced to be careful with the application of the extension for mission critical high load environments It s still uncertain how performance is under heavy load conditions and when MANY pages are indexed Currently benchmarks has been done only up to 2000 pages indexed approx 400 000 relation records It is probably that some parts has to be optimized for such scenarios 33
34. ific analysis module In the Web gt Info module you can see an overview of how many instances are indexed per TYPO3 page Look at this image INDEXED SEARCH z levels Tite p Hash cHash H 012 pid t l Size griist cHashParams E www typos Corn ww Lupo 3 Com T 116705550 327532761 Laa a dea a a ao pA 1 www EN Pp OS com m 221209103 3275325651 A O A a jiwa oe E About About m 119568173 237 2 70S 7 e Lye AO alee a ak Ord re E What is a CMS 3 Whatis a CMS Wm 28188575 43647704 12271 1231 13511351 0 018 5 Kh Et i x aa E Highlights 3 Highlights 92177821 104231436 iia la 0 0 En a de ae x i E Feature list 5 Feature list m 76410846 202796647 1221 1231 12431233 0 040k ae A A E Screenshots Screenshots m 110003490 108707500 aaa ie see a Sd Oe pal oe E Price amp License Price amp License m 169121833 145933262 1221 1231 12441244 0 0 0 pala i x E People People m 12420061 151111013 1221 1231 13541354 0 014 4 K 0 1 People m 1881223938 154060150 ii a aa EN e a ae E History E History m 28101371 2285396469 122 S e 2 0 1 ale tea E Snowboard Snowboard m 188210381 46243302 Laad dead Teast ma 0 0 La e ie ee E Cases amp Reviews 3 Cases amp Reviews Wm 22451913 4000517 Pee bee SO da i 14 m I Sl 0 2 1 x ae E Case Studies Case Studies m 173387739 254553391 1221 1229 13491345 0 014 0 F a i D 2 1 2 ve E References References m 178049520 427026175 1221 12729 12391239 0 039k n Inter Photo A S m 116796239
35. itions would not seriously alter the page content For external media this is a serialization of 1 unique filename id 2 any subpage indication parallel to cHashParams gr_list is NOT taken into consideration here phash_grouping 7md5 int hash This is a non unique hash exactly like phash but WITHOUT the gr_list and in addition for external media without subpage indication Thus this field will indicate a unique page or file while this page may exist twice or more due to gr_list Use this field to GROUP BY the search so you get only one hit per page when selecting with gr_list in mind Currently a seach result does not either group or limit by this but rather the result display may group the result into logical units item_mtime Modification time For TYPO3 pages the SYS_LASTCHANGED value For external media The filemtime value Depending on config if mtime hasn t changed compared to this value the file page is not indexed again tstamp time stamp of the indexing operation You can configure min max ages which are checked with this timestamp A min age defines how long an indexed page must be indexed before it s reconsidered to index it again A max age defines an absolute point at which re indexing will occur unless the content has not changed according to an md5 hash cHashParams The cHashParams For TYPO3 pages These are used to re generate the actual url of the TYPO3 page in questi
36. k page access based on the id list But then you loose that feature of course Can t have both In any case The indexing of pages and searching the indexed information are two different processes and therefore you can easily use another frontend plugin for making searches in the same data for whatever reason you might have for discarding the default search plugin TY P03 Y Indexed Search Reference doc_indexed_search User manual User manual Adding the search plugin to a page That is really easy 1 Create a page called Search or something like that This is where the search box will appear 2 Then create a new content element on that page From the Web gt Page module you can do it like this DER Search Column TER Path www typo orgi Search Tarai E Pagecontent Edit page header NORMAL Create page content Show hidden content elements 3 Then select some plugin type if you can It doesn t matter if it s a guestbook or forum Or if no plugins are available just select a Regular text element as in the top of the page Plugins O l Message board Adds a message board list style forum to the page O Ta Discussion forum Add a threaded discussion forum C tree style forum to the page Oo Guestbook Adds a guestbook to the page O A Todo items 4 Then make sure Insert plugin is selected if not select it and save the element then you ll see the form below enter a title and select
37. les like these here using UID values 7 and 8 since they look up in TYPO3 org and TYPO3 com results index php id 78 amp tx indexedsearch sword level tx indexedsearch freeIndexUid 7 8 Grouping more indexing configurations in one search category You might find that you want to group the results from multiple indexing configurations in the same category For instance I have an indexing configuration for both TYPO3 org and TYPO3 com but I want all search results to appear under the Category External URLs This can be done by creating a special type of indexing configuration which only points to other indexing configurations 18 v TY PO 3 Indexed Search Reference doc_indexed_search Indexing configurations Path Testsite External Urls l T uw ra j oe or E E Bari PRA jS Tce to a o AAA 3 Indexing Configuration Page This indexing configuration is not used during indexing but during searching So a reconfiguration of the TypoScript to use uid 9 instead of 7 8 will yield this result External URLs Search for erel Displaying results 1 to out of Newsfeed single view 81 a Revised UndaHistory This important safety net for editors just gat a complete overhaul The undohistory feature is now avaiable at the frst level of the cickmenu and can show you detaded information about each content element when used af the page level Also noticeable is fhe amount ar delais that is n
38. n Say you are doing a search in the section from Content elements and outwards in the page tree The word document is matched in the search but it will appear only once in the search result Now if one of the two pages where the Word document was either hidden or access restricted the word document would still be matched because one of the pages is accessible for the user But if BOTH pages with the link to the word document is not accessible for the user doing the search then the word document will not be included in the search result Here we can see that the pages Special content Advanced and Menu Sitemap is indexed twice each The reason is that those three pages has had different content depending on whether or not a user was logged in In the case of the page Special content the reason is that the page contained a content element which was visible for users which was a member of group number 1 Therefore the page was different in the two cases The page Advanced has a user login form and that form looks different whether a user is logged in or not Finally the page Menu Sitemap apparently changed There reason was that this page includes a sitemap and that sitemap displayed some extra pages when the logged in users hit the page and so the content was not the same as without login Another thing which is interesting is that two different users must have visited those pages We can see that because the
39. n it s stored only once If the page content differs whether a user is logged in or not it may even do so based on the fe_groups then it s indexed as many times as the content differs The phash is of course different but the phash_grouping value is the same The table index_grlist will always hold one record per phash row of item_type 0 that is TYPO3 pages But it may also hold many more records These point to the phash row in question in the case of other gr_list combinations which actually had the SAME content and thus refers to the same phash row External media External media pdf doc html txt is tricky External media is always detected as links to local files in the content of a TYPO3 23 T AY TY P03 LN Indexed Search Reference doc_indexed_search Technical details page which is being indexed But external media can the linked to from more than one page So the index_section table may hold many entries for a single external phash record one for each position it s found Also it s important to notice that external media is only indexed or updated if a parent TYPO3 page is re indexed Only then will the links to the external files be found In a searching operation external media will be listed only once grouping by phash but say two TYPO3 pages are linking to the document then only one of them will be shown as the path where the link can be found However if both TYPO3 pages are not available then the document will
40. nd Oni Fat percent B Sourcream and Oni B Fat percent tree 3 Fat percent tree This is gross 3 Sourcream and Oni Indexed search ip Hash cHash H m 174793933 102463242 10 m 205020053 229544850 1d m 14567630 40764155 10 240390245 84186444 10 q 78323332 124733575 10 63998485 42508934 10 45757112 59780722 10 118902887 84186444 10 D 12 11 11 11 ala 11 11 11 al pid t l 2424 2443 2443 2444 2444 2444 2444 2444 aL ppp pop pop D aadd aAa A A 0 Size grlist aLe 2 e Moal Gea e aL m sl B SL foo e yew Toa E Heal Toa e Ma Foil e HB 3l Take ger cHash Params att board_uid 1 att board_uid att board uid 4 att board uid 5 att board_uid 27 w TY P03 Indexed Search Reference doc_indexed_search Analysing the indexed data As you see the main board page showing the list of messages threads Sourcream and Oni is indexed without any values for the parameter tt_board_uid the cHashParams field is blank Then it has also been indexed one time for each display of a message In a search result any of these five rows may appear as an independent result row after all they are to be regarded as a single page with unique content despite sharing the same page id Another interesting thing is that while the main page has inherited the page title for the search result Sourcream and each of the indexed pages with a me
41. nt there must have been a link to a local PDF and Word file since those two are indexed in relation to this page The PDF file is located in the path uploads media tsref_onepage pdf relative to the website Notice that the PDF file is actually indexed three times one time per page This is of course configurable Each indexed section of the PDF file has the potential to show up as a search result row of course because the phash is different per indexed part The whole point with this is that a large PDF file might contain so much information that it might match all too many search queries So breaking a PDF file down into smaller parts makes it possible for us to indicate exactly WHERE in the PDF file the search word was found Looking at the word file and the PDF file as well we see that they are found on BOTH the page Special content and on the page ISEARCH example But looking at the phash values for the word file it is 268192666 it is the SAME value in both cases So this means that the Word and PDF file is indexed only once when it is first discovered Later when 25 w TY P03 Indexed Search Reference doc_indexed_search Analysing the indexed data another page is indexed and a link to the same document appears then the document is not indexed as another document but rather an entry in the index_section table is made indicating that this result row is also found available linked to from another page sectio
42. of type Page tree 10 Yy TY PO 3 Indexed Search Reference doc_indexed_search Indexing configurations 3 Levels This defines that the page tree is to be crawled to a depth of 3 levels from the root point Testsite For each page a combination of parameters is calculated based on the crawler configurations for the Re index processing instruction See crawler extension for more information and those URLs are committed to the crawler log plus entries for all subpages to the processed page so that each of those pages are indexed as well This is what the crawler log may look like after processing Page Title gid Scheduled Run time Status Url E Testsite 635 Ed 28 02 06 16 24 06 28 02 06 16 24 09 OK http 636 pd 28 02 06 16 24 06 28 02 06 16 24 25 OK A E Products 665 Ed 28 02 06 16 25 05 28 02 06 16 28 16 OK 670 pi 28 02 06 16 25 05 28 02 06 16 28 49 OK 671 pi 28 02 06 16 25 05 28 02 06 16 28 32 OK cal E Content Elements 655 l 28 02 06 16 25 04 28 02 06 16 27 26 OK 656 pi 28 02 06 16 25 04 28 02 06 16 27 42 OK BaF Ed 28 02 06 16 25 04 86 02 06 16 27 59 OK i a E Visions 653 28 02 06 16 25 02 26 02 06 16 27 07 OK 654 28 02 06 16 25 02 28 02 06 16 26 50 OK i a E About Us 652 l 28 02 06 16 25 01 28 02 06 16 26 34 OK i A E Contact 648 l 28 02 06 16 25 00 28 02 06 16 26 00 OK 649 l 28 02 06 16 25 00 28 02 06 16 26 17 OK s E Indexed Search 682 j 26 02 06 16 25 10 28 02 06 16 31 57 OK E News
43. on For files this is an empty array Not used item_type An integer indicating the content type 0 is TYPO3 pages 1 external files like pdf 2 doc 3 html 1 txt 4 and so on See the class indexer php file item_ title Title For TYPO3 pages the page title For files the basename of the file no path item_description Short description of the item Top information on the page Used in search result data_page_id For TYPO3 pages The id data_page_type For TYPO3 pages The type data_filename For external files The filepath relative or URL not used yet 30 TY P03 LY Indexed Search Reference doc_indexed_search Database Tables contentHash md5 hash of the content indexed Before reindexing this is compared with the content to be indexed and if it matches there is obviously no need for reindexing crdate The creation date of the INDEXING not the page file see item_crdate parsetime The parsetime of the indexing operation sys_language_uid Will contain the value of GLOBALS TSFE gt sys_language_uid which tells us the language of the page indexed item_crdate The creation date For files only the modification date can be read from the files so here it will be the filemtime gr_list Contains the gr_list of the user initiating the indexing of the document index section Points out the section where an entry in index_phash belongs
44. other hand extend ToSubpages will NOT be taken into account false specConfs pid specConfs is an array of objects with properties that can customize certain behaviours of the display of a result row depending on it s position in the rootline For instance you can define that all results which links to pages in a branch from page id 123 should have another page icon displayed Of you can add a suffix to the class names so you can style that section differently Examples If a page Contact is found in a search for address and that Contact page is in the rootline Frontpage ID 23 gt About us ID 45 gt Contact ID 77 then you should set the pid value to either 77 or 45 If 45 then all subpages including the About us page will have similar configuration If the pid value is set to 0 zero it will apply to all pages Please see the options below specConfs pid pageIcon gt IMAGE cObject Alternative page icon specConfs pid CSSsuffix string A string that will be appended to the class names of all the class attributes used within the result row presentation The prefix will be like this Example If CSSsuffix doc then eg the class name tx indexedsearch title will be tx indexedsearch title doc whatis_stdWrap gt stdWrap Parse input through the stdWrap function tsref plugin tx_indexedsearch 22 y
45. ow avaiable It is for example possible fo see every content element insertion and Size 10 2 K Created 28 02 06 Modified 28 02 06 17 30 htt o3 0ng news single views tx_newsimporter_pil SBshowltem 50 ShcHash 489at a5ad E About 80 Germany nor Denmark but internationally The office TYPOS nguage apart from TypoScript 5 english and af cammuniation an developer leve is required to be in english so we buid a shared base of information for everyone fo use Activates This website and the mailing fists are Size 8 4 K Created 28 02 06 Modified 28 02 06 17 29 Path http typo3 org community about E Newsfeed single view 75 a TAS further improves navigation for peaple with disabilities Currently the sitemap is rendered with classes Chat represent the levels however Chis is not logical Chere is nothing in the construction thal provides disabled users with any cire about each levels re honshio with ane another The default TS has been changed in CSS styled content fo autput a realy neat unordered nested fet An Size 12 6 K Created 28 02 06 Modified 28 02 06 17 30 o3 0rg news single views tx_newsimporter_pil SBshowltemSo 50 6bh amp cHash dcl00a1le69 E Screenshots 75 a Che List Module is useful for wemwing and editing these records The Web gt Access module Access cantral in TYPOS is done an multiple levels Apart fram access defined for modules tables and fablefiead s eve
46. page Special content was apparently indexed with the usergroup combination 1 2 Later another user hit the page but only a member of group 1 However the page content was the SAME And because those two users saw the very same page it was not indexed a third time but it was instead noted down that a user with membership of only group 1 did also see this same page That comparison was based on the cHash contentHash which is a hash value based on the actual content being indexed So when the user with group 1 only came to the page the indexer engine realize that the page as it looked has already been indexed because another phash row with that content hash was already available These pages does not contain any tricks it appears According to the grlist s both users with membership of group 1 2 and group 1 only as well as surfers who did not at all login 0 1 is the pseudo group for no login as visited the page And because only one indexed version exist the page must have had the same content to present all users regardless of their login status The reason why the page Your own scripts does not contain a grlist value O 2 1 2 as the others do is simply because no user with that combination of usergroups has ever visited the page txt and html documents can also be indexed as external media In the case of HTML documents the documents lt title gt is detected and used 26 TYPO3
47. pages which has access restriction or a whole section in an intranet such pages would obviously not have been indexed by no login users However in this case nothing indicates that the page should be hidden for non login users and so we must conclude that the page has simply not yet been visited by a no login user otherwise it would look like the page Addresses having also the 0 1 list detected e The Guestbook page was indexed by a user without login only 28 TYPO3 Indexed Search Reference doc_indexed_search Analysing the indexed data Indexed search E Another site in the same database Indexed search Path Intro fAnother site INDEXED SEARCH 2 levels v Tide inp Hash cHash H 012 pid t l Size geist cHashParams E Another site in t Another site in t m 364753 15713195 10 0 0 101 0 78K 0 1 ee ISearch lE Lists Lists m 7151393 187305258 10 11 0 11 1 0 I E Addresses Addresses p 2893449 ic E E Guestbook Guestbook 15182000 62499995 To rio pels al tu Bo ard Board 1154793933 102465242 9 10 11 2424 1 0 Rating m 112802233 2357 70893 AO aay die ee Sia ial a al Foll m 218919759 1045479535 10 11 147147 1 0 al Calendar m 1517655025 140416950 10 dido esate PL mus e aal ee E Cool example Cool example m 13441242 56153355 Po eno 18 1 0 S aF SL E Other languages H Other languages mm 201874082 60110211 10 173
48. ple the length was reduced to 7 all being positive then How pages are indexed First of all a page must be cachable For pages where the cache is disabled no indexing will occur The phash is a unique identification of a page with regard to the indexer So an entry in the index_phash table equals 1 resultrow in the search results called a phash row A phash is a combination of the page id type sys_language id gr_list MP and the cHash parameters of the page function setT3Hashes If the phash is made for EXTERNAL media item_type gt 0 then it s a combination of the absolute filename hashes with any subpage indication for instance if a PDF document is splitted into subsections So for external media there is one phash row for each file except PDF files where there may be more But for TYPO3 pages there can be more phash rows matching one single page Obviously the type parameter would normally always be only one namely the type number of the content page And the cHash may be of importance for the result as well with regard to plugins using that For instance a message board may make pages cachable by using the cHash params If so each cached page will also be indexed Thus many phash rows for a single page id But the most tricky reason for having multiple phash rows for a single TYPO3 page id is if the gr_list is set This works like this If a page has exactly the same content both with and without logins the
49. rawler extension to index ex When external flles are found on a page they are added to cronscript running the crawler This eliminates problems wi peer configuration of the crawler extension al Default 20 TY PO3 Y Indexed Search Reference doc_indexed_search Configuration Configuration General The most basic requirement for the search engine to work is that pages are getting indexed That will not happen by just installing the plugin You will have to set up in TypoScript that a certain page should be indexed That is needed for several good reasons First of all not all sites in a TYPO3 database might need indexing So therefore we disable it on a per site basis Secondly a single site may have frames and in that case we need only index the page object which actually shows the page content Lets say that you have a PAGE object called page that is pretty typical then you will have to set this config option page config index enable 1 When this option is set you should begin to see your pages being indexed when they are shown next time Remember that only cached pages are indexed This is documented in TSref in the CONFIG section Please look there for further options For instance indexing of external media can also be enabled there Languages The plugin supports all system languages in TYPO3 Translation is done using the typo3 org tools If you want to use eg danish language that will automatically be use
50. ry page has an owner group and settings for each of them Page wath the current database of TYPOS and access is managed automaticaly by TYPO phoMyAdmin allows administrators fo do really low Size 30 K Created 28 02 06 Modified 28 02 06 17 32 Path http typo3 com Screenshots 1627 0 html TypoScript plugin tx indexedsearch search defaultrFreelndexVUidList 9 6 0 19 i Y TY PO 3 Indexed Search Reference doc_indexed_search Indexing configurations Disable frontend initiated indexing If you choose to index your site using Indexing Configurations you can disable indexing through the user requests in the frontend This is easily done via the configuration of the Indexed Search extension in the Extension Manager Disable Indexing in Frontend By default pages are indexed during viewing of pages in the a is only initiated through the backend page crawler Default 0 Indexing files on pages separately If enabled links to local files found on pages will initiate indexing of those external files However this often has the unpleasant effect that too many files are indexed during the same page request Using the crawler extension you can configure the indexer to add a queue entry instead of immediate indexing of external files Thus the indexing will happen outside the frontend user request using the cronscript This behaviour is configured in the extension managers configuration for Indexed search Use c
51. s like sentence Logical AND and OR search including syntactical recognition of AND OR and NOT as logical keywords Furthermore sentences encapsulated in quotes will be recognized Searching can be targeted at specific media for instance searching only indexed PDF files HTML files Word files TYPO3 pages or everything e The engine is language sensitive based on the multiple language feature of TYPO3 s CMS frontend Searching can be performed in specific sections of the website e Results can be sorted descending or ascending and ordered by word frequency weight location relative to page top page modification date page title etc e The display of search results can be intelligently divided into sections based on the internal page hierarchy Thus results are primarily grouped by relation then by hit relevance This shows the full range of default options for advanced search Search for german Match All words AND v Search m All media Alllanguages From section hole site ka Order by Weight Frequency Highest first wv at a time tyle section hierarchy v Extended resume Warning The search frontend plugin is optimized for features not speed Especially it will be slow on a website with many pages in the page tree because it traverses the whole tree each time to build a list of accessible pages However you can circumvent this by modifications to the search plugin so it does not chec
52. ssage has got another title namely the subject line of the message shown Thus a search matching three of these five pages will not shown three similar page titles but a unique page title relative to the actual content on the page It is the tt_board plugin that sets the page title itself by an API call The only glitch here is that the tt_board plugin has falsely allowed the main page to be cached twice See the first and last phash row The last row has got the parameter amp tt_board_uid sent and the tt board plugin should not have allowed that Because looking at the content hash of the first and last we realize that it s the SAME hash 84186444 and therefore the SAME content However being two separate result rows they will both be displayed in the search result as separate hits The responsibility for this lies with the plugin However such occurrences can be automatically filtered out during the search result display But it s better to avoid this kind of stuff The last example below has three main issues to discuss 1 The page Other languages is apparently available in three languages Which ones are not possible to determine unless we know the value from the sys_languages table In this case the default language zero 0 is english and the language with id 1 and id 2 is danish and german versions of the page When a search is conducted each page may turn up as a result page but with a little flag telling if the page was found
53. t for external URLs records target search mediaList string Restrict the file type list when searching for files search defaultFreeIndexUid string List of Indexing Configuration Uids to show as categories in search List form The order determines the order displayed in the search result 21 TY P03 LY Indexed Search Reference doc_indexed_search Configuration Property Data type Description Default search exactCount boolean Force permission check for every record while displaying search results Otherwise records are only checked up to the current result page and this might cause that the result counter does not print the exact number of search hits By enabling this setting the loop is not stopped which causes an exact result count at the cost of an obvious slowdown caused by this overhead See property show forbiddenRecords for more information search skipExtendToSubpa gesChecking boolean If set to false default on each search the complete page tree will be transversed to check which pages are accessible so that the extend ToSubpages can be considered This will work with a limited number of page ids which means most sites but will result in slow performance on huge page trees If set to true then the final result rows are joined with the pages table to select pages that are currently accessible This will speed up searching in very huge page trees but on the
54. text 217 index phash TYPES Typo3 page 0 204 217 This shows that 217 pages are indexed comprising 7000 words and using 40 000 records in the relation table to glue things together List TYPO3 Pages This view shows a list of indexed pages with all the technical details TYPO3 List Typos Pages x TYPO3 PAGES id type 7230 740 1018 101 1020 1021 1021 1022 1023 1024 1025 Tite Case stories Case stories Mitsubishi Danmark News ob Greensquare Antiques FreakZone Internet Cate Kaspers minimalistic homepage Kaspers minimalistic homepage Kasper s Wedding private Duision Digital Wideo Fladsaa County Denmark Inter Photo Photo Dealer E D DR m gt Words 125 130 137 116 132 50 254 113 z46 100 110 mime 23 07 02 03 09 02 23 07 02 03 09 02 23 07 02 23 04 02 03 09 02 23 07 02 23 07 02 23 07 02 23 07 02 Indexed Search Reference doc_indexed_search Indexing Engine Statistics Indexed 22 08 02 04 10 02 22 08 02 13 10 02 22 08 02 22 08 02 04 10 02 22 06 02 77 06 02 22 08 02 22 08 02 Updated 29 10 02 14 29 Parsetme 15 166 156 783 180 434 726 161 369 150 145 Adminstration sec gr full 17171 17171 17171 17171 17171 17171 17171 17171 111 111 111 sub LAL LIAL LAL LAL LAL LIAL LAL LAL LAL LAL LAL v TY PO 3 Indexed Search Reference doc_indexed_search Indexing configurations In
55. tp ftypo3 org teamsf http http typo3 org documentation http typo3 org download The initial entry is http typo3 org which is already processed When this process was executed it added entries for all found subpages to the queue as well When their execution time comes the crawler will request those URLs as well and if subpages are found on them entries for those subpages are added until the configured depth is reached After a few minutes you see more entries processed like this x an E External Urls 701 702 703 704 705 706 707 708 709 710 711 712 Ed 28 02 06 17 11 01 Ed 28 02 06 17 11 08 28 02 06 17 11 11 28 02 06 17 11 01 OK 28 02 06 17 12 01 OK 26 02 06 17 12 06 OK El 26 02 06 17 11 14 26 02 06 17 12 09 OK P 26 02 06 17 11 17 26 02 06 17 12 13 OK P 26 02 06 17 11 20 28 02 06 17 12 16 OK l 28 02 06 17 11 23 t 28 02 06 17 11 26 Ed 28 02 06 17 11 29 Ed 28 02 06 17 11 32 Ed 28 02 06 17 11 35 El 28 02 06 17 11 38 In Web gt Info Indexed search the indexed entries looks like this 3 0r 3 0rg communi our account login 3 0rg a 3 0rg community abor 3 0rg team 3 0rg developme 3 org documentatio 3 0rg downl 3 0 II dE zi p ftypo3 org teams typo3orgf 14 TY PO 3 LY Indexed Search Reference doc_indexed_search Title E External urls Login Logout E About E About E Teams g Development g Document Library
56. we can be sure that the page linking to it can be selected But we cannot be sure that the link was in a section accessible for the user Similarly we should make a lookup in the index_grlist table selecting the phash gr_list by the phash_t3 value of the section record for the search result If this is not available we should not display a link to the document and not show resume but rather link to the page from which the user can see the real link to the document Note These tricky scenarios exist only if the content on a page differs based on login It does not affect situations with access restriction to the page as a whole A general lesson from this is to reduce the number of hidden content elements Instead use hidden pages Better more reliable 24 v TY P03 Indexed Search Reference doc_indexed_search Analysing the indexed data Analysing the indexed data The indexer is constructed to work with TYPO3 s page structure Opposite to a crawler which simply indexes all the pages it can find the TYPO3 indexer MUST take the following into account Only cached pages can be indexed Pages with dynamic content such as search pages etc should supply their own search engine for lookup in specific tables Another option is to selectively allow certain of those dynamic pages to be cached anyways see the cHashParams concept used by some plugins Pages in more than one language must be indexed separately as different pages
Download Pdf Manuals
Related Search
Related Contents
コアグピア LA - 積水メディカル株式会社 User-s-Manual-30031A PDFダウンロード Guide d'utilisation Fr - 3 User's guide Gb - 11 Manual de Snapper 4568 User's Manual MR マンションリニューアル×LED ADTRAN Power Supply/Battery Charger User's Manual TDM カルバマゼピン - 積水メディカル株式会社 Elbonia: User`s Guide Copyright © All rights reserved.
Failed to retrieve file