Home

PDF Library SDK User Manual

image

Contents

1. The basic functionality of the PDF library is to read in data from PDF files present them in structured objects and create new PDF files where such objects can be written to The PDF library models the contents of a PDF file by C classes You may want to read Adobe s PDF specification to gain the necessary background The PDF Library SDK supports PDF versions 1 1 which relates to Adobe Acrobat 2 1 up to 1 6 that comes with Adobe Acrobat 7 0 Overview The core classes of the PDF library comprise PDFile that encapsulates a PDF file and PDObj which models an object in the PDF file The content of PDF objects is reflected by a hierarchically composed value PDValue A value can be a dictionary PDDictionary an object reference or a another type like string or number Dictionaries are collections of keys and associated values Some objects have a data stream that belongs to them This data is also attached to an object of class PDObj The library contains auxiliary classes to implement input from PDF files PDParse PDScan they should be of no interest to a user of the library The basic functionality provided with PDFile and PDObj file pdfile h is extended by derived classes PDFInput is derived from PDFile with enhancements for basically two issues copying pages to a PDF output file and cashing objects in memory Reading and writing of PDF files from to memory is also supported PDFOutput is also derived from PDFile but d
2. TOOLS COM Premium PDF Technology PDF Library SDK Version 4 5 User Manual Contact pdfsupport pdf tools com Owner PDF Tools AG Kasernenstrasse 1 8184 Bachenb lach Switzerland www pdf tools com Copyright 2001 2015 PDF Library SDK Version 4 5 Page 2 of 24 July 7 2015 Table of Contents 1 INtrOdUC HON ici nc 4 2 OVerVIGW 2 0 32 5a RRRRSERRRRERESEDRRRRESERRRRRENARRARRESURERRAESESRRERESRRRRRREESERRRESEARNMRERNE 4 3 Core ClasSGS cisco 5 CMM eA B i K E na TEM 5 Reading froma PDE File vainilla 5 Writing to a PDP flle ice ch three dde eere rre eux pea raa 6 Memory based Input Output essssssssesseeme men nennen nennen nenn 7 Standard Security SuppOrt uuessusesanennanennnnennnnennnnennnn en menm eene 7 Methods and Attributes 22susssnnnnnnnnnn nun nn nun nn nun enne nnn na n n 7 3 2 PDODJ AA ette ele rego cu poe Vets necu wate cuero ak 8 3 3 PDValle nn sun Riemer a a a a a 8 3 4 PDDictionary A nun nun nun nun nun nun nun 9 3 5 PDFINBUE Ha ee ee EEE Re 9 SE PDFOULP UT HI 10 337 PDPag6 een DOTEN RE here ueen 11 3 8 PDFONE aan era Aa Bra nern ernennen 11 329 PDGODVODJ ra A E e e eae 12 3 10 PDANnotlterator iii aive D re p AIF EE d n eor A POTE LI or Y 12 3 11 PDAction and Subclasses snmma nan n na 12 3 12 PDAnnot PDAnnotData and Subclasses ccccccccconconcnnconcnnnannan nun n nennen nennen 12 33 PDOUUMN sa Ad 13 3 14 PD
3. 3 11 The class PDAnnotlterator helps to retrieve annotations from pages in a convenient representation a polymorphic object rather than a general PDValue tree Currently the recognition of Text and Link annotations of subtypes GoToR and Launch is supported Each call to GetNextAnnotData retrieves an annotation and stores it in a dynamically created object according to the type of the annotation Make sure to delete this object when it is no longer used PDAction and Subclasses 3 12 The PDF library supports a number of standard action classes such as GoToR navigate to another page of a PDF file Launch activate another application program and URT web links for internet browser navigation PDAction is an abstract base class so you will never create objects of that class but rather deal with one of the subclasses PDLaunchAction PDGoToRAction or PDURIAction Objects of this type are found in conjunction with Annotations or book marks outlines You can retrieve action information from a link annotation object or an outline object using the GetAction method of class PDFInput Note that you are responsible to free PDAction objects created this way to avoid memory leaks PDAnnot PDAnnotData and Subclasses There are two major types of annotations in PDF Text and Link Link annotations consist of a variety of subtypes like GoToR Launch or URI The PDF library supports t
4. PP 19 6 8 NN 19 6 9 Pai a i 20 6 10 pdspliti i cec A A is 20 7 AAA 55252725 52222523222232533 23 NEERRRRRRRSESRRRRRESRRENEESSERERASSR an 21 DT TOMOS to ODSEV Ennis nnana a aaa AAA ook wale 21 SECUNIEY tog iex ve RR a Re HER we Eh ER HE EHE Regener hehe 21 COP VIM D RE 21 Memory Usage en nee re base x aes 21 Multithireadirng s iuo t 2 en En diaz 21 Error Handling rd t tete LE en 21 Compiling on MS Windows ssssssemIIHn mener 22 Using Different Compiler Settings coooccococcocnnconnncnnnnnonnncnnnnnnnnnrnnnnnnnnnes 22 7 2 FHrouble shooting os 23 Compilation with MSVC When Using MFC seessseemm 23 Text Operator Dependencies ooccccoccncnccnnnncnnnncnnnncnnnnrnnnncnnnnrnnnrrnrnernannes 23 8 Index emm 24 9 Licensing zuunu nannanannanannanannnnannanannanannanannnnannnnannanannanan nun an nanannanannanannunn 24 PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 4 of 24 July 7 2015 Introduction The PDF library originates from a development in early 1995 The library was designed to satisfy the requirements of the former Xerox DPP product later called XDA Xerox Document Assembly Since then more and more functionality has been added to the library It constitutes the core of several own products and has been embedded into various third party products
5. encoding and decoding You can construct a stream using the PutBytes method and write it to a PDF file using the Write or WriteStreamObj method You may want to have a look at the txt2pdf sample program for this Please note that GetLength returns the length of the uncompressed stream The only way to get the length of the compressed stream is on writing it to a file because only then the actual compression is done The method ReplaceFontName is useful to patch font references in a text stream PDPgStream 3 17 Class PDPgStream is an extension of PDStream with support for the construction of page contents streams The declaration of this class is located in pdstream h When starting a new stream that should contain text use the TextDefaults method to reset text related characteristics like gray level character and word spacing When mixing text and graphics you need to switch modes in a PDF stream For this purpose there are two methods NeedNextMode and NeedDrawMode The text related methods automatically call NeedTextMode while graphics related methods call NeedDrawMode For an in depth description of the stream operators refer to the Adobe PDF specification PDFontDict This class makes font information accessible to text scanning in contents streams The implementation knows about the following standard fonts Helvetica Helvetica Bold Times Roman Times Italic Times ZapfDingbats Symbol Arial
6. FALSE 0 if the index runs out of range PDFInput The main purpose of the class PDFInput is to selectively copy pages from the input file to an output file It allows the modification of the pages on the fly This is supported with an object cache that is also incorporated into PDFInput Objects can be acquired selectively for alteration before the standard copy routine handles the page During copy the objects that are kept in the cache are used rather than the original ones that would be read into memory from the input file The declarations for PDFInput are located in the header file pdpage h The CopyTo method works in conjunction with ReadPages OnReadPage and OnReadPages The latter methods contain the code that actually deals with copying This means that you cannot use PDFInput to simply traverse the pages tree of a file and NOT copy pages to another file You can derive a class from PDFInput where you override ReadPages OnReadPage and OnReadPages The sample program pdcat uses PDFInput to copy pages while doing some modifications to them How does PDFInput work PDFInput incorporates a cache of objects that have been read using its GetObj method O PDF Tools AG Premium PDF Technology 3 6 PDF Library SDK Version 4 5 Page 10 of 24 July 7 2015 GetObj first looks at the cache implemented by m_objOnHold if the object is there a pointer to it is returned Otherwise the object is read from the file and
7. a convenient way to remember objects you want to write to the PDF file for which you do not have everything ready This is the case for link annotations to pages whose id is not known yet if you want to use the id for the destination which is the more efficient and also more safe than using the page PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 11 of 24 July 7 2015 number 3 7 PDPage The class PDPage is derived from PDObj and incorporates functionality related to Page or Pages objects The following features are related to these objects e adding a content object to add text or graphics to a page e removing an entry from the page s dictionary e g to strip off the annotations e add an annotation to the page e add a font to the page s resources which is required if that font is used in a content of the page e add an XObject to the page s resources e find the object in the pages tree that contains the MediaBox definition that applies to a page e get the rectangle of the media box that applies to a page e set the media box rectangle of the page add it if it is defined elsewhere or change it e remember the parent object e remove a page or sub tree of pages from a Pages object To obtain objects of class PDPage rather than PDObj you must use the PDFInput PDFOutput constructor unless you do a CopyTo The m template member of PDFile cannot be set directly to a PDPage object
8. derive your own class to do this 3 8 PDFont To create a page content with text you need to refer to a font declaration The class PDFont which is an extension of PDObj provides this support for the built in fonts like Helvetica Times or Courier A typical scenario for using PDFont is PDFont font font Create FX1 Helvetica font Write output file In this sample the object id for the font object is created during the Write method An alternate way is to create an object id first and then pass it as third parameter to Create The SetEncoding methods permit to set one of the standard built in encodings or to set a user defined encoding by referring to another PDF object Type Encoding Differences gt gt s txt2pdf sample The PDFont object can be deleted after Write Reuse of the PDFont object to create and O PDF Tools AG Premium PDF Technology 3 9 PDF Library SDK Version 4 5 Page 12 of 24 July 7 2015 write several fonts is discouraged PDCopyObj 3 10 The class PDCopyObj is a helper class that extends the base class PDAttrScan to support the copying of an object tree from an input file to an output file It is used for example in the context of the CopyTo method of PDFInput to copy everything belonging to a page In the sample pdcat there is an example where PDAttrScan is derived not only to do the copy job but also patch certain items on the fly PDAnnotlterator
9. see pdxt but also add link annotations and bookmarks according to directives from a separate input file 6 5 pdtoc The pdtoc utility creates a PDF file that contains a page with a list of links to files specified on the command line There are may options to control the behaviour like bookmark copying placing the creation date of the file onto the page setting the page width setting a title string on top of the page and giving a document title to the new file pdcat and pdtoc can be used to build a contents document for a whole hierarchy of documents 6 6 pdxt The pdxt program demonstrates how a background logo can be added to some pages of a PDF document The logo is converted into an XObject and a content that refers to the XObject is added on the desired pages Its functionality is now also integrated in pdcat 6 7 txt2pdf The txt2pdf program demonstrates the creation of a PDF file based on ASCII text input It uses PDPgStream to compose the contents stream 6 8 pdw This program demonstrates how text tokens can be retrieved from a contents stream along with some metrics information like position size and orientation PDF Tools AG Premium PDF Technology 6 9 PDF Library SDK Version 4 5 Page 20 of 24 July 7 2015 pdwebl 6 10 The pdwebl program demonstrates how textual content analysis of an existing PDF file can be used to add internet links at the location of selected text pieces There are s
10. settings resulting in access violations This is probably due to different storage allocation of CString objects Thus make sure you are using the correct PDAFX D DLL PDF Tools AG Premium PDF Technology 7 2 PDF Library SDK Version 4 5 Page 23 of 24 July 7 2015 Trouble shooting Compilation with MSVC When Using MFC Because of a strange feature bug of MSVC you cannot use precompiled headers when including pdfile h The statement that causes troubles is ifdef _AFXDLL include afx h You can edit pdfile h and replace the whole ifdef part by include lt stdafx h gt But be sure to use the AFX version of the library Text Operator Dependencies Adobe introduced a new restriction on text operators with Version 3 01 In order to print correctly on postscript printers the Tc and Tw operators must not be issued before a font has been set using Tf The sample txt2pdf has been updated accordingly PDF Tools AG Premium PDF Technology 8 PDF Library SDK Version 4 5 July 7 2015 Page 24 of 24 Index Encoding 15 AddAnnotation 17 Annotations 16 AppendKid 17 AppendTree 17 AssignStream 11 book marks 16 BreakOnBlank 21 copy pages 4 12 referenced objects 8 16 CopyTo 5 12 Courier 15 DecodePDFString 8 GetNextAnnotData 17 HasEncodedStreams 18 HasXObjects 18 Helvetica 15 logo 17 MakePDFString 8 MemCreate 8 MemOpen 8 OnReadPage 12 PDAction 16 PDAnnot 17 PDAnnotD
11. 4 July 7 2015 Memory based Input Output The PDF Library SDK supports also reading or writing PDF files from to a memory buffer If you choose for example to store a PDF file as a blob in a database you can retrieve it to a memory buffer and open it using PDFile MemOpen An other use case is when you prefer to work with memory mapped files A web server application may not want to create the PDF file in the file system but pipe the PDF file in response to a CGI or servlet request back to the browser In this case the output can be generated into a memory buffer by using the PDFile MemCreate function Note that you must Close the file to complete the output buffer After that you can use MemBuffer and MemLength to refer to the output buffer The space for the output buffer is managed by the PDFile object and will be freed in the destructor of the object Standard Security Support Support for standard security based on the encryption technique described in the Adobe PDF specifications is optional This means that the API calls are present but only functional with the corresponding code module contained in the library The functionality dealing with security is encapsulated in the classes PDFile and PDObj The PDFile SetUserPassword and PDFile SetOwnerPassword methods are used to provide password information after opening or creating a PDF file The security flags are accessed via PDFile PermissionFlags Since string an
12. Arial Bold Courier Other fonts contained in PDF files should contain a Widths attribute PDFontDict will retrieve font metrics from there PDF Tools AG Premium PDF Technology 3 18 PDF Library SDK Version 4 5 Page 15 of 24 July 7 2015 PDTextState 3 19 This class stores state information from text scanning which is necessary to accurately compute the width of a text token PDTextToken 3 20 An object of the class PDTextToken contains the results from text scanning as performed by PDTextScanner s below It stores the text token string its position in standard PDF coordinates the font size which corresponds to the height of the token on the page the width of the text token and its orientation The orientation is relative to the coordinate system if there is a Rotate entry in the Page dictionary it differs from the visual orientation when the page is displayed This can typically be the case when pages are printed in landscape format PDTextScanner The class PDTextScanner permits you to find text tokens on a PDF page The behaviour can be controlled to some extent via the method BreakOnBlank The default behaviour is to provide tokens that consist of as many characters as can obviously be retrieved from the stream Whenever there is a change in a font or a stream operator is found that sets the text pointer the token ends When BreakOnBlank is set tokens will be broken down into p
13. DF library without MFC and still have CString objects avaiable as on UNIX platforms based on the CString subset implemented in the PDF Library Release Debug Using Type Encrypti Thread model Library Library MFC on support PDAFX PDAFXD Yes DLL No Multithreaded PDLIB No Static No Single PDAFXE PDAFXD Yes DLL Yes Multithreaded PDLIBE No Static Yes Single If you have a source code license and want to compile the library with MSVC the macroes _AFX and _AFXDLL will control whether CString comes from MFC or not The compiler macro _WINDLL will control whether export directives are generated to make the API classes available to the linker Using Different Compiler Settings You may encounter problems when using special compiler options to build an application using the PDF library in binary form There are some precautions for this when using MS Visual C and packing options However there are cases where no simple solution exists If the linker complains about missing functions that are inlines the problem is probably that you are compiling with debugging option enabled but linking to a PDF library archive that was compiled with debugging off So make sure you use corresponding settings check if there is a debug version of the PDF library to link with in this case A problem that has been found when using PDAFX with MFC CString objects may be passed between code with different DEBUG
14. XObj PDXSource oo cc EERE EEE EEE 13 3 15 PDStream iii dai 14 3 10 PDPgStream ier eee a 14 3 17 PDEGDUDICE AA tu eras koe te A unt E Cbr mr E EUER ALTE 14 3 18 AS aive eive c Rl E E CA RC EAT CA LR ER A EC E T on 15 3 19 PD kexthoken oou oer tex oe e me A a 15 3 20 PDTextScarnnet ana rr A etx Vide pr V bL SEV n einen VE E RES vals 15 4 Classes of PDPTDoc Module uuuuununnnnnununnnununnnnnunnnunnnnnnnnnnnunnnnnnnunn 15 4 1 JPTInputDOC sssssseeesseenn enne he eme hme hne ase ase assa ses nun nun nun assa ann 15 4 2 PTPrINtDOC naar eia ER VER VERE VR VR nehme 16 a S SP IFONERSCH eee er e rn cc ee e BEE e ea e te e a ea c ca V ca 16 44 PTEONtENt Y eite ke tesis etse ae a ra Ed 16 4 5 A 16 4 6 PANNO S OO crore c teta nexo oda 16 TEE A ap EM 17 4 8 PDEnhancedTextScanner ccoccccconccncnnnnanananana conan nanannranann narra aaa aa n 17 5 Linearization zuililiicnlllislilalisRzianelzasUDRRRRRRRRRRURRRRURRSRRRUDRASRRSRRASRRERHAERA 18 Sample Applications oe eee eee eee nennen enn nnne nne n nennen nana annua 18 6512 PAS PEE 18 6 2 PANTO zt eet eee aede eee eer eme eem x e x tbt 18 6 3 PAOD ii A AA rer ER ieee Ded ae Ed 18 O PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 3 of 24 July 7 2015 sr seo c PDT 19 6 5 pdtoC dii niue e ner ve e ta deer E EN RR ee dake ERE TUST E ease eher 19 6 6 TAO 19 EZ EXEZ DO ace
15. age 21 of 24 July 7 2015 Appendix Things to observe Security PDF files can be encrypted to provide security features The PDF Library SDK supports Standard PDF security as described in the Adobe PDF specifications Copying PDFile and PDObj objects and objects of derived classes cannot be copied the copy constructor is made private to prevent you from doing this If you write functions that take PDFile parameters pass these parameters by reference Memory Usage Keeping many objects in memory requires heap space Try to free objects that you do not need any more If you have to process all pages of a file use the recursive traversal of the ReadPages method If you use PDFInput GetObj make sure to apply ReleaseObj or ReleaseAll if you are dealing with large files When the files are always small there is no problem Try to avoid memory leaks Whenever you use a method that returns a pointer make sure whose responsibility it is to free the data again PDFInput GetObj keeps the data in a cache and you may not free the data yourself On the other hand when extracting annotation data from a page using PDAnnotlterator this data is not cached by the PDF library and it is your responsibility to free it Multithreading The PDF library is thread safe in the sense that multiple threads are allowed to concurrently access distinct objects files It is also possible for the application to synchronize access to PDF ob
16. are an aggregation of keys and associated values Some common keys are predefined in the PDF library in general there is no limitation to keys and the library handles this dynamically To gain access to the value associated e g with the Length key you would use either PDDictionary pDict PDValue pVal pDict gt GetAttrVal PDDictionary aLength or PDValue pVal pDict gt GetAttrVal Length To add another entry to an existing dictionary you write the following code pDict SaveAttrVal Author pVal Keys are unique in a dictionary if you apply SaveAttrVal to a dictionary with a key that already exists the previous value is deleted and the new value is stored Note that the value pointer that you pass is stored in the dictionary and that the dictionary objects receives control over the value object Before storing a value you must allocate it using the new operator and you may not delete it any more You can delete the dictionary object and this will automatically delete any values stored in it The DeleteAttr method deletes an entry from a dictionary ChangeName allows you to change a specific key in the dictionary this is more efficient than deleting and adding it again you will hardly need this feature it is used in one special case in the PDF library To traverse all keys and corresponding values in a dictionary you use GetVal The fpPos parameter works like an index it starts at O GetVal returns
17. ata 17 Licensing PDAnnotlterator 16 17 pdcat 25 PDCopyObj 8 16 PDDictionary 11 writing 8 PDEnhancedTextScanner 23 PDFile 4 6 pdfile h 6 PDFInput 4 5 12 21 PDFont 15 PDFontDict 20 PDFOutput 5 14 22 PDGoToRAction 16 pdinfo 25 PDLaunchAction 16 pdis 25 pdobj 25 PDObj 4 10 PDOutlineNode 17 PDOutlineTree 17 PDOutln 17 PDParse 4 PDPgStream 5 19 PDScan 4 PDStream 5 19 pdstream h 19 PDTextScanner 21 PDTextToken 20 pdtoc 26 PDURIAction 16 PDValue 10 writing 8 pdw 26 PDXObj 5 17 PDXSource 5 17 pdxt 26 PTAnnotStore 23 PTFontEntry 22 PTFontRsc 22 PTPrintPage 22 ReadPages 6 12 ReplaceFontName 19 SetEncoding 15 Times 15 txt2pdf 26 Write PDFile 7 PDFont 15 WriteContents 14 WriteStreamObj 19 XObject 18 26 The PDF Library SDK is copyrighted This user s manual is also copyright protected it may be copied and given away provided that it remains unchanged including the copyright notice PDF Tools AG Premium PDF Technology
18. ble because traversal starts at the root object and recursively goes down to the leafs of the tree When a leaf or sub tree that has to be omitted is found all nodes up to the root are present on the stack and are linked via the m_parent member of PDPage Please note that CopyTo requires objects to be of class PDPage or something derived from that As an alternative to the CopyTo method you can use CopyFew This method does not traverse the whole pages tree but rather descends the tree to a random page or some random pages to copy it CopyFew is therefore appropriate to extract some pages from a large document Please be aware of a conceptual problem when copying only a range of pages it is possible that these pages contain link annotations which refer to pages that are not copied It is up to the PostCopyPage method to remove such annotations If the page contains form fields that should be copied there is a possible problem of having more instance of that field on pages that are not copied The AcroForm dictionary must be reconstructed therefore This is not yet automatically supported by the PDF library PDFOutput The class PDFOutput is a rather tiny extension of PDFile It stores objects of class PDStoredObj until after all other objects have been written to the output file By overriding the WriteContents method of PDFile PDFOutput triggers at this moment the output of the stored objects You would use stored objects as
19. ct unless you have obtained written permission from PDF Tools AG for this All of the utilities print out a usage message when run with no arguments 6 1 pdis The pdls utility lists information about the pages tree of a PDF file It can also print out the contents streams of the file 6 2 pdinfo The pdinfo program writes the entries of the info object and some important ids to standard output 6 3 pdobj The pdobj utility dumps the objects whose id is specified on the command line to PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 19 of 24 July 7 2015 standard output To find out the id of a particular page you would first use pdls When you specify a file name only pdobj will print the info and Catalog objects When the option s is specified pdobj will print also stream contents 6 4 pdcat The pdcat utility demonstrates how a number of files can be concatenated to a single PDF file This program can also add bookmarks related to each of the input files it can even copy existing bookmarks from the input files into the output file The pdcat sample also demonstrates a simple manipulation of page contents When the clip option is specified on the command line the corresponding rectangle is clipped on each page actually only on the first content of the page but usually there is only one content With release 1 4 pdcat now incorporates a lot more functionality It can add a logo
20. d a variety of settings that affect its appearance PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 18 of 24 July 7 2015 5 Linearization Linearization is implemented in basically two new classes PDLInput and PDLOutput The input class performs the analysis of an existing PDF file while the output class handles the linearization specific output The linearization classes are extensions of the PDFile class The use of the linearization classes is demonstrated in the pdlin command line application Functional extensions are possible but should be implemented very carefully You can override the PDLOutput OnWriteObj method to add or suppress the standard optimization features These are e Removal of dictionary entries in Pages objects that have been copied to the Page leafs e Compression of uncompressed streams based on presence of a Filter entry in the dictionary e Removal of references to objects not stored in the PDF file 6 Sample Applications The sample applications are actually very useful utilities that demonstrate the power of the PDF Library SDK Please note that these utilities are copyright protected You can use them for your own purposes and you can copy parts of the code to incorporate it into your product that you develop with the PDF Library SDK However your product must be significantly different from these utilities and you may not incorporate the utilities into your produ
21. d stream output is encrypted in secured files you have to use the specific methods designed for these data types PDFile WriteEncoded will encrypt the data and then encode it If you have used previous versions of the PDF library you will have to replace calls like PDFile WriteString some string data by PDFile WriteEncoded some string data PDF data is usually read via a PDObj object This class has methods to facilitate encryption for output and decryption for input such as e DecodeString EncodeString e DecryptStream EncryptStream e DecryptValue EncryptValue The data of a PDObj can be either decrypted plain text or encrypted and care should be taken not to confuse these states The PDObj Read method will read in the data from the file and leave it encrypted All other methods providing PDObj or PDPage objects will automatically decrypt the data The PDObj Write method will automatically encrypt the data Methods and Attributes The class definition of PDFile is located in the file pdfile h It contains comments for the methods and attributes that may be of interest to an application programmer The destructor of PDFile takes care to free any dynamic memory associated with the PDFile object m_template closing the file to free the file handle m_idMap m_index PDF Tools AG Premium PDF Technology 3 2 PDF Library SDK Version 4 5 Page 8 of 24 July 7 2015 m_parent m_threadArr The close met
22. esigned for enhancements that apply to output to a PDF file Note that the PDF library does not permit input and output at the same time to the same file There is no updating of existing files as the PDF standard would permit A file that is written to is always created from scratch PDPage is a class derived from PDObj that models more precisely the behaviour of page objects It is related to PDFInput since PDFInput requires objects to be of this class for the CopyTo functionality PDPage several enhancements over PDObj like adding contents annotations fonts or XObjects Retrieval of page related information items is also supported Support for transforming a page from an input file into an XObject that can be used for output is included in pdxobj h through the classes PDXObj and PDXSource Outlines i e bookmarks can be constructed and added to an output file This support is found in pdoutln h PDF Tools AG Premium PDF Technology 3 1 PDF Library SDK Version 4 5 Page 5 of 24 July 7 2015 Streams are used to carry many different kinds of data notably the contents of a page If you need access to an encoded contents stream or if you would like to place text on a page you use the classes PDStream or PDPgStream pdstream h Core Classes PDFile The class PDFile models a PDF file that is either being read from or one that is being written to It is not possible to alter an existing PDF file on disk nei
23. everal issues that make this interesting Many applications that produce PDF create small fragments of text that must be reassembled The re assembly is based on heuristics of geographical placement Use of multi column text can make the correct text assembly very difficult pdwebl assembles the text of a line before matching is applied If a pattern spans over the end of a line it will not be recognized Often it is desired that links are visualized in some way Acrobat can add a border to the box that represents the link This box is not visible on a printout It is also possible to change the content of the page to reflect the presence of a link e g by changing the color of the text or by adding a line blow the text All this requires a programming effort and will affect the printout By the way pdwebl also shows how memory based PDF files can be handled Depending on the options settings it reads from standard input into a memory buffer and passes this to the PDF library Output can also be collected in a memory buffer and then written to e g standard output pdsplit The pdsplit program demonstrates how link annotations can be changed on the fly when splitting a PDF file into several output files This program has been developed to prepare PDF files for a web server application which counts access to individual pages of the PDF files PDF Tools AG Premium PDF Technology 7 1 PDF Library SDK Version 4 5 P
24. he recognition of these types and subtypes of annotations by parsing the PDF objects containing such annotations There is also support for constructing annotations and place them on pages while resolving forward references to pages that are not yet created Class PDAnnotData is the base class of all annotation types PDAnnot serves to intermediately store annotation data to be written to a PDF file once the references to linked pages can be resolved which is when the output file is about to be closed So you will obtain PDAnnotData from parsing an input file e g by using PDF Tools AG Premium PDF Technology 3 13 PDF Library SDK Version 4 5 Page 13 of 24 July 7 2015 PDAnnotIterator GetNextAnnotData Objects of class PDAnnot have to be created by you You will typically attach these annotation objects to a particular page using PDPage AddAnnotation To not call AddAnnotation more than once for a particular PDAnnot object PDOutin 3 14 There is support for outlines or book marks through the classes PDOutin PDOutlineTree and PDOutlineNode header file pdoutin h You can construct the outline tree using the AppendKid method which is overloaded to generate actions of one of the subtypes described above The method AppendTree moves a whole outlines tree from an input file to the output file PDXObj PDXSource These two classes provide the functionality to e g add a logo on pages of a PDF f
25. hod frees m_parent m_idMap file handle m_index m_threadArr PDObj 3 3 Everything contained in a PDF file except header and trailer is a hierarchy of objects The origin of all objects is the root object PDObj objects carry their object id in the m id attribute The information contained in the object is stored in the value part a protected attribute that you access using AttrVal Some objects have stream data this data is attached to the value attribute see PDValue below The class PDObj encapsulates all kinds of these objects It discerns two specific types of objects that make up the pages of the document the other object types are handled generically The type of an object is stored in the m_kind attribute This attribute is actually determined from the value of the object according to the Type entry in the dictionary Setting m_kind has no effect it is just an indication for the efficient traversal of the pages tree PDValue The PDValue class models all possible variants of simple or aggregated data that makes up the information contained in an object at the root level or contained in an aggregate part of it The basic data types are object references names numbers and strings An object reference is something like 1 O R a name is e g Page in a dictionary like lt lt Type Page gt gt a number is an integer number as in lt lt Length 59 gt gt an a string example is lt
26. ieces whenever there is more space between tow characters than about a space s witdh You should preferably use the class constructor that accepts a PDPage parameter because PDTextScanner can then find the font information required We have found PDF files that contain streams that are broken down over several contents objects Parsing requires that these streams are concatenated again The sample program pdw demonstrates the use of these features 4 Classes of PDPTDoc Module The PDPTDoc module file pdptdoc contains the classes that make up the so called Prep Tool Suite component PT The main features of this module are content analysis content assembly and dealing with Acrobat form fields 4 1 PTInputDoc This class enhances the class PDFInput in several ways It e supports reference counting for COM support e permits to add modify or delete form fields PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 16 of 24 July 7 2015 e gives access to various objects like fonts page content document and page attributes etc PTInputDoc cooperates with the other classes of the module as described below 4 2 PTPrintDoc This class adds functionality to PDFOutput for e page content construction in cooperation with PTPrintPage e filling in form data e copy pages from existing PDF files e copy bookmarks from existing files e add bookmarks and links e creating image
27. ile The PDXObj encapsulates the XObject to be placed in the new PDF to be written and PDXSource contains the functionality to extract the information for the XObject from the page of a PDF file There is a number of issues in this context In PDF 1 1 XObjects were not allowed to refer themselves to XObjects The method HasXObjects was useful to detect that problem In PDF 1 2 this is no longer a restriction Until version 1 4 of the PDF library the contents stream of the page where the XObject is retrieved from had to be uncompressed because some modifications must be made to it The method HasEncodedStreams was useful to detect that problem With the current release of the PDF library this restriction no longer applies actually only LZW and FlateDecode is supported but we have never found any other compression types applied to contents streams XObjects must be given a name that is unique within the scope of the page resources Potential conflicts may come from either XObjects contained in the logo file or from such objects already contained in the PDF file to be enhanced with the logo It may not be easy to check all pages of that file first in order to determine a new unique name for the XObject To make an XObject visible add it to the page of an input file to produce an output file you have to add suitable directives to the contents stream The sample programs pdxt and pdcat demonstrate how to do that When placing a logo
28. jects between several threads Thread safety is not ensured for error output however which is by default disabled anyway Error Handling When the PDF library encounters unexpected situations it can print an error message to standard error or some file s PD ERROR macro definition in pdimpl h Error output is controlled via the pd set error output function s pdimpl h Error logging is not thread safe When an unexpected situation is encountered within functions that return a pointer result NULL 0 is returned This is also the case when the result is an OBJID because O PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 July 7 2015 Page 22 of 24 zero is not a valid object identification When a PAGENR is returned a value less than zero means an error because O is a valid page number page numbering starts at zero In the context of a PDFile object the error code m err is set Compiling on MS Windows As of V2 0 MSVC 1 52 i WIN16 s not longer supported The binary release for Windows systems is compiled with MSVC 6 0 There are several variants how the library is built depending on e whether it is used with or without MFC e whether it is to be linked statically or as DLL e whether it is to be used with the multithreaded Win32 libraries or not e debug setting When the PDF Library SDK is used together with MFC the MFC implementation of CString is used It is possible to use the P
29. lt Title De bello gallico Author Julius Caesar gt gt The numerical data is stored in the m_num attribute but also as string in m_string Aggregate types are arrays and dictionaries Arrays are implemented as linked lists of PDValue objects using the m_nextEl attribute The m_num attribute of the array object contains the number of elements in the array Note that array elements can be any basic data type or a dictionary Starting with V1 4 arrays elements can also be arrays In this case make sure to use the access methods GetFirstEl GetNextEl The behaviour with respect to the member variable m_nextEl has been preserved for compatibility with earlier versions of the library For a description of dictionaries please refer to the next section Instances of the class PDValue can store a PDF stream e g in the case of Contents objects In this case they contain a dictionary which itself contains a Length key and possibly Filter keys To construct such a class instance you can use the method AssignStream This method will automatically set the Length key in the dictionary Make sure m_dict has been initialised before It does not set or remove any encoding entries in the dictionary Make sure these entries are set corresponding to the contents of the stream that you assign PDF Tools AG Premium PDF Technology 3 4 PDF Library SDK Version 4 5 Page 9 of 24 July 7 2015 PDDictionary 3 5 Dictionaries
30. new identification using the CreateObj method It should not contain any object references inside If it is related to other objects that come from the same input file i e if it is referenced from such objects or refers itself to such objects you want to use the id adoption mechanism supplied in the PDF library You have to replace the object id and all references it contains using the Adopted method The PDCopyObj class helps you to do this for a whole hierarchy of objects Id adoption is a feature that maps object ids from a particular id scope that of a chosen input file to the scope of the output file Whenever you choose a new input scope you do this by a call to the Reservelds method of the output file It is not possible to save a mapping and restore it again for example to merge pages of two input files However you can insert objects pages programmatically by using the CreateObj method that reserves new object ids Strings numbers PDValue and PDDictionary objects are written when you compose new objects as in the sample code above PDF string values deserve your special attention they are enclosed in left and right parentheses If the text contains special characters among them parenthesis it has to be encoded appropriately For this purpose the PDF library supplies the functions MakePDFString and DecodePDFString in pdfile h PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 7 of 2
31. objects for placement in the document For a more detailed description of the functionality refer to the Prep Tool Suite User s Manual 4 3 PTFontRsc The PTFontRsc class represents a collection of font definitions for the purpose of importing from an existing PDF file and reuse during content construction of an output PDF file 4 4 PTFontEntry Fonts that are used in content construction are stored in a PTFontEntry object which itself is a member of the PTFontRsc collection 4 5 PTPrintPage A PTPrintPage object represents a layer of page content Usually pages just contain one layer but it may also be interesting to use additional layers with content that is put on top of several pages logo header footer page numbers etc The PTPrintPage class is derived from the core class PDPgStream It adds functionality for font handling and some standard PDF stream object constructors 4 6 PTAnnotStore PTAnnotStore stores the annotations links that shall be added to PDF pages that are created There is a separate store object for each output page PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 17 of 24 July 7 2015 4 7 PTPageDir PTPageDir contains all the PTAnnotStore objects for each individual page 4 8 PDEnhancedTextScanner The class PDEnhancedTextScanner provides some additional features compared to PDTextScanner Most important it can determine the width of a piece of text depending on font an
32. stored in the cache and the pointer is returned PeekObj can be used to check the cache for an object without reading it from the file The cache can be flushed either by using the ReleaseAll method or by using the ReleaseObj method ReleaseObj can either release only the object that is specified or also any other objects that are referenced from this object The reference chain stops when a Page or Pages object would be reached following link annotations and Parent links would result in unpredictable behaviour Copying works as follows the method CopyTo initializes the state of the member variables of PDFInput such that the methods dealing with page traversal select the desired pages The Reservelds method of the output file is called to flush a potentially existing id mapping table and reserve space for the one to come Since CopyTo can be called several times in sequence the array indicating which objects already have been copied is cleared If no object template has been stored CopyTo installs a PDPage template ReadPages OnReadPages and OnReadPage are the methods that are called to traverse the pages tree of the input file When only part of the pages are copied the pages tree is modified to contain only the desired part of the pages To this end PDFInput requires PDPage objects to be read because it makes use of the RemoveKid method This method modifies recursively the Pages object on the way up to the pages root This is possi
33. ther is it possible to make any changes to an object once it has been written out to a new PDF file The class declaration is located in the header file pdfile h Reading from a PDF File Reading from a PDF file is performed with the following steps PDFile theFile PDObj theObject theFile Open acrobat pdf theObject Read theFile theFile GetInfoId After declaring appropriate variables you gain access to information in the PDF file by first opening the file and then read from it by using the Read method that belongs to the object in this sample here The Read method fills in the data of theObject An alternate method to read data from the file is using the ReadObj method of PDFile PDObj pObj theFile ReadObj 1 When you use ReadObj a new object is dynamically created and returned to you with the data filled in Note that this sample carries some dangers we ask for object with id 1 but this object may not exist unless we have good reasons to believe this ReadObj would return a NULL pointer in this case Please refer to the description of the PDObj class below for more information on gaining access to information within an object The ReadPages method can be used to traverse the pages tree of a PDF file On traversal of the pages tree OnReadPages is called when a page is encountered OnReadPage is called The pdls sample shows how these methods can be overridden to add functionality Generally page n
34. umbering starts at zero This applies e g whenever a page is referred to by its number as in link annotations The member m_curPage counts page numbers before OnReadPage is called Therefore m_curPage contains the number of pages encountered so far and starts at one rather than zero PDF Tools AG Premium PDF Technology PDF Library SDK Version 4 5 Page 6 of 24 July 7 2015 Writing to a PDF file PDF files can be written to in a variety of different ways Be careful to obey the Adobe standards it is easy to write messy files The PDF Library SDK does not care much about the semantics of objects The creation of a PDF file happens according to the following scheme PDFile theFile theFile Create newfile pdf theFile Write 9 ocomments are allowed theFile WriteLn OBJID id theFile CreateObj theFile WriteObjHeader id theFile Write theFile WriteEndObj theFile Close The Write method is overloaded to accept several parameter types PDObj CString char numbers PDValue PDDictionary arrays of bytes WriteRef writes an object reference WritePageRef writes an object reference to a page A PDObj is usually written to a file after reading it from another file and eventually modifying it In this case think about the id of this object most of the time it will not be the id it carries in the input file If it is not related to anything you have written or are going to write you must give it a
35. you can run into the problem that it is not visible when placed on the background The reason for that is that either the visible part of the logo lies outside of the visible portion of the page or the page content is not transparent The page content coming from a scanner is never transparent and will hide the logo but there are also authoring tools which invisibly place a white rectangle that will have the same effect On the other hand the logo may come from a source with a white non transparent PDF Tools AG Premium PDF Technology 3 15 PDF Library SDK Version 4 5 Page 14 of 24 July 7 2015 background that will hide everything when the logo is put in the foreground of the page So either set the bounding box for the logo in order to clip it to the part that actually shall cover the page or make sure the logo is transparent PDStream 3 16 Object of class PDStream store stream data The declaration is located in the header file pdstream h In a PDF file Streams are used for different purposes e g to store the text and graphic contents of pages but also thumb nails or font data The class PDStream has a close relation to the class PDStreamBuf PDStreamBuf only takes care of buffering the data while PDStream allows manipulation of the data PDStream incorporation LZW decoding of compressed streams but not LZW compression because of patent protection With release 1 3 PDStream also supports flate zlib

Download Pdf Manuals

image

Related Search

Related Contents

RAPPORT FINAL DE LA COMMISSION D`ENQUETE  Solution 844/862/880 Installation Manual  ダウンロード - 日本電産シンポ  Corporate Profile CSR Report 2010 / 2011  

Copyright © All rights reserved.
Failed to retrieve file