Home

LATEX Source from Word Processors

image

Contents

1. In this paragraph there are single words and a three word sequence that are emphasized by changing fonts The default font is changed to italics or typewriter Source exported as text with encoding will have for matting removed A similar situation occurs when text is inserted into mathematics code The user can highlight a phrase or click within the single word Then the user presses the appro priate function key for the formatting command to be inserted with grouping of the appropriate text If the user has clicked within a word then the extent of the word is determined by whitespace delimiters Clicking on whitespace is a special form of this the commands are inserted and the cursor placed on the right brace for user input Instead of highlighting a region the user can use the Emacs form of setting the mark and mov ing the cursor to the other end of the region I implemented these functions for bold italic sans serif and typewriter fonts I did not insert the italic correction but easily could have paying at tention to the following character I did not be cause in many cases it is just not needed and be sides the user should have some responsibilities The same functions are reused for simple grouping and the text commands which were used mostly in mathematics modes 1004 5 2 Inline mathematics Inline mathematics is common in the rotordynamics text Most of the resulting mathematics is usually a fraction of a line in
2. Inline mathematics Some inline mathematics is converted to italics That is troublesome to me because it should really remain as unconverted mathematics Then too that may be the fault of the author Export of structure The structure of the chap ter and lists are inconsistent to missing This is likely the authors fault as the use of styles seems to be the cause 7 Writer and Friends In spite of my earlier remarks I salute OOo I believe that the Writer package and Writer2LaTeX appli cation have made a great contribution to the goal of converting many documents into a form for bet ter presentation and archival namely TeX TATRX That may not have been the intent The intent may have been to enable a good Writer user to simply use ATEX as an output device 1005 Bart Childs The ATEX code output in version beta 1 2 is improved but not clean The Writer2LaTeX Users Manual is 45 pages in length The exported TEX withe the clean option source averages about four teen occurrences of mdseries and twelve occur rences of textstyleSourceText per page Each paragraph is grouped with mdseries as the start The latter is effectively an alias for texttt and used in tables 8 Conclusions Reasonable document interchange and archival is now possible for a wide range of systems I be lieve that TEX ETEX is the most reasonable basis for many archival systems The advances by OOo and its Writer system are impressive and appr
3. Meeting Inconsistent use of functionality Wrong use of functionality Not using available functionality ten NEL Oops Operator operand placement Misun derstandings Mysticism about style files This quote is in section 1 2 of Writer2LaTeX Users Manual 3 You can use LaTeX as a typesetting engine for your OOo documents Writer2LaTeX can be configured to create a LaTeX document with as much formatting as possible preserved Note that the resulting LaTeX source will be readable but not very clean You will find that Writer2LaTeX uses the prin ciple garbage in garbage out Each of the above examples of garbage in garbage out was present in at least two of the test cases cited Garbage in garbage out may be a bit strong a de scription for these but the message is clear For ex ample in the Czech memoir it was certainly appro priate to attempt to show correct accents Horak would be proud It overwhelmed the author s limits of skills with the systems he was using Each of the authors has a doctorate and has taught at major universities They are consistent users of computers but obviously are not the most persistent readers of the formatter manuals Maybe the manuals are poor non existent or not conve nient Maybe the easy to use graphics interfaces overwhelmed the authors Maybe these interfaces do not encourage users like these to seek the infor mation they need Maybe they just do not care
4. computers seem to get so little from users guides and manuals Maybe the manuals are poor non existent or not convenient Maybe the easy to use graphics interfaces overwhelmed the au thors Maybe these interfaces do not encourage users like these to seek the information they need Maybe they just do not care 1006 Was the intent in creating Writer2LaTeX to give the user ATEX as an improved output device I think that poses a bigger challenge How do you teach a Writer user to write for ATEX 9 Questions I did not intend this as a FAQ but thought it might be a good way to end the paper LL ATEX Do any of the test cases use ATEX be yond s Leslie Lamport s book Answer No for memoir and book on the three love triangles Yes for the science and engineering texts Pack ages used float lscape makeidx fancyvrb graphicx array amsmath amssymb sidecap wrapfig and caption These were probably not all necessary but useful WORD test case What do you want for a WORD test Answer A one pager like Norman Naugle s An El ementary Sum Then many others would help I hope it would also convert to Writer and back too How long How long did it take you to type Nor man s note Answer An hour or so The answer to the next question is why didn t you just do it in WORD Probably eight or seven hours and fortunately I do not have WORD in my house References 1 Charles Ota Heller PRAGUE My Long Journ
5. length The implementation is like the font changes in the previous subsection A significant difference is that the export processes handling Word Perfect mathematics yields significant artifacts of excessive white space and natting trash This almost always includes many of the grave characters these must be an escape character for the internal form of Word Perfect mathematics I have not had a reasonable test case with WORD mathematics yet There are small examples of math ematics in the programming text 5 3 Display mathematics The concepts in the previous subsection are appli cable However there are several forms of display mathematics These forms were used in the rotor dynamics text 1 delimited the standard for display equa tions without numbers 2 beginfequation end equation delim ited which is an alias for the former or vice versa 3 beginfequation end equation delimited which numbers the equations and should have an accompanying label 4 beginfequation begin split end split end equation delimited the col lection of equations is numbered and should have an accompanying label Chapter 8 of Frank Mittelbach s et al The PTEX Companion is seventy pages of great details of Ad vanced Mathematics formatting I implemented these four display math choices using one function key and prompting the user for which of the above forms was desired I developed similar choice macros for
6. presenting fractions and matrices which made conversions faster and most importantly more consistent The most important facet of this conversion is that with a little care the totality of the mathematics was converted correctly and hours of detailed laborious proofreading was avoided 5 4 Programs code fragments verbatim text Programs should be formatted by language sensi tive packages like listings The package fancyvrb requires some study but gives great results Both packages come with inline commands whose use is TUGboat Volume 0 2060 No 0 Proceedings of the 2060 Annual Meeting aided by adaptation of the above font changing and inline mathematics concepts 5 5 Other macros fix up There were several other macros that aided the con version I consider these to be fix up in nature These include e captions in the rotordynamics text often con tain inline mathematics The use of the ATEX delimiters are not allowed and must be converted to the TEX toggle e Interactive aid to standardizing presentation of fixed point and floating point numbers e Locating likely multicharacter super sub scripts that were not exported correctly needed group ing e Locating likely problems due to insertion of in advertent whitespace e Locating unescaped TEX control characters e Macros to aid the insertion of labels and their references 6 Current System The current system has been improved grea
7. were delivered to me The source was 1 edited to remove the graphics from the Word Perfect source 2 exported in rtf form 3 the graphics elements were put in a zip file The ver sion of Word Perfect being used would create rtf files hundreds of times bigger than needed if the graphics was in the export to the rtf Remov ing the graphics was no loss because it like the mathematics was not being exported 1002 I would take the rtf from Word Perfect im port it to OOo Writer and savee it This appar ently lost nothing but gave a smaller file and there fore my system was faster in using it I also noticed that Writer s export of tert with encoding was dif ferent from the other systems I had used Further the export could be done in Unicode which was com patible with Emacs Apparently there was significant appreciation of Unicode in the Word Perfect export process The export of the mathematics from Word Perfect was not converted but many symbols Greek letters etc were now viewable on the screen Most TeX IATEX users should be able to glean the proper content from a printed pdf of the Word Perfect Now the Emacs macros could do much more At this time my benefactor had other obligations and so I had time to work on the macros and test the system us ing the modified process I continued to learn more elisp 3 Keeping The Mind Busy My benefactor s diversions lasted longer than planned I read more about Uni
8. written in a reasonable dialect of WORD Word Perfect or Writer could be converted to TEX in an hour or two 1 Genesis My primary formatting system has been TREX based for more than thirty years Throughout this time I have had occassional need to import small parts of documents done in word processors into my TEX based documents I have accomplished that in a number of ways from keyboarding small projects to somewhat automatic conversion depending upon what was available I used some of the earlier sys tems discussed by Hennings 2 Several years ago two colleagues were writing a text on programming and became aware that they would have significant advantages if they could convert the half of the book that was completed to IATEX and take some instruction on how to complete the rest in ATEX I sketched the process and created a small set of Emacs elisp macros to do that conversion We agreed to the generalities with the plans to make a formal agreement upon the return of the senior au thor from a summer long trip Much of the TEX work was to be done by the junior author naturally The health of the junior author suddenly deterio rated and my conversion project was cancelled I continued to be intrigued by the concept I learned more elisp added macros and a number of open packages that seemed to offer promise as a means of getting much of the conversion done in an automatic manner I never felt that a mostly auto
9. 4 1 Inconsistent use of functionality The author of the memoir that used many Czech words phrases and sentences is to be saluted for attempting to make that text look proper to a Czech reader There are five special items in this sentence On my next visit to Prague he joined Vlada and me along with our wives for lunch at a French restaurant in Obecn dum Municipal House The nickname Vl a has an accent over the letter a and an accent often called a caron modifying the letter d The accented i in the first italicized word is a dotless i Finally the second italicized word has an accent that almost appears to be the degrees as in temperature symbol Although it was not the author s intention the distances these accents were raised or kerned differed in most cases I do not claim my caron is perfect TUGboat Volume 0 2060 No 0 Proceedings of the 2060 Annual Meeting IATEX Source from Word Processors 4 2 Misuse of functionality In the rotordynamics book there were many instances of using different Greek characters as the same The phi and varphi y as well as others Since this doc ument was constructed using papers written years ago this is easily understood The author of the novel containing three love triangles suffered a similar problem The author did not like the double prime for the opening and closing quotes When he wrote the first part he selected special graphics ch
10. I4TRX Source from Word Processors Bart Childs Texas A amp M University College Station TeXas 77843 3112 USA bart at tamu edu http faculty cse tamu edu bart Abstract Hennings CTAN survey is a good starting point when considering projects im plied by the title of this article I found it a fair view of most related packages He suggests having one of two goals converting the document structure or converting the appearance My goal is neither of these I want to produce I4TeX source that is accurate in content clean and therefore maintainable This is in keeping with Knuth s original goals in producing TEX graphic excel lence and a document convenient for archiving Structure and appearance are important I believe clean TFX is more likely to have this intrinsic result not use of word processing systems My current conversion system is a hybrid based on the use of the Open Office Writer package its Writer2LaTeX application and macros for the Emacs editor written in elisp The test cases for this sys tem are books 1 on rotordynamics 2 a C programming text 3 a memoir on a friend s life including significant text fragments in the Czech language and 4 a novel that includes three love triangles Even the worst case with significant mathematics formatting done in Word Perfect is tractable I did not say easy The lack of intelligent use of word processors causes many of the problems I estimate that a 300 page novel
11. aracters for the quotes When he wrote the other two parts the smart quotes were automatic for him He did not recall why it may have been a new revision of his formatter 4 3 Not using available functionality In two of the test cases the authors used itemized lists The exported form yielded consectutive lists of one item This did not bother the bulleted lists but would have been an error with enumerated and description lists In many cases the authors did not use styles and so chapter and section beginnings show the for matting but no ATEX commands This is not a to tal loss because I convert the section numbers into labels that would aid if we were trying to resolve differences in my output with the older version 4 4 Oops These examples can be difficult A glaring exam ple is that in Word Perfect s mathematics operators may follow the operand in some cases In TEX the operator is always first I did not find a general rule as to when to expect this My Emacs macros for ad justing this are interactive to enable the user me to minimize such problems A really big Oops worth repeating is the lack of using styles which caused inconsistencies I had to handle some of these manually 5 Typical Emacs macros The first version of these macros were developed when I was using an export that was usually des ignated text with encoding This export would dis card all or nearly all formatting such as emphases The improveme
12. code and realized how provin cial some of us are here in English only USA A college buddy of mine is a Czech immigrant and was corresponding with a publisher in the Czech Republic about his memoir When he wrote to the publishers and sent it by email the formatting was lost I suggested learning a bit of TEX converting it to pdf and emailing that He had sent me a draft of the book so I could create some examples The published version 1 was done while I was creating this system Of course I was naive and would still have been so had I read Horak s note But while waiting I thought I could polish my Emacs macros to handle his Czech problems It was fairly easy and with the improvements in the Writer export process it was really easy I mention this project because it shows evidence of real problems with similar projects That will be discussed later In the abstract I mentioned a novel about three love triangles That project was trivial but also con tains the same real problems with conversion of word processor sources 4 Real Problems There are several sources of problems that impeded progress in these projects Some of these sources could be avoided by user learning while others re sultec from differences in the design and implemen tation of the systems they used The authors had several kinds of problems that automatic conversion did not handle TUGboat Volume 0 2060 No 0 Proceedings of the 2060 Annual
13. eciated I hope that its open status and development will continue Note I have addressed only a small part of a large project A point made in a number of venues is the prob lem of TEX systems not having a native graphical in put process Lyx and OOo are touted as solutions along with several others The authors of the three test cases I have used show that the graphical in terfaces are not a solution to the problems in my humble opinion All the authors are highly educated and familiar with the problems of getting people to learn at the college level Still each has shown the results from casual learning about their systems The effective use of styles consistent use of symbols and special functions document structure etc were lacking in each of their documents The first line of a ATEX document requires state ment of the class of the document There is a fi nite number of them It does not seem to enter the stream of consciousness for many that if they learned how to type Mary had a little lamb on a machine that there should be at least a small change in the start of a letter to a sweetheart a grocery list or any other class of documents In a moment of frustration I lamented Users avoid using IATRX because you have to learn how to do some things while users of WORD believe if it takes any non obvious effort to do something it should not be done I raised these questions earlier about why edu cated users of
14. ey Home Abbott Press dec 2011 Wilfried Hennings Converters from pc textpro cessors to latex overview June 2012 mailto texconvfaq gmx de 3 Karel Hor k Those obscure accents TUG boat 29 1 42 44 2007 4 Henrik Just March 2012 sourceforge net User s manual for writer2latex mailto henrikjust This is intended as a preprint copy The bibli ography will be expanded and other cleaning done TUGboat Volume 0 2060 No 0 Proceedings of the 2060 Annual Meeting
15. matic conversion was realistic for projects involving significant mathematics content I expected to pur sue a PhD with a screwdriver approach I was willing to do this based on working from the WORD rtf Rich Text File format total extraction of text without formatting and or a mostly auto matic conversion that needed tweaking my pipe dream A few years after retirement a friend and col league in the college of engineering asked me for help in finding someone to keyboard a new text he was writing based on a few dozen of his research papers and related studies The topic the text is rotordynamics from small pumps and turbines to large ones as in the main engine of the space shuttle TUGboat Volume 0 2060 No 0 Proceedings of the 2060 Annual Meeting 1001 Bart Childs I ressurected my plan and we agreed on the plan of work The draft source of this rotordynamics text is being done in Word Perfect the formatter the au thor has used for many years Most of the text is being adapted from the author s contributions in the subject The current version is approximately 400 pages in length with another 25 to be added The lists of contents figures and tables will likely occupy 18 pages There are hundreds of equations with one of them being a full page 2 The Process Evolves I started this conversion using the process I had prototyped for the programming text The rotor dynamics text was a quite different docu
16. ment be cause of the large fraction of displayed equations The displayed equations and figures in the rotor dynamics text require approximately the same frac tion of space required by figures programs and code fragments in the programming text Most maybe all the code fragments programs and figures in the programming text were restricted from floating There had to be some manual floats I did some small portions of the rotordynamics book as manual conversions for test cases Some of the equations were manually entered because con version of mathematics among word processing sys tems was generally accepted to be non existent I think that is improving The manual process was based on a having a pdf of the document b edit ing the rtf file c editing a text file exported from a word processor with some encoding and or d a form of ATEX exported from one of several systems I was delivering ATEX source faster than I could have keyboarded it from good copy Still it was un satisfactory because it was mostly a manual process The source documents were done in Word Per fect on a PC and I was doing HTX on a Mac There are good TEX and Emacs systems for the Mac using MAC OS X Some Emacs systems were not accept able to me because my system uses function keys I continued to strive for big improvements be cause keyboarding mathematics would be slow A significant improvement came by changing the for mat that sources
17. nts in Writer2LaTeX have led to a reduced need of this kind of detailed editing Still the concepts in the design of these macros are ap plicable in the current system of conversion as well as keyboarding original documents This list contains three cases where it is more efficient to use text with encoding exports than the converted exports assuming the goal of clean TEX 1003 Bart Childs These came from the rotordynamics text the pro gramming text and the User s Manual These are Tables Tables are exported with all formatting on every cell The usual ATEX procedure is to give default formatting in a template and ex ceptional formatting when needed in a cell Mathematics Text is often used for explanatory purposes in equations Programs and verbatim text also need special han dling Portions of some documents are easier to con vert by exporting as text with encoding and then inserting the formatting by editing Two examples are mathematics that does not convert and format ted code fragments in a processor where font changes are done manually rather than using a package like listings These macros were implemented using the mouse or similarly functioning device to point or high light in conjunction with function keys In Emacs one can also highlight a region of text by setting the mark and moving the point The function keys can also be modified by use of shift control and alt 5 1 Applying fonts to text
18. ord Processors features that LaTeX does not support well If the layout of your document depends on text flowing around pictures or linked text boxes you will never get good results with Writer2LaTexX According to TeX s author Donald E Knuth TeX is a typesetting system intended for the creation of beautiful books and especially for books that contain a lot of mathematics quoted from The TeX book Writer2LaTeX will aim to produce excellent result for this kind of documents including of course shorter texts with a book like layout This quotation is fair but I think it makes my point go ahead and inhale Show the logos TeX and IATEX correctly use the correct dashes and spacing use the proper quotes 6 1 Examples of Other Problems I present an annotated list of a few other problems I addressed in the macros Based on two of the test cases the rotordynamics text and the programming text I think it is fair to classify most of these as not very clean BTEX Export of spacing The export of the chapter 5 of the rotordynamics text has 47 occurrences of a space preceeding a right brace The ma jority of those are in constructs like textit word while most of the rest are weird constructs like textit and textbf textit The first may be sloppy keyboarding by the au thor The second seems to be intentional spac ing why not The last is likely a hacked indentation kludge
19. tly with the release of OOo Writer2LaTeX version beta 1 2 I missed the notice of this release until after the abstract of this paper was submitted It is a beta release but I have not found any problems to date I find these observations about this beta release interesting 1 the users guide is 10 shorter and 2 the output files are 3 5 shorter than with ver sion 1 0 The ATEX output is cleaner as most of the reduction in the size is the elimination of need less formatting like 1 most paragraphs were in side grouping braces and a declaration that I used English and 2 textquotedblright for a simple A cursory look at the users guide indicates some removal of redundancy There is a lack of the com pleteness that is characteristic of the documentation of releases from the TEX communities I plan to work with OOo and continue to make this product better I believe it to be the best hope I know of especially in the open domain The following quote is from the sourceforge web site You will never get a result that looks iden tical to the original in fact that s the whole point LaTeX is in general a superior typeset ting engine compared to Writer For exam ple LaTeX produces much better results for formulas it has an excellent paragraph and page breaking mechanism it uses ligatures etc On the other hand Writer has a few TUGboat Volume 0 2060 No 0 Proceedings of the 2060 Annual Meeting IATEX Source from W

Download Pdf Manuals

image

Related Search

Related Contents

Page 1 Page 2 Page 3 Page 4 Page 5 由形ノ~ウを作りま しょ う (焼き  IBM Informix Database Extensions User's Guide  BA VBL_Änderung_3_DE.pmd  Bluetooth interior/Altavoce s exteriores  (Manual de Instala\347\343o das Saunas Secas)  VMware Shutdown Wizard  PC Link Tool 取扱説明書  Tripp Lite SR48SIDE rack accessory  Manual de instalação e utilização MD100  Tech air TSC13V1  

Copyright © All rights reserved.
Failed to retrieve file