As I’ve mentioned before, I’m a big fan of LaTeX in a field that is not so wild about its use. In the past few weeks, I’ve had multiple situations where a Word file was requested instead of a PDF, which have left me grumbling and manually hacking my .tex files into Word docs each and every time. So, this afternoon I decided to take a crack at solving the problem. After googling around the Intertubes, I decided that my most flexible option was likely to be the process of LaTeX -> HTML -> Word, and so I spent a couple of hours getting everything working on my system. Since I’m a blogger, naturally I also decided to share the results of my efforts with you. Thus, a miniature tutorial on getting up and running with latex2html on OS X.
Before you start:
Since you’re here reading about converting LaTeX, I’m going to assume that you already have a functioning LaTeX system; if not, you’ll want to use fink or MacTeX (see also the TeXShop page on the subject). If you’re installing LaTeX for the first time, I do recomment MacTeX over fink, which to the best of my knowledge still uses the no-longer-maintained teTeX as its base system. I will also be assuming that you’re comfortable with Google and the command-line; if you’re not, leave a comment and I may expand upon things for you, but for now I’m going to play to the experienced crowd.
Getting and installing latex2html:
How you choose to do this will depend on what TeX distribution you have installed. I’ll deal with both fink and MacTex below, and comment upon the i-installer distribution as appropriate.
- Fink: If you’ve installed TeX via Fink, this should be fairly easy for you. Simply type the command sudo fink install latex2html into a Terminal window, and hit enter. Fink should do the rest for you; it will ask you about installing a few packages that latex2html depends on, and the default answers should just be fine.Note: if you installed via the i-installer distribution, Fink may ask about installing tetex-base – if this hapens, look for the option that says “system-tetex”, which indicates a manual installation of tetex. If this option is not available, you can choose to install teTeX, which may be more hassle-free for this particular case but which will eat up hard-drive space unnecessarily (and I can’t guarantee it won’t cause problems with your existing installation, either), or you can choose to cancel the installation and proceed as for MacTeX below.
- MacTeX: latex2html has a number of dependencies, which you will need to make sure are satisfied. Some will likely have been installed by MacTeX, such as GhostScript and dvips; the one you may not have is netpbm, which is necessary (if typing pnmtopng<enter> in the Terminal gives you an error saying that the command is not found, you probably don’t have netpbm installed). Installation can be done from source, downloaded here, or better yet, through fink via sudo fink install netpbm (I tried the DarwinPorts installation, but that did not work with latex2html).Once you have the dependencies taken care of, you will have to install latex2html from source, unless you want to install teTeX as well (see above). To get it, download latex2html-2008.tar.gz from here, and unzip it to the directory of your choice. Switch to that directory in a Terminal window, and execute the *nix-standard sequence of:
make test<enter> (important – you’ll want to make sure everything went all right above; if it has, the last line of output should contain a file:// url that you can copy to your browser and see the HTML produced from the test .TeX files).
sudo make install<enter>
And you’re done! If you’ve done the check and the tests and everything has come out okay, then you’re ready to use latex2html.
At this point, you’re ready to convert your files from LaTeX to HTML, and then possibly to Word. To invoke latex2html, switch to the directory with your .tex file, and latex2html filename<enter>. In its default state, latex2html will produce HTML that is broken up into multiple pages, usually one per section / subsection, much like the latex2html home page is. If you want to import your document into Word, you may wish to suppress this tendency. To do so, use the following command:
latex2html -split 0 -info 0 -no_navigation filename<enter>
-split 0 will make the entire LaTeX file into a single HTML page, while -info 0 will remove the information bar at the bottom of the page and -no_navigation will remove the navigational menus on the top on bottom. This should produce a vanilla HTML file that Microsoft Word can read fairly easily.
One thing to beware at this point: as I noticed on this blog post, Word will link to image files instead of including them in the document, which will mean that things like your equations will drop out if you send someone the .doc file without sending the image files as well. To fix this, adopt the procedure mentioned by PD by going to Edit->Links, selecting all of the links in the dialog box, and clicking “Break Link”. Once that is done, save the file and the images will now be embedded into the document itself, ready for sending off to someone else.
And that’s it! To get more information about latex2html command line options, use latex2html -h in your Terminal window, and if you have any other questions, I’d be happy to try and help if you leave a comment. Happy TeXing!