LaTeX to HTML and Word with latex2html – A mini-tutorial for OS X users.

September 29, 2008

As I’ve mentioned before, I’m a big fan of LaTeX in a field that is not so wild about its use.  In the past few weeks, I’ve had multiple situations where a Word file was requested instead of a PDF, which have left me grumbling and manually hacking my .tex files into Word docs each and every time.  So, this afternoon I decided to take a crack at solving the problem.  After googling around the Intertubes, I decided that my most flexible option was likely to be the process of LaTeX -> HTML -> Word, and so I spent a couple of hours getting everything working on my system.  Since I’m a blogger, naturally I also decided to share the results of my efforts with you.  Thus, a miniature tutorial on getting up and running with latex2html on OS X.

Before you start:

Since you’re here reading about converting LaTeX, I’m going to assume that you already have a functioning LaTeX system;  if not, you’ll want to use fink or MacTeX (see also the TeXShop page on the subject).  If you’re installing LaTeX for the first time, I do recomment MacTeX over fink, which to the best of my knowledge still uses the no-longer-maintained teTeX as its base system.   I will also be assuming that you’re comfortable with Google and the command-line; if you’re not, leave a comment and I may expand upon things for you, but for now I’m going to play to the experienced crowd.

Getting and installing latex2html:

How you choose to do this will depend on what TeX distribution you have installed.  I’ll deal with both fink and MacTex below, and comment upon the i-installer distribution as appropriate.

  • Fink:  If you’ve installed TeX via Fink, this should be fairly easy for you.  Simply type the command sudo fink install latex2html into a Terminal window, and hit enter.  Fink should do the rest for you;  it will ask you about installing a few packages that latex2html depends on, and the default answers should just be fine.Note:  if you installed via the i-installer distribution, Fink may ask about installing tetex-base – if this hapens, look for the option that says “system-tetex”, which indicates a manual installation of tetex.  If this option is not available, you can choose to install teTeX, which may be more hassle-free for this particular case but which will eat up hard-drive space unnecessarily (and I can’t guarantee it won’t cause problems with your existing installation, either), or you can choose to cancel the installation and proceed as for MacTeX below.
  • MacTeX:  latex2html has a number of dependencies, which you will need to make sure are satisfied.  Some will likely have been installed by MacTeX, such as GhostScript and dvips;  the one you may not have is netpbm, which is necessary (if typing pnmtopng<enter> in the Terminal gives you an error saying that the command is not found, you probably don’t have netpbm installed).  Installation can be done from source, downloaded here, or better yet, through fink via sudo fink install netpbm (I tried the DarwinPorts installation, but that did not work with latex2html).Once you have the dependencies taken care of, you will have to install latex2html from source, unless you want to install teTeX as well (see above).  To get it, download latex2html-2008.tar.gz from here, and unzip it to the directory of your choice.  Switch to that directory in a Terminal window, and execute the *nix-standard sequence of:

    make check<enter>
    make test<enter>
    (important – you’ll want to make sure everything went all right above;  if it has, the last line of output should contain a file:// url that you can copy to your browser and see the HTML produced from the test .TeX files).
    sudo make install<enter>

    And you’re done!  If you’ve done the check and the tests and everything has come out okay, then you’re ready to use latex2html.

Using latex2html:

At this point, you’re ready to convert your files from LaTeX to HTML, and then possibly to Word. To invoke latex2html, switch to the directory with your .tex file, and latex2html filename<enter>. In its default state, latex2html will produce HTML that is broken up into multiple pages, usually one per section / subsection, much like the latex2html home page is.  If you want to import your document into Word, you may wish to suppress this tendency.  To do so, use the following command:

latex2html -split 0 -info 0 -no_navigation filename<enter>

-split 0 will make the entire LaTeX file into a single HTML page, while -info 0 will remove the information bar at the bottom of the page and -no_navigation will remove the navigational menus on the top on bottom.  This should produce a vanilla HTML file that Microsoft Word can read fairly easily.

One thing to beware at this point:  as I noticed on this blog post,  Word will link to image files instead of including them in the document, which  will mean that things like your equations will drop out if you send someone the .doc file without sending the image files as well.  To fix this, adopt the procedure mentioned by PD by going to Edit->Links, selecting all of the links in the dialog box, and clicking “Break Link”.  Once that is done, save the file and the images will now be embedded into the document itself, ready for sending off to someone else.

And that’s it!  To get more information about latex2html command line options, use latex2html -h in your Terminal window, and if you have any other questions, I’d be happy to try and help if you leave a comment.  Happy TeXing!


A LaTeX goodie (and one or two for biologists as well).

September 26, 2008

A nice link today:

  • Via the Blog on Latex Matters comes a tip about creating margin comments using the \marginpar{…} tag, which is a built-in way to add margin notes to a document as an alternative to using comments in the code.  Very nice!

I was also poking around CTAN and came across a couple of biology-oriented packages:

  • BioTeX is a project to help expand the use of LaTeX in the biological sciences by bringing high quality packages to said sciences.  On their webpage you will find two packages which might be of use to the molecular biologists out there:  TeXshade, described as “a comprehensive program for displaying, shading and labeling of nucleotide andprotein alignments”, and TeXtopo, “plotting of shaded membrane protein topology data”.  Since I’m working in behavioural ecology right now, I don’t expect to get much use from these myself, but hopefully someone else will find them useful!
  • For those of you with a regular need to typeset biological species names, the Biocon package, available through CTAN, might be of some assistance to you.  It allows you to define commands to refer to each species you reference, giving you the ability to flexibly automate the inclusion of those species names in your document.  For more information on the specifics of how to use this package, take a look at the manual.

Wait – Stephen Harper knows what an RSS feed is? *Really*?

September 24, 2008

I was heading over to Google to check something, and on the Google home page there was a link to see what Canadian politicians are reading with Google Reader.  Being a fan of the big G’s Reader myself, I wanted to see what this was all about.  As you can see yourself, the page purports to be a read-out of the readings lists of some of the more important people in Canadian politics, including the leaders of four Canadian political parties. Now, I’m a pretty credulous guy, but even I’m having trouble swallowing the idea that Stephen Harper or Stéphane Dion are regular users of Google Reader.  Come to that, I’m having a hell of a time convincing myself that Harper even knows that an RSS feed is.  But hey, he’s welcome to subscribe to my blog – I’d be happy to give him some tips on how to fund Canadian science appropriately…

On a side note, the link to Elizabeth May’s Facebook page is kind of entertaining to me;  I had to laugh when I heard on NPR’s Wait Wait … Don’t Tell Me! about her apologizing for never having smoked pot.

Christians + crime = The Great Northern Texas.

September 24, 2008

So, it seems that the University of Alberta Atheists and Agnostics group (were they there when I attended?  Damn it, I missed out!) had a banner defaced by a group of hateful Christians recently (h/t Pharyngula).  Good to see that my alma mater has such tolerant people.


Update:  for those of you coming in from “Paul Lesoway”‘s note on this subject – yes, since you linked to me and clicked on the link, I can see you – I implore you to grow up and perhaps even try to adopt someone else’s viewpoint for a moment;  as one of the commenters on the original post that I linked to above said, what would be your feelings if someone defaced and vandalized a poster advocating a Christian group on campus?  One of the reasons that you go to university is to be exposed to people who don’t necessarily share your beliefs, so this is an opportunity to become a better person.  Don’t blow it.


September 20, 2008

… this is just hilarious.

Opposing views.

September 17, 2008

Via Panda’s Thumb, I came across a website that I didn’t know about, called Opposing Views, which is a regulated debate site where experts weigh in on each side of an issue and can object to each other’s arguments, while the public sits on the sidelines and comments.  I’m a little burnt-out on the whole “go on the internet and convince people” routine, because I’m frustrated at how useless most of these types of conversations are, but this site looks like a good alternative to the standard go-into-a-forum-and-scream-at-each-other thing.  Watching the (verified) experts lay out the case for either side is a good way to get a feeling for how strong the arguments really are;  for example, take a look at the beating that Steven Novella is giving Bill Reddy on acupuncture.  Here’s to the advancement of science!

Obama supporters – spam commenting?

September 16, 2008

Okay, this is a little strange.  I fired off a quick blog post earlier today about CNN and how it bolloxed up reporting its own poll results on a question about Obama and McCain, and I got this as a comment:

I agree ; we have to have Obama or at least the Democrats in power to bring the fisical HOUSE in order. Not only has the Bush/McCain Administration been running unabaited and above all laws but all their special interest are holding out their hands one more time before CHANGE arrives!

This comment, from one “Take It Back”, is a complete nonsequiteur.  If you read the original post *at all*, it’s clear that I’m not talking about either the Obama or McCain campaigns;  even a brief look at the title of the post would have told you that.  It’s really just a coincidence that Obama and McCain showed up in the post at all, since the poll could easily have been about cheese and the post would have been largely the same (if a little lacking in emotional punch).   For crying out loud, I’m not even American!  I’m Canadian!  We’ve got our own election to worry about up here, though I won’t be surprised if Americans don’t know that.

So, I can only assume that a).  the commenter was well-meaning but stupid, or b).  this is a spam comment, possibly automated, based on the keywords of “Obama” and “McCain” in my post.  Has anyone else seen this or anything like it?  A quick google search came up with nothing, but I’m wondering if this is a wide-spread problem or just a fluke here.

(Oh, and just to go on the record, I actually do hope that Obama wins in the United States;  as our large and somewhat bristly neighbour to the south, the election down there does have an unfortunate ripple effect up here.)

image:  david