LaTeX to HTML and Word with latex2html – A mini-tutorial for OS X users.

September 29, 2008

As I’ve mentioned before, I’m a big fan of LaTeX in a field that is not so wild about its use.  In the past few weeks, I’ve had multiple situations where a Word file was requested instead of a PDF, which have left me grumbling and manually hacking my .tex files into Word docs each and every time.  So, this afternoon I decided to take a crack at solving the problem.  After googling around the Intertubes, I decided that my most flexible option was likely to be the process of LaTeX -> HTML -> Word, and so I spent a couple of hours getting everything working on my system.  Since I’m a blogger, naturally I also decided to share the results of my efforts with you.  Thus, a miniature tutorial on getting up and running with latex2html on OS X.

Before you start:

Since you’re here reading about converting LaTeX, I’m going to assume that you already have a functioning LaTeX system;  if not, you’ll want to use fink or MacTeX (see also the TeXShop page on the subject).  If you’re installing LaTeX for the first time, I do recomment MacTeX over fink, which to the best of my knowledge still uses the no-longer-maintained teTeX as its base system.   I will also be assuming that you’re comfortable with Google and the command-line; if you’re not, leave a comment and I may expand upon things for you, but for now I’m going to play to the experienced crowd.

Getting and installing latex2html:

How you choose to do this will depend on what TeX distribution you have installed.  I’ll deal with both fink and MacTex below, and comment upon the i-installer distribution as appropriate.

  • Fink:  If you’ve installed TeX via Fink, this should be fairly easy for you.  Simply type the command sudo fink install latex2html into a Terminal window, and hit enter.  Fink should do the rest for you;  it will ask you about installing a few packages that latex2html depends on, and the default answers should just be fine.Note:  if you installed via the i-installer distribution, Fink may ask about installing tetex-base – if this hapens, look for the option that says “system-tetex”, which indicates a manual installation of tetex.  If this option is not available, you can choose to install teTeX, which may be more hassle-free for this particular case but which will eat up hard-drive space unnecessarily (and I can’t guarantee it won’t cause problems with your existing installation, either), or you can choose to cancel the installation and proceed as for MacTeX below.
  • MacTeX:  latex2html has a number of dependencies, which you will need to make sure are satisfied.  Some will likely have been installed by MacTeX, such as GhostScript and dvips;  the one you may not have is netpbm, which is necessary (if typing pnmtopng<enter> in the Terminal gives you an error saying that the command is not found, you probably don’t have netpbm installed).  Installation can be done from source, downloaded here, or better yet, through fink via sudo fink install netpbm (I tried the DarwinPorts installation, but that did not work with latex2html).Once you have the dependencies taken care of, you will have to install latex2html from source, unless you want to install teTeX as well (see above).  To get it, download latex2html-2008.tar.gz from here, and unzip it to the directory of your choice.  Switch to that directory in a Terminal window, and execute the *nix-standard sequence of:

    ./configure<enter>
    make<enter>
    make check<enter>
    make test<enter>
    (important – you’ll want to make sure everything went all right above;  if it has, the last line of output should contain a file:// url that you can copy to your browser and see the HTML produced from the test .TeX files).
    sudo make install<enter>

    And you’re done!  If you’ve done the check and the tests and everything has come out okay, then you’re ready to use latex2html.

Using latex2html:

At this point, you’re ready to convert your files from LaTeX to HTML, and then possibly to Word. To invoke latex2html, switch to the directory with your .tex file, and latex2html filename<enter>. In its default state, latex2html will produce HTML that is broken up into multiple pages, usually one per section / subsection, much like the latex2html home page is.  If you want to import your document into Word, you may wish to suppress this tendency.  To do so, use the following command:

latex2html -split 0 -info 0 -no_navigation filename<enter>

-split 0 will make the entire LaTeX file into a single HTML page, while -info 0 will remove the information bar at the bottom of the page and -no_navigation will remove the navigational menus on the top on bottom.  This should produce a vanilla HTML file that Microsoft Word can read fairly easily.

One thing to beware at this point:  as I noticed on this blog post,  Word will link to image files instead of including them in the document, which  will mean that things like your equations will drop out if you send someone the .doc file without sending the image files as well.  To fix this, adopt the procedure mentioned by PD by going to Edit->Links, selecting all of the links in the dialog box, and clicking “Break Link”.  Once that is done, save the file and the images will now be embedded into the document itself, ready for sending off to someone else.

And that’s it!  To get more information about latex2html command line options, use latex2html -h in your Terminal window, and if you have any other questions, I’d be happy to try and help if you leave a comment.  Happy TeXing!

Advertisement

A LaTeX goodie (and one or two for biologists as well).

September 26, 2008

A nice link today:

  • Via the Blog on Latex Matters comes a tip about creating margin comments using the \marginpar{…} tag, which is a built-in way to add margin notes to a document as an alternative to using comments in the code.  Very nice!

I was also poking around CTAN and came across a couple of biology-oriented packages:

  • BioTeX is a project to help expand the use of LaTeX in the biological sciences by bringing high quality packages to said sciences.  On their webpage you will find two packages which might be of use to the molecular biologists out there:  TeXshade, described as “a comprehensive program for displaying, shading and labeling of nucleotide andprotein alignments”, and TeXtopo, “plotting of shaded membrane protein topology data”.  Since I’m working in behavioural ecology right now, I don’t expect to get much use from these myself, but hopefully someone else will find them useful!
  • For those of you with a regular need to typeset biological species names, the Biocon package, available through CTAN, might be of some assistance to you.  It allows you to define commands to refer to each species you reference, giving you the ability to flexibly automate the inclusion of those species names in your document.  For more information on the specifics of how to use this package, take a look at the manual.

Wait – Stephen Harper knows what an RSS feed is? *Really*?

September 24, 2008

I was heading over to Google to check something, and on the Google home page there was a link to see what Canadian politicians are reading with Google Reader.  Being a fan of the big G’s Reader myself, I wanted to see what this was all about.  As you can see yourself, the page purports to be a read-out of the readings lists of some of the more important people in Canadian politics, including the leaders of four Canadian political parties. Now, I’m a pretty credulous guy, but even I’m having trouble swallowing the idea that Stephen Harper or Stéphane Dion are regular users of Google Reader.  Come to that, I’m having a hell of a time convincing myself that Harper even knows that an RSS feed is.  But hey, he’s welcome to subscribe to my blog – I’d be happy to give him some tips on how to fund Canadian science appropriately…

On a side note, the link to Elizabeth May’s Facebook page is kind of entertaining to me;  I had to laugh when I heard on NPR’s Wait Wait … Don’t Tell Me! about her apologizing for never having smoked pot.


Christians + crime = The Great Northern Texas.

September 24, 2008

So, it seems that the University of Alberta Atheists and Agnostics group (were they there when I attended?  Damn it, I missed out!) had a banner defaced by a group of hateful Christians recently (h/t Pharyngula).  Good to see that my alma mater has such tolerant people.

<sigh>

Update:  for those of you coming in from “Paul Lesoway”‘s note on this subject – yes, since you linked to me and clicked on the link, I can see you – I implore you to grow up and perhaps even try to adopt someone else’s viewpoint for a moment;  as one of the commenters on the original post that I linked to above said, what would be your feelings if someone defaced and vandalized a poster advocating a Christian group on campus?  One of the reasons that you go to university is to be exposed to people who don’t necessarily share your beliefs, so this is an opportunity to become a better person.  Don’t blow it.


And…

September 20, 2008

… this is just hilarious.


Opposing views.

September 17, 2008

Via Panda’s Thumb, I came across a website that I didn’t know about, called Opposing Views, which is a regulated debate site where experts weigh in on each side of an issue and can object to each other’s arguments, while the public sits on the sidelines and comments.  I’m a little burnt-out on the whole “go on the internet and convince people” routine, because I’m frustrated at how useless most of these types of conversations are, but this site looks like a good alternative to the standard go-into-a-forum-and-scream-at-each-other thing.  Watching the (verified) experts lay out the case for either side is a good way to get a feeling for how strong the arguments really are;  for example, take a look at the beating that Steven Novella is giving Bill Reddy on acupuncture.  Here’s to the advancement of science!


Obama supporters – spam commenting?

September 16, 2008

Okay, this is a little strange.  I fired off a quick blog post earlier today about CNN and how it bolloxed up reporting its own poll results on a question about Obama and McCain, and I got this as a comment:

I agree ; we have to have Obama or at least the Democrats in power to bring the fisical HOUSE in order. Not only has the Bush/McCain Administration been running unabaited and above all laws but all their special interest are holding out their hands one more time before CHANGE arrives!

This comment, from one “Take It Back”, is a complete nonsequiteur.  If you read the original post *at all*, it’s clear that I’m not talking about either the Obama or McCain campaigns;  even a brief look at the title of the post would have told you that.  It’s really just a coincidence that Obama and McCain showed up in the post at all, since the poll could easily have been about cheese and the post would have been largely the same (if a little lacking in emotional punch).   For crying out loud, I’m not even American!  I’m Canadian!  We’ve got our own election to worry about up here, though I won’t be surprised if Americans don’t know that.

So, I can only assume that a).  the commenter was well-meaning but stupid, or b).  this is a spam comment, possibly automated, based on the keywords of “Obama” and “McCain” in my post.  Has anyone else seen this or anything like it?  A quick google search came up with nothing, but I’m wondering if this is a wide-spread problem or just a fluke here.

(Oh, and just to go on the record, I actually do hope that Obama wins in the United States;  as our large and somewhat bristly neighbour to the south, the election down there does have an unfortunate ripple effect up here.)

image:  david


CNN – dropping the ball on stats is our game!

September 15, 2008

Bonus question from today’s coverage on CNN of the American presidential candidates’ responses to the melt-down on Wall Street today:

If economic problems continue to dominate the headlines, they could help Obama in the tight race for the White House, the recent CNN/Opinion Research poll suggested.

According to the poll, Obama was viewed as being better at handling economic issues by 52 percent of the voters surveyed. In comparison, McCain was viewed as better on economic issues by 44 percent. The margin of error on that question was plus or minus 4.5 percentage points.

Take a good look, and decide for yourself what the problem is with the conclusion from those two paragraphs of text.

[ … ]

[ … ]

[ … ]

[ … ]

No dice?  Here, I’ll play it again, with some boldface on the font to help out:

If economic problems continue to dominate the headlines, they could help Obama in the tight race for the White House, the recent CNN/Opinion Research poll suggested.

According to the poll, Obama was viewed as being better at handling economic issues by 52 percent of the voters surveyed. In comparison, McCain was viewed as better on economic issues by 44 percent. The margin of error on that question was plus or minus 4.5 percentage points.

Got it yet?  Let’s assume the percentage plus or minus the margin of error is the 95% confidence interval on the true proportion of Americans who would rate Obama or McCain as being better on economic issues.  The 95% confidence interval is the range of values within which we are 95% sure that true population value lies, and to get that range, you take the state proportion and add or subtract the margin of error.  For Obama, that would 52 +/- 4.5 = {47.5,56.5}.  For McCain, that would be 44 +/- 4.5 = {39.5,48.5}.  I could blather on about this for a while, but the short story is this:  based on their own numbers, CNN is saying that from the total American population, it could easily be the case that 47.5% think Obama is better on the economy and 48.5% think that McCain is better (the remaining 4% would be “neither” or what have you).  In other words, it’s a distinct possibility that Americans think that McCain is better than Obama on the economy, not the other way around! This means that the poll is not supporting the conclusion drawn in the previous paragraph, and someone around there should get a whack upside the head.

(Note: if you assume that there are only two poll choices – “Obama” or “McCain” but not “neither” or “no response”, or what have you – it can be more informative to look at the difference between the percentages and see if that is statistically greater than 0.  A few numbers on the back of an envelope is more than enough to show that p >> 0.25 here.)


A study in contrasts.

September 14, 2008

Observe:

Whinging hand-wringing:

Not all Montreal island residents are keen on the idea of island-wide wireless Internet access.

Megan Durnford, a Westmount writer and filmmaker, unplugged her own wireless router last spring after learning at a conference about potential health risks associated with exposure to radiofrequency radiation.

Durnford did some research on the issue and now she is trying to persuade her neighbours to unplug their routers, too. She says she will fight any move by Montreal authorities to expand the existing wireless network.

And now, let’s see the other side, shall we?  Ms. Durnford seems to think that she knows more than just about everybody in the world, but she’s not the first to ask this question:

Some common sense from a BBC news article on the effect of wi-fi in a public school:

Medical physics expert Professor Malcolm Sperrin told BBC News that the fact wi-fi radiation in a particular school was three times higher than a mobile phone mast was irrelevant, unless there was any evidence of a link to health effects.

“Wi-fi is a technique using very low intensity radio waves. Whilst similar in wavelength to domestic microwave radiation, the intensity of wi-fi radiation is 100,000 times less than that of a domestic microwave oven.

“Furthermore, tissue can only be effectively heated by a wavelength that is closely matched to the absorption, and there are strict guidelines for ensuring such absorption peaks are avoided.”

The type of radiation emitted by radio waves (wi-fi), visible light, microwaves and mobile phones has been shown to raise the temperature of tissue at very high levels of exposure – called a thermal interaction – but there is no evidence that low levels cause damage.

The Health Protection Agency has said that sitting in a wi-fi hotspot for a year results in receiving the same dose of radio waves as making a 20-minute mobile phone call.

Say it with me, people:  radio-wave (wi-fi) radiation is non-ionizing radiation.  This is the same type of radiation that visible light is composed of, and I haven’t seen anyone complaining about getting cancer from the colour red ( …. and now, somewhere on the InterTubes, a fruitcake registers redcausescancer.com …. ).  As the article says, radio wave energy can cause some thermal effects, but this has never been proven to have an effect on human tissue at the levels that you can experience from a wi-fi network. I would sorely love to see what “research” Ms. Durnford found to substantiate any of her claims, because I’ve never seen a single well-formed study to back her up.  And if you look at the mobile phone research, which as we saw above leads to much higher doses of RF than wi-fi ever will, the evidence is overwhelmingly negative for a link between mobile RF and cancer;  the Wikipedia article on this has a fair round-up of some of the large studies that have been done here, and if you’d like a basic course in the science and associated bull-@#$! that has gone on in this field, take a look at Orac’s analysis of a news event some time ago at Respectful Insolence.

And if you’re one of those people who claims to have “electromagnetic hypersensitivity”, I have one thing to say to you:  nocebo effect.  This is the reverse of the placebo effect, where a patient’s bad expectations about a treatment or drug cause those bad effects to occur, even if the patient receives nothing but a sham treatment or inert drug.  And until you can pony up some actual evidence beyond “I bought me one of them wi-fi routing thingies, and now my beloved cat Pookie won’t stop peeing on the carpet”, I’m going to have to say “nonsense”.

[ Yeah, I just portrayed wi-fi alarmists as rednecks.  Hey, I’m from Alberta – it’s my right to label idiots as rednecks. 🙂 ]


The lights are still on.

September 7, 2008

I know that things have been quiet around here for a little while.  This does not mean that I have abandoned the blog;  rather, it means that life has gotten pretty hectic.  With the start of the new academic year, I’ve been refocusing on my Ph.D. research and so free time has been pretty scarce.  I’ve also been thinking about the direction of this blog and the things that I want to focus on here, and I will be continuing to do so as I return to a regular posting schedule.

Long story short:  expect to see new stuff in this space soon!