The Homepage of tagurl.pl

Version 1.0 can be downloaded here. This is a Perl script that tags URLs in correctly-encoded MIME email messages. It generates a file that can be sourced by the mutt email client to define numerical macros to access those URLs.

Why?

Urlview is a great program, and my extract_url.pl script works very well, but both require a multiple-step process and neither show you the context of each link as you select it.

Dependencies

Mandatory (these usually come with Perl):

HTML::Parser

How to use it

This perl script expects a valid email to be piped in via STDIN. It can either be used as a prefilter for entries in your ~/.mailcap file, or as a $display_filter. If you use it as a prefilter, that works well with HTML email, but can be a bit annoying for plain text email. However, if you use it as a $display_filter, it can only tag URLs that are displayed as URLs, so that's not a good option for HTML email. Here's how you can test it:

        cat message.html | tagurl.pl -o muttmacros > taggedmessage.html

The file muttmacros should then contain a bunch of mutt-compatible macros for opening the first nine URLs in the HTML message.

Here's an example mailcap entry:

        text/html; cat %s | tagurl.pl -o ~/.muttmacros | elinks -dump -force-html; copiousoutput

Then all you have to do is get mutt to load the macros. You have to make sure that the file has been generated first, so something like <push> is necessary. For example: (see Known Problems)

        message-hook . 'push <enter-command>source ~/.muttmacros<enter>'

Using this script as a display filter is also really easy:

        set display_filter="tagurl.pl -o ~/.muttmacros"

The trick is in using it as a display filter for text/plain messages, but turning it off for text/html messages. Here's one way to do it: (see Known Problems)

                message-hook "~h 'Content-Type: [tT][eE][xX][tT]/[pP][lL][aA][iI][nN]'" \
                        'set display_filter="tagurl.pl -o ~/.muttmacros"'
                message-hook "~h 'Content-Type: [tT][eE][xX][tT]/[hH][tT][mM][lL]'" \
                        'unset display_filter'

The script has several arguments that it will accept:

-o file
This specifies which file to store the mutt macros in. This file is overwritten every time the script is run.
-d digits
This specifies how many digits can be used in macros. By default, only one digit is used, so only the first nine URLs are tagged and have macros generated. However, up to three digits can be used, allowing up to nine-hundred ninety-nine URLs to be tagged and have macros generated (more than that is just silly). In this case, low-value URLs are zero-prefixed, such as "001" for the first one and "032" for the thirty-second one.
-H
This forces the script to interpret the input as HTML, rather than guessing based on the presence of an <html> tag.
-T
This forces the script to interpret the input as plain text, rather than guessing based on the presence of an <html> tag.
-h
Prints usage information (i.e. this).

Config File

You can specify what command to use to view a URL by putting it in the ~/.tagurl file. So far, there are two kinds of lines you can have in this file:

COMMAND ...
This line specifies the command that will be used to view URLs. This command CAN contain a %s, which will be replaced by the URL inside single-quotes. If it does not contain a %s, the URL will simply be appended to the command. If this line is not present, the command is assumed to be "open", which is the correct command for MacOS X systems.
MACROFILE ...
This line specifies where the macros should be stored. This line is overridden by the -o flag. If the flag is not specified and this line doesn't exist in the file, the script assumes ~/.tagurl.macro-output.

Here is an example config file:

COMMAND mozilla-firefox -remote "openURL(%s,new-window)"
MACROFILE ~/.muttmacros

Known Problems

Many messages are mis-labelled, un-labelled, or are simply too complex for the message-hook examples given above. I don't know any really bulletproof way to toggle the display_filter setting, so you're pretty much on your own there. If anyone comes up with a good way, please let me know! (For what it's worth, I currently prefer using the extract_url.pl script for this very reason.)

The message-hook suggested above that uses push to source the muttmacros file hits the very old bug #1365 in mutt. The problem stems from the fact that message-hooks get triggered by the <save-message> command. Thus, anything pushed by a message-hook ends up being entered as the filename for any <save-message>, completely breaking this latter command. The best solution at the moment (that I know of) is to abandon sourcing the tagurl-generated macro file and instead use another external script to open the URLs. The script extracts the nth URL from the tagurl-generated file (where n is passed to the script as an argument), then calls a browser command to open the URL. With that script, instead of the above message-hook, all that's required is to bind the number keys in .muttrc to macros which call this new script, passing the appropriate argument, for example:

macro pager 01 "<shell-escape>tagurl_open 1<enter>"
macro pager 02 "<shell-escape>tagurl_open 2<enter>"
...

The tagurl_open script can be very simple. Here is an example. If tagurl were modified to simply spit out URLs instead of macros, it could be even simpler:

#!/bin/bash
COMMAND="firefox"
MACROFILE="$HOME/.tagurl.macro-output"
url=$(sed -n "$1 {s/.*'\([^']*)'.*/\1/; p}" "$MACROFILE")
$COMMAND "$url"

Security

All URLs have dangerous shell characters (namely a single quote and a dollar sign) removed (transformed into percent-encoding) before they are used in a shell. This should eliminate the possibility of a bad URL breaking the shell.