The Homepage of extract_url.pl

Version 1.3.2 can be downloaded here. This is a Perl script that extracts URLs from correctly-encoded MIME email messages. This can be used either as a pre-parser for urlview, or to replace urlview entirely.

Why?

Urlview is a great program, but has some deficiencies. In particular, it isn't particularly configurable, and cannot handle URLs that have been broken over several lines in format=flowed delsp=yes email messages. Also, urlview doesn't eliminate duplicate URLs. This perl script handles all of that. It also sanitizes URLs so that they can't break out of the command shell.

This is designed primarily for use with the mutt emailer. The idea is that if you want to access a URL in an email, you pipe the email to a URL extractor (like this one) which then lets you select a URL to view in some third program (such as Firefox). An alternative design is to access URLs from within mutt's pager by defining macros and tagging the URLs in the display to indicate which macro to use. A script you can use to do that is tagurl.pl.

Dependencies

Mandatory (these usually come with Perl):

Optional:

How to use it

This perl script expects a valid email to be piped in via STDIN. Its STDOUT can be a pipe into urlview (it will detect this). Here's how you can use it:

        cat message.txt | extract_url.pl
OR
        cat message.txt | extract_url.pl | urlview

For use with mutt 1.4.x, here's a macro you can use:

        macro index,pager \cb "<enter-command> unset pipe_decode<enter><pipe-message>extract_url.pl<enter>" "get URLs"
For use with mutt 1.5.x, here's a more complicated macro you can use:
        macro index,pager \cb "<enter-command> set my_pdsave=\$pipe_decode<enter>\
        <enter-command> unset pipe_decode<enter>\
        <pipe-message>extract_url.pl<enter>\
        <enter-command> set pipe_decode=\$my_pdsave<enter>" "get URLs"

Here's a suggestion for how to handle encrypted email:

        macro index,pager ,b "<enter-command> set my_pdsave=\$pipe_decode<enter>\
        <enter-command> unset pipe_decode<enter>\
        <pipe-message>extract_url.pl<enter>\
        <enter-command> set pipe_decode=\$my_pdsave<enter>" "get URLs"

        macro index,pager ,B "<enter-command> set my_pdsave=\$pipe_decode<enter>\
        <enter-command> set pipe_decode<enter>\
        <pipe-message>extract_url.pl<enter>\
        <enter-command> set pipe_decode=\$my_pdsave<enter>" "decrypt message, then get URLs"

        message-hook .  'macro index,pager \cb ,b "URL viewer"'
        message-hook ~G 'macro index,pager \cb ,B "URL viewer"'

It's not perfect, but it works for me.

Config File

If you're using it with Curses::UI (i.e. as a standalone URL selector), this perl script will try and figure out what command to use based on the contents of your ~/.urlview file. However, it also has its own configuration file (~/.extract_urlview) that will be used instead, if it exists. So far, there are six kinds of lines you can have in this file:

Here is an example config file:
SHORTCUT
COMMAND mozilla-firefox -remote "openURL(%s,new-window)"
HTML_TAGS a,iframe,link

Security

All URLs have dangerous shell characters (namely a single quote and a dollar sign) removed (transformed into percent-encoding) before they are used in a shell. This should eliminate the possibility of a bad URL breaking the shell.

Screenshots

Here's what it looks like for a standard email:
Standard list of URLs

If a URL is too big for your terminal, when you select it, extract_url.pl will (by default) ask you to review it in a way that you can see the whole thing. Here's what that looks like:
Asking for confirmation on long URLs

Changelog