March 12, 2008

Sorting Spaces

There seems to be some disagreement, at Apple Computer, about exactly what the definition of the word “ignore” is. From the “sort” man page:

-d Sort in `phone directory’ order: ignore all characters except letters, digits and blanks when sorting.

What does that suggest to you? Well, let’s compare it to the GNU “sort” man page:

-d, —dictionary-order
consider only blanks and alphanumeric characters

So you’d THINK, right, that sorting with these two options would be equivalent, right?


Here’s a simple list:

- foo
- foo

How should these things be sorted when the -d option is in effect? You’ve got a conundrum: is a space sorted BEFORE a number or AFTER a number?

Curse you, alphabet! You’re never around when I need you!

And, of course, BSD and GNU answer that question differently. On GNU, the answer is AFTER, on BSD the answer is BEFORE! Oh goody.

Here’s a better way if you need the sorting results to be the same on both BSD and GNU: replace all spaces with something else non-alpha-numeric that isn’t used in the file (such as an underscore, or an ellipsis, or an em-dash). Then sort with -ds (no last-minute saving throws!), then replace the underscore (or whatever) with a space again.

And if you need it to be consistent on OSX platforms too, make it a -dfs sort (so that capitals and lower-case are considered the same).

March 13, 2008

w3m and MacPorts

For whatever reason, w3m refuses to build on my Intel OSX box with the latest boehmgc library. To get it to build, you must forcibly downgrade to boehmgc 6.8 or 6.7 or something earlier.

Also, I noticed that w3m isn’t marked as depending on gdk-pixbuf. Strictly speaking, it doesn’t, but it does if you have --enable-image=x11. :P Add this to your Portfile:

depends_lib lib:libgccpp.1:boehmgc bin:gdk-pixbuf-config:gdk-pixbuf

Also, it seems that either w3m or gdk-pixbuf-config appends an extra library to the config line for gdk-pixbuf-config (essentially, they specify -lgdk_pixbuf AND -lgdk_pixbuf_xlib). That extra library causes build problems for w3m; you can fix it by editing /opt/local/bin/gdk-pixbuf-config and removing the -lgdk_pixbuf from what it prints out (however, if you use other software that uses gdk-pixbuf-config, you may need to put it back once w3m has finished building).

My Bashrc

There are few things that, over my time using Unix-like systems, I have put more cumulative effort into than into my configuration files. I've been tweaking them since the day I discovered them, attempting to make my environment more and more to my liking. I have posted them on my other website (here), but it occurred to me that they've gotten sufficiently hoary and complex that a walkthrough might help someone other than myself.

Anyway, my bashrc is first on the list. (Or, if you like, the pure text version.)

The file is divided into several (kinda fuzzy) sections:
- Initialization & Other Setup
- Useful Functions
- Loading System-wide Bashrc
- Behavioral Settings
- Environment Variables
- Character Set Detection
- Aliases
- Tab-completion Options
- Machine-local settings
- Auto-logout

Let's take them one at a time.

Initialization & Other Setup

Throughout my bashrc, I use a function I define here ( dprint ) to allow me to quickly turn on debugging information, which includes printing the seconds-since-bash-started variable ( SECONDS ) in case something is taking too long and you want to find the culprit. Yes, my bashrc has a debug mode. This is essentially controlled by the KBWDEBUG environment variable. Then, because this has come in useful once or twice, I allow myself to optionally create a ~/.bashrc.local.preload file which is sourced now, before anything else. Here's the code:


function dprint {
if [[ "$KBWDEBUG" == "yes" && "$-" == *i* ]]; then
    #date "+%H:%M:%S $*"
    echo $SECONDS $*
dprint alive
if [ -r "${HOME}/.bashrc.local.preload" ]; then
    dprint "Loading bashrc preload"
    source "${HOME}/.bashrc.local.preload"

Useful Functions

This section started with some simple functions for PATH manipulation. Then those functions got a little more complicated, then I wanted some extra functions for keeping track of my config files (which were now in CVS), and then they got more complicated...

You'll notice something about these functions. Bash (these days) will accept function declarations in this form:

function fname()
    do stuff

But that wasn't always the case. To maintain compatability with older bash versions, I avoid using the uselessly cosmetic parens and I make sure that the curly-braces are on the same line, like so:

function fname \
    do stuff

Anyway, the path manipulation functions are pretty typical — they're similar to the ones that Fink uses, but slightly more elegant. The idea is based on these rules of PATH variables:

  1. Paths must not have duplicate entries
  2. Paths are faster if they don't have symlinks in them
  3. Paths must not have "." in them
  4. All entries in a path must exist (usually)

There are two basic path manipulation functions: add_to_path and add_to_path_first. They do predictable things — the former appends something to a given path variable (e.g. PATH or MANPATH or LD_LIBRARY_PATH ) unless it's already in that path, and the latter function prepends something to the given PATH variable (or, if it's already in there, moves it to the beginning). Before they add a value to a path, they first check it to make sure it exists, is readable, that I can execute things that are inside it, and they resolve any symlinks in that path (more on that in a moment). Here's the code (ignore the reference to add_to_path_force in add_to_path for now; I'll explain shortly):

function add_to_path \
    local folder="${2%%/}"
    [ -d "$folder" -a -x "$folder" ] || return
    folder=`( cd "$folder" ; \pwd -P )`
    add_to_path_force "$1" "$folder"

function add_to_path_first \
    local folder="${2%%/}"
    [ -d "$folder" -a -x "$folder" ] || return
    folder=`( cd "$folder" ; \pwd -P )`
    # in the middle, move to front
    if eval '[[' -z "\"\${$1##*:$folder:*}\"" ']]'; then
        eval "$1=\"$folder:\${$1//:\$folder:/:}\""
        # at the end
    elif eval '[[' -z "\"\${$1%%*:\$folder}\"" ']]'; then
        eval "$1=\"$folder:\${$1%%:\$folder}\""
        # no path
    elif eval '[[' -z "\"\$$1\"" ']]'; then
        eval "$1=\"$folder\""
        # not in the path
    elif ! eval '[[' -z "\"\${$1##\$folder:*}\"" '||' \
      "\"\$$1\"" '==' "\"$folder\"" ']]'; then
        eval "export $1=\"$folder:\$$1\""

Then, because I was often logging into big multi-user Unix systems (particularly Solaris systems) with really UGLY PATH settings that had duplicate entries, often included ".", not to mention directories that either didn't exist or that I didn't have sufficient permissions to read, I added the function verify_path. All this function does is separates a path variable into its component pieces, eliminates ".", and then reconstructs the path using add_to_path, which handily takes care of duplicate and inaccessible entries. Here's that function:

function verify_path \
    # separating cmd out is stupid, but is compatible
    # with older, buggy, bash versions (2.05b.0(1)-release)
    local cmd="echo \$$1"
    local arg="`eval $cmd`"
    eval "$1=\"\""
    while [[ $arg == *:* ]] ; do
        if [ "$dir" != "." -a -d "$dir" -a \
          -x "$dir" -a -r "$dir" ] ; then
            dir=`( \cd "$dir" ; \pwd -P )`
            add_to_path "$1" "$dir"
    if [ "$arg" != "." -a -d "$arg" -a -x "$arg" -a -r "$arg" ] ;
        arg=`( cd "$arg" ; \pwd -P )`
        add_to_path "$1" "$arg"

Finally, I discovered XFILESEARCHPATH — a path variable that requires a strange sort of markup (it's for defining where your app-defaults files are for X applications). This wouldn't work for add_to_path, so I created add_to_path_force that still did duplicate checking but didn't do any verification of the things added to the path.

function add_to_path_force \
    if eval '[[' -z "\$$1" ']]'; then
        eval "export $1='$2'"
    elif ! eval '[[' \
        -z "\"\${$1##*:\$2:*}\"" '||' \
        -z "\"\${$1%%*:\$2}\"" '||' \
        -z "\"\${$1##\$2:*}\"" '||' \
        "\"\${$1}\"" '==' "\"$2\"" ']]'; then
        eval "export $1=\"\$$1:$2\""

I mentioned that I resolved symlinks before adding directories to path variables. This is a neat trick I discovered due to the existence of pwd -P and subshells. pwd -P will return the "real" path to the folder you're in, with all symlinks resolved. And it does so very efficiently (without actually resolving symlinks — it just follows all the ".." records). Since you can change directories in a subshell (i.e. between parentheses) without affecting the parent shell, a quick way to transform a folder's path into a resolved path is this: ( \cd "$folder"; pwd -P). I put the backslash in there to use the shell's builtin cd, just in case I'd somehow lost my mind and aliased cd to something else.

And then, just because it was convenient, I added another function: have, which detects whether a binary is accessible or not:

function have { type "$1" &>/dev/null ; }

Then I had to confront file paths, such as the MAILCAP variable. A lot of the same logic (i.e. add_to_path_force), but entry validation is different:

function add_to_path_file \
    local file="${2}"
    [ -f "$file" -a -r "$file" ] || return
    # realpath alias may not be set up yet
    file=`realpath_func "$file"`
    add_to_path_force "$1" "$file"

You'll note the realpath_func line in there. realpath is a program that takes a filename or directory name and resolves the symlinks in it. Unfortunately, realpath is a slightly unusual program; I've only ever found it on OSX (it may be on other BSDs). But, with the power of my pwd -P trick, I can fake most of it. The last little piece (resolving a file symlink) relies on a tool called readlink ... but I can fake that too. Here are the two functions:

function readlink_func \
    if have readlink ; then
        readlink "$1"
    #elif have perl ; then # seems slower than alternative
    #    perl -e 'print readlink("'"$1"'") . "\n"'
        \ls -l "$1" | sed 's/[^>]*-> //'

function realpath_func \
    local input="${1}"
    local output="/"
    if [ -d "$input" -a -x "$input" ] ; then
        # All too easy...
        output=`( cd "$input"; \pwd -P )`
        # sane-itize the input to the containing folder
        local fname="${input##*/}"
        if [ ! -d "$input" -o ! -x "$input" ] ; then
            echo "$input is not an accessible directory" >&2
        output="`( cd "$input" ; \pwd -P )`/"
        # output is now the realpath of the containing folder
        # so all we have to do is handle the fname (aka "input)
        if [ ! -L "$output$input" ] ; then
            input="`readlink_func "$output$input"`"
            while [ "$input" ] ; do
                if [[ $input == /* ]] ; then
                elif [[ $input == ../* ]] ; then
                elif [[ $input == ./* ]] ; then
                elif [[ $input == */* ]] ; then
                if [ -L "${output%%/}" ] ; then
                    if [ "$input" ] ; then
                        input="`readlink_func "${output%%/}"`/$input"
                        input="`readlink_func "${output%%/}"`"
    echo "${output%%/}"

Loading System-wide Bashrc

This section isn't too exciting. According to the man page:

When bash is invoked as an interactive login shell, or as a non-interactive shell with the --login option, it first reads and executes commands from the file /etc/profile, if that file exists. After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable.

SOME systems have a version of bash that appears not to obey this rule. And some systems put crucial configuration settings in /etc/bashrc (why?!?). And some systems even do something silly like use /etc/bashrc to source ~/.bashrc (I did this myself, once upon a time, when I knew not-so-much). I've decided that this behavior cannot be relied upon, so I explicitly source these files myself. The only interesting bit is that I added a workaround so that systems that use /etc/bashrc to source ~/.bashrc won't get into an infinite loop. There's probably a lot more potential trouble here that I'm ignoring. But here's the code:

if [[ -r /etc/bashrc && $SYSTEM_BASHRC != 1 ]]; then
    dprint " - loading /etc/bashrc"
    . /etc/bashrc
    export SYSTEM_BASHRC=1

Behavioral Settings

This is basic stuff, but after you get used to certain behaviors (such as whether * matches . and ..), you often get surprised when they don't work that way on other systems. Some of this is because I found a system that did it another way by default; some is because I decided I like my defaults and I don't want to be surprised in the future.

The interactive-shell-detection here is nice. $- is a variable set by bash containing a set of letters indicating certain settings. It always contains the letter i if bash is running interactively. So far, this has been quite backwards-compatible.

shopt -s extglob # Fancy patterns, e.g. +()
# only interactive
if [[ $- == *i* ]]; then
    dprint setting the really spiffy stuff
    shopt -s checkwinsize # don't get confused by resizing
    shopt -s checkhash # if hash is broken, doublecheck it
    shopt -s cdspell # be tolerant of cd spelling mistakes

Environment Variables

There are a slew of standard environment variables that bash defines for you (such as HOSTNAME). There are even more standard environment variables that various programs pay attention to (such as EDITOR and PAGER). And there are a few others that are program-specific (such as PARINIT and CVSROOT).

Before I get going, though, let me show you a secret. Ssh doesn't like transmitting information from client to server shell... the only reliable way to do it that I've found is the TERM variable. So... I smuggle info through that way, delimited by colons. Before I set any other environment variables, first, I find my smuggled information:

if [[ $TERM == *:* && ( $SSH_CLIENT || $SSH_TTY || $SSH_CLIENT2 ) ]] ; then
    dprint "Smuggled information through the TERM variable!"
    term_smuggling=( ${TERM//:/ } )
    export SSH_LANG=${term_smuggling[1]}
    unset term_smuggling

I begin by setting GROUPNAME and USER in a standard way:

if [[ $OSTYPE == solaris* ]] ; then
    idout=(`/bin/id -a`)
    [[ $USER == ${idout[0]} ]] && USER="UnknownUser"
    unset idout
    [[ -z $GROUPNAME ]] && GROUPNAME="`id -gn`"
    [[ -z $USER ]] && USER="`id -un`"

Then some standard things (MAILPATH is used by bash to check for mail, that kind of thing), including creating OS_VER and HOST to allow me to identify the system I'm running on:

# I tote my own terminfo files around with me
[ -d ~/.terminfo ] && export TERMINFO=~/.terminfo/
[ "$TERM_PROGRAM" == "Apple_Terminal" ] && \
    export TERM=nsterm-16color

add_to_path_file MAILPATH /var/spool/mail/$USER
add_to_path MAILPATH $HOME/Maildir/
[[ -z $MAILPATH ]] && unset MAILCHECK
[[ -z $HOSTNAME ]] && \
    export HOSTNAME=`/bin/hostname` && echo 'Fake Bash!'
[ -z "$OS_VER" ] && OS_VER=$( uname -r )
OS_VER=(${OS_VER//./ })
PARINIT="rTbgq B=.,?_A_a P=_s Q=>|}+"


I've also gotten myself into trouble in the past with UMASK being set improperly, so it's worth setting manually. Additionally, to head off trouble, I make it hard to leave myself logged in as root on other people's systems accidentally:

if [[ $GROUPNAME == $USER && $UID -gt 99 ]]; then
    umask 002
    umask 022

if [[ $USER == root ]] ; then
    [[ $SSH_CLIENT || $SSH_TTY || $SSH_CLIENT2  ]] && \
        export TMOUT=600 || export TMOUT=3600

if [[ -z $INPUTRC && ! -r $HOME/.inputrc && -r /etc/inputrc ]];
    export INPUTRC=/etc/inputrc

It is at this point that we should pause and load anything that was in /etc/profile, just in case it was left out (and, if its in there, maybe it should override what we've done so far):


if [[ -r /etc/profile && -z $SYSTEM_PROFILE ]]; then
    dprint "- loading /etc/profile ... "
    . /etc/profile
    export SYSTEM_PROFILE=1

Now I set my prompt (but only if this is an interactive shell). There are several details here (obviously). The first is that, if I'm logged into another system, I want to see how long I've been idle. So I include a timestamp whenever I'm logged into a remote system. I also added color to my prompt in two ways, which has been very useful. First, it changes the color of the $ at the end of the prompt to red if the last command didn't exit cleanly. Second, remote systems have yellow prompts, whenever I'm root I have a red prompt, and I created commands to flip between a few other colors (blue, purple, cyan, green, etc.) in case I find that useful to quickly distinguish between terminals. Anyway, here's the code:

if [[ $- == *i* ]]; then
    if [[ $TERM == xterm* || $OSTYPE == darwin* ]]; then
        # This puts the term information into the title
        PSterminfo='\[\e]2;\u@\h: \w\a\]'
    PSparts[3]='(\d \T)\n'
    PSparts[2]='[\u@\h \W]'
    PSparts[1]='\$ '
    PScolors[2]='\[\e[34m\]' # Blue
    PScolors[3]='\[\e[35m\]' # Purple
    PScolors[4]='\[\e[36m\]' # Cyan
    PScolors[5]='\[\e[32m\]' # Green
    PScolors[6]='\[\e[33m\]' # Yellow
    PScolors[100]='\[\e[31m\]' # Badc
    PScolors[0]='\[\e[0m\]' # Reset
    if [[ $USER == root ]] ; then
        PScolors[1]='\[\e[31m\]' # Red
    elif [[ $SSH_CLIENT || $SSH_TTY || $SSH_CLIENT2 ]] ; then
        PScolors[1]="${PScolors[6]}" # yellow
        if [[ $HOSTNAME == marvin ]] ; then
            PScolors[1]="${PScolors[5]}" # green
        unset PSparts[3]
    function bashrc_genps {
        if [ "$1" -a "${PScolors[$1]}" ] ; then
    bashrc_genps 1
    function safeprompt {
        export PS1='{\u@\h \W}\$ '
        unset PROMPT_COMMAND
    alias stdprompt='bashrc_genps 1'
    alias blueprompt='bashrc_genps 2'
    alias purpleprompt='bashrc_genps 3'
    alias cyanprompt='bashrc_genps 4'
    alias greenprompt='bashrc_genps 5'
    alias whiteprompt='bashrc_genps'
    # this is executed before every prompt is displayed
    # it changes the prompt based on the preceeding command
    export PROMPT_COMMAND='[ $? = 0 ] && PS1=$PSgood || PS1=$PSbad'

Now I set up the various paths. Note that it doesn't matter if these paths don't exist; they'll be checked and ignored if they don't exist:

verify_path PATH
add_to_path PATH "/usr/local/sbin"
add_to_path PATH "/usr/local/teTeX/bin"
add_to_path PATH "/usr/X11R6/bin"
add_to_path PATH "$HOME/bin"
add_to_path_first PATH "/sbin"

add_to_path_first PATH "/bin"
add_to_path_first PATH "/usr/sbin"
add_to_path_first PATH "/opt/local/bin"
add_to_path_first PATH "/usr/local/bin"

if [[ $OSTYPE == darwin* ]] ; then
    add_to_path PATH "$HOME/.conf/darwincmds"

    # The XFILESEARCHPATH (for app-defaults and such)
    # is a wonky kind of path
    [ -d /opt/local/lib/X11/app-defaults/ ] && \
        add_to_path_force XFILESEARCHPATH \
    [ -d /sw/etc/app-defaults/ ] && \
        add_to_path_force XFILESEARCHPATH /sw/etc/%T/%N
    add_to_path_force XFILESEARCHPATH /private/etc/X11/%T/%N

verify_path MANPATH
add_to_path MANPATH "/usr/man"
add_to_path MANPATH "/usr/share/man"
add_to_path MANPATH "/usr/X11R6/man"
add_to_path_first MANPATH "/opt/local/share/man"
add_to_path_first MANPATH "/opt/local/man"
add_to_path_first MANPATH "/usr/local/man"
add_to_path_first MANPATH "/usr/local/share/man"

verify_path INFOPATH
add_to_path INFOPATH "/usr/share/info"
add_to_path INFOPATH "/opt/local/share/info"

And now there are STILL MORE environment variables to set. This final group may rely on some of the previous paths being set (most notably, PATH).

export PAGER='less'
have vim && export EDITOR='vim' || export EDITOR='vi'
if [[ -z $DISPLAY && $OSTYPE == darwin* ]]; then
    processes=`ps ax`
    # there are double-equals here, even though they don't show
    # on the webpage
    if [[ $processes == *xinit* || $processes == *quartz-wm* ]]; then
        export DISPLAY=:0
        unset DISPLAY
if [[ $HOSTNAME == wizard ]] ; then
    dprint Wizards X forwarding is broken
    unset DISPLAY
export TZ="US/Central"
if [ "${BASH_VERSINFO[0]}" -le 2 ]; then
    export HISTCONTROL=ignoreboth
    export HISTCONTROL="ignorespace:erasedups"
export HISTIGNORE="&:ls:[bf]g:exit"
export GLOBIGNORE=".:.."
export CVSROOT=kyle@cvs.memoryhole.net:/home/kyle/cvsroot
export CVS_RSH=ssh
export BASH_ENV=$HOME/.bashrc
add_to_path_file MAILCAPS $HOME/.mailcap
add_to_path_file MAILCAPS /etc/mailcap
add_to_path_file MAILCAPS /usr/etc/mailcap
add_to_path_file MAILCAPS /usr/local/etc/mailcap
export EMAIL='kyle-envariable@memoryhole.net'
export GPG_TTY=$TTY
export RSYNC_RSH="ssh -2 -c arcfour -o Compression=no -x"
if [ -d /opt/local/include -a -d /opt/local/lib ] ; then
    export CPPFLAGS="-I/opt/local/include $CPPFLAGS"
    export LDFLAGS="-L/opt/local/lib $LDFLAGS"
if have glibtoolize ; then
    have libtoolize || export LIBTOOLIZE=glibtoolize

One little detail that I rather like is the fact that xterm's window title often tells me exactly what user I am on what machine I am, particularly when I'm ssh'd into another host. This little bit of code ensures that this happens:

if [[ $TERM == xterm* || $OSTYPE == darwin* ]]; then
    export PROMPT_COMMAND='echo -ne "\033]0;${USER}@${HOSTNAME/.*/}: ${PWD/${HOME}/~}\007"'

Character Set Detection

I typically work in a UTF-8 environment. MacOS X (my preferred platform for day-to-day stuff) has made this pretty easy with really excellent UTF-8 support, and Linux has come a long way (to parity, as far as I can tell) in the last few years. Most of my computing is done via a uxterm (aka. xterm with UTF-8 capability turned on), but I also occasionally dabble in other terminals (sometimes without realizing it). Despite the progress made, however, not all systems support UTF-8, and neither do all terminals. Some systems, including certain servers I've used, simply don't have UTF-8 support installed, even though they're quite capable of it.

The idea is that the LANG environment variable is supposed to reflect the language you want to use and character set your terminal can display. So, this is where I try and figure out what LANG should be.

The nifty xprop trick here is from a vim hint I found. I haven't used it for very long, but so far it seems to be a really slick way of finding out what sort of environment your term is doing, even if it hasn't set the right environment variables (e.g. LANG).

One of the more annoying details of this stuff is that ssh doesn't pass LANG (or any other locale information) along when you connect to a remote server. Granted, there are good reasons for this (just because my computer is happy when LANG=en_US.utf-8 doesn't mean any server I connect to would be), but at the same time, shouldn't the remote server be made aware of my local terminal's capabilities? Imagine if I connected to a server that defaults to Japanese, but I want it to know that I use English! Remember how I smuggled that information through in TERM and stuck it in the SSH_LANG variable? Here's where it becomes important.

I've also fiddled with different variations of this code to make it as compatible as possible. So far, this should work with Bash 2.05b and up... though that makes it slightly awkward-looking.

As a final note here, I discovered that less is capable of handling multibyte charsets (at least, recent versions of it are), but for whatever reason it doesn't always support LANG and other associated envariables. It DOES however support LESSCHARSET...

Anyway, here's the code:

if [[ -z $LANG ]] ; then
    dprint no LANG set
    if [[ $WINDOWID ]] && have xprop ; then
        dprint querying xprop
        __bashrc__wmlocal=(`xprop -id $WINDOWID -f WM_LOCALE_NAME 8s ' $0' -notype WM_LOCALE_NAME`)
        export LANG=`eval echo ${__bashrc__wmlocal[1]}`
        unset __bashrc__wmlocal
    elif [[ $OSTYPE == darwin* ]] ; then
        dprint "I'm on Darwin"
        if [[ ( $SSH_LANG && \
            ( $SSH_LANG == *.UTF* || $SSH_LANG == *.utf* ) || \
            $TERM_PROGRAM == Apple_Terminal ) && \
            -d "/usr/share/locale/en_US.UTF-8" ]] ; then
            export LANG='en_US.UTF-8'
        elif [ -d "/usr/share/locale/en_US" ] ; then
            export LANG='en_US'
            export LANG=C
    elif [[ $TERM == linux || $TERM_PROGRAM == GLterm ]] ; then
        if [ -d "/usr/share/locale/en_US" ] ; then
            export LANG='en_US'
            export LANG=C # last resort
        if [[ $SSH_LANG == C ]] ; then
            export LANG=C
        elif have locale ; then
            dprint "checking locale from big list (A)"
            locales=`locale -a`
            locales="${locales//[[:space:]]/|}" # not +() because that's slow
            if [[ en_US.utf8 == @($locales) ]] ; then
                export LANG='en_US.utf8'
            elif [[ en_US.utf-8 == @($locales) ]] ; then
                export LANG='en_US.utf-8'
            elif [[ en_US.UTF8 == @($locales) ]] ; then
                export LANG='en_US.UTF8'
            elif [[ en_US.UTF-8 == @($locales) ]] ; then
                export LANG='en_US.UTF-8'
            elif [[ en_US == @($locales) ]] ; then
                export LANG='en_US'
                export LANG=C
            unset locales
    dprint "- LANG IS ALREADY SET! ($LANG)"
    if [[ $SSH_LANG && $SSH_LANG != $LANG ]]; then
        if [[ $SSH_LANG == C ]] ; then
            export LANG=C
            dprint "checking locale from big list (B)"
            locales=`locale -a`
            locales="${locales//[[:space:]]/|}" # not +() because that's slow
            if [[ $SSH_LANG == @(${locales}) ]] ; then
                dprint "- SSH_LANG is a valid locale, resetting LANG"
                dprint "- SSH_LANG is NOT a valid locale"
                if [[ $SSH_LANG == *.(u|U)(t|T)@(f|F)?(-)8 ]] ; then
                    if [[ ! $LANG == *.(u|U)(t|T)@(f|F)?(-)8 ]] ; then
                        dprint "- want utf-8, but LANG is not utf8, unsetting"
                        unset LANG
                    dprint "- don't want utf-8"
                if [[ ! $LANG || ! $LANG == @($locales) ]] ; then
                    if [ "$wantutf8" = yes ] ; then
                        dprint "- finding a utf8 LANG"
                        if [[ en_US.utf8 == @($locales) ]] ; then
                            export LANG='en_US.utf8'
                        elif [[ en_US.utf-8 == @($locales) ]] ; then
                            export LANG='en_US.utf-8'
                        elif [[ en_US.UTF8 == @($locales) ]] ; then
                            export LANG='en_US.UTF8'
                        elif [[ en_US.UTF-8 == @($locales) ]] ; then
                            export LANG='en_US.UTF-8'
                        elif [[ en_US == @($locales) ]] ; then
                            export LANG='en_US'
                            export LANG=C
                        dprint "- finding a basic LANG"
                        if [[ en_US == @($locales) ]] ; then
                            export LANG='en_US'
                            export LANG=C
                unset wantutf8
            unset locales
        dprint "- ... without SSH_LANG, why mess with it?"
dprint - LANG is $LANG
if [[ $LANG == *.(u|U)(t|T)@(f|F)?(-)8 ]] ; then
    export LESSCHARSET=utf-8


This is where a lot of the real action is, in terms of convenience settings. Like anyone who uses a computer every day, I type a lot; and if I can avoid it, so much the better. (I'm a lazy engineer.)

Sometimes I can't quite get what I want out of an alias. In csh aliases can specify what to do with their arguments. In bash, aliases are really more just shorthand — "pretend I really typed this" kind of stuff. Instead, if you want to be more creative with argument handling, you have to use functions (it's not a big deal, really). Here's a few functions I added just because they're occasionally handy to have the shell do for me:

function exec_cvim {
/Applications/Vim.app/Contents/MacOS/Vim -g "$@" &

function darwin_locate { mdfind "kMDItemDisplayName == '$@'wc"; }
if [[ $- == *i* && $OSTYPE == darwin* && ${OS_VER[0]} -ge 8 ]] ;
alias locate=darwin_locate

function printargs { for F in "$@" ; do echo "$F" ; done ; }
function psq { ps ax | grep -i $@ | grep -v grep ; }
function printarray {
for ((i=0;$i<`eval 'echo ${#'$1'[*]}'`;i++)) ; do
    echo $1"[$i]" = `eval 'echo ${'$1'['$i']}'`
alias back='cd $OLDPWD'

There are often a lot of things that I just expect to work. For example, when I type "ls", I want it to print out the contents of the current directory. In color if possible, without if necessary. It often annoys me, on Solaris systems, when the working version of ls is buried in the path, while a really lame version is up in /bin for me to find first. Here's how I fix that problem:

# GNU ls check
if [[ $OSTYPE == darwin* ]]; then
    dprint "- DARWIN ls"
    alias ls='/bin/ls -FG'
    alias ll='/bin/ls -lhFG'
elif have colorls ; then
    dprint "- BSD colorls"
    alias ls='colorls -FG'
    alias ll='colorls -lhFG'
    __kbwbashrc__lsarray=(`\type -ap ls`)
    for ((i=0;$i<${#__kbwbashrc__lsarray[*]};i=$i+1)) ; do
        if ${__kbwbashrc__lsarray[$i]} --version &>/dev/null ;
            dprint "- found GNU ls: ${__kbwbashrc__lsarray[$i]}"
            alias ls="${__kbwbashrc__lsarray[$i]} --color -F"
            alias ll="${__kbwbashrc__lsarray[$i]} --color -F -lh"
    if [ "$__kbwbashrc__lsfound" == no ] ; then
        if ls -F &>/dev/null ; then
            dprint "- POSIX ls"
            alias ls='ls -F'
            alias ll='ls -lhF'
            alias ll='ls -lh'
    unset __kbwbashrc__lsarray __kbwbashrc__lsfound

Similar things are true of make and sed and such. I've gotten used to GNU's version, and if they exist on the machine I'd much rather automatically use them than have to figure out whether it's really called gnused or gsed or justtowasteyourtimesed all by myself:

if [[ $OSTYPE == linux* ]] ; then
    # actually, just Debian, but this works for now
    alias gv="gv --watch --antialias"
    alias gv="gv -watch -antialias"
if have gsed ; then
    alias sed=gsed
elif have gnused ; then
    alias sed=gnused
if have gmake ; then
    alias make=gmake
elif have gnumake ; then
    alias make=gnumake

The rest of them are mostly boring, with one exception:

alias macfile="perl -e 'tr/\x0d/\x0a/'"
have tidy && alias tidy='tidy -m -c -i'
have vim && alias vi='vim'
alias vlock='vlock -a'
alias fastscp='scp -c arcfour -o Compression=no' # yay speed!
alias startx='nohup ssh-agent startx & exit'
alias whatlocale='printenv | grep ^LC_'
alias fixx='xauth generate $DISPLAY'
alias whatuses='fuser -v -n tcp'
alias which=type
alias ssh='env TERM="$TERM:$LANG" ssh'
have realpath || alias realpath=realpath_func
if have readlink ; then
    unset -f readlink_func
    alias readlink=readlink_func
if [[ $OSTYPE == darwin* ]]; then
    alias top='top -R -F -ocpu -Otime'
    alias cvim='exec_cvim'
    alias gvim='exec_cvim'

Did you note that ssh alias? Heh.

Tab-completion Options

Bash has had, for a little while at least, the ability to do custom tab-completion. This is really convenient (for example, when I've typed cvs commit and I hit tab, bash can know that I really just want to tab-complete files that have been changed). However, I won't bore you with a long list of all the handy tab-completions that are out there. Most of mine are just copied from here anyway. But I often operate in places where that big ol' bash-completion file can be in multiple places. Here's the simple little loop I use. You'll notice that it only does the loop after ensuring that bash is of recent-enough vintage:

if [[ $BASH_VERSION && -z $BASH_COMPLETION && $- == *i* ]] ;
    bash=${BASH_VERSION%.*}; bmajor=${bash%.*}; bminor=${bash#*.}
    if [ $bmajor -eq 2 -a $bminor '>' 04 ] || [ $bmajor -gt 2 ] ;
        for bc in "${completion_options[@]}" ; do
            if [[ -r $bc ]] ; then
                dprint Loading the bash_completion file
                if [ "$BASH_COMPLETION" ] ; then
                export COMP_CVS_ENTRIES=yes
                source "$bc"
    unset bash bminor bmajor
unset completion_options

Machine-local settings

You'd be surprised how useful this can be sometimes. Sometimes I need machine-specific settings. For example, on some machines there's a PGI compiler I want to use, and maybe it needs some environment variable set. Rather than put it in the main bashrc, I just put that stuff into ~/.bashrc.local and have it loaded:

dprint checking for bashrc.local in $HOME
if [ -r "${HOME}/.bashrc.local" ]; then
    dprint Loading local bashrc
    source "${HOME}/.bashrc.local"


Lastly, it is sometimes the case that the TMOUT variable has been set, either by myself, or by a sysadmin who doesn't like idle users (on a popular system, too many idle users can unnecessarily run you out of ssh sockets, for example). In any case, when my time is limited, I like being aware of how much time I have left. So I have my bashrc detect the TMOUT variable and print out a big banner so that I know what's up and how much time I have. Note that bash can do simple math all by itself with the $(( )) construction. Heheh. Anyway:

if [[ $TMOUT && "$-" == *i* ]]; then
    echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    echo You will be autologged out after:
    echo -e -n '\t'
    [[ $days != 0 ]] && echo -n "$days days "
    [[ $hours != 0 ]] && echo -n "$hours hours "
    [[ $minutes != 0 ]] && echo -n "$minutes minutes "
    [[ $seconds != 0 ]] && echo -n "$seconds seconds "
    echo ... of being idle.
    unset days hours minutes seconds


While I'm at it, I suppose I should point out that I also have a ~/.bash_logout file that's got some niceness to it. If it's the last shell, it clears sudo's cache, empties the console's scrollback buffer, and clears the screen. Note: DO NOT PUT THIS IN YOUR BASHRC You wouldn't like it in there.

if [ "$SHLVL" -eq 1 ] ; then
    sudo -k
    type -P clear_console &>/dev/null && clear_console 2>/dev/null

And that's about it! Of course, I'm sure I'll add little details here and there and this blog entry will become outdated. But hopefully someone finds my bashrc useful. I know I've put a lot of time and effort into it. :)

