<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
   <channel>
      <title>Kyle</title>
      <link>http://www.memoryhole.net/kyle/</link>
      <description>High catecholamine levels</description>
      <language>en</language>
      <copyright>Copyright 2008</copyright>
      <lastBuildDate>Mon, 18 Aug 2008 18:01:12 -0600</lastBuildDate>
      <generator>http://www.sixapart.com/movabletype/</generator>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs> 

            <item>
         <title>More Compiler Complaints: Sparc Edition</title>
         <description>Unlike my &quot;previous whining about compilers&quot;:http://www.memoryhole.net/kyle/2008/01/apples_compiler_idiocy.html, this one I have no explanation for. It&apos;s not me specifying things incorrectly, it&apos;s just the compiler being broken.

So, here&apos;s the goal: atomically increment a variable. On a Sparc (specifically, SparcV9), the function looks something like this:

bc[cc]. static inline int atomic_inc(int * operand)
{
    register uint32_t oldval, newval;
    newval = *operand;
    do {
        oldval = newval;
        newval++;
        __asm__ __volatile__ (&quot;cas [%1], %2, %0&quot;
            : &quot;=&amp;r&quot; (newval)
            : &quot;r&quot; (operand), &quot;r&quot;(oldval)
            : &quot;cc&quot;, &quot;memory&quot;);
    } while (oldval != newval);
    return oldval+1;
}

Seems trivial, right? We use the CAS instruction (compare and swap). Conveniently, whenever the comparison fails, it stores the value of @*operand@ in the second register (i.e. %0 aka newval), so there are no extraneous memory operations in this little loop. Right? Right. Does it work? _NO._

Let&apos;s take a look at the assembly that the compiler (gcc) generates with -O2 optimization:

bc[asm]. save    %sp, -0x60, %sp
ld      [%i0], %i5      /* newval = *operand; */
mov     %i0, %o1        /* operand is copied into %o1 */
mov     %i5, %o2        /* oldval = newval; */
cas     [%o1], %o2, %o0
ret
restore %i5, 0x1, %o0

Say what? Does that have ANYTHING to do with what I told it? Nope! Of course, gcc is awful, you say! Use SUN&apos;s compiler! Sorry, it&apos;s the EXACT SAME output.

But let&apos;s be a bit more explicit about the fact that the @newval@ register is an input to the assembly block:

bc[cc]. static inline int atomic_inc(int * operand)
{
    register uint32_t oldval, newval;
    newval = *operand;
    do {
        oldval = newval;
        newval++;
        __asm__ __volatile__ (&quot;cas [%1], %2, %0&quot;
            : &quot;=&amp;r&quot; (newval)
            : &quot;r&quot; (operand), &quot;r&quot;(oldval), &quot;0&quot;(newval)
            : &quot;cc&quot;, &quot;memory&quot;);
    } while (oldval != newval);
    return oldval+1;
}

Now, Sun&apos;s compiler complains: @warning: parameter in inline asm statement unused: %3@. But at least, gcc leaves the @add@ operation in:

bc[asm]. save    %sp, -0x60, %sp
ld      [%i0], %i5      /* oldval = *operand; */
mov     %i0, %o1        /* operand is copied to %o1 */
add     %i5, 0x1, %o0   /* newval = oldval + 1; */
mov     %i5, %o2        /* oldval is copied to %o2 */
cas     [%o1], %o2, %o0
ret
restore %i5, 0x1, %o0

Still, though, the @do{ }while()@ loop was optimized away, because the compiler refuses to acknowledge that @newval@ can change values! Sun&apos;s compiler will indeed leave the while loop in, but will often use the WRONG REGISTER for comparison (such as %i2 instead of %o0).

But check out this minor change:

bc[cc]. static inline int atomic_inc(int * operand)
{
    register uint32_t oldval, newval;
    do {
        newval = *operand;
        oldval = newval;
        newval++;
        __asm__ __volatile__ (&quot;cas [%1], %2, %0&quot;
            : &quot;=&amp;r&quot; (newval)
            : &quot;r&quot; (operand), &quot;r&quot;(oldval), &quot;0&quot;(newval)
            : &quot;cc&quot;, &quot;memory&quot;);
    } while (oldval != newval);
    return oldval+1;
}

Note that, rather than using the output of the @cas@ instruction, we&apos;re throwing it away and re-reading operand no matter what. And what happens:

bc[asm]. save     %sp, -0x60, %sp
ld       [%i0], %i5           /* oldval = *operand; */
add      %i5, 0x1, %o0        /* newval = oldval + 1; */
mov      %i0, %o1             /* operand is copied to %o1 */
mov      %i5, %o2             /* oldval is copied to %o2 */
cas      [%o1], %o2, %o0
cmp      %i5, %o0             /* if (oldval != newval) */
bne,a,pt %icc, atomic_inc+0x8 /* then go back and try again */
ld       [%i0], %i5
ret
restore  %i5, 0x1, %o0

AHA! The while loop returns! And best of all, both GCC and Sun&apos;s compiler suddenly, magically, use the right registers for the loop comparison! It&apos;s amazing!

And yet, also completely idiotic. So, we can get it to work... but we have to be inefficient in order to do it, because otherwise (inexplicably) the compiler refuses to acknowledge that our output register can change.

In case you&apos;re curious, the gcc version is:
@sparc-sun-solaris2.10-gcc (GCC) 4.0.4 (gccfss)@
and the Sun compiler is:
@cc: Sun C 5.9 SunOS_sparc 2007/05/03@</description>
         <link>http://www.memoryhole.net/kyle/2008/08/more_compiler_complaints_sparc.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/08/more_compiler_complaints_sparc.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Research</category>
        
        
         <pubDate>Mon, 18 Aug 2008 18:01:12 -0600</pubDate>
      </item>
            <item>
         <title>I Hate Procmail</title>
         <description>Its error handling is *CRAP*.

I am coming to this realization because I recently lost a BUNCH of messages because of a bad delivery path (I told procmail to pipe messages to a non-existent executable). So what did procmail do? According to its log:

bc.. /bin/sh: /tmp/dovecot11/libexec/dovecot/deliver: No such file or directory
procmail: Error while writing to &quot;/tmp/dovecot11/libexec/dovecot/deliver&quot;

p. Well, sure, that&apos;s to be expected, right? So what happened to the email? VANISHED. Into the bloody ether.

Of course, determining that the message vanished is trickier than just saying &quot;hey, it&apos;s not in my mailbox.&quot; Oh no, there&apos;s a &quot;feature&quot;, called @ORGMAIL@. What is this? According to the procmailrc documentation (*that* collection of wisdom):

bc.. ORGMAIL     Usually the system  mailbox  (ORiGinal  MAIL‐
            box).   If,  for  some  obscure  reason (like
            ‘filesystem full’)  the  mail  could  not  be
            delivered, then this mailbox will be the last
            resort.  If procmail fails to save  the  mail
            in  here  (deep,  deep  trouble :-), then the
            mail will bounce back to the sender.

p. And so where is THAT? Why, @/var/mail/$LOGNAME@ of course, where else? And if @LOGNAME@ isn&apos;t set for some reason? Or what if @ORGMAIL@ is unset? Oh, well... nuts to you! Procmail will use @$SENDMAIL@ to _BOUNCE THE EMAIL_ rather than just try again later. That&apos;s what they mean by &quot;deep, deep trouble.&quot; Notice the smiley face? Here&apos;s why the manual has a smiley-face in it: to mock your pain.

But here&apos;s the real crux of it: procmail doesn&apos;t see delivery errors as FATAL. If one delivery instruction fails, it&apos;ll just keep going through the procmailrc, looking for anything else that might match. In other words, the logic of your procmailrc has to take into account the fact that sometimes mail delivery can fail. If you fail to do this, your mail CAN end up in *RANDOM LOCATIONS*, depending on how messages that were supposed to match earlier rules fare against later rules.

If you want &quot;first failure bail&quot; behavior (which makes the most sense, in my mind), you have to add an extra rule after EVERY delivery instruction. For example:

bc.. :0 H
* ^From: .*fred@there\.com
./from_fred

:0 e # handle failure
{
    EXITCODE=75 # set a non-zero exit code
    HOST # This causes procmail to stop, obviously
}

p. You agree that @HOST@ means &quot;stop processing and exit&quot;, right? Obviously. That&apos;s procmail for you. *Note that that second clause has gotta go after EVERY delivery instruction.* I hope you enjoy copy-and-paste.

Another way to handle errors, since *successful* delivery *does* stop procmail, is to add something like that to the end of your procmailrc, like so:

bc.. :0 # catch-all default delivery
${DEFAULT}

 # If we get this far, there must have been an error
EXITCODE=75
HOST

p. Of course, you could also send the mail to @/dev/null@ at that point, but unsetting the @HOST@ variable (which is what listing it does) does the same thing faster. Intuitive, right? Here&apos;s my smiley-face:

@&gt;:-P@</description>
         <link>http://www.memoryhole.net/kyle/2008/06/i_hate_procmail.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/06/i_hate_procmail.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Commentary</category>
        
        
         <pubDate>Fri, 27 Jun 2008 10:33:38 -0600</pubDate>
      </item>
            <item>
         <title>YAASI: Yet Another Anti-Spam Idea</title>
         <description>Branden and I had an idea to help with the spam problem on our system, and it&apos;s proven particularly effective. How effective? Here&apos;s the graphs from the last year of email on my system. Can you tell when I started using the system?

!(scaled)http://www.memoryhole.net/kyle/archives/images/antispam.png!

!(scaled)http://www.memoryhole.net/kyle/archives/images/antispam2.png!

If you want to see the live images, check &quot;here&quot;:http://www.we-be-smart.org/rrd/.

The idea is based on the following observations: certain addresses on my domain ONLY get spam. This is generally because they either don&apos;t exist or because I stopped using them; for example, spammers often send email to buy@memoryhole.net. Branden and I also  both use the user-tag@domain scheme, so we get a lot of disposable addresses that way. These addresses are such that we know for *certain* that anyone sending email to them is a spammer. Some of these addresses were already being rejected as invalid; some we hadn&apos;t gotten around to invalidating yet.

By simply rejecting emails sent to those addresses, we were able to reduce the spam load of our domains by a fair bit, and the false-positive rate is nil. But we took things a step further: since spammers rarely send only one message, often they will send spam to both invalid AND valid addresses.

If I view those known-bad addresses as, essentially, honeypots, I can say: aha! Any IP sending to a known-bad address is a spammer, and I can refuse (with a permanent fail) any email from that IP for some short time. I started with 5 minutes, but have moved to an exponentially increasing timeout system. Each additional spam increased the length of the timeout (5 minutes for the first spam, 6 for the second, 8 for the third, and so on). Longer-term bans, as a result of the exponentially increasing timeout, are made more efficient via the equivalent of @/etc/hosts.deny@. I haven&apos;t gotten into the maintaining-my-spammer-database much yet, but I think this may not be terribly important (I&apos;ll explain in a moment).

One of the best parts of the system is that it is *fast*: new spammers that identify themselves by sending to honeypot addresses get blocked quickly and without my intervention. So far this has been particularly helpful in eliminating spam spikes. Another feature that I originally thought would be useful, but hasn&apos;t really appeared to be (yet) is that it allows our multiple domains to share information about spam sources. Thus far, however, our domains seem to be plagued by different spammers.

Now, interestingly, about a week after we started using the system, our database of known spammers was wiped out (it&apos;s kept in /tmp, and we rebooted the system). Result? No noticeable change in effectiveness. How&apos;s that for a result? And, as you can see from the graph above, there&apos;s no obvious change in spam blocking over the course of a month that would indicate that the long-term history is particularly useful. So, it may be sufficient to keep a much shorter history. Maybe only a week is necessary, maybe two weeks, I haven&apos;t decided yet (and, as there hasn&apos;t yet been much of a speed penalty for it, there&apos;s no pressure to establish a cutoff). But, given that most spam is sent from botnets with dynamic IPs, this isn&apos;t a particularly surprising behavior.

Forkit.org and memoryhole.net have been using this filter for a month so far. The week before we started using this filter, memoryhole.net averaged around 262 emails per hour. The week after instituting this filter, the average was around 96 per hour (a 60+% reduction!). Before using the filter, forkit.org averaged 70 emails per hour; since starting to use the filter, that number is down to 27.4 per hour (also a 60+% reduction). We have recorded spams from over 33,000 IPs, most of which only ever sent one or two spams. We typically have between 100 and 150 IPs that are “in jail” at any one time (at this moment: 143), and most of those (at this moment 134) are blocked for sending more than ten spams (114 of them have a timeout measured in days rather than minutes).

Now, granted, I know that by simply dropping 60% of all connections we’d get approximately the same results. But I think our particular technique is superior to that because it&apos;s based on known-bad addresses. Anyone who doesn&apos;t send to invalid addresses will never notice the filter.

The biggest potential problem that I can see with this system is that of spammers who have taken over a normally friendly host, such as Gmail spam. I&apos;ve waffled on this potential problem: on the one hand, Gmail has so many outbound servers that it&apos;s unlikely to get caught (a couple bad emails won&apos;t have much of a penalty). Thus far, I&apos;ve seen a few yahoo servers in Japan sending us spam, but no Gmail servers. On the other hand, as long as I simply use temporary failures (at least for good addresses), and as long as ND doesn&apos;t retry in the same order every time, messages will get through.

I&apos;ve also begun testing a &quot;restricted sender&quot; feature to work with this. For example, I have the address kyle-slashdot@memoryhole.net that I use exclusively for my slashdot.org account. The only people who are allowed to send to that email address is slashdot.org (i.e. if I forget my password). If anyone from any other domain attempts that address, well, then I know that sending IP is a spammer and I can treat it as if it was a known-bad address. Not applicable to *every* email address, obviously, but it&apos;s a start.

It&apos;s been pointed out that this system is, in some respects, a variant on greylisting. The major difference is that it&apos;s a penalty-based system, rather than a &quot;prove yourself worthy by following the RFC&quot; system, and I like that a bit better. I&apos;m somewhat tempted to define some bogus address (bogus@memoryhole.net) and sign it up for spam (via &quot;spamyourenemies.com&quot;:http://spamyourenemies.com or something similar), but given that part of the benefit here is due to spammers trying both valid and invalid addresses, I think it would probably just generate lots of extra traffic and not achieve anything particularly useful.

Now, this technique is simply one of many; it&apos;s not sufficient to guarantee a spam-free inbox. I use it in combination with several other antispam techniques, including a greet-delay system and a frequently updated SpamAssassin setup. But check out the difference it&apos;s made in our CPU utilization:

!(scaled)http://www.memoryhole.net/kyle/archives/images/antispam-cpu.png!

Okay, so, grand scheme of things: knocking the CPU use down three percentage points isn&apos;t huge, but knocking it down by 50%? That sounds better, anyway. And as long as it doesn&apos;t cause problems by making *valid* email disappear (possible, but rather unlikely), it seems to me to be a great way to cut my spam load relatively easily.</description>
         <link>http://www.memoryhole.net/kyle/2008/04/yaasi_yet_another_antispam_ide.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/04/yaasi_yet_another_antispam_ide.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Cool Stuff</category>
        
        
         <pubDate>Thu, 24 Apr 2008 18:22:26 -0600</pubDate>
      </item>
            <item>
         <title>Leopard - Finally!</title>
         <description><![CDATA[So, I upgraded to MacOS 10.5 recently (from 10.4). Those of you who know me will doubtless be thinking "my god, man, what took so long?!?", and that's a longer story than I want to get into right now. Suffice to say: we're rocking and rolling now!

My impressions of the new OS are pretty favorable. I've read all the complaints about the UI changes, and they have some merit. By the time I upgraded, Apple had already released 10.5.2, which addressed many of the more unfortunate problems for people like me who put /Applications into the Dock.

I really DO like the "Fan" icon display, though, particularly for the new "Downloads" folder. Creating a folder just for downloads is something I could have done years ago, of course, but I hadn't - everything downloaded to the Desktop, which inevitably became incredibly cluttered. But I *love* the new approach, and part of what makes it especially useful is that things in the "Fan" display can be *dragged to the trash.* HA! I love it! It's the little things that make me happy. :)

The new X11 is a bit of a pain in the butt. I'd become very used to using @xterm@ - or more precisely, @uxterm@ - for all my terminal needs (which is to say, for 90% of what I do with my computer). That's not so tenable now, particularly since Apple has apparently decided that @uxterm@ was just too useful a shell script to let stand. I am keeping a copy of that shell script (which just runs @xterm@ with all the necessary utf-8 flags and sets the LANG appropriately) handy, just in case, but for the time being, I've decided to migrate to using Apple's Terminal full time now. Undoubtedly, it's still not as fast as @uxterm@, but since getting an Intel iMac, I don't really notice anymore (on the old dual 500Mhz G4, it was definitely noticable).

For migrating, I've had to create my own @nsterm-16color@ termcap file (which I keep in @~/.terminfo/n/nsterm-16color@ ) in order to ensure that all the features I want work properly. I stole the file from ncurses 5.6, and modified it to add correct dual-mode swapping ( @smcup=\E7\E[?47h, rmcup=\E[2J\E[?47l\E8@ ) and then to support the home and end keys ( @khome=\E[H, kend=\E[F@ ). These are things that the native OSX dtterm/xterm/xterm-color/whatever terminfo settings don't do correctly. (<sigh> WHY???) ...And then, of course, I had to fix the key mapping of pageup/shift-pageup and pagedown/shift-pagedown and all the relative keys, but that was easy to do in the Terminal.app's preferences. The defaults are sensible, just not for folks who are used to xterm's behavior. I also re-discovered that I hate Terminal.app's default blue (a dark, almost-midnight blue), and much prefer having a lighter one. Thankfully I'm not the only one - "Ciarán Walsh's update to the TerminalColors plugin":http://ciaranwal.sh/2007/11/01/customising-colours-in-leopard-terminal is solid and works well.

Other than that, things have been pretty smooth. I haven't experienced any really strange compatibility problems -- in large part, I think, because I keep my system pretty up-to-date, so I already had the "Leopard-compatible" versions of all the software I use (and all the Unix applications seem to work flawlessly without even needing a recompile - huzzah for that!).

The one application that needed SERIOUS fiddling is "VirtualBox":http://www.virtualbox.org. They have an OSX version, but only in beta form. I use it mostly so I can provide sensible Windows XP support to relatives who have computer questions (and for doing browser compatibility tests). I had been using Beta 2 (1.4.6), which had worked flawlessly for my needs. Unfortunately, Beta 2 isn't compatible with Leopard, so an upgrade to the latest (Beta 3) was necessary. THIS beta seems to have a few problems. For one thing, it can't understand all the old machine definitions (so when upgrading, make sure you don't have any important system snapshots or saved machine state that you need). However, it does understand the old disk files, so it's a simple matter to create a new machine definition using the old disk. The new machine still won't *BOOT,* though, and it took me an hour or so of fiddling to figure out how to fix it.

There are two major problems that crop up. First: *they changed the default IDE controller* for Windows XP guests. The old default was PIIX3; the new default is PIIX4. Either one will work, and if you install XP from scratch on a newly created XP host, it will work with the PIIX4 controller just fine. But if you're booting from an XP that was created with Beta 2 (i.e. a WindowsXP installation that thinks you have a PIIX3 controller), it will blue-screen and reboot immediately after displaying the Microsoft logo: not good. Fixing it is easy, though: just change the IDE controller for your XP machine in the machine settings dialog.

The second problem is that *the network doesn't work.* Actually, that's not true, the network works just fine, it's DNS resolution that doesn't work (but one looks a lot like the other when you're not paying close attention to error messages). For whatever reason, when your XP system uses DHCP to get its network information, the information it receives from VirtualBox is _wrong._ Specifically, VirtualBox tells it to resolve DNS names by contacting 10.0.2.3; it should be contacting 10.0.2.2 (i.e. the same as the router). Fixing this was just a matter of changing Windows' network configuration to use a custom DNS server (10.0.2.2) rather than the one supplied by DHCP. Annoying, but nothing terrible.

The only other stumbling block in Leopard that I've come across is the iChat-vs-Internet-Sharing problem that other people have discovered. Essentially, if you have enabled Internet Sharing, iChat can't do video conferencing. Something to do with being able to remap ports... the explanations I've read are rather vague. It's not especially important to me, but came up when I was trying to demonstrate the virtues of Leopard to Emily.

Which reminds me: the new iChat is MUCH better for talking to multiple people at the same time. The "tabbed" chatting interface is *terrific.* The vaunted "Spaces" (virtual desktops) are nice, and implemented well, but I gotta say that I've gotten used to having just one desktop these days (I use Exposé *a lot*). Getting used to having the extra desktops will probably take a while.

Two more features I noticed were the Quick View (in Finder, press the space bar to quickly view something) and Web Clips (in Safari, you can take a snippet of a webpage and turn it into a Dashboard widget). Quick View is pretty great, especially for folders full of PDFs, because you can leave it up and keep navigating around the Finder (the contents of the Quick View window will track whatever you select in the Finder), but since I don't spend much time in the Finder, it's of limited use. If I could integrate it with my @~/.mailcap@ file, now THAT would be awesome. Web Clips are not *quite* as great as they could be. For one thing, they don't refresh quickly (but they DO refresh--at first I didn't think they did--and in the worst case, you can click on them and press Ctrl-R to force the issue), and for another, they can't scale -- many of the things I want to clip are large graphics that I wish to monitor. If OSX could scale clips down for me, that would make them much more useful.

Which reminds me -- one new feature of Leopard that I *adore* is their new built-in VNC viewer. It may not actually be VNC, but that's fine by me -- it's blazing fast, and best of all, it scales the screen down so that you can easily control a screen that's larger than the one you have. Chicken of the VNC used to be a must-have application for me, but Leopard's built-in screen viewer is *much* better for what I usually want to do (which is control the iMac upstairs from the laptop down on the couch).]]></description>
         <link>http://www.memoryhole.net/kyle/2008/04/leopard_finally.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/04/leopard_finally.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Commentary</category>
        
        
         <pubDate>Tue, 08 Apr 2008 15:45:29 -0600</pubDate>
      </item>
            <item>
         <title>My Bashrc</title>
         <description><![CDATA[There are few things that, over my time using Unix-like systems, I have put more cumulative effort into than into my configuration files. I've been tweaking them since the day I discovered them, attempting to make my environment more and more to my liking. I have posted them on my other website ("here":http://www.memoryhole.net/~kyle/index.php?section=reference), but it occurred to me that they've gotten sufficiently hoary and complex that a walkthrough might help someone other than myself.

Anyway, my "bashrc":http://www.memoryhole.net/~kyle/bashrc.html is first on the list.

The file is divided into several (kinda fuzzy) sections:
- "Initialization & Other Setup":#init
- "Useful Functions":#funcs
- "Loading System-wide Bashrc":#sysbashrc
- "Behavioral Settings":#behavioral
- "Environment Variables":#envariable
- "Character Set Detection":#charset
- "Aliases":#aliases
- "Tab-completion Options":#tab
- "Machine-local settings":#local
- "Auto-logout":#logout

Let's take them one at a time.

h3(#init). Initialization & Other Setup

Throughout my bashrc, I use a function I define here ( @dprint@ ) to allow me to quickly turn on debugging information, which includes printing the seconds-since-bash-started variable ( @SECONDS@ ) in case something is taking too long and you want to find the culprit. Yes, my bashrc has a debug mode. This is essentially controlled by the @KBWDEBUG@ environment variable. Then, because this has come in useful once or twice, I allow myself to optionally create a @~/.bashrc.local.preload@ file which is sourced now, before anything else. Here's the code:

bc[bash].. KBWDEBUG=${KBWDEBUG:-no}

function dprint {
if [[ "$KBWDEBUG" == "yes" && "$-" == *i* ]]; then
    #date "+%H:%M:%S $*"
    echo $SECONDS $*
fi
}
dprint alive
if [ -r "${HOME}/.bashrc.local.preload" ]; then
    dprint "Loading bashrc preload"
    source "${HOME}/.bashrc.local.preload"
fi

h3(#funcs). Useful Functions

This section started with some simple functions for PATH manipulation. Then those functions got a little more complicated, then I wanted some extra functions for keeping track of my config files (which were now in CVS), and then they got more complicated...

You'll notice something about these functions. Bash (these days) will accept function declarations in this form:

bc[bash]. function fname()
{
    do stuff
}

But that wasn't always the case. To maintain compatability with older bash versions, I avoid using the uselessly cosmetic parens and I make sure that the curly-braces are on the same line, like so:

bc[bash]. function fname \
{
    do stuff
}

Anyway, the path manipulation functions are pretty typical -- they're similar to the ones that "Fink":http://www.finkproject.org/ uses, but slightly more elegant. The idea is based on these rules of PATH variables:

# Paths must not have duplicate entries
# Paths are faster if they don't have symlinks in them
# Paths must not have "." in them
# All entries in a path must exist (usually)

There are two basic path manipulation functions: @add_to_path@ and @add_to_path_first@. They do predictable things -- the former appends something to a given path variable (e.g. @PATH@ or @MANPATH@ or @LD_LIBRARY_PATH@ ) unless it's already in that path, and the latter function prepends something to the given PATH variable (or, if it's already in there, moves it to the beginning). Before they add a value to a path, they first check it to make sure it exists, is readable, that I can execute things that are inside it, and they resolve any symlinks in that path (more on that in a moment). Here's the code (ignore the reference to @add_to_path_force@ in @add_to_path@ for now; I'll explain shortly):

bc[bash].. function add_to_path \
{
    local folder="${2%%/}"
    [ -d "$folder" -a -x "$folder" ] || return
    folder=`( cd "$folder" ; \pwd -P )`
    add_to_path_force "$1" "$folder"
}

function add_to_path_first \
{
    local folder="${2%%/}"
    [ -d "$folder" -a -x "$folder" ] || return
    folder=`( cd "$folder" ; \pwd -P )`
    # in the middle, move to front
    if eval '[[' -z "\"\${$1##*:$folder:*}\"" ']]'; then
        eval "$1=\"$folder:\${$1//:\$folder:/:}\""
        # at the end
    elif eval '[[' -z "\"\${$1%%*:\$folder}\"" ']]'; then
        eval "$1=\"$folder:\${$1%%:\$folder}\""
        # no path
    elif eval '[[' -z "\"\$$1\"" ']]'; then
        eval "$1=\"$folder\""
        # not in the path
    elif ! eval '[[' -z "\"\${$1##\$folder:*}\"" '||' \
      "\"\$$1\"" '==' "\"$folder\"" ']]'; then
        eval "export $1=\"$folder:\$$1\""
    fi
}



p. Then, because I was often logging into big multi-user Unix systems (particularly Solaris systems) with really UGLY @PATH@ settings that had duplicate entries, often included ".", not to mention directories that either didn't exist or that I didn't have sufficient permissions to read, I added the function @verify_path@. All this function does is separates a path variable into its component pieces, eliminates ".", and then reconstructs the path using @add_to_path@, which handily takes care of duplicate and inaccessible entries. Here's that function:

bc[bash].. function verify_path \
{
    # separating cmd out is stupid, but is compatible
    # with older, buggy, bash versions (2.05b.0(1)-release)
    local cmd="echo \$$1"
    local arg="`eval $cmd`"
    eval "$1=\"\""
    while [[ $arg == *:* ]] ; do
        dir="${arg%:${arg#*:}}"
        arg="${arg#*:}"
        if [ "$dir" != "." -a -d "$dir" -a \
          -x "$dir" -a -r "$dir" ] ; then
            dir=`( \cd "$dir" ; \pwd -P )`
            add_to_path "$1" "$dir"
        fi
    done
    if [ "$arg" != "." -a -d "$arg" -a -x "$arg" -a -r "$arg" ] ;
    then
        arg=`( cd "$arg" ; \pwd -P )`
        add_to_path "$1" "$arg"
    fi
}

p. Finally, I discovered @XFILESEARCHPATH@ -- a path variable that requires a strange sort of markup (it's for defining where your app-defaults files are for X applications). This wouldn't work for @add_to_path@, so I created @add_to_path_force@ that still did duplicate checking but didn't do any verification of the things added to the path.

bc[bash].. function add_to_path_force \
{
    if eval '[[' -z "\$$1" ']]'; then
        eval "export $1='$2'"
    elif ! eval '[[' \
        -z "\"\${$1##*:\$2:*}\"" '||' \
        -z "\"\${$1%%*:\$2}\"" '||' \
        -z "\"\${$1##\$2:*}\"" '||' \
        "\"\${$1}\"" '==' "\"$2\"" ']]'; then
        eval "export $1=\"\$$1:$2\""
    fi
}

p. I mentioned that I resolved symlinks before adding directories to path variables. This is a neat trick I discovered due to the existence of @pwd -P@ and subshells. @pwd -P@ will return the "real" path to the folder you're in, with all symlinks resolved. And it does so very efficiently (without actually resolving symlinks -- it just follows all the ".." records). Since you can change directories in a subshell (i.e. between parentheses) without affecting the parent shell, a quick way to transform a folder's path into a resolved path is this: @( \cd "$folder"; pwd -P)@. I put the backslash in there to use the shell's builtin cd, just in case I'd somehow lost my mind and aliased cd to something else.

And then, just because it was convenient, I added another function: @have@, which detects whether a binary is accessible or not:

bc[bash]. function have { type "$1" &>/dev/null ; }

Then I had to confront file paths, such as the @MAILCAP@ variable. A lot of the same logic (i.e. @add_to_path_force@), but entry validation is different:

bc[bash]. function add_to_path_file \
{
    local file="${2}"
    [ -f "$file" -a -r "$file" ] || return
    # realpath alias may not be set up yet
    file=`realpath_func "$file"`
    add_to_path_force "$1" "$file"
}

You'll note the @realpath_func@ line in there. @realpath@ is a program that takes a filename or directory name and resolves the symlinks in it. Unfortunately, @realpath@ is a slightly unusual program; I've only ever found it on OSX (it may be on other BSDs). But, with the power of my @pwd -P@ trick, I can fake most of it. The last little piece (resolving a file symlink) relies on a tool called @readlink@ ... but I can fake that too. Here are the two functions:

bc[bash].. function readlink_func \
{
    if have readlink ; then
        readlink "$1"
    #elif have perl ; then # seems slower than alternative
    #    perl -e 'print readlink("'"$1"'") . "\n"'
    else
        \ls -l "$1" | sed 's/[^>]*-> //'
    fi
}

function realpath_func \
{
    local input="${1}"
    local output="/"
    if [ -d "$input" -a -x "$input" ] ; then
        # All too easy...
        output=`( cd "$input"; \pwd -P )`
    else
        # sane-itize the input to the containing folder
        input="${input%%/}"
        local fname="${input##*/}"
        input="${input%/*}"
        if [ ! -d "$input" -o ! -x "$input" ] ; then
            echo "$input is not an accessible directory" >&2
            return
        fi
        output="`( cd "$input" ; \pwd -P )`/"
        input="$fname"
        # output is now the realpath of the containing folder
        # so all we have to do is handle the fname (aka "input)
        if [ ! -L "$output$input" ] ; then
            output="$output$input"
        else
            input="`readlink_func "$output$input"`"
            while [ "$input" ] ; do
                if [[ $input == /* ]] ; then
                    output="$input"
                    input=""
                elif [[ $input == ../* ]] ; then
                    output="${output%/*/}/"
                    input="${input#../}"
                elif [[ $input == ./* ]] ; then
                    input="${input#./}"
                elif [[ $input == */* ]] ; then
                    output="$output${input%${input#*/}}"
                    input="${input#*/}"
                else
                    output="$output$input"
                    input=""
                fi
                if [ -L "${output%%/}" ] ; then
                    if [ "$input" ] ; then
                        input="`readlink_func "${output%%/}"`/$input"
                    else
                        input="`readlink_func "${output%%/}"`"
                    fi
                    output="${output%%/}"
                    output="${output%/*}/"
                fi
            done
        fi
    fi
    echo "${output%%/}"
}

h3(#sysbashrc). Loading System-wide Bashrc

This section isn't too exciting. According to the man page:

bq. When bash is invoked as an interactive login shell, or as a  non-interactive  shell with the --login option, it first reads and executes commands from the file /etc/profile, if that file exists.   After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one  that exists  and  is  readable.

SOME systems have a version of bash that appears not to obey this rule. And some systems put crucial configuration settings in @/etc/bashrc@ (why?!?). And *some* systems even do something silly like use @/etc/bashrc@ to source @~/.bashrc@ (I did this myself, once upon a time, when I knew not-so-much). I've decided that this behavior cannot be relied upon, so I explicitly source these files myself. The only interesting bit is that I added a workaround so that systems that use @/etc/bashrc@ to source @~/.bashrc@ won't get into an infinite loop. There's probably a lot more potential trouble here that I'm ignoring. But here's the code:

bc[bash].. if [[ -r /etc/bashrc && $SYSTEM_BASHRC != 1 ]]; then
    dprint " - loading /etc/bashrc"
    . /etc/bashrc
    export SYSTEM_BASHRC=1
fi

h3(#behavioral). Behavioral Settings

This is basic stuff, but after you get used to certain behaviors (such as whether * matches . and ..), you often get surprised when they don't work that way on other systems. Some of this is because I found a system that did it another way by default; some is because I decided I like my defaults and I don't want to be surprised in the future.

The interactive-shell-detection here is nice. <code>$-</code> is a variable set by bash containing a set of letters indicating certain settings. It always contains the letter i if bash is running interactively. So far, this has been quite backwards-compatible.

bc[bash].. shopt -s extglob # Fancy patterns, e.g. +()
# only interactive
if [[ $- == *i* ]]; then
    dprint setting the really spiffy stuff
    shopt -s checkwinsize # don't get confused by resizing
    shopt -s checkhash # if hash is broken, doublecheck it
    shopt -s cdspell # be tolerant of cd spelling mistakes
fi

h3(#envariables). Environment Variables

There are a *slew* of standard environment variables that bash defines for you (such as @HOSTNAME@). There are even more standard environment variables that various programs pay attention to (such as @EDITOR@ and @PAGER@). And there are a few others that are program-specific (such as @PARINIT@ and @CVSROOT@).

Before I get going, though, let me show you a secret. Ssh doesn't like transmitting information from client to server shell... the only reliable way to do it that I've found is the @TERM@ variable. So... I smuggle info through that way, delimited by colons. Before I set any other environment variables, first, I find my smuggled information:

bc[bash]. if [[ $TERM == *:* && ( $SSH_CLIENT || $SSH_TTY || $SSH_CLIENT2 ) ]] ; then
    dprint "Smuggled information through the TERM variable!"
    term_smuggling=( ${TERM//:/ } )
    export SSH_LANG=${term_smuggling[1]}
    TERM=${term_smuggling[0]}
    unset term_smuggling
fi

I begin by setting @GROUPNAME@ and @USER@ in a standard way:

bc[bash].. if [[ $OSTYPE == solaris* ]] ; then
    idout=(`/bin/id -a`)
    USER="${idout[0]%%\)*}"
    USER="${USER##*\(}"
    [[ $USER == ${idout[0]} ]] && USER="UnknownUser"
    GROUPNAME="UnknownGroup"
    unset idout
else
    [[ -z $GROUPNAME ]] && GROUPNAME="`id -gn`"
    [[ -z $USER ]] && USER="`id -un`"
fi

p. Then some standard things (@MAILPATH@ is used by bash to check for mail, that kind of thing), including creating @OS_VER@ and @HOST@ to allow me to identify the system I'm running on:

bc[bash].. # I tote my own terminfo files around with me
[ -d ~/.terminfo ] && export TERMINFO=~/.terminfo/
[ "$TERM_PROGRAM" == "Apple_Terminal" ] && \
    export TERM=nsterm-16color

MAILPATH=""
MAILCHECK=30
add_to_path_file MAILPATH /var/spool/mail/$USER
add_to_path MAILPATH $HOME/Maildir/
[[ -z $MAILPATH ]] && unset MAILCHECK
[[ -z $HOSTNAME ]] && \
    export HOSTNAME=`/bin/hostname` && echo 'Fake Bash!'
HISTSIZE=1000
HOST=${OSTYPE%%[[:digit:]]*}
OS_VER=${OSTYPE#$HOST}
[ -z "$OS_VER" ] && OS_VER=$( uname -r )
OS_VER=(${OS_VER//./ })
TTY=`tty`
PARINIT="rTbgq B=.,?_A_a P=_s Q=>|}+"

export USER GROUPNAME MAILPATH HISTSIZE OS_VER HOST TTY PARINIT

p. I've also gotten myself into trouble in the past with @UMASK@ being set improperly, so it's worth setting manually. Additionally, to head off trouble, I make it hard to leave myself logged in as root on other people's systems accidentally:

bc[bash].. if [[ $GROUPNAME == $USER && $UID -gt 99 ]]; then
    umask 002
else
    umask 022
fi

if [[ $USER == root ]] ; then
    [[ $SSH_CLIENT || $SSH_TTY || $SSH_CLIENT2  ]] && \
        export TMOUT=600 || export TMOUT=3600
fi

if [[ -z $INPUTRC && ! -r $HOME/.inputrc && -r /etc/inputrc ]];
then
    export INPUTRC=/etc/inputrc
fi

p. It is at this point that we should pause and load anything that was in /etc/profile, just in case it was left out (and, if its in there, maybe it should override what we've done so far):

bc[bash].. export BASHRCREAD=1

if [[ -r /etc/profile && -z $SYSTEM_PROFILE ]]; then
    dprint "- loading /etc/profile ... "
    . /etc/profile
    export SYSTEM_PROFILE=1
fi

p. Now I set my prompt (but only if this is an interactive shell). The idea is that, if I'm logged into another system, I want to see how long I've been idle. This works out well:

bc[bash].. if [[ $- == *i* ]]; then
    if [[ $SSH_CLIENT || $SSH_TTY || $SSH_CLIENT2 ]] ; then
        PS1='(\d \T)\n[\u@\h \W]\$ '
    else
        PS1='[\u@\h \W]\$ '
    fi
fi

p. Now I set up the various paths. Note that it doesn't matter if these paths don't exist; they'll be checked and ignored if they don't exist:

bc[bash].. verify_path PATH
add_to_path PATH "/usr/local/sbin"
add_to_path PATH "/usr/local/teTeX/bin"
add_to_path PATH "/usr/X11R6/bin"
add_to_path PATH "$HOME/bin"
add_to_path_first PATH "/sbin"

add_to_path_first PATH "/bin"
add_to_path_first PATH "/usr/sbin"
add_to_path_first PATH "/opt/local/bin"
add_to_path_first PATH "/usr/local/bin"

if [[ $OSTYPE == darwin* ]] ; then
    add_to_path PATH "$HOME/.conf/darwincmds"

    # The XFILESEARCHPATH (for app-defaults and such)
    # is a wonky kind of path
    [ -d /opt/local/lib/X11/app-defaults/ ] && \
        add_to_path_force XFILESEARCHPATH \
            /opt/local/lib/X11/%T/%N
    [ -d /sw/etc/app-defaults/ ] && \
        add_to_path_force XFILESEARCHPATH /sw/etc/%T/%N
    add_to_path_force XFILESEARCHPATH /private/etc/X11/%T/%N
fi

verify_path MANPATH
add_to_path MANPATH "/usr/man"
add_to_path MANPATH "/usr/share/man"
add_to_path MANPATH "/usr/X11R6/man"
add_to_path_first MANPATH "/opt/local/share/man"
add_to_path_first MANPATH "/opt/local/man"
add_to_path_first MANPATH "/usr/local/man"
add_to_path_first MANPATH "/usr/local/share/man"

verify_path INFOPATH
add_to_path INFOPATH "/usr/share/info"
add_to_path INFOPATH "/opt/local/share/info"

p. And now there are STILL MORE environment variables to set. This final group may rely on some of the previous paths being set (most notably, @PATH@).

bc[bash].. export PAGER='less'
have vim && export EDITOR='vim' || export EDITOR='vi'
if [[ -z $DISPLAY && $OSTYPE == darwin* ]]; then
    processes=`ps ax`
    # there are double-equals here, even though they don't show
    # on the webpage
    if [[ $processes == *xinit* || $processes == *quartz-wm* ]]; then
        export DISPLAY=:0
    else
        unset DISPLAY
    fi
fi
if [[ $HOSTNAME == wizard ]] ; then
    dprint Wizards X forwarding is broken
    unset DISPLAY
fi
export TZ="US/Central"
if [ "${BASH_VERSINFO[0]}" -le 2 ]; then
    export HISTCONTROL=ignoreboth
else
    export HISTCONTROL="ignorespace:erasedups"
fi
export HISTIGNORE="&:ls:[bf]g:exit"
export GLOBIGNORE=".:.."
export CVSROOT=kyle@cvs.memoryhole.net:/home/kyle/cvsroot
export CVS_RSH=ssh
export BASH_ENV=$HOME/.bashrc
add_to_path_file MAILCAPS $HOME/.mailcap
add_to_path_file MAILCAPS /etc/mailcap
add_to_path_file MAILCAPS /usr/etc/mailcap
add_to_path_file MAILCAPS /usr/local/etc/mailcap
export EMAIL='kyle-envariable@memoryhole.net'
export GPG_TTY=$TTY
export RSYNC_RSH="ssh -2 -c arcfour -o Compression=no -x"
if [ -d /opt/local/include -a -d /opt/local/lib ] ; then
    export CPPFLAGS="-I/opt/local/include $CPPFLAGS"
    export LDFLAGS="-L/opt/local/lib $LDFLAGS"
fi
if have glibtoolize ; then
    have libtoolize || export LIBTOOLIZE=glibtoolize
fi

p. One little detail that I rather like is the fact that xterm's window title often tells me exactly what user I am on what machine I am, particularly when I'm ssh'd into another host. This little bit of code ensures that this happens:

bc[bash]. if [[ $TERM == xterm* || $OSTYPE == darwin* ]]; then
    export PROMPT_COMMAND='echo -ne "\033]0;${USER}@${HOSTNAME/.*/}: ${PWD/${HOME}/~}\007"'
else
    unset PROMPT_COMMAND
fi

h3(#charset). Character Set Detection

I typically work in a UTF-8 environment. MacOS X (my preferred platform for day-to-day stuff) has made this pretty easy with really excellent UTF-8 support, and Linux has come a long way (to near-parity) in the last few years. Most of my computing is done via a uxterm (aka. xterm with UTF-8 capability turned on), but I also occasionally dabble in other terminals (sometimes without realizing it). Despite the progress made, however, not all systems support UTF-8, and neither do all terminals. Some systems, including certain servers I've used, simply don't have UTF-8 support *installed*, even though they're quite capable of it.

The idea is that the @LANG@ environment variable is supposed to reflect the language and character set you're using. So, this is where I try and figure out what @LANG@ should be.

The nifty @xprop@ trick here is from a vim hint I found. I haven't tried it out for very long, but so far it seems to be a really slick way of finding out what sort of environment your term is doing, even if it hasn't set the right environment variables (e.g. @LANG@).

One of the more annoying details of this stuff is that ssh doesn't pass LANG (or any other locale information) along when you connect to a remote server. Granted, there are good reasons for this (just because *my* computer is happy when @LANG=en_US.utf-8@ doesn't mean any server I connect to would be). But remember how I smuggled that info through and stuck it in the SSH_LANG variable? Here's where it becomes important.

As a final note here, I discovered that @less@ is capable of handling multibyte charsets (at least, recent versions of it are), but for whatever reason it doesn't always support @LANG@ and other associated envariables. It DOES however support @LESSCHARSET@...

Anyway, here's the code:

bc[bash].. if [[ -z $LC_ALL && -z $LC_CTYPE && -z $LANG ]] ; then
    dprint no LC_ALL or LC_CTYPE or LANG
    if [[ $WINDOWID ]] && have xprop ; then
        dprint querying xprop
        __bashrc__wmlocal=(`xprop -id $WINDOWID -f WM_LOCALE_NAME 8s ' $0' -notype WM_LOCALE_NAME`)
        export LANG=`eval echo ${__bashrc__wmlocal[1]}`
        unset __bashrc__wmlocal
    elif [[ $OSTYPE == darwin* ]] ; then
        dprint "I'm on Darwin"
        if [[ ( $SSH_LANG && \
            ( $SSH_LANG == *.UTF* || $SSH_LANG == *.utf* ) || \
            $TERM_PROGRAM == Apple_Terminal ) && \
            -d "/usr/share/locale/en_US.UTF-8" ]] ; then
            export LANG='en_US.UTF-8'
        elif [ -d "/usr/share/locale/en_US" ] ; then
            export LANG='en_US'
        else
            export LANG=C
        fi
    elif [[ $TERM == linux || $TERM_PROGRAM == GLterm ]] ; then
        if [ -d "/usr/share/locale/en_US" ] ; then
            export LANG='en_US'
        else
            export LANG=C # last resort
        fi
    else
        if have locale ; then
            locales=`locale -a`
            case "$locales" in
                *en_US.utf8[[:space:]]*|*en_US.utf8)
                export LANG='en_US.utf8'
                export LESSCHARSET=utf-8
                ;;
                *en_US.utf-8[[:space:]]*|*en_US.utf-8)
                export LANG='en_US.utf-8'
                export LESSCHARSET=utf-8
                ;;
                *en_US[[:space:]]*|*en_US)
                export LANG='en_US'
                ;;
                *)
                export LANG=C
                unset LESSCHARSET
                ;;
            esac
            unset locales
        fi
    fi
else
    dprint LANG IS ALREADY SET! $LANG
fi

h3(#aliases). Aliases

This is where a lot of the real action is, in terms of convenience settings. Like anyone who uses a computer every day, I type a lot; and if I can avoid it, so much the better. (I'm a lazy engineer.)

Sometimes I can't *quite* get what I want out of an alias. In _csh_ aliases can specify what to do with their arguments. In bash, aliases are really more just shorthand -- "pretend I really typed this" kind of stuff. Instead, if you want to be more creative with argument handling, you have to use functions (it's not a big deal, really). Here's a few functions I added just because they're occasionally handy to have the shell do for me:

bc[bash].. function exec_cvim {
/Applications/Vim.app/Contents/MacOS/Vim -g "$@" &
}

function darwin_locate { mdfind "kMDItemDisplayName == '$@'wc"; }
if [[ $- == *i* && $OSTYPE == darwin* && ${OS_VER[0]} -ge 8 ]] ;
then
alias locate=darwin_locate
fi

function printargs { for F in "$@" ; do echo "$F" ; done ; }
function psq { ps ax | grep -i $@ | grep -v grep ; }
function printarray {
for ((i=0;$i<`eval 'echo ${#'$1'[*]}'`;i++)) ; do
    echo $1"[$i]" = `eval 'echo ${'$1'['$i']}'`
done
}
alias back='cd $OLDPWD'

p. There are often a lot of things that I just expect to *work*. For example, when I type "ls", I want it to print out the contents of the current directory. In color if possible, without if necessary. It often annoys me, on Solaris systems, when the *working* version of <code>ls</code> is buried in the path, while a really lame version is up in @/bin@ for me to find first. Here's how I fix that problem:

bc[bash].. # GNU ls check
if [[ $OSTYPE == darwin* ]]; then
    dprint "- DARWIN ls"
    alias ls='/bin/ls -FG'
    alias ll='/bin/ls -lhFG'
elif have colorls ; then
    dprint "- BSD colorls"
    alias ls='colorls -FG'
    alias ll='colorls -lhFG'
else
    __kbwbashrc__lsarray=(`\type -ap ls`)
    __kbwbashrc__lsfound=no
    for ((i=0;$i<${#__kbwbashrc__lsarray[*]};i=$i+1)) ; do
        if ${__kbwbashrc__lsarray[$i]} --version &>/dev/null ;
        then
            dprint "- found GNU ls: ${__kbwbashrc__lsarray[$i]}"
            alias ls="${__kbwbashrc__lsarray[$i]} --color -F"
            alias ll="${__kbwbashrc__lsarray[$i]} --color -F -lh"
            __kbwbashrc__lsfound=yes
            break
        fi
    done
    if [ "$__kbwbashrc__lsfound" == no ] ; then
        if ls -F &>/dev/null ; then
            dprint "- POSIX ls"
            alias ls='ls -F'
            alias ll='ls -lhF'
        else
            alias ll='ls -lh'
        fi
    fi
    unset __kbwbashrc__lsarray __kbwbashrc__lsfound
fi

p. Similar things are true of @make@ and @sed@ and such. I've gotten used to GNU's version, and if they exist on the machine I'd much rather automatically use them than have to figure out whether it's really called @gnused@ or @gsed@ or @justtowasteyourtimesed@ all by myself:

bc[bash].. if [[ $OSTYPE == linux* ]] ; then
    # actually, just Debian, but this works for now
    alias gv="gv --watch --antialias"
else
    alias gv="gv -watch -antialias"
fi
if have gsed ; then
    alias sed=gsed
elif have gnused ; then
    alias sed=gnused
fi
if have gmake ; then
    alias make=gmake
elif have gnumake ; then
    alias make=gnumake
fi

p. The rest of them are mostly boring, with one exception:

bc[bash].. alias macfile="perl -e 'tr/\x0d/\x0a/'"
have tidy && alias tidy='tidy -m -c -i'
have vim && alias vi='vim'
alias vlock='vlock -a'
alias fastscp='scp -c arcfour -o Compression=no' # yay speed!
alias startx='nohup ssh-agent startx & exit'
alias whatlocale='printenv | grep ^LC_'
alias fixx='xauth generate $DISPLAY'
alias whatuses='fuser -v -n tcp'
alias which=type
alias ssh='env TERM="$TERM:$LANG" ssh'
have realpath || alias realpath=realpath_func
if have readlink ; then
    unset -f readlink_func
else
    alias readlink=readlink_func
fi
if [[ $OSTYPE == darwin* ]]; then
    alias top='top -R -F -ocpu -Otime'
    alias cvim='exec_cvim'
    alias gvim='exec_cvim'
fi

p. Did you note that ssh alias? Heh.

h3(#tab). Tab-completion Options

Bash has had, for a little while at least, the ability to do custom tab-completion. This is *really* convenient (for example, when I've typed @cvs commit@ and I hit tab, bash can know that I really just want to tab-complete files that have been changed). However, I won't bore you with a long list of all the handy tab-completions that are out there. Most of mine are just copied from "here":http://www.caliban.org/bash/ anyway. But I often operate in places where that big ol' bash-completion file can be in multiple places. Here's the simple little loop I use. You'll notice that it only does the loop after ensuring that bash is of recent-enough vintage:

bc[bash].. completion_options=(
~/.conf/bash_completion
/etc/bash_completion
/opt/local/etc/bash_completion
)
if [[ $BASH_VERSION && -z $BASH_COMPLETION && $- == *i* ]] ;
then
    bash=${BASH_VERSION%.*}; bmajor=${bash%.*}; bminor=${bash#*.}
    if [ $bmajor -eq 2 -a $bminor '>' 04 ] || [ $bmajor -gt 2 ] ;
    then
        for bc in "${completion_options[@]}" ; do
            if [[ -r $bc ]] ; then
                dprint Loading the bash_completion file
                if [ "$BASH_COMPLETION" ] ; then
                    BASH_COMPLETION="$bc"
                fi
                #COMP_CVS_REMOTE=yes
                export COMP_CVS_ENTRIES=yes
                source "$bc"
                break
            fi
        done
    fi
    unset bash bminor bmajor
fi
unset completion_options

h3(#local). Machine-local settings

You'd be surprised how useful this can be sometimes. Sometimes I need machine-specific settings. For example, on some machines there's a PGI compiler I want to use, and maybe it needs some environment variable set. Rather than put it in the main bashrc, I just put that stuff into @~/.bashrc.local@ and have it loaded:

bc[bash].. dprint checking for bashrc.local in $HOME
if [ -r "${HOME}/.bashrc.local" ]; then
    dprint Loading local bashrc
    source "${HOME}/.bashrc.local"
fi

h3(#logout). Auto-logout

Lastly, it is sometimes the case that the @TMOUT@ variable has been set, either by myself, or by a sysadmin who doesn't like idle users (on a popular system, too many idle users can unnecessarily run you out of ssh sockets, for example). In any case, when my time is limited, I like being aware of how much time I have left. So I have my bashrc detect the @TMOUT@ variable and print out a big banner so that I know what's up and how much time I have. Note that bash can do simple math all by itself with the @$(( ))@ construction. Heheh. Anyway:

bc[bash].. if [[ $TMOUT && "$-" == *i* ]]; then
    echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    echo You will be autologged out after:
    echo -e -n '\t'
    seconds=$TMOUT
    days=$((seconds/60/60/24))
    seconds=$((seconds-days*24*60*60))
    hours=$((seconds/60/60))
    seconds=$((seconds-hours*60*60))
    minutes=$((seconds/60))
    seconds=$((seconds-minutes*60))
    [[ $days != 0 ]] && echo -n "$days days "
    [[ $hours != 0 ]] && echo -n "$hours hours "
    [[ $minutes != 0 ]] && echo -n "$minutes minutes "
    [[ $seconds != 0 ]] && echo -n "$seconds seconds "
    echo
    echo ... of being idle.
    unset days hours minutes seconds
fi

dprint BASHRC_DONE

p. While I'm at it, I suppose I should point out that I also have a @~/.bash_logout@ file that's got some niceness to it. If it's the last shell, it clears sudo's cache, empties the console's scrollback buffer, and clears the screen. Note: **DO NOT PUT THIS IN YOUR BASHRC** You wouldn't like it in there.

bc[bash]. if [ "$SHLVL" -eq 1 ] ; then
    sudo -k
    type -P clear_console &>/dev/null && clear_console 2>/dev/null
    clear
fi

p. And that's about it! Of course, I'm sure I'll add little details here and there and this blog entry will become outdated. But hopefully someone finds my bashrc useful. I know I've put a lot of time and effort into it. :)]]></description>
         <link>http://www.memoryhole.net/kyle/2008/03/my_bashrc.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/03/my_bashrc.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Cool Stuff</category>
        
        
         <pubDate>Thu, 13 Mar 2008 12:51:37 -0600</pubDate>
      </item>
            <item>
         <title>w3m and MacPorts</title>
         <description>For whatever reason, w3m refuses to build on my Intel OSX box with the latest boehmgc library. To get it to build, you must forcibly downgrade to boehmgc 6.8 or 6.7 or something earlier.

Also, I noticed that w3m isn&apos;t marked as depending on gdk-pixbuf. Strictly speaking, it doesn&apos;t, but it does if you have @--enable-image=x11@. :P Add this to your Portfile:

bc. depends_lib lib:libgccpp.1:boehmgc bin:gdk-pixbuf-config:gdk-pixbuf

Also, it seems that either w3m or gdk-pixbuf-config appends an extra library to the config line for gdk-pixbuf-config (essentially, they specify @-lgdk_pixbuf@ AND @-lgdk_pixbuf_xlib@). That extra library causes build problems for w3m; you can fix it by editing @/opt/local/bin/gdk-pixbuf-config@ and removing the @-lgdk_pixbuf@ from what it prints out (however, if you use other software that uses gdk-pixbuf-config, you may need to put it back once w3m has finished building).</description>
         <link>http://www.memoryhole.net/kyle/2008/03/w3m_and_macports.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/03/w3m_and_macports.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">People Suck</category>
        
        
         <pubDate>Thu, 13 Mar 2008 10:12:00 -0600</pubDate>
      </item>
            <item>
         <title>Sorting Spaces</title>
         <description>There seems to be some disagreement, at Apple Computer, about exactly what the definition of the word &quot;ignore&quot; is. From the &quot;sort&quot; man page:

bq.  -d     Sort  in  `phone  directory&apos; order: ignore all characters except letters, digits and blanks when sorting.

What does that suggest to you? Well, let&apos;s compare it to the GNU &quot;sort&quot; man page:

bq. -d, --dictionary-order
              consider only blanks and alphanumeric characters

So you&apos;d *THINK*, right, that sorting with these two options would be equivalent, right?

Nope!

Here&apos;s a simple list:

- 192.168.2.4 foo
- 192.168.2.42 foo

How should these things be sorted when the -d option is in effect? You&apos;ve got a conundrum: is a space sorted BEFORE a number or AFTER a number?

Curse you, alphabet! You&apos;re never around when I need you!

And, of course, BSD and GNU answer that question differently. On GNU, the answer is AFTER, on BSD the answer is BEFORE! Oh goody.

Here&apos;s a better way if you need the sorting results to be the same on both BSD and GNU: replace all spaces with something else non-alpha-numeric that isn&apos;t used in the file (such as an underscore, or an ellipsis, or an em-dash). Then sort with -ds (no last-minute saving throws!), then replace the underscore (or whatever) with a space again.

And if you need it to be consistent on OSX platforms too, make it a -dfs sort (so that capitals and lower-case are considered the same).</description>
         <link>http://www.memoryhole.net/kyle/2008/03/sorting_spaces.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/03/sorting_spaces.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Commentary</category>
        
        
         <pubDate>Wed, 12 Mar 2008 18:08:44 -0600</pubDate>
      </item>
            <item>
         <title>Apple&apos;s Compiler Idiocy</title>
         <description>This is something that&apos;s been bugging me for a while here, and I might as well write it down since I finally found a solution.

I have an atomic-increment function. To make it actually atomic, it uses assembly. Here&apos;s the PPC version:

bc[cc]. static inline int atomic_inc(int * operand)
{
    int retval;
    register unsigned int incrd = incrd; // silence initialization complaints
    asm volatile (&quot;1:\n\t&quot;
                  &quot;lwarx  %0,0,%1\n\t&quot; /* reserve operand into retval */
                  &quot;addi   %2,%0,1\n\t&quot; /* increment */
                  &quot;stwcx. %2,0,%1\n\t&quot; /* un-reserve operand */
                  &quot;bne-   1b\n\t&quot; /* if it failed, try again */
                  &quot;isync&quot; /* make sure it wasn&apos;t all just a dream */
                  :&quot;=&amp;r&quot; (retval)
                  :&quot;r&quot; (operand), &quot;r&quot; (incrd)
                  :&quot;cc&quot;,&quot;memory&quot;);
    return retval;
}

Now, what exactly is wrong with that, eh? This works *great* on Linux. The general GCC compiles this just fine, as does the PGI compiler, IBM&apos;s compiler, and Intel&apos;s compiler.

Apple&apos;s compiler? Here&apos;s the error I get:

bc[cc]. gcc -c test.c
/var/tmp/ccqu2RmV.s:5949:Parameter error: r0 not allowed for parameter 2 (code as 0 not r0)

Okay, so, some kind of monkey business is going on. What does this look like in the .S file?

bc[cc]. 1:
    lwarx r0,0,r2
    addi   r3,r0,1
    stwcx. r3,0,r2
    bne-   1b
    isync
    mr r3,r0

It decided (retval) was going to be r0! Even though that&apos;s apparently not allowed! (FYI it&apos;s the @addi@ that generates the error).

The correct workaround is to use the *barely* documented &quot;b&quot; option, like this:

bc[cc]. static inline int atomic_inc(int * operand)
{
    int retval;
    register unsigned int incrd = incrd; // silence initialization complaints
    asm volatile (&quot;1:\n\t&quot;
                  &quot;lwarx  %0,0,%1\n\t&quot; /* reserve operand into retval */
                  &quot;addi   %2,%0,1\n\t&quot; /* increment */
                  &quot;stwcx. %2,0,%1\n\t&quot; /* un-reserve operand */
                  &quot;bne-   1b\n\t&quot; /* if it failed, try again */
                  &quot;isync&quot; /* make sure it wasn&apos;t all just a dream */
                  :&quot;=&amp;b&quot; (retval) /* note the b instead of the r */
                  :&quot;r&quot; (operand), &quot;r&quot; (incrd)
                  :&quot;cc&quot;,&quot;memory&quot;);
    return retval;
}

That ensures, on PPC machines, that the value is a &quot;base&quot; register (aka not r0).

How gcc on Linux gets it right all the time, I have no idea. But it does.</description>
         <link>http://www.memoryhole.net/kyle/2008/01/apples_compiler_idiocy.html</link>
         <guid>http://www.memoryhole.net/kyle/2008/01/apples_compiler_idiocy.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Research</category>
        
        
         <pubDate>Fri, 11 Jan 2008 16:30:29 -0600</pubDate>
      </item>
            <item>
         <title>Health Coverage &amp; Obama</title>
         <description>I was recently pointed at an Op-Ed in the NYT by Paul Krugman called &quot;The Mandate Muddle&quot;:http://www.nytimes.com/2007/12/07/opinion/07krugman.html?n=Top/Opinion/Editorials%20and%20Op-Ed/Op-Ed/Columnists/Paul%20Krugman that talks about Obama&apos;s health care plan versus Hillary&apos;s health care plan.

I must admit, I have difficulty understanding the whole &quot;universal health coverage&quot; thing. As I understand it, the way things currently work is that the advantage of health insurance is *almost* the same as every other kind of insurance: when something bad happens, you don&apos;t take the huge financial hit. This is essentially equivalent to simply socking away money somewhere useful (a mutual fund, say) that you&apos;ve specifically labelled as &quot;health care money&quot;, and whenever you need to pay for health care, you tap it. The primary difference being that if bad things happen that you don&apos;t have enough for, insurance will cover it while your personal health account wouldn&apos;t--and the reason it works that way is because rather than drawing on your own account, you&apos;re drawing on (essentially) the combined purchasing power of everyone else that&apos;s been chipping in to the same insurance company. Only, because it&apos;s a company rather than an account, they have a fair bit of overhead involved in verifying claims and making sure people aren&apos;t cheating. The next big detail, though, is that insurance companies negotiate lower rates with doctors: so they limit who you can go to, and they don&apos;t have to pay as much as you would have to if you self-insured. So far, this all makes a fair bit of sense: by purchasing insurance, you mitigate your risk, at the expense of most likely paying more into the system than you ever get out. Probably quite a lot more (otherwise it wouldn&apos;t be profitable for the insurance company). On the other hand, if you&apos;re wealthy, maybe you aren&apos;t so concerned about mitigating the risk, since you can stand to take the hit if and when you need health care.

So what&apos;s the big goal with mandating universal insurance? By making sure that everyone is mitigating risk, we legally guarantee that the insurance companies have customers. And what happens? Do we get better health care? Does the cost of health care go down? Well, we may get better health care, because people will go to more regular check-ups. And that may cause the cost to go down (due to the whole &quot;catch expensive problems before they become expensive&quot; thing), but costs also go up because now hospitals can&apos;t afford to give great prices to insurance companies and pass the buck on to the little guy without insurance. So what happens if you can&apos;t afford to buy insurance?

If you can&apos;t afford car insurance, these days, you just ride a bike (or you walk). But there&apos;s nothing you can give up (short of committing suicide) if you can&apos;t afford mandatory health insurance. So what happens? Legally, you&apos;re required to buy it, so suddenly purchasing insurance gets put first on the list, in front of things like food, shelter, keeping the lights on, keeping the toilets working, etc. Krugman says this is a false claim, but on what grounds? Only that the government will subsidize it for people... meaning what, that the government will buy it for you? Forcibly make it cheaper? This sounds *ripe* for abuse, and unless the government is buying it for you, doesn&apos;t actually solve the problem.

And the first false claim, that it&apos;s unenforceable... the answer is Switzerland and the Netherlands? Surely that can&apos;t be taken as proof positive. We&apos;re talking about America here, a country that rejected Kyoto, a country that still has capital punishment, a country where people (mostly) don&apos;t live close together and find it easy to ignore each other and revel in our ignorance and rudeness. And Krugman wants to compare this body of people to folks in Switzerland and the Netherlands? Countries known for their staunchly liberal politics? The *citizens* of America have a far different approach to &quot;government mandates&quot; than do citizens of Switzerland and the Netherlands.

Besides which, there is no enforcement mechanism in these plans. Whether or not it&apos;s even possible to enforce such a mandate, there&apos;s no provision for it in Hillary&apos;s plan. For car insurance, you&apos;re required to provide proof whenever you get pulled over by the cops. What happens for Hillary&apos;s care? Nothing. If anything, Edwards&apos; plan has at least approached this aspect, as his plan requires people to show proof of insurance when they submit their tax returns. But I think Timothy Noah makes a good point when he says &quot;this&quot;:http://www.slate.com/id/2178896/:

bq.. Enrolling people in a private health care plan isn&apos;t the hard part; forcing people to pay for a private health care plan is the hard part. Yes, the government has procedures to collect student loans and unpaid taxes, but it&apos;s understood that such payments are obligations. There&apos;s little disagreement that if you take out a loan, you&apos;re obliged to repay it, and only slightly more disagreement (mostly among crackpots) that as a citizen you are obliged to share in the cost of government. I believe there would be a lot of disagreement about whether the government could compel you to buy a private health insurance policy.

If you want to drive a car, it&apos;s accepted that you have to buy private auto insurance. But that&apos;s conditional on enjoying the societal privilege of driving a car; you can avoid the requirement by choosing not to drive one. A mandate to buy private health insurance, however, would be conditional on... being alive. I can&apos;t think of another instance in which the government says outright, &quot;You must buy this or that,&quot; independent of any special privilege or subsidy it may bestow on you. Even if such a requirement could pass muster in the courts--and I have my doubts--it seems to me that politically it would give the inevitable conservative opposition a nice fat target to rally around. _Big Brother will steal your wages if you don&apos;t buy a health insurance policy!_

p. More importantly, he keenly observes the following:

bq.. It may be necessary to achieve the goal of expanding government-administered health insurance in stages. All the health care plans of the major Democratic candidates are premised on that assumption, whether they acknowledge it or not. The only Democratic candidate I&apos;m aware of who dispenses with such gradualism is Dennis Kucinich, whose solution--&quot;Medicare For All&quot;--is the only one that will solve the health care mess in the long run. Clinton, Obama, and Edwards all have plans that would steadily enlarge the role of government health insurance. These are accommodations to political reality. I question the wisdom of including, within such an accommodation, a mandate that would render that accomodation unattractive to a large bloc of voters. If we&apos;re going to create a ruckus, better to do it in the service of a more comprehensive solution than either Clinton or Edwards has put forth. If we aren&apos;t, Obama&apos;s resistence to an individual mandate makes perfect sense.

p. That being said, I&apos;m with Krugman in being annoyed that Obama keeps making the claim that his health care plan covers everyone. The only thing I can imagine is that he must be playing some stupid semantic game: &quot;covers everyone&quot; (i.e. applies to everyone) versus &quot;provides health care coverage for everyone&quot;.</description>
         <link>http://www.memoryhole.net/kyle/2007/12/health_coverage_obama.html</link>
         <guid>http://www.memoryhole.net/kyle/2007/12/health_coverage_obama.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Politics</category>
        
        
         <pubDate>Fri, 07 Dec 2007 11:38:10 -0600</pubDate>
      </item>
            <item>
         <title>Moving Parts are Evil</title>
         <description>I recently was doing some work on the computer of an elderly friend of mine, and had a bit of a scare with a hard drive that appeared to have failed. Turns out the boot block had been corrupted somehow, which was easy enough to fix from another computer (yay Linux!). Anyway, this made me stick my nose into S.M.A.R.T. statistics on hard drives. There&apos;s a nice little tool for OSX that sits in the menu bar and keeps an eye on your disks for you (&quot;SMARTReporter&quot;:http://www.corecode.at/smartreporter/). I figured there *had* to be something similar for Windows. In the &quot;free&quot; department, there&apos;s very little available that&apos;s worth beans, but I was able to find something called &quot;HDD Health&quot;:http://www.panterasoft.com. No sooner had I installed it than it started telling me that the Seek Error Rate was fluctuating wildly (generally it would go from 100 to 200 and back again every couple minutes). This was rather sudden! I got worried about the health of the drive, and started backing things up on it... then I looked it up on the internet. Apparently this is a common thing with Western Digital drives (which is what this computer had): their Seek Error Rate tends to fluctuate like that, and it doesn&apos;t mean anything at all. The general recommendation seems to be &quot;download the diagnostic tools from Western Digital; those will be authoritative&quot;. So I did, and they said the drive was in perfect health.

Well, so much for being worried!

It does seem to speak to the temperamental (and largely useless) nature of S.M.A.R.T. statistics. Thing to keep in mind: they don&apos;t always mean much.</description>
         <link>http://www.memoryhole.net/kyle/2007/11/moving_parts_are_evil.html</link>
         <guid>http://www.memoryhole.net/kyle/2007/11/moving_parts_are_evil.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Commentary</category>
        
        
         <pubDate>Tue, 27 Nov 2007 23:40:09 -0600</pubDate>
      </item>
            <item>
         <title>Concurrent Hash Table Tricks</title>
         <description><![CDATA[So, I'm working on qthreads (which is open-sourced, but currently lacks a webpage), and thinking about its Unix implementation.

The Unix implementation emulates initialization-free synchronization (address locks and FEBs) by storing addresses in a hash table (well, okay, a striped hash table, but if we make the stripe 1, then it's just a hash table). Let's take the simplest: address locks. The semantics of the hash table at the moment are really pretty basic: if an address is in the hash, it's locked. If it's not in the hash, it's not locked. The hash is the @cp_hashtable@ from "libcprops":http://cprops.sf.net, a library which I appreciate greatly for giving C programmers easy access to high-quality basic data structures (I've contributed some significant code to it as well). Anyway, the downside of using this hash table is that it's a bottleneck. The table is made thread-safe by simply wrapping it in a lock, and every operation (lock and unlock) requires locking the table to either insert an entry or remove an entry.

So how could we do this with a more concurrent hash table? I've seen two hash table APIs that are concurrent: the lock-free hash in Java that I talked about previously, and the @concurrent_hash_map@ from Intel's "Thread Building Blocks":http://threadingbuildingblocks.org/ library (which, given that it's in C++, is something I can actually use).

The way the TBB hash works is that you can perform three basic operations on your hash: @find()@, @insert()@, and @erase()@. When you do either of the first two operations, you can lock that entry in the hash and prevent others from getting at it, or you can access it read-only. The erase function merely takes a key and removes it from the hash table, giving you no access to whatever might have been deleted from the hash table. Worse yet, you cannot erase something that you currently have a lock on, even if it's a write lock!

Using this hash the way that I currently use the cprops hash is thus impossible. Why? Because *erasing things from the TBB hash is ALWAYS a race condition*. Put another way, all TBB hash erase operations are "blind erase" operations, when what you really want is "erase if it's still in an erasable state". You can never be certain that erasing an entry from the hash table is a good idea, because you can never be certain that someone else didn't add something important to that entry in between the time that you decided the entry was erasable and the time you actually erased it. If I insert a value (to "lock" an address, say), I can associate that value with a queue of waiting threads (i.e. other threads that also want to lock that address), but I can never erase that entry in the hash table! The reason is that since I can't erase something that I have access to (i.e. have a write-lock on), there's a race condition between me fetching the contents of that hash table entry and me removing that entry from the hash table.

A different approach to this might be to simply never remove entries from the hash table, and to simply say that if the associated list of threads is empty (or NULL), then the lock is unlocked. That would work well, except for that tiny little problem of the hash table eternally growing and never reclaiming memory from unused entries. So, if I had an application that created lots of locks all over the place (i.e. inserted lots of different entries into the hash), but never had more than a handful locked (i.e. in the hash) at a time, I'd be wasting memory (and possibly, LOTS of it).

Is there another way to use such a hash table to implement locks more efficiently? I don't know, but I don't think so (I'd love to be proved wrong). Any way you slice it, you come back to the problem of deleting things that are in a deletable state, but not knowing if it's safe to do so.

The Azul Java-only hash is an interesting hash that behaves differently. It is based upon compare-and-swap (CAS) atomic operations. Thus, for a given key, you can atomically read the contents of a value, but there's no guarantee that that value isn't changed the *MOMENT* you read it. Deleting an entry, in this case, means swapping a tombstone marker into place where the entry's value is supposed to be, which you can avoid doing if that value changed before you got to the swap part (the C of the CAS). Thus, after you've extracted the last thread that'd locked that address (i.e. you've set the value to NULL) you can avoid marking a thing as "deleted" when it has really just been re-locked because if the value changed to non-NULL (and the compare part of the CAS fails), you can simply ignore the failure and assume that whoever changed it knew what they were doing. Thus, you CAN safely delete elements from the hash table. Better still, it easily integrates with (and may even require) a lock-free CAS-based linked list for queueing blocked threads. (You may be saying to yourself "um, dude, a hash table entry with a tombstone as a value is still taking up memory", and I say to you: yeah? so? they get trimmed out of the hash table whenever the hash table is resized, thereby being an awesome idea.)

And, as I think about it, forcing users to do blind erases makes Intel TBB hash tables *ALMOST* unusable for an entire class of problems and/or algorithms. That category of algorithms is any algorithm that needs to delete entries that could potentially be added back at any time. They really ought to provide an equivalent of a CAS: let the user say "delete this hash entry if the value is equal to this".

I say "ALMOST unusable" because it's fixable. Consider the ramifications of having control over the comparison and equivalence functions: a key can be associated with a "deletable" flag that provides much of the needed functionality. With such a flag, the result of any @find()@ operation can be considered invalid not only if it returns false but also if the deletable flag associated with the result's key is true. Essentially, finding something in the hash becomes:

bc[cc].. while (hash.find(result, &findme) && result->first->deletable) {
    result->release();
}

p. It's an extra layer of indirection, and can cause something to spin once or twice, but it works. Your comparison struct functions must then be something like this:

bc[cc].. typedef struct evilptrabstraction {
    bool deletable;
    void * key;
} epa_s;

typedef epa_s * epa;

struct EPAHashCompare {
    static size_t hash(const epa &x) {
        return (size_t)x->key; // or a more complex hash
    }
    static bool equal (const epa &x, const epa &y) {
        if (x->deletable && y->deletable) return true;
        if (x->deletable || y->deletable) return false;
        return x->key == y->key;
    }
};

p. Note that anything marked deletable is equivalent, but doesn't match anything non-deletable. Thus, safely deleting something becomes the following (assuming @findme@ is a version of the @epa@ struct not marked deletable):

bc[cc].. accessor *result = new accessor();

bool found = hash.find(*result, &findme);
while (found && (*result)->first->deletable)  {
    (*result)->release();
    found = hash.find(*result, &findme);
}

if (found) {
    (*result)->first->deletable = true;
    delete result; // release the lock
    findme.deletable = true;
    hash.erase(&findme);
} else {
    delete result;
}

p. This opens the question of inserting safely, though, because during the insertion process, your inserted object might have already existed, and if it already existed, it may have been in the process of being deleted (i.e. it might have been marked as deleted). There's the potential that your "freshly-inserted" item got marked deletable if it was already in the hash. So how do you insert safely?

bc[cc].. bool inserted = hash.insert(result, insertme);
// !inserted means insertme was already in the hash
while (!inserted && result->first->deletable) {
    result.release();
    inserted = hash.insert(result, insertme);
}
if (!inserted) delete insertme;

p. Note that we can't simply toggle the deletable mark, because an @erase()@ operation may already be waiting for the hash value, and it doesn't expect that the key for the item may have changed while it was waiting for the item to be locked (so changing the deletable flag won't stop it from being erased). The downside, of course, is that popularly erased/re-inserted items may cause a fair bit of memory churn, but that's unavoidable with the TBB's bare-bones @erase()@ interface.]]></description>
         <link>http://www.memoryhole.net/kyle/2007/10/intels_tbb_hash_is_severely_li.html</link>
         <guid>http://www.memoryhole.net/kyle/2007/10/intels_tbb_hash_is_severely_li.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Research</category>
        
        
         <pubDate>Tue, 09 Oct 2007 14:50:41 -0600</pubDate>
      </item>
            <item>
         <title>Come *on*, Apple</title>
         <description>This is just petty, but Apple? What&apos;s up with @libtoolize@? I know, I know, you decided you wanted to call it @glibtoolize@, and that&apos;s fine! That&apos;s fine, I don&apos;t mind. But why did you distribute an @autoreconf@ that still believed in @libtoolize@? That&apos;s just dumb.</description>
         <link>http://www.memoryhole.net/kyle/2007/10/come_on_apple.html</link>
         <guid>http://www.memoryhole.net/kyle/2007/10/come_on_apple.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Commentary</category>
        
        
         <pubDate>Wed, 03 Oct 2007 13:32:14 -0600</pubDate>
      </item>
            <item>
         <title>I wish I had a C-based lock-free hash table...</title>
         <description>I recently stumbled across a google tech talk video by a man named Cliff Click Jr. He works for a company named Azul, and he has a blog &quot;here&quot;:http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html This tech talk was all about a lock-free hash table that he&apos;d invented for Azul. The video is &quot;here&quot;:http://video.google.com/videoplay?docid=2139967204534450862

**Lock free hash table? Heck yeah I want a lock free hash table!**

Sadly, Cliff&apos;s implementation is Java-only, and relies on some Java memory semantics, but it&apos;s on &quot;sourceforge&quot;:http://sourceforge.net/projects/high-scale-lib if anyone&apos;s interested.

So, as I began reading up on the subject, I discovered that he&apos;s not the only one interested. In fact, there&apos;s another fellow who has a C-based library &quot;here&quot;:http://www.sunrisetel.net/software/devtools/sunrise-data-dictionary.shtml. Only problem? IT&apos;S NOT ACTUALLY LOCK FREE!!! At least, not yet. At the moment it&apos;s a pthread/mutex-based hash table that happens to have all the pthreads stuff ifdef&apos;d out (joy). There are other people out there who talk about it. A fellow from IBM named &quot;Maged M. Michael&quot;:http://www.google.com/url?sa=t&amp;ct=res&amp;cd=5&amp;url=http%3A%2F%2Fportal.acm.org%2Fft_gateway.cfm%3Fid%3D564881%26type%3Dpdf%26coll%3D%26dl%3DACM%26&amp;ei=Q-HyRsrGG5q8gwPE7KSJDQ&amp;usg=AFQjCNFQGyOjRvt1BIjwPXLyn_Y7dEwkfA&amp;sig2=sP5hRneENo99e8MdspCR-A has a paper about how to do lock-free hash tables, and he even has a patent on his particular method, but no implementations appear to be available. Chris Purcell wrote a &quot;paper on the topic&quot;:http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-639.html, which contains pseudocode, but yet again, no implementation.

So it would appear that if I want a lock-free hash table, I&apos;m going to have to implement it myself. But boy, it gets me giddy just thinking about it. :) Pthreads, you&apos;re going down!</description>
         <link>http://www.memoryhole.net/kyle/2007/09/i_wish_i_had_a_cbased_lockfree.html</link>
         <guid>http://www.memoryhole.net/kyle/2007/09/i_wish_i_had_a_cbased_lockfree.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Cool Stuff</category>
        
        
         <pubDate>Thu, 20 Sep 2007 14:46:07 -0600</pubDate>
      </item>
            <item>
         <title>Superscript-One</title>
         <description><![CDATA[Some half-crazed moron at Microsoft, in an attempt to be helpful, made an idiotic decision.

Of what do I speak? Microsoft Entourage (11.3.6.070618) attempts to be both convenient and pretty by replacing apostrophes (&#x0027;) with curly quotes (&#x2019;). Ordinarily, I wouldn't complain. I like curly-quotes as much as the next guy, and I regularly use a vim plugin called UniCycle to achieve the same effect. HOWEVER, Entourage knows that it only wants to send text email in the ISO-8859-1 (aka "Latin1") character set, which does not contain a curly-quote. This presents the age-old conundrum: "wanna curly quote, can't have a curly quote". So Entourage must choose a different character from the ISO-8859-1 character set to use instead of the curly quote. The *obvious* choice would be the apostrophe (&#x0027;); people are used to it, and after all it _is_ a quote! But what does Entourage choose? A superscript 1, like this: &#x00b9;

What goon came up with this? A superscript 1, in most fonts (except at *very* small sizes) looks _nothing_ like a quotation mark. It looks like the number one! Which is exactly what it is! And, let's be honest here, how many fonts do you suppose have a superscript one character but NOT a curly quote or an apostrophe? Besides *looking* stupid, Microsoft isn't not actually improving their compatibility at all!

<sigh> But, I suppose, what did I expect from such an "innovative" company?]]></description>
         <link>http://www.memoryhole.net/kyle/2007/08/superscriptone.html</link>
         <guid>http://www.memoryhole.net/kyle/2007/08/superscriptone.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">People Suck</category>
        
        
         <pubDate>Tue, 21 Aug 2007 12:35:40 -0600</pubDate>
      </item>
            <item>
         <title>Another Review!</title>
         <description><![CDATA[I just got a link to another review of "my book":http://www.amazon.com/gp/product/1847191150?ie=UTF8&tag=kyle-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=1847191150!<img src="http://www.assoc-amazon.com/e/ir?t=kyle-20&amp;l=as2&amp;o=1&amp;a=1847191150" width="1" height="1" alt="" style="border:none !important; margin:0px !important;" /> This time, by a fellow named "James Craig Burley":http://jcb-sc.com/reviews/qmail-quickstarter/, who also frequents the qmail mailing list. Here's his intro and conclusion: <iframe src="http://rcm.amazon.com/e/cm?t=kyle-20&o=1&p=8&l=as1&asins=1847191150&fc1=000000&IS2=1&lt1=_blank&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0" align="right"></iframe>

bq. Email servers, also called Mail Transfer Agents (MTAs), today do much of the heavy lifting required to transport email from sender to recipient, ideally without the sender or recipient being particularly aware of them. They store (queue) incoming email, then forward it to user's mailboxes, sometimes via other email servers, while often trying to avoid accepting or sending out spam or viruses. They also allow users to read email waiting for them in their mailboxes. "qmail":http://cr.yp.to/qmail.html is a popular email server for Unix-based systems; <a href="http://www.amazon.com/gp/product/1847191150?ie=UTF8&tag=kyle-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=1847191150">"Qmail Quickstarter: Install, Set Up, and Run your own Email Server"</a> introduces the reader to qmail as an email-delivery architecture that provides the building blocks for an email server.

bq. "Qmail Quickstarter" is a good book for anyone wanting to come up to speed on qmail, whether as their email server of choice or as a means to better choose among the many email servers available. I give this book a 7 out of 10.]]></description>
         <link>http://www.memoryhole.net/kyle/2007/07/another_review.html</link>
         <guid>http://www.memoryhole.net/kyle/2007/07/another_review.html</guid>
                  <category domain="http://www.sixapart.com/ns/types#category">Cool Stuff</category>
        
        
         <pubDate>Tue, 10 Jul 2007 19:44:56 -0600</pubDate>
      </item>
      
   </channel>
</rss>
