Concurrent Hash Table Tricks (Kyle)

So, I’m working on qthreads (which is open-sourced, but currently lacks a webpage), and thinking about its Unix implementation.

The Unix implementation emulates initialization-free synchronization (address locks and FEBs) by storing addresses in a hash table (well, okay, a striped hash table, but if we make the stripe 1, then it’s just a hash table). Let’s take the simplest: address locks. The semantics of the hash table at the moment are really pretty basic: if an address is in the hash, it’s locked. If it’s not in the hash, it’s not locked. The hash is the cp_hashtable from libcprops, a library which I appreciate greatly for giving C programmers easy access to high-quality basic data structures (I’ve contributed some significant code to it as well). Anyway, the downside of using this hash table is that it’s a bottleneck. The table is made thread-safe by simply wrapping it in a lock, and every operation (lock and unlock) requires locking the table to either insert an entry or remove an entry.

So how could we do this with a more concurrent hash table? I’ve seen two hash table APIs that are concurrent: the lock-free hash in Java that I talked about previously, and the concurrent_hash_map from Intel’s Thread Building Blocks library (which, given that it’s in C++, is something I can actually use).

The way the TBB hash works is that you can perform three basic operations on your hash: find(), insert(), and erase(). When you do either of the first two operations, you can lock that entry in the hash and prevent others from getting at it, or you can access it read-only. The erase function merely takes a key and removes it from the hash table, giving you no access to whatever might have been deleted from the hash table. Worse yet, you cannot erase something that you currently have a lock on, even if it’s a write lock!

Using this hash the way that I currently use the cprops hash is thus impossible. Why? Because erasing things from the TBB hash is ALWAYS a race condition. Put another way, all TBB hash erase operations are “blind erase” operations, when what you really want is “erase if it’s still in an erasable state”. You can never be certain that erasing an entry from the hash table is a good idea, because you can never be certain that someone else didn’t add something important to that entry in between the time that you decided the entry was erasable and the time you actually erased it. If I insert a value (to “lock” an address, say), I can associate that value with a queue of waiting threads (i.e. other threads that also want to lock that address), but I can never erase that entry in the hash table! The reason is that since I can’t erase something that I have access to (i.e. have a write-lock on), there’s a race condition between me fetching the contents of that hash table entry and me removing that entry from the hash table.

A different approach to this might be to simply never remove entries from the hash table, and to simply say that if the associated list of threads is empty (or NULL), then the lock is unlocked. That would work well, except for that tiny little problem of the hash table eternally growing and never reclaiming memory from unused entries. So, if I had an application that created lots of locks all over the place (i.e. inserted lots of different entries into the hash), but never had more than a handful locked (i.e. in the hash) at a time, I’d be wasting memory (and possibly, LOTS of it).

Is there another way to use such a hash table to implement locks more efficiently? I don’t know, but I don’t think so (I’d love to be proved wrong). Any way you slice it, you come back to the problem of deleting things that are in a deletable state, but not knowing if it’s safe to do so.

The Azul Java-only hash is an interesting hash that behaves differently. It is based upon compare-and-swap (CAS) atomic operations. Thus, for a given key, you can atomically read the contents of a value, but there’s no guarantee that that value isn’t changed the MOMENT you read it. Deleting an entry, in this case, means swapping a tombstone marker into place where the entry’s value is supposed to be, which you can avoid doing if that value changed before you got to the swap part (the C of the CAS). Thus, after you’ve extracted the last thread that’d locked that address (i.e. you’ve set the value to NULL) you can avoid marking a thing as “deleted” when it has really just been re-locked because if the value changed to non-NULL (and the compare part of the CAS fails), you can simply ignore the failure and assume that whoever changed it knew what they were doing. Thus, you CAN safely delete elements from the hash table. Better still, it easily integrates with (and may even require) a lock-free CAS-based linked list for queueing blocked threads. (You may be saying to yourself “um, dude, a hash table entry with a tombstone as a value is still taking up memory”, and I say to you: yeah? so? they get trimmed out of the hash table whenever the hash table is resized, thereby being an awesome idea.)

And, as I think about it, forcing users to do blind erases makes Intel TBB hash tables ALMOST unusable for an entire class of problems and/or algorithms. That category of algorithms is any algorithm that needs to delete entries that could potentially be added back at any time. They really ought to provide an equivalent of a CAS: let the user say “delete this hash entry if the value is equal to this”.

I say “ALMOST unusable” because it’s fixable. Consider the ramifications of having control over the comparison and equivalence functions: a key can be associated with a “deletable” flag that provides much of the needed functionality. With such a flag, the result of any find() operation can be considered invalid not only if it returns false but also if the deletable flag associated with the result’s key is true. Essentially, finding something in the hash becomes:

while (hash.find(result, &findme) && result->first->deletable) {
    result->release();
}

It’s an extra layer of indirection, and can cause something to spin once or twice, but it works. Your comparison struct functions must then be something like this:

typedef struct evilptrabstraction {
    bool deletable;
    void * key;
} epa_s;

typedef epa_s * epa;

struct EPAHashCompare {
    static size_t hash(const epa &x) {
        return (size_t)x->key; // or a more complex hash
    }
    static bool equal (const epa &x, const epa &y) {
        if (x->deletable && y->deletable) return true;
        if (x->deletable || y->deletable) return false;
        return x->key == y->key;
    }
};

Note that anything marked deletable is equivalent, but doesn’t match anything non-deletable. Thus, safely deleting something becomes the following (assuming findme is a version of the epa struct not marked deletable):

accessor *result = new accessor();

bool found = hash.find(*result, &findme);
while (found && (*result)->first->deletable)  {
    (*result)->release();
    found = hash.find(*result, &findme);
}

if (found) {
    (*result)->first->deletable = true;
    delete result; // release the lock
    findme.deletable = true;
    hash.erase(&findme);
} else {
    delete result;
}

This opens the question of inserting safely, though, because during the insertion process, your inserted object might have already existed, and if it already existed, it may have been in the process of being deleted (i.e. it might have been marked as deleted). There’s the potential that your “freshly-inserted” item got marked deletable if it was already in the hash. So how do you insert safely?

bool inserted = hash.insert(result, insertme);
// !inserted means insertme was already in the hash
while (!inserted && result->first->deletable) {
    result.release();
    inserted = hash.insert(result, insertme);
}
if (!inserted) delete insertme;

Note that we can’t simply toggle the deletable mark, because an erase() operation may already be waiting for the hash value, and it doesn’t expect that the key for the item may have changed while it was waiting for the item to be locked (so changing the deletable flag won’t stop it from being erased). The downside, of course, is that popularly erased/re-inserted items may cause a fair bit of memory churn, but that’s unavoidable with the TBB’s bare-bones erase() interface.

Comments (3)

I've been doing a lot of research into this subject lately. I've actually implemented my own HashTable based on CAS linked lists. Unfortunately, there are some serious problems one must deal with. The ABA problem forces most implementations to use 64bit CAS(cmpxchng8b), making ports to a 64bit OS impossible. Others avoid this problem with costly garbage collection strategies. Another not so mentioned issue is that of memory barriers. Most CAS based structures break on SMP systems that do out of order memory loads. Furthermore, all but the most simple structures are very hard to make lock-free. So far there are only reliable algorithms for queues, stacks, and linked lists. These are all roughly based on the same algorithm to boot.

Posted by A | November 20, 2007 5:20 AM

Posted on November 20, 2007 05:20

I would recommend taking a look at Doug Leas ConcurrentHashMap implemented in Java. It uses an optimistic lock-free loopup, when met with contention it acquires a traditional lock. Reads still lock the table, however, the table is split into several sub-tables to increase concurrency.

Posted by a | November 20, 2007 5:23 AM

Posted on November 20, 2007 05:23

Kyle:

A: I'd be interested to see your CAS HashTable implementation. Even if it's only useful in limited circumstances, it may still be invaluable.

a: If reads still lock the table, I don't understand what the benefit of the optimistic design is, unless I misunderstand you. I'd be curious to see how he detects contention reliably... Striping (i.e. using sub-tables) is all well and good, and I already use it for qthreads, but generally the utility of the technique all depends on the probability of contention, which all depends on how the hash is split up and what the typical behavior is. It feels like a hack rather than a solution, but maybe that's just me. :)

Thanks for responding!

Posted by Kyle | December 13, 2007 4:04 PM

Posted on December 13, 2007 16:04