« Gmail, DKIM, and DomainKeys | Main | Google Breaks its own DKIM Signatures »

A C Lock-Free Hash Table Implementation

Well, I finally have one: a lock-free hash table implementation in C. The hash table I implemented is based on the work by Ori Shalev and Nir Shavit. Much of the code is similar to the example code they published in their paper Split-Ordered Lists: Lock-Free Extensible Hash Tables, back in 2006, but has been modified to support additional conveniences so it has a library-esque interface.

There is one problem unique to this algorithm: It doesn’t compact well if you go from lots and lots of entries back down to just a few entries. Not that it compacts horribly, but it doesn’t shrink all the way back down; it keeps some meta-data around (what the algorithm calls “dummy keys”). On the other hand, it does quite well if you tend to have a rapidly rotating set of keys (even with random-ish key values) but with a ceiling on the number of them in the hash table at any one time: it uses a static set of dummy keys in that case. In an effort to reduce the impact of the compaction problem, I didn’t implement the dynamic-array extension, so this implementation is performance-limited to around 2048 concurrent keys or so (though you can insert as many as you like, more than that will begin to get slow).

Of course, there are three problems with this algorithm (and implementation) that are pretty typical:

  1. It has the ABA problem. I think this can be solved (or at least mitigated) with generation counters, so I don’t see it as a big issue.
  2. It has the blind-delete problem. This is a really tough one. I think this can be solved, but my current best thinking on the matter ends up devolving to a highly-contended CAS operation, which is obviously sub-optimal.
  3. For simplicity, I didn’t use memory pools for the nodes, so I end up doing malloc/free calls on insertion and delete. This isn’t a correctness problem, but can become a performance bottleneck.

Anyway, without further ado, here’s the code. You can download it and compile it yourself; it is self-contained C99 code with a trivial (single-threaded) driver in main().

I’ve talked about concurrent hash tables before, here and here, and those discussions may be interesting to anyone who found this code interesting and/or useful. Having re-read Chris Purcells’ paper, I should point out that while he does have some very worthwhile code, his algorithm also has the compaction problem. He does, however, have a good summary of several designs, and of the drawbacks of each (primarily in the area of a need for garbage collection).

TrackBack

TrackBack URL for this entry:
https://www.we-be-smart.org/mt/mt-tb.cgi/782

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on June 2, 2011 11:26 PM.

The previous post in this blog was Gmail, DKIM, and DomainKeys.

The next post in this blog is Google Breaks its own DKIM Signatures.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.34