« qsort_r | Main | Gmail, DKIM, and DomainKeys »

PGI Compiler Bug

I ran across another PGI compiler bug that bears noting because it was so annoying to track down. Here’s the code:

static inline uint64_t qthread_cas64(
           volatile uint64_t *operand,
           const uint64_t newval,
           const uint64_t oldval)
{
    uint64_t retval;
    __asm__ __volatile__ ("lock; cmpxchg %1,(%2)"
        : "=&a"(retval) /* store from RAX */
        : "r"(newval),
          "r"(operand),
          "a"(oldval) /* load into RAX */
        : "cc", "memory");
    return retval;
}

Now, both GCC and the Intel compiler will produce code you would expect; something like this:

mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %rax,0xfffffffffffffff8(%rbp)

In essence, that’s:

  1. copy the newval into register %r12 (almost any register is fine)
  2. copy the operand into register %r13 (almost any register is fine)
  3. copy the oldval into register %rax (as I specified with “a”)
  4. execute the ASM I wrote (the compare-and-swap)
  5. copy register %rax to the variable I specified

Here’s what PGI produces instead:

mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %eax,0xfffffffffffffff8(%rbp)

You notice the problem? That last step became %eax, so only the lower 32-bits of my 64-bit CAS get returned!

The workaround is to do something stupid: be more explicit. Like so:

static inline uint64_t qthread_cas64(
           volatile uint64_t *operand,
           const uint64_t newval,
           const uint64_t oldval)
{
    uint64_t retval;
    __asm__ __volatile__ ("lock; cmpxchg %1,(%2)\n\t"
            "mov %%rax,(%0)"
        :
        : "r"(&retval) /* store from RAX */
           "r"(newval),
          "r"(operand),
          "a"(oldval) /* load into RAX */
        : "cc", "memory");
    return retval;
}

This is stupid because it requires an extra register; it becomes this:

mov 0xfffffffffffffff8(%rbp),%rbx
mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %rax,(%rbx)

Obviously, not a killer (since it can be worked around), but annoying nevertheless.

A similar error happens in this code:

uint64_t retval;
__asm__ __volatile__ ("lock xaddq %0, (%1)"
    :"+r" (retval)
    :"r" (operand)
    :"memory");

It would appear that PGI completely ignores the bitwidth of output data!

TrackBack

TrackBack URL for this entry:
https://www.we-be-smart.org/mt/mt-tb.cgi/780

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on June 16, 2010 11:01 AM.

The previous post in this blog was qsort_r.

The next post in this blog is Gmail, DKIM, and DomainKeys.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.34