« November 2009 | Main | January 2011 »

June 2010 Archives

June 16, 2010

PGI Compiler Bug

I ran across another PGI compiler bug that bears noting because it was so annoying to track down. Here’s the code:

static inline uint64_t qthread_cas64(
           volatile uint64_t *operand,
           const uint64_t newval,
           const uint64_t oldval)
{
    uint64_t retval;
    __asm__ __volatile__ ("lock; cmpxchg %1,(%2)"
        : "=&a"(retval) /* store from RAX */
        : "r"(newval),
          "r"(operand),
          "a"(oldval) /* load into RAX */
        : "cc", "memory");
    return retval;
}

Now, both GCC and the Intel compiler will produce code you would expect; something like this:

mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %rax,0xfffffffffffffff8(%rbp)

In essence, that’s:

  1. copy the newval into register %r12 (almost any register is fine)
  2. copy the operand into register %r13 (almost any register is fine)
  3. copy the oldval into register %rax (as I specified with “a”)
  4. execute the ASM I wrote (the compare-and-swap)
  5. copy register %rax to the variable I specified

Here’s what PGI produces instead:

mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %eax,0xfffffffffffffff8(%rbp)

You notice the problem? That last step became %eax, so only the lower 32-bits of my 64-bit CAS get returned!

The workaround is to do something stupid: be more explicit. Like so:

static inline uint64_t qthread_cas64(
           volatile uint64_t *operand,
           const uint64_t newval,
           const uint64_t oldval)
{
    uint64_t retval;
    __asm__ __volatile__ ("lock; cmpxchg %1,(%2)\n\t"
            "mov %%rax,(%0)"
        :
        : "r"(&retval) /* store from RAX */
           "r"(newval),
          "r"(operand),
          "a"(oldval) /* load into RAX */
        : "cc", "memory");
    return retval;
}

This is stupid because it requires an extra register; it becomes this:

mov 0xfffffffffffffff8(%rbp),%rbx
mov 0xffffffffffffffe0(%rbp),%r12
mov 0xffffffffffffffe8(%rbp),%r13
mov 0xfffffffffffffff0(%rbp),%rax
lock cmpxchg %r12,0x0(%r13)
mov %rax,(%rbx)

Obviously, not a killer (since it can be worked around), but annoying nevertheless.

A similar error happens in this code:

uint64_t retval;
__asm__ __volatile__ ("lock xaddq %0, (%1)"
    :"+r" (retval)
    :"r" (operand)
    :"memory");

It would appear that PGI completely ignores the bitwidth of output data!

About June 2010

This page contains all entries posted to Kyle in June 2010. They are listed from oldest to newest.

November 2009 is the previous archive.

January 2011 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.34