At work we are getting a 64-bit version of our software up and running at the moment. Most of the usual culprits reared their head - assuming that a pointer and integer had the same size etc etc.

One more interesting one, which I've not come across before is related to using STL string::find and the special constant string::npos. This is not unique to our code base when you google for it and actually just boils down to data being truncated before a comparison. The nuances of the problem do lead on to a discussion about signed vs. unsigned integral types in C++ and the handling of comparisons between differently sized data types. I though it was worth looking at a bit further and definitely something to watch out for when doing code reviews.

It could also make for a particularly challenging interview question 😉

string::npos

The following code snippet exhibits the problem:

#include <string>
#include <iostream>
using namespace std;

int main(void)
{
   string a = "hello";
   unsigned int pos = a.find("foo");
   if(pos == string::npos)
      cout << "foo not found" << endl;
   else
      cout << "foo found" << endl; 
   return 0;
}

If you compile (g++ -m32 -o test test.cpp) and run that code at 32-bit it will output:

 $./test
 foo not found

If you compile (g++ -m64 -o test test.cpp) and run that code at 64-bit it will output:

 $./test
 foo found

Certainly not what was expected! gcc will actually warn you to expect a problem:

warning: comparison is always false due to limited range of data type

What is going on?

The problem comes from the fact that the type of string::npos is of type size_t and not of type integer. If you do the correct thing and change the type of pos to be size_t then the code works as expected.

If you read the help of string::npos, if contains two pieces of information relevant to this problem:

npos is a static member constant value with the greatest possible value for an element of type size_t.

and

This constant is actually defined with a value of -1 (for any trait), which because size_t is an unsigned integral type, becomes the largest possible representable value for this type.

From the definition of string::npos we can see that it's value will be dependent upon sizeof(size_t). So what do we see at 32-bit and 64-bit:

  • At 32-bit sizeof(size_t)==4 and hence has a value of 0xFFFFFFFF
  • At 64-bit sizeof(size_t)==8 and hence has a value of 0xFFFFFFFFFFFFFFFF

From this point onwards it should hopefully be fairly obvious as to why our comparison in the code above is misbehaving. Putting the return value of string::find into an unsigned integer causes it to be truncated to 4 bytes when compiled at 64-bit, which will then cause the subsequent comparison to fail.

When comparing two integral types of differing sizes the smaller one is promoted up to match the larger, so the following happens:

  1. The 32-bit unsigned integer is converted to a 64-bit unsigned integer by padding with zeros
  2. The 32-bit value of 0xFFFFFFFF becomes 0x00000000FFFFFFFF
  3. 0x00000000FFFFFFFF is then compared to 0xFFFFFFFFFFFFFFFF, which fails

signed, unsigned conversions and promotion

If pos had been declared as an int rather than an unsigned int then this code actually behaves as expected, albeit whilst generating a compiler warning (more on that later). The reasons for this requires digging down into a few details of the C++ language spec.

Two factors come into play when pos is an integer:

  1. We are comparing data types of a different size
  2. We are comparing a signed values against an unsigned value

A similar process to the one outlined above happens, but the crucial difference is the fact that we are now working with a signed type rather than an unsigned type. The following happens when using a signed integer:

  1. The 32-bit signed integer is converted to a 64-bit signed integer with the sign bit extended
  2. Hence the 32-bit value of 0xFFFFFFFF becomes 0xFFFFFFFFFFFFFFFF
  3. The 64-bit signed value is converted to unsigned, which doesn't actually change the bit pattern
  4. The comparison now behaves as expected

So the crucial difference here is how signed and unsigned values are treated:

  1. When an unsigned value is promoted up to a larger data type it is zero padded
  2. When a signed value is promoted up the sign bit is extended out if it is set

The other time the difference between signed and unsigned values becomes important is when doing a right bit shift operations -the same rules apply as to what the bits on the left hand side get set to. See Arithmetic Shift vs. Logical Shift for more details.

Conclusions

This leads onto an interesting point and one which I suspect explains how the code ended up the way it was. For people who aren't aware of the fact that string::find returns data of type size_t rather than an int, I imagine the following situation occurs:

  1. Write the code to store the value in a (signed) int
  2. The compiler issues a warning about signed/unsigned comparisons
  3. The code is changed to use an unsigned int and the warning goes away

It just goes to show that assumptions about data types can really come back to bite you. I guess the main thing to say about all of this is - compiler warnings are there for a reason. They are your friend, understand what it is complaining about and fix it! And fix it properly, don't just do something kludgy to silence the warning.

5 thoughts on “string::npos, integers and 64-bit applications

    • Glad to help! It struck me as something worth writing about - mainly given the strange behaviour you can accidentally trigger.

      Reply
  1. Actually there is a proper way to do this for both 32 and 64 bit and that is to use the correct std::size_type. an example would be:

    #include
    #include
    using namespace std;

    int main(void)
    {
    std::string a = "hello";
    std::string::size_type pos = 0;

    // Now pos handles the correct sign and and length for the platform
    pos = a.find("foo");
    if(pos == std::string::npos)
    cout << "foo not found" << endl;
    else
    cout << "foo found" << endl;
    return 0;
    }

    the issue is that std::string::npos is -1, and the type is unsigned, so -1 will wrap around and be the max value of the size. A comparison with an unsigned pos will not match properly because " and i quote, unsigned types can not have negative values!".

    You would then need to:
    1. not used unsigned for std::string positions (pos).
    2. cast std::string::npos to signed.

    int pos = 0; // signed by default

    Example: (pos == (signed)std::string::npos).

    But that's dumb, so it's much easier and correct to use the std::string::size_type when using std::strings and indexing positions.

    Hope this helps.

    Reply

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

required