String’s length

Let’s start with a small test. Is the equivalence expressed with the following assertion correct?

void test_length(std::string const& s)
{
  assert(s.length() == strlen(s.c_str()));
}

It is not; otherwise I wouldn’t be mentioning this in the post; but do you know why it is wrong?

If you have run a couple of tests and observed that the assertion doesn’t fail, you may get convinced that it is just fine. And it is so for some strings, but definitely not for every string.

The crux is in how the length of the string is computed. In C you count characters until you reach the first null character. This is the same in C++ for types like const char*. For std::string the length of the string is tracked separately and is independent of the contained characters. You can have as many null characters as you like and they do not affect the string length! (Well, they do affect the string length in the sense that every '\0' adds 1 to the length, but it does not indicate the end of the string.)

Surprised? You might be, because a std::string is often initialized from a null-terminated character string and often its value is used as a null-terminated character string, when c_str is called; but nonetheless a std::string is not a null-terminated character string.

Is it possible to create a std::string with null characters inside? If you do this:

std::string s ("\0\0test");

The length of s is 0. True, because you used the constructor that assumes a null-terminated character string input. Try one which explicitly specifies the length:

std::string s ("\0\0test", 6);

Now, its length is 6, and our test_length will fail.

But why would anyone put a null character inside the string? Well, this is what a string is supposed to be: a sequence of characters, and 0 is just one more character. A string is not necessarily meant to be displayed. My colleague really encountered this problem. A raw buffer obtained from a socket was converted to a string and it had some header information in front, which contained zeros. Trying to log the content resulted in logging an empty null-text, even though it was obvious that some content was there.

Having this in mind helps understand why a numeric user-defined literal has this signature:

double operator "" _d (const char* value);

Whereas a string literal has signature:

std::string operator "" _s (const char* text, size_t len);

This is because, in case of a numeric literal, you will never get character '\0', whereas someone might want to create a string with a null value inside:

test_length("one\0two"_s);
This entry was posted in programming and tagged , , . Bookmark the permalink.

18 Responses to String’s length

  1. Mike says:

    > Is the following assertion correct?

    I think you should insert the word ‘always’ into this sentence somewhere.

  2. vinayakgarg says:

    Reblogged this on Tech Blog and commented:
    A simple short gotcha!, worth knowing.

  3. JD says:

    I always enjoy reading your blog. Thanks Andrzej!

  4. Casey says:

    > You can have as many null characters as you like and they do not affect the string length!

    I believe each additional null character will in fact increase the string length by one. (Yes, I understand from context that the meaning is, “The length of the string isn’t affected by the values of the characters it contains, ” or otherwise.”, but that’s not what it is actually saying.

  5. John says:

    Why would a string literal operator return double?

  6. Raphael Miedl says:

    I wonder if the string constructor and assignment operator should be changed have an template overload taking const char arrays of different sizes. That would at least allow strings to determine the real length (even with embedded ) of the literal when it’s passed directly, and therefore leaving already decayed char * as the only exceptions…

    • John says:

      This would break a lot of existing code. Consider a function which declares an array of char and manipulates it – perhaps it passes it as a decayed pointer to a function which will assign something into it – and then calls a function taking a const std::string& using this array as that parameter. That doesn’t sound like a crazy use case to me, and you just broke that code.
      No, the length needs to be a separate parameter, and it is. The only advantage I could see of deducing the array length would be that one could insist on not reading past the end of the array (perhaps throw an exception if the provided length is too long). But that would also break some (questionable) code that was intentionally overflowing a char array, perhaps within a packed struct (such code is out there in the wild – I’ve seen it).

  7. concerned reader says:

    I actually bumped into this while working with Protocol Buffers, where they use `std::string` as a generic-sequence-of-bytes type.

  8. cyber_fusion says:

    Would a sequence of characters not be better represented by vector<char>?

    • A fair question. They are quite similar, but they also have tiny differences. std::string is allowed to support small buffer optimization — vector is not. String is required to store the extra zero element at the end — vector is not. Is that enough to justify the existence of the two abstractions? — I do not know.

  9. Rawnaqi Beer says:

    you should Change the void to int as return value for Function :

    test_length(std::string const& s)
    {
    assert(s.length() == strlen(s.c_str()));
    }

    int main()
    {
    string t = makeText();
    cout << t << endl;
    int l = test_length(t);
    cout << "The Length of String is : " << l <<endl;
    system("PAUSE");
    return 0;
    }

  10. Rawnaqi Beer says:

    string makeText()
    {
    string s(“Hello World!”); // C++11 initialization of String or may be so s= “Hello World”;
    // and so between two bracket { }
    return s;
    }

    int main()
    {
    string t = makeText();
    cout << t << endl;
    }

  11. Rawnaqi Beer says:

    string makeText()
    {
    string s(“Hello World!”); // initialization of String or may be so s= “Hello World”;
    // and not so between two bracket { } ist failer
    return s;
    }

    int main()
    {
    string t = makeText();
    cout << t << endl;
    }

  12. Hi Andrzej,
    I came across this post.
    Now, my question, in C++, we have both string and char array. But that char array will be null terminated right?
    Also when i have a string object and try to copy it to a char array, should I add a line where it adds a null terminating char? For example see this code snippet:

       int main() { 
       string str;
        // Declaring character array 
        char ch[80]; 
        
        getline(cin,str);
        // using copy() to copy elements into char array 
        cout<<str.length()<<endl;
        str.copy(ch,str.length(),0);
        cout<<ch<<endl;
    }
    
    • If you are using C-strings, then you need the trailing zero to indicate the size of the string. This trailing zero occupies the storage but does not amount into the string size:

      const char arr [] = "CAT";
      assert (sizeof(arr) == 4);
      assert (strlen(arr) == 3);
      

      The situation is quite similar with std::string. even though it stores its size separately, it still has the trailing zero:

      const std::string str = "CAT";
      assert (str.size() == 3);
      assert (strlen(str.c_str()) == 3);
      assert (str.c_str()[3] == '\0');
      

      If you need to convert a std::string into a C-string, you do need to make sure the trailing zero is written, but you can do it in one instruction by copying one more character from the source string:

      char cstr [16];
      const std::string str = "CAT";
      strncpy(cstr, str.c_str(), str.size() + 1);
      assert(strlen(cstr) == 3);
      
  13. Pingback: ¿Std::string tiene un terminador nulo? - Fallosweb.com

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.