Reversing c++ string and QString

After the rust string overview of its internal substructures, let's see if c++ QString storage is more light, but first we'r going to take a look to the c++ standard string object:



At first sight we can see the allocation and deallocation created by the clang++ compiler, and the DAT_00400d34 is the string.

If we use same algorithm than the rust code but in c++:



We have a different decompilation layout. Note that the Ghidra scans very fast the c++ binaries, and with rust binaries gets crazy for a while.
Locating main is also very simple in a c++ compiled binary, indeed is more  low-level than rust.


The byte array is initialized with a simply move instruction:
        00400c4b 48 b8 68        MOV        RAX,0x6f77206f6c6c6568

And basic_string generates the string, in the case of rust this was carazy endless set of calls, detected by ghidra as a runtime, but nevertheless the basic_string is an external imported function not included on the binary.

(gdb) x/x 0x7fffffffe1d0
0x7fffffffe1d0: 0xffffe1e0            low str ptr
0x7fffffffe1d4: 0x00007fff           hight str ptr
0x7fffffffe1d8: 0x0000000b        sz
0x7fffffffe1dc: 0x00000000
0x7fffffffe1e0: 0x6c6c6568         "hello world"
0x7fffffffe1e4: 0x6f77206f
0x7fffffffe1e8: 0x00646c72
0x7fffffffe1ec: 0x00000000        null terminated
(gdb) x/s 0x7fffffffe1e0
0x7fffffffe1e0: "hello world"

The string is on the stack, and it's very curious to see what happens if there are two followed strings like these:

  auto s = string(cstr);
  string s2 = "test";

Clang puts toguether both stack strings:
[ptr1][sz1][string1][null][string2][null][ptr2][sz2]

C++ QString datatype

Let's see the great and featured QString object defined on qstring.cpp and qstring.h

Some QString methods use the QCharRef class whose definition is below:

class Q_EXPORT QCharRef {
    friend class QString;
    QString& s;
    uint p;


Searching for the properties on the QString class I've realized that one improvement that  rust and golang does is the separation from properties and methods, so in the large QString class the methods are  hidden among the hundreds of methods, but basically the storage is a QStringData *;

After removing the methods of QStringData class definition we have this:

struct Q_EXPORT QStringData : public QShared {
    QChar *unicode;
    char *ascii;
#ifdef Q_OS_MAC9
    uint len;
#else
    uint len : 30;
#endif
    uint issimpletext : 1;
#ifdef Q_OS_MAC9
    uint maxl;
#else
    uint maxl : 30;
#endif
    uint islatin1 : 1;

private:
#endif
};

Which is pretty clear, there is one QChar pointer and a char pointer, then define in compiler time the size of the unsigned length len.

So two pointers and the unsigned integer length.

Regarding QChar it helds two ushorts:

private:
    ushort ucs;


and a set of enums for decribing the type of byte.

ucs is the two bytes unicode considering latin1 only one byte, if the ushort is > 0xff it return 0.

we have some quite nice methods implemented on the QChar class to classify the byte stored.

But lets see the real storage of a QString object:
GCC and doesn't recognize well the object, the print and display are the incomplete.




Kdbg display sort of the substructures with the text length:


In the practice the QString is an object that helds the two dwords pointer to the metadata, then a pointer to the end, and then the null terminated string text.



The pointer 0x55555576f6a0 points to the metadata:



Here we have some metadata, like the string size, type of data, and so on.


So ... is it possible to overlap the metadata of a string and perform an QString overflow? the answer is yes if the previous string is  a controlled char * then we can modify the metadata pointer to our buffer.


Binary size benchmark:
Some curious stats about the binary file size, which are not conclusive because some standard libs are include on the binary.

Binary file length:
Turbopascal              2,928 
c++ string                21,256
Free pascal             32,256
c++ QString          28,0496
golang               2,020,112
rust                    2,625,776
                         



The hello world string rust is about 2,625,776 bytes, the c++ one is 21,064 bytes

Comentarios