A common data structure in C++ is the vector, which is a misnamed variable length array.
Here's a simplified common bit of code you might see:
void append(size_t n, std::vector &vec) {
for (size_t i = vec.size(); i < n; i++) {
vec.push_back(i);
}
}
I see this a lot, we all see this a lot, because programmers are lazy and don't think about memory allocation.
Why is this bad? Because Vector::pushback
checks to see if
vec.capacity() == vec.size()
and then reallocates the entire array if true.
You might reallocate multiple times in this loop depending on n, and you _know*
n
, so you may as well allocate the memory once, at the beginning.
void append(size_t n, std::vector &vec) {
size_t sz = vec.size();
vec.reserve(n);
for (size_t i = sz; i < n; i++) {
vec.push_back(i);
}
}
This is a pretty simple example, but I got in a discussion at work about it because it was discussed in This 2014 CppCon Talk by Chandler Carruth, one of the project leads on the new Carbon Language.
Specifically, I was bothered by the fact that Carruth, who is an expert on compiler optimization was recommending it. I expected that the compiler, which I worship like a God, could have made that change for you. Apparently not.
My boss, on the other hand, said he wouldn't trust anyone making that recommendation. He said you should use resize and bracket notation assignment.
I spent about 30-45 minutes with a co-worker trying to understand why resize would be faster than reserve and missed the forest for the trees.
There are two steps in both my boss's recommendation
void append(size_t n, std::vector &vec) {
size_t sz = vec.size();
vec.resize(n);
for (size_t i = sz; i < n; i++) {
vec[i] = i;
}
}
and Carruth's recommendation:
The difference between resize and reserve is that reserve allocates memory, but resize initializes it.
The key difference that we discovered, is that resize adjusts the size of the vector, where reserve does not. So
vec.size() == n / 10; // true
vec.resize(n);
vec.size() == n; // true
while
vec.size() == n / 10; // true
vec.reserve(n);
vec.size() == n / 10; // still true
So
vec.size() == n / 10; // true
vec.reserve(n);
vec.push_back(10);
vec.size() == (n / 10) + 1; // true
However, you cannot:
void append(size_t n, std::vector &vec) {
size_t sz = vec.size();
vec.reserve(n);
for (size_t i = sz; i < n; i++) {
vec[i] = i;
}
vec.size() == n; // false
}
But obviously, allocation + initialization (resize
) is slower than allocation
alone (reserve
), right?
With some common sense, it's obvious that bracket notation is faster than
push_back
: push_back
has to increment the size, check that it's still less
than capacity, and if its not, reallocate (the reallocation solved by the
reserve, but still). The bracket notation directly accesses the underlying
memory and just puts it in there.
The only way that resize + [] is faster than reserve + push_back is if the combined overhead of push_back overwhelms the initialization of resize, right? This is probably the case for some large n, but not for some small n I'd assume, since memory reallocation is expensive (you have to allocate a new block of memory and copy the whole array to the new block!).
Except, that's not true.
resize()
initializes to zero unless you specify a value. And here's the thing
about zero-initialization: you can use calloc:
For large allocations, most calloc implementations under mainstream OSes will get known-zeroed pages from the OS (e.g. via POSIX mmap(MAP_ANONYMOUS) or Windows VirtualAlloc) so it doesn't need to write them in user-space. This is how normal malloc gets more pages from the OS as well; calloc just takes advantage of the OS's guarantee.
That means zero-allocation is equivalent (or near equivalent) to uninitialized
allocation. So you get the initialization for free, malloc == calloc
. Now the
only difference between the two patterns is push_back
's overhead, which is
obviously greater than bracket notation.
Use
void append(size_t n, std::vector &vec) {
size_t sz = vec.size();
vec.resize(n);
for (size_t i = sz; i < n; i++) {
vec[i] = i;
}
}
not
void append(size_t n, std::vector &vec) {
size_t sz = vec.size();
vec.reserve(n);
for (size_t i = sz; i < n; i++) {
vec.push_back(i);
}
}
My boss was right, as usual.