|
||||
|
Section 10:
|
[10.10] What about returning a local variable by value? Does the local exist as a separate object, or does it get optimized away?
When your code returns a local variable by value, your compiler might optimize away the local variable completely - zero space-cost and zero time-cost - the local variable never actually exists as a distinct object from the caller's target variable (see below for specifics about exactly what this means). Other compilers do not optimize it away. These are some(!) of the compilers that optimize away the local variable completely:
These are some(!) of the compilers that do not optimize away the local variable:
Here is an example showing what we mean in this FAQ:
class Foo {
public:
Foo(int a, int b);
void some_method();
...
};
void do_something_with(Foo& z);
Foo rbv()
{
Foo y = Foo(42, 73);
y.some_method();
do_something_with(y);
return y;
}
void caller()
{
Foo x = rbv();
...
}
The question addressed in this FAQ is this: How many Foo objects
actually get created in the runtime system? Conceptually there could be as
many as three distinct objects: the temporary created by Foo(42, 73),
variable y (in rbv()), and variable x (in
caller()). However as we saw earlier
most compilers merge Foo(42, 73)
and variable y into the same object, reducing the total number
of objects from 3 to 2. But this FAQ pushes it one step further: does
y (in rbv()) show up as a distinct, runtime object from
x (in caller())?
Some compilers, including but not limited to those listed above, completely optimize away local variable y. In those compilers, there is only one Foo object in the above code: caller()'s variable x is exactly identically the same object as rbv()'s variable y. They do this the same way as described earlier: the return-by-value in function rbv() is implemented as pass-by-pointer, where the pointer points to the location where the returned object is to be initialized. So instead of constructing y as a local object, these compilers simply construct *put_result_here, and everytime they see variable y used in the original source code, they substitute *put_result_here instead. Then the line return y; becomes simply return; since the returned object has already been constructed in the location designated by the caller. Here is the resulting (pseudo)code:
// Pseudo-code
void rbv(void* put_result_here) ← Original C++ code: Foo rbv()
{
Foo_ctor((Foo*)put_result_here, 42, 73); ← Original C++ code: Foo y = Foo(42,73);
Foo_some_method(*(Foo*)put_result_here); ← Original C++ code: y.some_method();
do_something_with((Foo*)put_result_here); ← Original C++ code: do_something_with(y);
return; ← Original C++ code: return y;
}
void caller()
{
struct Foo x; ← Note: x is not initialized here!
rbv(&x); ← Original C++ code: Foo x = rbv();
...
}
Caveat: this optimization can be applied only when all a function's
return statements return the same local variable. If one
return statement in rbv() returned local variable y
but another returned something else, such as a global or a temporary, the
compiler could not alias the local variable into the caller's destination,
x. Verifying that all the function's return statements return the
same local variable requires extra work on the part of the compiler writers,
which is usually why some compilers fail to implement that
return-local-by-value optimization.
Final thought: this discussion was limited to whether there will be any extra copies of the returned object in a return-by-value call. Don't confuse that with other things that could happen in caller(). For example, if you changed caller() from Foo x = rbv(); to Foo x; x = rbv(); (note the ; after the declaration), the compiler is required to use Foo's assignment operator, and unless the compiler can prove that Foo's default constructor followed by assignment operator is exactly the same as its copy constructor, the compiler is required by the language to put the returned object into an unnamed temporary within caller(), use the assignment operator to copy the temporary into x, then destruct the temporary. The return-by-value optimization still plays its part since there will be only one temporary, but by changing Foo x = rbv(); to Foo x; x = rbv();, you have prevented the compiler from eliminating that last temporary. |
|||