Early Local Variable Release: What and Why

The main entry point of this new "feature" is RSP-30050.
This entry describes the new behavior of the compiler. If you don't have access to the Quality portal, here are the highlights.

Since years, if a function returned an interface instance, then this instance would remain active until the end of the current function, and the compiler generated an eventual := nil statement at its final end;.
It is a very common way of automatic memory management in Delphi code. A lot of API return an interface instance, which would manage automatically its lifetime using proper reference counting. No need of try ... finally Free block, because the compiler will generate it for you.

The fact that the hidden local variable was only released at the function ending was used sometimes, e.g. to create "auto-free" class features, or change the mouse pointer on screen during a process.
This RSP is about a change of lifetime: now the instance is released sooner, before the final end; statement of the function.

In practice, the compiler changed its behavior when compiling the following code:

procedure Test(anObject : TObject = nil)
  if not Assigned(anObject) then
    AutoFree(anObject, TMyClass.Create);
  end; //Delphi 10.4 destroys IAutoFree here
  ... some other code
end; //Delphi 10.3 destroys IAutoFree here

The final reasoning from Embarcadero, in the RSP, is the following:

  • Q/ "should we change our code to use Delphi 10.4 or newer?"
    A/ You should change your code. We have been considering options, but we need a better definition of the lifetime of temporaries (which was undefined in the past) and that's going to be at the most local scope level – like that of an inline variable. This is what most other programming languages do and helps the compiler optimize the generated code.
  • Sync status from internal system, internal issue closed as " Works As Expected " on Jul 26, 2021 with comment: The lifetime of temporaries (which was undefined in the past) and is at the most local scope level – like that of an inline variable. This is what most other programming languages do and helps the compiler optimize the generated code.

So this is the "Delphi 11 Alexandria" way of thinking.

Pretty clear, and making sense - at least from the Embarcadero team point of view.
From the user point of view, the benefit is not so obvious. Changing perfectly working code, on a huge project, with the risk of changed behavior, random GPF, exceptions or memory leaks, just to follow "what most other programming languages do" (tm) does not convince me.
They already did it with ARC or RawByteString... and they came back to common sense, after a few years.

Of course, here the impact is much less than with ARC. But it is the very same logic. Why waste our time and money?
My point is that Embarcadero should be more customer focused.

Better Performance?

Theoretically speaking, from better local variable management comes better code.
This is perfectly true for highly optimized compilers like GCC or LLVM. They do wonders when generating code. I push you to consider viewing this great video of Matt Godbolt about "What has my compiler done for me lately?". Fun and exciting for sure.

But the Delphi compiler, even with its LLVM backend, is not at this level of integration. For a regular VCL/FMX application using a database, generated code is fast enough. They should better fix inlining issues, which sometimes induce some performance problems - just check how functions returning floats are implemented. What is possible with a full LLVM stack is not possible with a Delphi front-end, because LLVM is so complex and changing, and requires a full compiler stack from front-end to back-end to leverage its full optimizing power.

What we did for years with Delphi, to leverage its performance, is to follow some simple rules, like:

  • Make it right, then make it fast;
  • Identify the real bottlenecks using a profiler: don't guess;
  • Avoid unneeded calls;
  • Use tuned libraries;
  • Avoid memory allocation;
  • Avoid copies or reference counting;
  • Avoid hidden try...finally;
  • Better register allocation by using a sub-function for loops.

Check our blog article and the slides and code proposed at Ekon 22.

The last point is what interests us.
Local variable allocations don't make a performance difference in normal code. It only makes a difference within a loop of thousands of occurrences. With a very small function, including only a processing loop and a few input parameters, we ease registers allocation, and performance is enhanced. For one-way simple code, stack variable allocations do not matter much in terms of performance.

In practice, writing a SubCall() dedicated function is the way to go for performance:

procedure SubCall(p: PIntegerArray; n: integer);
  i: PtrInt;
  for i := 0 to n - 1 do // here every variable will be registers
    p[i] := i;

procedure TTestEkon22Performance.BetterRegisterAllocation;
  ints: TIntegerDynArray;
  i, j, n: integer;
  timer: TPrecisionTimer;
  SetLength(ints, 50000);
  n := 1000;
  for j := 1 to n do
    for i := 0 to high(ints) do // here some variables will be allocated on stack - even when inlining "for var i ..."
       ints[i] := i;
  NotifyTestSpeed('regular loop', length(ints) * n, length(ints) * n * SizeOf(Integer), @timer);
  for j := 1 to n do
    SubCall(pointer(ints), length(ints));
  NotifyTestSpeed('dedicated call', length(ints) * n, length(ints) * n * SizeOf(Integer), @timer);

Of course, we may argue that local scope can increase performance, because initialization/finalization are delayed or by-passed.
This was the point of this good blog article.
But in practice, the benefit is not so obvious, because on some platforms, creating nested try..finally blocks for each local variable scope actually slows down the execution, or increase the linking time and executable size, because more exception traps are to be generated.

Inlined variables have undoubtful benefits, e.g. within a loop or to reduce the code verbosity when the type is known, and complex - which is the case with generics.
So we will be fair, and read the Grijjy blog article until its conclusion: "Use With Care" and "there are some drawbacks too". No magic bullet.

Show Me The Code

We could find micro-benchmarks where this could make a difference.
But I don't like micro-benchmarks. Real code does not lie, and from what I have seen, in real production code, there is almost no performance change since Delphi 2009. Only a few percents more or less depending on the use case. We observed noticeable boost at generics code level for sure. But it comes more from RTL optimization and (iterative) rewrite, than from compiler improvements. The biggest performance boost was in Delphi 2006/2007, back when inlining was introduced in the compiler. Since then, some kind of values (like floats) have troubles being inlined. Also sometimes generated code is incorrect, or just trigger Internal errors - just ask any library maintainer using Delphi generics and inlining...

About code, the main argument is that proper coding should require small functions. It is the truth since early days of programming.
If you have dozen of local variables, and hundredths of lines of code within a function, this is really time for refactoring into a class or a record. Don't hope that the compiler make your code any faster or maintainable.

So I doubt changing the local variable lifetime would make any difference for Delphi end-users.
If only I could be wrong - but don't show me micro-benchmarks, they are pointless. The mORMot regression tests for instance, are more convincing. And they tend to be slower year after year due to Windows itself (background tasks like antivirus, slow NTFS, new CPU security mitigations...). For raw data processing, when OS is not involved, they tend to give almost the same timing since Delphi 2007. The fastest execution is on FPC + Linux, mostly due to the OS itself - and slightly to FPC better inlining abilities.

In Practice, For mORMot Users

As a consequence, the behavior of some well used mORMot features did change with 2021's Delphis:

  • TSynLog.Enter and automatic Leave generation in the logs;
  • TAutoFree and automatic memory management of classes;
  • _Safe() returning a PDocVariantData on a temporary variant.

We discussed this in our forum here and here.

About TSynLog.Enter and TAutoFree, it was already the case with FPC. So for cross-compiler code, you should already use a local variable, or an explicit with statement.
So I am fine with that.

My concern is about _Safe() and a temporary variant. It works fine on FPC, but is broken on latest Delphi. So we have introduced _DV() which returns a TDocVariantData and not a PDocVariantData which is slower, but safer. For me, it is a regression, and it should be fixed. We will see what would happen on Embarcadero side.

To circumvent these issues, Eugene suggested that we may use Custom Managed Records as replacement on new version of Delphi. But I am not sure we have the warranty that it is not affected.

Feedback is welcome on our forum, as usual!