What extra data is stored by an anonymous function?

I have learned recently, that anonymous functions can carry around large amounts of extra data from the workspace that they don't use, even if this data is created after the anonymous function. The following example, together with the FUNCTIONS comand, illustrates this,
function fun=test
a=1;
b=2;
c=3;
fun=@(x)x+b+a;
a=7;
b=rand(1000);
c=5;
q=3;
r=4;
end
Now, back in the base workspace, when I apply the functions() command to 'fun', I see
>> fun=test; s=functions(fun); s.workspace{:}
ans =
b: 2
a: 1
ans =
fun: @(x)x+b+a
a: 1
b: [1000x1000 double]
c: 3
I would like to understand (with official documentation if possible) what rules anonymous functions use to decide what data to carry around. The above seems to suggest that s.workspace{1} will always contain the external variables and their values that the anonymous function actually uses. Meanwhile s.workspace{2} seems to contain updates to variables that came into scope before fun was defined. Am I correct that these are the rules? But if so, then why, in the above, does s.workspace{2} contain an update to b, but not to a and c?

 Réponse acceptée

Philip Borghesani
Philip Borghesani le 10 Fév 2014
Modifié(e) : Philip Borghesani le 10 Fév 2014
I will start my answer with a quote from the documentation of functions:
The functions function is used for internal purposes, and is provided
for querying and debugging purposes. Its behavior may change in
subsequent releases, so it should not be relied upon for programming
purposes.
The output from this function for this code WILL change in a future version of MATLAB and can change in your current version:
>> fun=test; s=functions(fun); s.workspace{:}
ans =
b: 2
a: 1
ans =
fun: @(x)x+b+a
a: 1
b: [1000x1000 double]
c: 3
>> feature accel off
>> fun=test; s=functions(fun); s.workspace{:}
ans =
b: 2
a: 1
ans =
fun: @(x)x+b+a
a: 7
b: [1000x1000 double]
c: 5
q: 3
r: 4
workspace{2} will contain the final state of the function workspace at exit but might not take into consideration non-visible changes to that workspace that are optimized by the jit.
The contents of workspace{2} should be considered completely version dependent and subject to removal or being inconsistent due to current and future optimizations.

3 commentaires

Matt J
Matt J le 11 Fév 2014
Modifié(e) : Matt J le 11 Fév 2014
Thanks, Philip. So you're saying workspace{2} is version dependent, but not workspace{1}? You can say with some assurance that workspace{1} will always be there, even though FUNCTIONS is supposedly guaranteed for querying only?
Aside from this, can you explain why workspace{2} is even necessary? Of what use is it to store updates to the the variables made after the anonymous function has been defined? From the user's point of view, the only external values the function is supposed to refer to are the ones in force at the moment of the function's creation.
Anything returned by functions or even the existence of the function functions is subject to change but the contents or existence of workspace{2} is known to change. There is a slight difference there.
One note on the contents you are seeing of workspace{2}. This is not a copy of variables in the function but a pointer to the actual workspace. Nested functions or multiple anonymous functions will see the same values in workspace{2} even if they are changed by a nested function so the memory used is not usually noticed and there is little performance overhead caused by this data as long as parfor is not in the equation.
Matt J
Matt J le 11 Fév 2014
Modifié(e) : Matt J le 11 Fév 2014
I agree that that's true as long as they anonymous function is used transiently, i.e., that it goes out of scope in the same workspace where it was created. Admittedly, too, that's what you do most of the time.
However, parfor isn't the only exception to this, I don't think. If you return an anonymous function handle from a function to a calling workspace, it will prevent workspace{2} variables from going out of scope and its (potentially large) memory from being released. Similarly, when saving to a .mat file, deep copies will be made.
I think most users know how to navigate this when it comes to workspace{1} data. They know that the anonymous function uses that and so that it must be kept stored somewhere. However, workspace{2} data is data that anonymous functions never use, and the documentation doesn't warn that it is there. Thus, it seems very easy to lock large amounts of memory by accident.
I still do wonder why anonymous functions care about and keep track of workspace{2}...

Connectez-vous pour commenter.

Plus de réponses (2)

Matt J
Matt J le 4 Mar 2014
Modifié(e) : Matt J le 4 Mar 2014

0 votes

I still do wonder why anonymous functions care about and keep track of workspace{2}...
Assuming workspace{2} really has no purpose, I've posted this cleaning tool as a potential remedy
It strips away workspace{2} data leaving only workspace{1}, which presumably contains all/only the variables that the function needs.

3 commentaires

Try your function with this code and examine workspace{2} after calling fun
function fun=test
r=1;
fun=@(x) nest(x+r);
function out=nest(x)
r=r+1;
out=r+x;
end
end
Pretty tricky, Philip. Can I assume my function will act as intended if no nested functions (with externally scoped variables) are used by the anonymous function? If not, can you tell me more about when workspace{2} is used?
I can't imagine scenarios where someone would want to save an anonymous function to disk if it relied on externally scoped variables.
I can't guarantee it but I believe you are correct, workspace{2} is only needed with nested functions.

Connectez-vous pour commenter.

James Tursa
James Tursa le 6 Mar 2014
Modifié(e) : James Tursa le 6 Mar 2014
This topic has already been addressed in this thread:
For convenience I will repeat my answer here:
When you create an anonymous function handle, all variables that are not part of the argument list are regarded as constants. Shared data copies of them are made at the time you create the function handle and actually stored inside the function handle itself. They retain their value and use up memory even if you change the source of the "constant" later on in your code. E.g., if you had done this:
A = v;
f = @(x) A*x; % could have done f = @(x) v*x; and got same result
A = 2*v;
the last line has no effect on the function handle f output (EDIT). Note that if A happens to be a very large variable, its memory effectively gets "locked up" inside f and can only be cleared by clearing (or re-defining) f. E.g., in the above code snippet, the 2nd line will put a shared data copy of A inside of f. The 3rd line will cause this shared data copy to essentially become a deep data copy (it gets unshared with A at that point).
Bottom line is, once the anonymous function gets created the standard rules for shared data copies applies. At least that was the behavior I observed last year. I may need to re-examine this ...

6 commentaires

Matt J
Matt J le 6 Mar 2014
Modifié(e) : Matt J le 6 Mar 2014
the last line has no effect on the function handle f.
Not true, James (unfortunately).
As you can see from my posted example, the handle as returned by
>> fun=test;
is carrying around not only to the value b=2 in force at the time the function handle is originally defined, but also the larger array value b=rand(1000) when b is later redefined. If the handle 'fun' is now saved to a .mat file, the 1000x1000 value of b is also saved to that .mat file, consuming a lot more disk space than the data actually used by the function in its execution.
James Tursa
James Tursa le 6 Mar 2014
Modifié(e) : James Tursa le 6 Mar 2014
What version of MATLAB and OS are you running? In my versions (Win32 R2013a and Win64 R2012a) the behaviour is as I posted for the expression shown ... a large 2nd variable does not get saved to the mat file. I also ran the exact case in your FEX submission and got the same result ... the large 2nd variable does not get saved to the mat file. So perhaps the behavior of "save" or anonymous functions has recently changed, or maybe it's an OS issue. Offhand, saving variables that are not used by a function handle to a mat file would seem to be a "save" undesired new feature.
I will have to look at the nested function example above in more detail to see what is going on in that case.
Matt J
Matt J le 6 Mar 2014
Modifié(e) : Matt J le 6 Mar 2014
I also ran the exact case in your FEX submission and got the same result ... the large 2nd variable does not get saved to the mat file.
If you got the same result as in the FEX submission, then the large variable was saved to the first .mat file (tst1.mat should be 259MB).
There are posts observing these effects both here and on stackoverflow going back at least a few years, e.g.,
I'm working under Windows 7 64-bit, and can reproduce the effects in R2012a and R2013b.
I was being naive. To be explicit, I executed the code in your FEX submission and the first mat file, tst1, did NOT get the large variable written to it, contrary to what I was expecting per your description. So I assumed it was a version or OS issue. However, I had just copied and pasted the innards of the function into the MATLAB command window. When I actually created the function and executed the code, voila, large file. So the behavior is apparently altered by the fact that the code is inside a function. Seems like an undesireable feature to me, but it is what it is.
Also, to be clear, everything I have written above is still correct if my words "has no effect on the function handle f" are replaced by "has no effect on the function handle f output", which was the meaning I originally intended by the phrase. All of the shared data copy stuff I wrote about above is still true.
Matt J
Matt J le 6 Mar 2014
Modifié(e) : Matt J le 6 Mar 2014
Also, to be clear, everything I have written above is still correct if my words "has no effect on the function handle f" are replaced by "has no effect on the function handle f output"
Well, then I agree with all of it. However, it seems to pertain to a slightly different question from the one I posted.
Agreed.

Connectez-vous pour commenter.

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by