# Finding an analytical fit for a multi-dimensional function

8 views (last 30 days)
Daniel H on 22 Oct 2020
Commented: Daniel H on 25 Oct 2020
I have numerical data in a MATLAB multi-dimensional array f(a,b,c,d) which I would like to fit into an analytical function with 4 arguments: f(a,b,c,d).
For a one-dimensional funcion I am fairly comfortable: I could for example choose a Taylor series and use polyfit (just solve a fitting problem via Least Squares).
However, for such a high dimensional problem I am lacking the intuition how to best do this. Let's for simplicity start with just two parameters and keep a=b=const. This is how f looks like: 1. Pick one of those curves (=also fix d). How would you pick which function to fit with? (note the log-log plot!). Would you pick just a Taylor series with logf and logc? Or do these curves look like there would be a more natural option?
2. Now suppose I would fit f(a,b,c,d) with a=b=d=const. How would you proceed fitting a variable d?
3. I hope that I could tackle variable a and b in the same way as (2)
Note that the dimension of f is different, for example:
>> size(f)
ans =
10 7 42 10
PS: I found https://www.mathworks.com/matlabcentral/fileexchange/34765-polyfitn ... however, the independent variables have to be the same and the dependent variable (f in my case) has to be 1 x n. But for my case, it is a multidemensional array.

Walter Roberson on 22 Oct 2020
How would you pick which function to fit with? (note the log-log plot!). Would you pick just a Taylor series with logf and logc?
Unfortunately, no:
• any finite number of terms of a taylor series produces answers that are infinitely wrong for any periodic function and most major transcendental functions... including log. Remember that any finite taylor series is a polynomial, and that polynomials must go to +/- infinity at +/- infinity
• Given any finite set of points stated to finite precision, there are an infinite number of functions that fit the points exactly (to within round-off error.) If it is admitted that the experimental data is only approximate, then there would be one sense in which there would be even more (than infinite) number of functions that fit to within error, except that the first infinity is already the infinity of continuous values and the "larger" infinity would be the same infinity.
• I have posted constructive proof of the above, showing how to explicitly construct an infinite number of functions that fit exactly. It is not just one of those cases where you can demonstrate that such a function must exist "somewhere": I have posted instructions on how to build them.
• Because of the above, any function that you choose that fits the data, has a 100% probability of being the "wrong" function unless you have external reason to know that one function is more right than another. The probability that any one function is the "right" one is 1 out of infinity, and that is exactly 0.
Therefore, there is no possible way to correctly find a function that fits experimental data: not unless you have prior information constraining the form of the functions -- not unless you can use a proposed function to predict the response to experimental conditions and predict the response under alternative models, and can perform tests to check to see which version is correct.
Consider, though, even a very simple linear equation, predicting that an object would travel in a perfectly straight line. You might visually observe the object travel and sample it and find it has very good agreement. But did you think to check to see whether there is perhaps a very slight wave component to the travel, with the object perhaps not traveling exactly straight and inside has a very very small sine component to the path? Is a photon a wave or a particle?

Daniel H on 24 Oct 2020
Ok, that's really disappointing :( (BTW, these drop out for the trace of a specific random matrix after inverting it if that helps ... so it involves the characteristic polynomial and likely many terms).
The numerical issues for this specific problem aside, how would you tackle a multi-dimentional fit?
I get we would start with one line (say the blue one), in my case I have (with the mentioned toolbox):
Tce_inv1 = squeeze(mean(Tce_inv,1));
% size(f) = [ 10, 7, 42, 10 ]
% we start with fitting versus the third dimension ("c") on the plot, keeping a=b=d fixed
log_f = log10(f);
log_c = log10(c);
plot(log_c, squeeze(log_f(4,7,:,1))); % a=4, b=7, d=1
pFit = polyfitn(log_c, squeeze(log_f(4,7,:,1)), 5);
fit1 = polyvaln(pFit, log_c);
hold all;
plot(log_c, fit1);
But as I said, I am stuck also fitting this "d". Conceptually. Even if it won't work well for this instance of data...
Thanks!!
Walter Roberson on 24 Oct 2020
In your latest code you test a very specific model. If you have a specific model with unknown coefficients then you can fit the coefficients to the data. Sometimes that is fairly easy, but sometimes it can be pretty difficult to find the best fit (it becomes a global minimization problem.)
The part that is lacking is knowing that the function you are fitting has anything to do with the mathematics of the situation. Like why choose degree 5 and not degree 55? Why should you think that a polynomial is at all appropriate when there might be transcendental functions like exp involved?
Daniel H on 24 Oct 2020
there is no physical background (I said it’s coming from the trace of the inverse of a matrix).
And it’s quite easy:
1.) yes, ideally I would find the function that gives the best fit (with transcendental functions). But as a fallback, according to Weierstrass theorem I can always find a fit (subject to certain constraints) with a Taylor series. And that’s what I went for.
2.) I start with degree 2 and see it doesn’t fit. I increase the order and found 5 gives me a great fit. No need increase further.
In the end it’s an empiric fit and I need to cross check by plotting the fit on top of the fit data.

John D'Errico on 24 Oct 2020
Read again what Walter has said. There is much to learn for you.
You might also look at some of what is found in this post, which tries to teach people about how to build nonlinear functional forms, from primitive shapes:
But what you should consider are fundamental behaviors in the process you want to model.
Is this function asymptotically linear on a log scale? Then you want to choose a model that is asymptotically linear. What can I say? A polynomial does NOT have that character. Period.
Next, there are some tricks you can use that can work. First, fit each curve, using a basic model that fits that curve shape well.
Then you take the coefficients from each model. Now model those coefficients as a function of other parameters. For example, suppose you had an entire family of linear models? Thus
z = a1*x + b1 % curve 1
z = a2*x + b2 % curve 2
z = a3*x + b3 % curve 3
But assume that some other parameter was varied to create each of those curves? Now you would fit a as a function of y, as well, fit b as a function of y. So you now have a model of the form
z = a(y)*x + b(y)
Of course, this can get more complicated. But you are the one who understands what your data represents, what models would make sense.

#### 1 Comment

Daniel H on 25 Oct 2020
Regarding your first part: I am not sure if I have "fundamental behavior in the process that I want to model". All I have is the data itself. If it helps, the data comes from a an expression , where and is a random matrix accrding to some structure. In the above code, a denotes the variance of the random elements, b denotes the columns of , c denotes the rows of and d is another parameter (related to the matrix structure). I run the experiment thousands of times filling with random elements according to its structure and then take the mean value.
Hence I am fairly confident that indeed all functions get asymptotically linear on the log-log plot and even meet when c goes to infinity. My best bet so far was to fit what I have as well as possible with a finite polynomial.
Thanks for the link, I am going to have a look at it!
Regarding your last part: This comes close to what I am looking for! I am just trying to think how I would best implement this in MATLAB, given my multidimensional array f(a,b,c,d).
I got it for a 2D function (see my sample code below) but that's already 45 lines of code, not mentioning this is a toy example. And I am struggling to imagine how I could add yet another dimension. I guess I would need to wrap a for loop around the whole thing and repeat every step for each value in "b":
Is there any way to automate this?
b = [ 1 3 4 ];
c = linspace(0,1,1000);
d = [ 0.01 0.1 0.2 0.3 0.5 ];
F = zeros(length(b), length(c), length(d));
for ll=1:length(d)
for kk=1:length(b)
F(kk,:,ll) = (1 + d(ll)/3*c + 0.5*d(ll)*c.^2) * b(kk);
end
end
F = F + 0.01*randn(size(F));
bsel = 1; % parameter b fixed for now
% FIT 1: fit c
models1 = {};
coeffs1 = zeros(length(d), 3);
for ll=1:length(d)
models1{ll} = polyfitn(c, squeeze(F(bsel,:,ll)), 2);
coeffs1(ll, :) = models1{ll}.Coefficients;
end
plot(c, squeeze(F(bsel,:,:)));
hold all;
for ll=1:length(d)
plot(c, coeffs1(ll,3) + coeffs1(ll,2)*c + coeffs1(ll,1)*c.^2);
end
% FIT 2: fit d
models2 = {};
coeffs2 = zeros(3,3);
for ll=1:3
models2{ll} = polyfitn(d, coeffs1(:,ll), 2);
coeffs2(ll, :) = models2{ll}.Coefficients;
end
figure
plot(c, squeeze(F(bsel,:,:)));
hold all;
for ll=1:length(d)
f = (coeffs2(3,3) + coeffs2(3,2)*d(ll) + 1*coeffs2(3,1)*d(ll)^2) * 1 + ...
(coeffs2(2,3) + coeffs2(2,2)*d(ll) + 1*coeffs2(2,1)*d(ll)^2) * c + ...
(coeffs2(1,3) + coeffs2(1,2)*d(ll) + 1*coeffs2(1,1)*d(ll)^2) * c.^2;
plot(c, f);
end
% FIT 3: fit b
% ??