# How to determine the relationship between two variables from a plotted graph

16 views (last 30 days)
Vellan on 20 Nov 2022
Commented: Steven Lord on 20 Nov 2022
If I have two variables, let's say M and N, and I plotted a graph based on the values (M represents the x-axis and N represents the y-axis). From that graph, how do I obtain an equation that represents that graph and shows the relationship between those two variables?
If the relationship between the M and N variables are linear, then the equation that represents the relationship between those 2 variables can be determined based on straight line equation, y=mx + c. But for this case the relationship between those two variables aren't linear.
For non-linear cases, how do we obtain an equation that represents the relationship between those 2 variables?
Thanks!

Walter Roberson on 20 Nov 2022
You cannot reliably determine the relationship between the variables from a graph.
Suppose you had a trial function N = f(M) . Now consider N1 = f(M) * (1+1e-10) and N2 = f(M) + sin(2*pi*M*1e10)/1e10 . Would you be able to tell the plots of N1 or N2 apart from the plot for N ?
If the true relationship between the two involved a piecewise expression that only happened to involve a single branch in the part of the graph you see, would you be able to determine the true relationship?
If the true relationship involves an infinitely thin discontinuity, would you be able to determine the correct relationship? Suppose for example that N = M^2 except that N(pi) is defined to not exist -- would you be able to tell from the plot, even if you were able to zoom the plot arbitrarily far ?
The situation is a bit different from the case where you are given a finite list of values of M and of corresponding N: that is a situation in which you can prove that there are an infinite number of different plots that fit the data, even if you are given a large list of points. When you are given a graph, then there is a sense in which you are being given an infinite number of points instead of a finite list... but instead you run into practical problems about not really being able to examine the information infinitely precisely.
Walter Roberson on 20 Nov 2022
Example:
The ancients, and scientists especially from Galileo onwards, measured the locations of Mercury and plotted it. Was that enough to arrive at the correct equations of motion? No! You needed Lorentz and Einstein to characterize relativity before you could create more accurate equations of motion: the data alone was not enough without the later theory.
Do we have the correct equations for motion for Mercury now, a century-ish after Einstein ? We do not know: there is a lot of work being done to determine whether the Einstein-adjusted "Newtonian Gravity" is correct, with examination both at Planck scales (Quantum Gravity) and at intergalactic scales (various Modified Newtonian Gravity). Einstein-aware Newtonian Gravity together with galactic motion observations imply Dark Matter and Dark Energy, but we have not been able to find any candidates for Dark Matter or Dark Energy yet. There are serious disagreements between the two major ways of estimating galactic motion, and multiple studies trying hoping to unify the two by reducing measurement error have instead determined that the disagreement is more than expected, that the actual amount of disagreement is close to the upper bound of the previous error esimate -- strong evidence that the disagreement between the two methods is a real disagreement, not just measurement error. No-one has been able to come up with a reason yet -- though a paper released earlier this year said that you can explain the observations and disagreement well if you use one of the Modified Newtonian Gravity theories. Unfortunately the theory in question fails to explain some other observations that had been considered "settled"...
Thus, it is not just a matter of "academic interest" that there are hypothetical graphs in which the relationship between the variables cannot be calculated: there are multiple major international scientific investigations trying to figure out what the real relationship is between some things that have already been plotted.

John D'Errico on 20 Nov 2022
Walter is correct in everything said. My answer here is just to expand.
Once something becomes no longer linear, what is it? For example, suppose I were to tell you that my car is NOT red. Do you know what color it is? A flower in my garden is not yellow. What color is it?
But your problem is harder yet. You might at least guess what color that flower in my garden happens to be. (Right now, at this time of year, all flowers in my garden would be brown.) But in terms of a function that represents some arbitrary set of numbers, this is impossible to know, since there are infinitely many possible functions that would do so.
For example, what function represents this set of points:
x = [1 2 3 4];
y = [2 3 5 7];
plot(x,y,'o') syms X
P3 = dot(polyfit(x,y,3),X.^[3 2 1 0])
P3 = So is the function the cubic polynomial I show there, or were the elements of Y just chosen as the set of primes? In the latter case, there is no known function that can reliably return the set of primes as a function of N. Worse, there are infinitely many polynomials that can interpolate any list of points.
The point is, you cannot know just from looking at some set of numbers, what the nonlinear function is. Nor can any mathematical operation tell you what that function should be, given only a list of values. At best, you can pose some form as a family of functions, that you hope will represent the relatibonship in question, and then try to find which member of that family best approximates the data at hand. But in order to do this, you need in advance, to specify what family of functions to consider.
And of course, in all of this, it is far too easy for someone to decide to overfit the data, since you can always find a high order polynomial that will approximate literally any set of data. Unfortunately, the result will pretty much always be complete garbage.
So in order for you to find "the" function that best represents your data, you need to do some thinking about what family of functions would best represent it. You need to learn about modeling techniques to fit that function. That may take no more than learning how to use tools like the curve fitting toolbox, or the stats or optimization toolbox, or perhaps even how to use tools like neural nets, for truly complicated problems.
Steven Lord on 20 Nov 2022
Another example: here is a set of 5 points. If all you know is those 5 points, how can you distinguish between the three different functions I plot that go through all five?
x = [0, 1, 2, 3, 4];
y = [0, 0, 0, 0, 0];
Which of the following equations did I use to generate these plots? All three curves go exactly through the points defined by the x and y variables.
x1 = 0:(1/32):4;
plot(x, y, 'o')
hold on
plot(x1, zeros(size(x1)), 'DisplayName', 'y = 0')
plot(x1, sinpi(x1), 'DisplayName', 'y = sin(pi*x)')
plot(x1, (x1-0).*(x1-1).*(x1-2).*(x1-3).*(x1-4), ...
'DisplayName', 'y = x*(x-1)*(x-2)*(x-3)*(x-4)')
legend show 