How to determine the relationship between two variables from a plotted graph
16 views (last 30 days)
If I have two variables, let's say M and N, and I plotted a graph based on the values (M represents the x-axis and N represents the y-axis). From that graph, how do I obtain an equation that represents that graph and shows the relationship between those two variables?
If the relationship between the M and N variables are linear, then the equation that represents the relationship between those 2 variables can be determined based on straight line equation, y=mx + c. But for this case the relationship between those two variables aren't linear.
For non-linear cases, how do we obtain an equation that represents the relationship between those 2 variables?
Walter Roberson on 20 Nov 2022
You cannot reliably determine the relationship between the variables from a graph.
Suppose you had a trial function N = f(M) . Now consider N1 = f(M) * (1+1e-10) and N2 = f(M) + sin(2*pi*M*1e10)/1e10 . Would you be able to tell the plots of N1 or N2 apart from the plot for N ?
If the true relationship between the two involved a piecewise expression that only happened to involve a single branch in the part of the graph you see, would you be able to determine the true relationship?
If the true relationship involves an infinitely thin discontinuity, would you be able to determine the correct relationship? Suppose for example that N = M^2 except that N(pi) is defined to not exist -- would you be able to tell from the plot, even if you were able to zoom the plot arbitrarily far ?
The situation is a bit different from the case where you are given a finite list of values of M and of corresponding N: that is a situation in which you can prove that there are an infinite number of different plots that fit the data, even if you are given a large list of points. When you are given a graph, then there is a sense in which you are being given an infinite number of points instead of a finite list... but instead you run into practical problems about not really being able to examine the information infinitely precisely.
John D'Errico on 20 Nov 2022
Walter is correct in everything said. My answer here is just to expand.
Once something becomes no longer linear, what is it? For example, suppose I were to tell you that my car is NOT red. Do you know what color it is? A flower in my garden is not yellow. What color is it?
But your problem is harder yet. You might at least guess what color that flower in my garden happens to be. (Right now, at this time of year, all flowers in my garden would be brown.) But in terms of a function that represents some arbitrary set of numbers, this is impossible to know, since there are infinitely many possible functions that would do so.
For example, what function represents this set of points:
x = [1 2 3 4];
y = [2 3 5 7];
P3 = dot(polyfit(x,y,3),X.^[3 2 1 0])
So is the function the cubic polynomial I show there, or were the elements of Y just chosen as the set of primes? In the latter case, there is no known function that can reliably return the set of primes as a function of N. Worse, there are infinitely many polynomials that can interpolate any list of points.
The point is, you cannot know just from looking at some set of numbers, what the nonlinear function is. Nor can any mathematical operation tell you what that function should be, given only a list of values. At best, you can pose some form as a family of functions, that you hope will represent the relatibonship in question, and then try to find which member of that family best approximates the data at hand. But in order to do this, you need in advance, to specify what family of functions to consider.
And of course, in all of this, it is far too easy for someone to decide to overfit the data, since you can always find a high order polynomial that will approximate literally any set of data. Unfortunately, the result will pretty much always be complete garbage.
So in order for you to find "the" function that best represents your data, you need to do some thinking about what family of functions would best represent it. You need to learn about modeling techniques to fit that function. That may take no more than learning how to use tools like the curve fitting toolbox, or the stats or optimization toolbox, or perhaps even how to use tools like neural nets, for truly complicated problems.