Convolution of log-transformed random variables

BACKGROUND I have a random variable RV3 = RV1 + RV2 and i would like to determine p(RV3) by convolution, i.e. p(RV3) = conv(p(RV1), p(RV2)). Importantly, p(RV1) and p(RV2) are very skewed with extremely long tails. The set up is thus as follows:
X=linspace(0,1e5,1e6) % domain of Y1 and Y2
Y1=normhist(RV1,X) % pdf 1 (sum==1)
Y2=normhist(RV2,X) % pdf 2 (sum==1)
Y3=conv(Y1,Y2) % convolution, sum==1 if everything goes fine
PROBLEM Due to the skewness it is hard/impossible to choose an X that always works ("works" in this context means that sum(Y3)==1.00): i need both a fine spacing and a huge domain (for those interested: the RVs are likelihood ratios; about half the data is in [0,1] and the other half in [0,Inf)). For reasons that are not worth going into here, renormalizing after conv() is not an option.
QUESTION Since log(RV1) and log(RV2) are roughly normally distributed, I probably wouldn't get numerical problems if i could do the convolution on log-transformed RVs and then somehow "correct" for the log transform after the convolution, i.e.,
X_log=logspace(-10,10,1e5);
Y1_log=normhist(log(RV1),X_log);
Y2_log=normhist(log(RV2),X_log);
Y3_log=conv(Y1_log,Y2_log);
Y3_trans=do_some_magic(Y3_log)
So the essence of my question is: does a function do_some_magic() exist such that Y3_trans is identical to Y3 in the first snippet of code (but without the numerical problems)?
PS: Other solutions are welcome too of course PPS: I would also be happy knowing the distribution of log(RV3) instead

Réponses (1)

Matt J
Matt J le 7 Sep 2015
Modifié(e) : Matt J le 7 Sep 2015
I think we need to scrutinize the premise of the question a little more closely. I don't really understand why you are not reliably getting sum(Y3)=1.00. Maybe it has something to do with the fact that I'm using FFT-based convolution, but even with 1e7 elements and uniformly randomized Y1, Y2 (can't get heavier tails than that), the convolution output is well-normalized inherently:
>> Y1=rand(1,1e7); Y1=Y1/sum(Y1); Y2=rand(1,1e7); Y2=Y2/sum(Y2);
>> Y3=ifft(fft(Y1,2e7).*fft(Y2,2e7),1e7,'symmetric');
>> sum(Y3)-1
ans =
-1.3656e-14

2 commentaires

Ronald van den Berg
Ronald van den Berg le 8 Sep 2015
Modifié(e) : Ronald van den Berg le 8 Sep 2015
Matt, thanks for your response. We seem to have different definitions of "heavy tails". In your example, Y1 and Y2 only take values in [0,1], but what i meant was a distribution in which most of the probability mass is at low values (e.g. range [0,1]) but also some at extremely high values (likelihood ratios can easily go to Inf). E.g., Y1=Y2=hist(lognrnd(3,7,1,1e5),X).
To clarify matters, what i really need is a function Y3 = do_conv(X,Y1,Y2) with the following properties:
  • Y1 and Y2 are probability distributions with long tails, discretized (binned) on X and normalized to 1
  • Y3 is the convolution of these distributions, normalized and on the same domain X
Even when using fft-based convolution, it seems impossible to choose a proper choice for X, because:
  • X needs to cover 0 to near-infinity
  • X needs to be finely spaced, because large part of the probability mass is in [0,1]
If it's possible at all, i think they only way would be by intermediately either working on log-transformed Y1 and Y2 or using log-spaced X instead of linearly, but i don't see how this can be done :(
Matt J
Matt J le 8 Sep 2015
Modifié(e) : Matt J le 8 Sep 2015
Could you attach a .mat file with an example of data that "doesn't work"?

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by