Efficient algorithm to compute only the most probable sequences of a random variable out of all possible sequences

1 vue (au cours des 30 derniers jours)
I'm looking for a better way to compute the possible sequences of a random variable whos value at time k is given by.
x(k) = 1 with probability p(1), 2 with probability p(2), ... np with probability p(np).
However, since the number of possible sequences increases exponentially with n and np, I only want to compute the most probable sequences which have a combined probability less than a threshold p_cut.
I've devised a function to do this below but it has two major drawbacks.
  1. It doesn't generalize to sequences longer than 3.
  2. It is quite memory inefficient since it always computes all possible permutations when only a subset may be needed.
function S = most_probable_sequences(p, n, p_cut)
assert(sum(p) == 1);
np = numel(p);
switch n
case 1
all_perms = (1:np)';
case 2
[X,Y] = ndgrid(1:np,1:np);
all_perms = [X(:) Y(:)];
case 3
[X,Y,Z] = ndgrid(1:np,1:np,1:np);
all_perms = [X(:) Y(:) Z(:)];
otherwise
error("n > 3 not implemented")
end
probs = prod(reshape(p(all_perms), [], n), 2);
[probs_sorted, order] = sort(probs, 'descend');
most_prob = cumsum(probs_sorted) <= p_cut;
S = all_perms(order(most_prob), :);
end
Examples of correct output of this function:
>> S = most_probable_sequences([0.95 0.04 0.01], 1, 0.99)
S =
1
2
>> S = most_probable_sequences([0.01 0.04 0.95], 2, 0.99)
S =
3 3
3 2
2 3
3 1
  4 commentaires
Bill Tubbs
Bill Tubbs le 23 Juin 2021
Thanks, yes I'm expecting some of the orderings to be arbitrary when p_cut < 1. A next step might be to think about what particular orderings I want but at the moment I'm just trying to cap the total number sequences. As long as the number of sequences is a minimum.
Matt J
Matt J le 23 Juin 2021
Modifié(e) : Matt J le 23 Juin 2021
It is still a question though, whether you need the sequences that differ only by ordering listed explicitly. You could save memory and computation time if instead of listing the solution as,
S =[ 3 3
3 2
2 3
3 1];
you instead listed the result in tabular form like,
C=cell2table({[3 3], 1; [3 2], 2; [3 1], 1},'VariableNames',{'Sequence','Permutations'})
C = 3×2 table
Sequence Permutations ________ ____________ 3 3 1 3 2 2 3 1 1

Connectez-vous pour commenter.

Réponse acceptée

Matt J
Matt J le 23 Juin 2021
Modifié(e) : Matt J le 24 Juin 2021
This might work, but it requires uniqueperms from the file exchange:
p=[0.01 0.04 0.95];
S = most_probable_sequences(p, 2, 0.99)
S = 4×2
3 3 2 3 3 2 1 3
function S=most_probable_sequences(p, n, p_cut)
assert( abs(sum(p)-1) <= 1e-12);
S = recursor(p, n, p_cut);
end
function S = recursor(p, n, p_cut,~)
if n==1
[p,S]=sort(p(:),'descend');
cut=cumsum(p)<=p_cut;
S=S(cut);
return
end
s=recursor(p, n-1, p_cut,[]);
s=unique(sort(s,2),'rows');
np=numel(p);
ns=size(s,1);
[I,J]=ndgrid(1:ns,1:np);
s=unique(sort([s(I(:),:),J(:)],2),'rows');
s=num2cell(s,2);
for i=1:numel(s), s{i}=uniqueperms( s{i} ); end
s=cell2mat(s);
sp=prod(p(s),2);
[Probs,is]=sort(sp(:),'descend');
cut=cumsum(Probs)<=p_cut;
S=s(is(cut),:);
end
  6 commentaires
Bill Tubbs
Bill Tubbs le 23 Juin 2021
What is uniqueperms()? I see more than one thing on File Exchange with this name.
Matt J
Matt J le 23 Juin 2021
I used this one,
but I suppose it doesn't matter, if they all do the same thing and are equally fast.

Connectez-vous pour commenter.

Plus de réponses (2)

Matt J
Matt J le 24 Juin 2021
Modifié(e) : Matt J le 24 Juin 2021
but it has two major drawbacks....It doesn't generalize to sequences longer than 3.
The first drawback at least is easy to fix.
function S = most_probable_sequences(p, n, p_cut)
assert( abs(sum(p)-1) <= 1e-12);
np = numel(p);
[all_perms{1:n}] = ndgrid(1:np);
all_perms=reshape( cat(n+1,all_perms{:}) , [],n);
probs = prod(reshape(p(all_perms), [], n), 2);
[probs_sorted, order] = sort(probs, 'descend');
most_prob = cumsum(probs_sorted) <= p_cut;
S = all_perms(order(most_prob), :);
end
This will be faster, but less memory efficient than my other answer.

Matt J
Matt J le 24 Juin 2021
Modifié(e) : Matt J le 24 Juin 2021
Here is a far more efficient version of my original answer based on my recommendation above to avoid explicitly listing permutations of the same sequence. It requires no File Exchange downloads and, as demonstrated below, easily handles up to n=12 and np=10.
%Small example
S = most_probable_sequences([0.01 0.04 0.95], 2, 0.99),
S = 3×2 table
Sequences NumPerms _________ ________ 3 3 1 2 3 2 1 3 1
%Large example
p=rand(1,10);p=p/sum(p);
tic
S = most_probable_sequences(p, 12, 0.99);
toc
Elapsed time is 1.789715 seconds.
function S=most_probable_sequences(p, n, p_cut)
assert( abs(sum(p)-1) <= 1e-12)
S = recursor(p(:), n, p_cut);
end
function S = recursor(p, n, p_cut)
if n==1
[p,S]=sort(p(:),'descend');
ncut=find(cumsum(p)<=p_cut,1,'last');
S=S(1:ncut);
S=table(S,ones(size(S)),'VariableNames',{'Sequences','NumPerms'});
return
end
s=recursor(p, n-1, p_cut);
s=s.Sequences;
np=numel(p);
ns=size(s,1);
[I,J]=ndgrid(1:ns,1:np);
s=unique( sort( [s(I(:),:), J(:)] ,2) ,'rows','stable'); clear I J
ns=size(s,1);
X=repmat((1:ns).',1,n);
K=histcounts2(X,s,1:ns+1,1:np+1);
F = cumprod([1 1 2:n]) ; % trick to get all needed factorials (note that all K < N)
w = F(n+1) ./ prod(F(K+1),2) ; % number of unique permutations
clear F K X
sProbs=prod(p(s),2); %single sequence probabilities
wProbs=w.*sProbs; %weighted probabilities
[wProbs,order]=sort(wProbs(:),'descend');
sProbs=sProbs(order);
c=cumsum(wProbs);
mcut=find( c<=p_cut,1,'last');
kcut=floor( (p_cut-c(mcut))/sProbs(mcut+1));
ncut=mcut+(kcut>0);
order=order(1:ncut);
S=s(order,:);
w=w(order);
if kcut>0, w(end)=kcut; end
S=table(S,w,'VariableNames',{'Sequences','NumPerms'});
end

Produits


Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by