Effacer les filtres
Effacer les filtres

Fastest possible code for AUC between a continuous predictor and a binary target

3 vues (au cours des 30 derniers jours)
dymitr ruta
dymitr ruta le 10 Oct 2021
Modifié(e) : dymitr ruta le 10 Oct 2021
Hi Folks,
I am on the hunt for the fastest possible Matlab code that computes empirical exact Area Under the Receiver/Operator Curve (AUC). The built-in Matlab's perfcurve returns auc as the 4th output but it is terribly slow. Here is my current fastest code, would apprreciate suggestions if/how to make it faster. cumsum(flipud(x)) appear optimizable, but I could not find any faster solution so far. Another lead could be that all variables involved are integer until the very last scaling normalization /(2*s0*s1).
%AUC - Area Under the Receiver/Operator Curve for binary target vector y
% and a continuous or binary predictor vector x
function a=auc(y,x)
n=numel(x); s1=sum(y); s0=n-s1;
if islogical(x) %Binary x case a = (1+tp-fp)/2
tp=sum(x & y); fp=(sum(x)-tp)/s0; tp=tp/s1; a=(1+tp-fp)/2;
else %Continuous numerical x case
[x,j]=sort(x);
i=[find(diff(x)~=0);n]; c=diff([0;i]);
y=cumsum((y(j,:))); y=diff([0;y(i,:)]);
tp=[0;cumsum(flipud(y))]; fp=[0;cumsum(flipud(c))]-tp;
a=sum(diff(fp).*(tp(1:end-1,:)+tp(2:end,:)))/(2*s0*s1);
end
a=max(a,1-a);
A quick speed test to beat (on a small portable laptop):
x=rand(1e7,1); y=rand(1e7,1)>.5; tic;a=auc(y,x);t=toc; [a t]
ans =
0.5001 1.7108

Réponses (1)

Walter Roberson
Walter Roberson le 10 Oct 2021
nnz() on a logical matrix is faster than sum() on the matrix
foo = rand(1,1e7) < 0.1;
N = 50;
t1 = zeros(N,1);
t2 = zeros(N,1);
for K = 1 : N; t1(K) = timeit(@() sum(foo), 0); end
for K = 1 : N; t2(K) = timeit(@() nnz(foo), 0); end
plot([t1, t2])
legend({'sum', 'nnz'})
  3 commentaires
Walter Roberson
Walter Roberson le 10 Oct 2021
The second section has no useful comments. I do not know what it is intended to do or why it is doing things that way.
dymitr ruta
dymitr ruta le 10 Oct 2021
Modifié(e) : dymitr ruta le 10 Oct 2021
Sorry, I did not expect to target the merit here but here you go:
%AUC - Area Under the Receiver/Operator Curve for binary target vector y
% and a continuous or binary predictor vector x
function a=auc1(y,x)
n=numel(x); s1=nnz(y); s0=n-s1;
if islogical(x) %Binary x case a = (1+tp-fp)/2
tp=nnz(x & y); fp=(nnz(x)-tp)/s0; tp=tp/s1; a=(1+tp-fp)/2;
else %Continuous numerical x case
[x,j]=sort(x); %Sort predictor values x and get sorted index j
i=[find(diff(x)~=0);n]); %Find indices of the last unique values of sorted x
c=diff([0;i]); %Compute the frequency (count) of all unique values of x
y=cumsum((y(j,:))); %Compute cumulative sum of targets along the order of sorted x
y=diff([0;y(i,:)]); %Compute the number of positive targets observed for each unique x
tp=[0;cumsum(flipud(y))]; %Compute true positives for the ROC curve i.e. the cumulative sum of positive targets observed along the inverse order of x
fp=[0;cumsum(flipud(c))]-tp; %Compute false positives for the ROC curve i.e. cumulative number of negative targets observed along the inverse order of x
%Compute AUC i.e. the area under the curve (integral) defined by the function: tp=f(fp), normalized within 0:1 limits
a = sum(diff(fp).*(tp(1:end-1,:)+tp(2:end,:)))/(2*s0*s1);
end
a=max(a,1-a); %Since AUC(y,x)=1-AUC(y,-x), we are looking for the maximum AUC achievable out of x predictor

Connectez-vous pour commenter.

Catégories

En savoir plus sur ROC - AUC dans Help Center et File Exchange

Tags

Produits


Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by