Fast data structure for tabu list?
2 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I am looking for a fast way to implement a tabu list where each element in the list is a vector of integers.
I am writing some optimization code where evaluating the quality of a given candidate solution is expensive so I want to record that I have evaluated a given candidate before so as to avoid doing it a second time.
I am currently doing something very simple:
ismember(candidateSolution,tabuList,'rows')
where the tabuList is a matrix where each row corresponds to a previously evaluated candidateSolution. This works well enough for small problem instances but starts to bog down when the number of rows in tabuList gets into the tens of thousands.
Presumably a more sophisticated data structure would improve performance, but I'm not sure what to try. Maybe a sparse matrix?
2 commentaires
James Tursa
le 16 Sep 2017
How many elements per row? How many total candidate solutions are we talking about? What are the ranges of the integer values? Are you willing to consider mex functions?
Réponses (4)
Walter Roberson
le 16 Sep 2017
2 commentaires
Image Analyst
le 16 Sep 2017
I didn't know about this. thanks, it could be useful. Is it old? The documentation does not say when it was introduced.
I was going to suggest a table. Could containers be an alternative to a table? And iskey() an alternative to ismember()?
John D'Errico
le 15 Sep 2017
Sparse matrices are NOT the answer here. In fact, that would be an actively terrible reason to use a sparse matrix.
You might consider the memoize function, which allows MATLAB to recognize that it has seen a set of inputs to a specific function.
2 commentaires
John D'Errico
le 16 Sep 2017
When an old release is an issue, it is always a good idea to tell us. Saves us wasting time answering, and you wasting time to tell us.
Jan
le 16 Sep 2017
Modifié(e) : Jan
le 16 Sep 2017
You can use a hash, e.g. obtained by FEX: DataHash or FEX: GetMD5. The latter is faster, but the hashing will not be the bottleneck of the code.
tabuList = {};
hash8 = GetMD5(candidateSolution, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if any(strcmp(hash, tabuList))
% Existing already
else
% New data, process...
tabuList{end+1} = hash;
end
This can be improved by pre-allocating the tabuList or perhaps by a binary search. But usually strcmp is fast, because it stops the comparison at the first not matching character.
[EDITED] The comparison is slightly faster with using all 16 bits of the CHAR type. Now this takes 37 seconds on my R2016b/64 system for 40'000 candidates with about 10'000 repeated keys. The main time is spent in any(strcmp), such that the drawback of the omitted pre-allocation is less important that the advantage of having a shorter tabuList. With a dedicated C-Mex function this can be accelerated:
tic;
tabuList = cell(1, 4e4); % Pre-allocate
iList = 0;
for k = 1:4e4
c = randi([1, 4], 1, 8);
% hash = GetMD5(c, 'Binary', 'base64');
hash8 = GetMD5(c, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if anyStrcmp8(hash, tabuList))
% Existing already
else
% New data, process...
% Add hash to the list:
iList = iList + 1;
tabuList{iList} = hash;
end
end
disp(iList)
toc
And the Mex function anyStrcmp8.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint16_T *S, *CS;
size_t iC, nC, nS;
const mxArray *C, *aC;
// Get inputs:
S = (uint16_T *) mxGetData(prhs[0]);
nS = mxGetNumberOfElements(prhs[0]);
C = prhs[1];
nC = mxGetNumberOfElements(C);
// Loop over cell:
for (iC = 0; iC < nC; iC++) {
aC = mxGetCell(C, iC);
if (aC == NULL) { // Undeclared element reached:
plhs[0] = mxCreateLogicalScalar(false);
return;
}
CS = (uint16_T *) mxGetData(aC);
if (memcmp(S, CS, 8 * sizeof(uint16_T)) == 0) {
// Matching element found:
plhs[0] = mxCreateLogicalScalar(true);
return;
}
}
// No element found
plhs[0] = mxCreateLogicalScalar(false);
return;
}
This needs 13.6 sec for 40'000 keys with 25% repetitions.
0 commentaires
Jan
le 16 Sep 2017
It is much faster to store the hash keys in an UINT64 matrix than in a cell string:
tic;
nKey = 4e4;
List = zeros(2, nKey, 'uint64');
iList = 0;
for k = 1:nKey
c = randi([1, 4], 1, 8);
hash = typecast(GetMD5(c, 'Binary', 'uint8'), 'uint64').';
if ~SearchHashKey(hash, List, iList)
% New data, process...
% Append hash to the list:
iList = iList + 1;
List(:, iList) = hash;
end
end
disp(iList);
toc
And the C-Mex function SearchHashKey.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint64_T *V, *W;
size_t iW, nW;
if (nrhs != 3) {
mexErrMsgTxt("SearchHashKey.mex: 3 inputs required.");
}
if (!mxIsUint64(prhs[0]) || !mxIsUint64(prhs[1])) {
mexErrMsgTxt("SearchHashKey.mex: Inputs must be UINT64.");
}
V = (uint64_T *) mxGetData(prhs[0]);
W = (uint64_T *) mxGetData(prhs[1]);
nW = (size_t) mxGetScalar(prhs[2]);
if (mxGetNumberOfElements(prhs[0]) != 2 ||
mxGetNumberOfElements(prhs[1]) < 2 * nW) {
mexErrMsgTxt("SearchHashKey.mex: Inputs have bad sizes.");
}
for (iW = 0; iW < nW; iW++) {
if (V[0] == W[0] && V[1] == W[1]) {
plhs[0] = mxCreateLogicalScalar(true);
return;
}
W += 2;
}
plhs[0] = mxCreateLogicalScalar(false);
return;
}
This takes 1.3 seconds for 40'000 elements only.
0 commentaires
Voir également
Catégories
En savoir plus sur Resizing and Reshaping Matrices dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!