Can we manipulate a file without opening it

Hello,
I have a question which I explain in bellow. Consider the following loop:
for i=1:10^6
A = Read a csv file;
A = perform some operations on A;
A= save the performed operations;
end
Apparently, the most time conssumming part is reading the file. If I use A=csvread(); then this is very time consumming. If I use fopen stuff it is
computationally cheaper but still time conssuming.
Do you have an idea to rewduce the computational time for what I intend to do?
I hope there is a way to do the above operations without actually opening any file (updating an existing file and saving the updates to the same file without opening it).
Any idea?
Thanks in advance!
Babak

8 commentaires

You cannot touch a file without opening it in some way. Essentially, opening a file is how MATLAB needs to interact with it.
OMG!
Thanks John
Stephen23
Stephen23 le 24 Nov 2022
Do you always open the same file, of do you have 1e6 different files that you are opening in this loop?
Hi Stephen,
I just have one single file.
So, all the workers independently work but all of them read a single file and modify it.
I hope there is a way (just a hope~!~!~)
Stephen23
Stephen23 le 25 Nov 2022
"So, all the workers independently work but all of them read a single file and modify it."
Workers... are you actually using parallel processing (e.g. PARFOR), but did not tell us this important information?
Sorry,
I do not know why I missed this in my post. Yes, I meant oarfor.
I am wondering whther there is a way to solve my problem?
Stephen23
Stephen23 le 25 Nov 2022
Don't read and write the file on every iteration. Just use an array and indexing.

Connectez-vous pour commenter.

Réponses (2)

Matt J
Matt J le 25 Nov 2022
Modifié(e) : Matt J le 25 Nov 2022
If you have one single file, the reading and saving of the file should probably happen outside the loop. Use the parfor loop to loop over sections of the data and keep them in Matlab memory until you are ready to save all of the results.
A = Read a csv file;
parfor i=1:10^6
A(i,:) = perform some operations on A(i,:);
end
A= save the performed operations;

3 commentaires

Matt J
Matt J le 25 Nov 2022
Modifié(e) : Matt J le 25 Nov 2022
It may be worth elaboraing on the operations that your loop is performing, so we can see if there are opportunities or vectorization.
Well, it is difficult to show my codes as they are long but I explain (I believe you do not need to know more than this).
The problem is that I have a very complex minimization problem in which none of the MATLAB solvers can solve (in the vicinity of any feasible solution there are infinite number of feasible and infeasible solutions). I, therefore, am using a very sophisticated algorithm called 'Grey-Wolf Optimizer' (GWO). GWO can solve my problem but sometimes it get traped (stagnation). The workaound to this is to re-run it several times. This is time conssumung. I, therefore, wish to run it at once using several workers.
My optimization problem never stops (I set Iter_no = 10^9) . Once there is a better solution it appears in the command window. However, it might get traped and then I have to stop the code and re-run it again in a hope that it does not get traped again. Now, I want to do this same job but using multiple workers to save time.
Of course, each worker can save the results in a separate csv file and at the end I can check which file has a better result. However, this is not what I want. What I want is to do exactly what I did for a single worker (no parfor) : I would like to see the new update from all workers in the comand window. I do not care about the order at all.
How to do this? Then I need to have a single csv file (let's call it Results.csv) and use parfor and send it to, say 10, workers. Bellow shows more details:
parfor n=1:10
Run the optimizer
for iteration =1: 10^9
.................
if (there is a better solution. Let's call it BetterS (it is a vector) and assume that its objective value is
BetterObj (it is a scalar))
fileID=fopen('Results.csv');
A=str2double(strsplit(fgetl(fileID),','));
fclose(fileID);
A=[A;[BetterS BetterObj]];
A=sortrows(A,length(A),'descend');
A=A(end,:);
disp('Estimated parameters : ');
disp(num2str(A))
writematrix(A,'Results.csv');
end
end
I hope there is way to do this!!!
Matt J
Matt J le 25 Nov 2022
Modifié(e) : Matt J le 25 Nov 2022
I don't think you should be using files to store and retrieve optimization results. I would structure the loops like this,
I=1000;
J=300;
bestValue=inf;
bestSolution=[];
for i=1:I %Loop over batches
s(1:J).Value=nan;
s(1:J).Solution=nan;
parfor j=1:J %Do a batch of optimizations in parallel
[x,fval,exitflag]=Run the optimizer
if exiflag<0 %optimization failed
continue
end
s(j).Value=fval;
s(j).Solution=x;
end
[minf,k]=min([s.Value]);
if minf<bestVal
bestVal=minf;
bestSolution=s(k).Solution;
end
end

Connectez-vous pour commenter.

I suggest that you switch to using parfeval() . Approximately
while you haven't gotten tired of it all
while number of active workers is less than number of cores
use parfeval() to create a new worker passing in a different initial condition
end
wait for a worker to finish, using a timeout
if any worker has been active longer than you want, cancel() the worker, end
if any workers have finished, fetch their results and update the notion of best, end
end
when you get tired of it all, cancel all remaining workers

1 commentaire

Hi Walter,
This is a nice approach, indeed! (I did not know about such things like parfeval before)
However, the problem is that in my optimization problem I do not set any termination criteria (it works forever). The reason is that my problem is super complex and it is difficult to know how much time the optimization solver needs and such things are problem-specific. Therefore, these are things that a 'human' should check rather than a 'machine' (well, I am not saying this is impossible to code but I think that at this stage of knowledge about meta-heuristic search algorithms it should be difficult). Sometimes, I get a solution rather fast (if stagnation does not occur) sometimes it is the opposite. It is difficult to tell a code to stop if it gets stagnated as the concepts of 'slow' and 'fast' are relative (sure, for single optimization problem I can come up with an approximative measure of slowness or fastness but my code should solve any generic problem).
So, for me if there is no way to see the best result (of all workers) in the command window this is not useful. All the solutions being proposed so far assume that there is a 'termination criteria' for a worker and this is the bottleneck which precludes to observe the best outcome in the command window (while all workers are still woirking. Actually they never finish).
I think, at this point I admit that I cannot solve my problem using a single csv file. Therefore, I use several csv files (= number of workers).
Thanks a lot!
Babak

Connectez-vous pour commenter.

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by