Big Data analysis using Matlab and Database connection, is it possible ?

1 vue (au cours des 30 derniers jours)
Hi,
I have a code that analyze large data in a database like follows:
for i=1:N %N is large
A=exec(con,query);*%retrieving ONLY one row of data from a database
A=fetch(A);
A=A.data;
... %other calculations
end
It always stall [GC/memory error] after few hours. I already tried both of these methods to get around it:
  1. Java.opts solution
  2. JheapCl Solution [Garbage Collector]
But still have no luck. I already increase my Java Heap memory to 8Gb and my machine memory is quite large 32Gb. Furthermore I only retrieve one row at a time in each iteration and use the same variable, therefore I don't think it's because of memory insufficiency.
Can anybody help me with this issue, any help will be greatly appreciated. Thanks.

Réponse acceptée

Yair Altman
Yair Altman le 15 Oct 2013
Modifié(e) : Yair Altman le 15 Oct 2013
1. Try to disconnect from the DB every now and then, ensuring that all references are explicitly cleared so that there's no dangling references out there that cannot be GC'ed. After disconnecting and before you reconnect, perform a manual GC (JheapCl makes this easy, or run the simpler java.lang.System.gc).
2. Try to fetch data in bulks rather than in separate rows. This would improve performance as well as decrease resources.
3. Ensure that you're connecting to the DB directly via JDBC rather than an ODBC bridge
Yair Altman
  1 commentaire
Taufik Sutanto
Taufik Sutanto le 16 Oct 2013
Thank you for the suggestions. I connect directly using the JDBC connection. I tried retrieving 1000 rows per query and closing the connection after 10 queries. In between closing and re-openning the connection I was trying to clear the heap and even the workspace. I tried the following:
NN=ceil(N/blok_data);
for i=1:NN
a_function(con1,x,y,z,start,Blok_data);
%the function retrieve data from database & process it
start=start+blok_data;
if mod(i,10)==0
close(con1); clear con1; warning off;
save('c:\tmp\RBCC_Data.mat');
clear all; clear java; jheapcl();warning on;
load('c:\tmp\RBCC_Data.mat');
con1=connection_function(usr,db,passwd);
end
end
It helps delay the error, but Matlab still not responding after about 5 million calls. [I am hoping it can go up to at least 10 million calls].
btw, do you know why even after I close the connection and clear the connection variable I got warning message "the Jdbc not serializable" when trying to re-connect? [hence my "warning off" line in above code]

Connectez-vous pour commenter.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by