searching a given line in a text file

8 vues (au cours des 30 derniers jours)
Ram
Ram le 28 Fév 2011
The following file is a txt file in sdf format(chemical structures) It looks sumthin lik this
7 9 1 0 0 0 0
7 14 1 0 0 0 0
8 10 1 0 0 0 0
8 15 1 0 0 0 0
9 10 2 0 0 0 0
9 16 1 0 0 0 0
10 17 1 0 0 0 0
12 13 1 0 0 0 0
13 18 1 0 0 0 0
13 19 1 0 0 0 0
13 20 1 0 0 0 0
M END
> <PUBCHEM_COMPOUND_CID>
2244
> <PUBCHEM_COMPOUND_CANONICALIZED>
1
> <PUBCHEM_CACTVS_COMPLEXITY>
212
I need to extract just the information under the CID number field and there could be multiple CID number fields in a single file.. How should I go about this?? Any help would be appreciated..

Réponse acceptée

Ram
Ram le 1 Mar 2011
I tried sumthin lik this
[A,B]=uigetfile('*.sdf','sdf');
C=fopen(A,'r');
n=0;
i=<ui>; %number of structures -- wil be obtained from the user
pubchem_id=[];
z=<ui>*300; %rough approximation-- 300lines for each structure
for j=1:1:z
D=fgetl(C);
if strcmp('> <PUBCHEM_COMPOUND_CID>',D)
E=fgetl(C);
E = str2double(E);
pubchem_id=[pubchem_id; E]
end
end
and it worked :)
  2 commentaires
David Young
David Young le 1 Mar 2011
The for loop that looks at 300 lines only is a hostage to fortune: what if there are more than 300 lines for a structure? You could avoid this by using a while loop that kept looking until it either found a particular line, or came to the end of the file, and that would be far more robust.
Ram
Ram le 4 Mar 2011
I din use while loop because there is no such thing in an sdf that marks the end of the file.. lik for instance $$$$ marks the end of each structure and there could be multiple $$$$'s depending on the number of structures.. a structure averagely has about 180 lines so 300 is actually redundant and when thr are more 300 lines it wil be compensated by the ones that have less than 300..

Connectez-vous pour commenter.

Plus de réponses (1)

Walter Roberson
Walter Roberson le 28 Fév 2011
Not much you can do except fgetl() through the file until you encounter the M END line, and do the extraction work from there. The ease of extracting after that would depend upon the regularity of the data after that and upon which fields you were interested in.
  1 commentaire
Ram
Ram le 1 Mar 2011
thank u so much:) i have built my code based on ur reply only :)

Connectez-vous pour commenter.

Catégories

En savoir plus sur Workspace Variables and MAT Files dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by