searching NCBI by accession number

3 vues (au cours des 30 derniers jours)
Elissa Moller
Elissa Moller le 8 Juin 2020
I am trying to search the NCBI database using accession numbers instead of gi numbers. The example function is
function mapTaxoFile(taxonomyFilenameIn, taxonomyFilenameOut, blockSize)
%MAPTAXOFILE Helper function for METAGENOMICDEMO
% Copyright 2007-2016 The MathWorks, Inc.
fid1 = fopen(which(taxonomyFilenameIn),'rt'); % from NCBI TAXONOMY FTP site
if fid1<0
error('bioinfo:mapTaxoFile:invalidFile','Cannot open input file.')
end
fid2 = fopen(taxonomyFilenameOut, 'w');% binary file used for mapping
%===create a map between gi numbers and taxids
curr = 1; % current gi to consider
while(~feof(fid1))
data = textscan(fid1, '%d %d', blockSize);
gi = data{1};
taxa = data{2};
gap = gi(1) - curr;
%=== missing gi numbers between blocks are assigned a taxid = -1
if gap
D = -1 * ones(gap, 1);
fwrite(fid2, D, 'int32');
end
%=== populate array D such that D(gi) = taxid of gi
curr = gi(end) + 1; % current gi position in the final list
offset = min(gi) - 1; % starting gi in the current block
N = max(gi) - offset; % number of gi's to consider
D = -1 * ones(N,1);
D(gi - offset) = taxa;
%=== write array D into binary file
fwrite(fid2, D, 'int32');
end
fclose all;
I successfully got the data section to run with the following code
data = textscan(fid1, '%s %s %s %s', blockSize, 'HeaderLines', 1);
accession = data{1,2};
taxa = data{1,3};
However when I get to the part where it's populating the array the input is a mixture of numbers and letter so functions like max and min will not work. Is there another way to do this? I want to make sure it's reading the block size starting from the correct point and eventually save it in a memory map. The file is massive so I dont want to load it all at once.

Réponses (0)

Produits


Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by