Parsing or regexp HTML output from urlread
4 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I need to extract the PubMed IDs from the below HTML, but I am not too fluent in the use of regexp.
Can anyone help with how I would extract the IDs from the below HTML, and store them in a vector?
I'm guessing there is some way to say: what is between '<Id>' and '</Id>' store in...
version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult<Count>8</Count><RetMax>8</RetMax><RetStart>0</RetStart><IdList> href = "Id>16123227</Id">Id>16123227</Id</a href = "Id>9561342</Id">Id>9561342</Id</a href = "Id>8429296</Id">Id>8429296</Id</a href = "Id>1408722</Id">Id>1408722</Id</a href = "Id>2152845</Id">Id>2152845</Id</a href = "Id>2894889</Id">Id>2894889</Id</a href = "Id>2860133</Id">Id>2860133</Id</a href = "Id>6145799</Id">Id>6145799</Id</a /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> href = "Field>All">Fields</Field</a href = "Count>33249</Count">Count>33249</Count</a href = "Explode>N</Explode">Explode>N</Explode</a /TermSet TermSet Term"Clonidine"[All Fields]</Term> href = "Field>All">Fields</Field</a href = "Count>16458</Count">Count>16458</Count</a href = "Explode>N</Explode">Explode>N</Explode</a /TermSet href = "OP>AND</OP">OP>AND</OP</a /TranslationStack<QueryTranslation>"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</QueryTranslation></eSearchResult>
0 commentaires
Réponse acceptée
Tom
le 24 Juin 2013
str = 'version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult880<IdList> Id>16123227</Id Id>9561342</Id Id>8429296</Id Id>1408722</Id Id>2152845</Id Id>2894889</Id Id>2860133</Id Id>6145799</Id /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> Fields</Field Count>33249</Count Explode>N</Explode /TermSet TermSet Term"Clonidine"[All Fields]</Term> Fields</Field Count>16458</Count Explode>N</Explode /TermSet OP>AND</OP /TranslationStack"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</eSearchResult>';
%isolate the ID list string
IDList = regexp(str,'(?<=IdList>).*(?=/IdList)','match');
disp(IDList{1})
%get the ID numbers from the string
IDno = textscan(IDList{1},'Id>%d</Id');
disp(IDno{1})
Plus de réponses (1)
Voir également
Catégories
En savoir plus sur Characters and Strings dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!