Effacer les filtres
Effacer les filtres

Parsing or regexp HTML output from urlread

2 vues (au cours des 30 derniers jours)
Philip  Spratt
Philip Spratt le 24 Juin 2013
I need to extract the PubMed IDs from the below HTML, but I am not too fluent in the use of regexp.
Can anyone help with how I would extract the IDs from the below HTML, and store them in a vector?
I'm guessing there is some way to say: what is between '<Id>' and '</Id>' store in...

Réponse acceptée

Tom
Tom le 24 Juin 2013
str = 'version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult880<IdList> Id>16123227</Id Id>9561342</Id Id>8429296</Id Id>1408722</Id Id>2152845</Id Id>2894889</Id Id>2860133</Id Id>6145799</Id /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> Fields</Field Count>33249</Count Explode>N</Explode /TermSet TermSet Term"Clonidine"[All Fields]</Term> Fields</Field Count>16458</Count Explode>N</Explode /TermSet OP>AND</OP /TranslationStack"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</eSearchResult>';
%isolate the ID list string
IDList = regexp(str,'(?<=IdList>).*(?=/IdList)','match');
disp(IDList{1})
%get the ID numbers from the string
IDno = textscan(IDList{1},'Id>%d</Id');
disp(IDno{1})
  1 commentaire
Philip  Spratt
Philip Spratt le 25 Juin 2013
Very much appreciated Tom
Thanks!

Connectez-vous pour commenter.

Plus de réponses (1)

Sean de Wolski
Sean de Wolski le 24 Juin 2013
  1 commentaire
Philip  Spratt
Philip Spratt le 24 Juin 2013
Must apologise, the output was HTML, hence the xml2struct didn't work.

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by