What causes this inconsistency in xml parsing?

I have the following XML file:
<?xml version="1.0" encoding="utf-8"?>
<AddressBook>
<Entry>
<Name>Friendly J. Mathworker</Name>
<PhoneNumber>(508) 647-7000</PhoneNumber>
<Address hasZip="no" type="work">3 Apple Hill Dr, Natick MA</Address>
</Entry>
<Entry>
<Name>Joe P. Surname</Name>
<PhoneNumber>(334) 647-8898</PhoneNumber>
<Address hasZip="no" type="work">123 Addr Lane, Chicago NY</Address>
</Entry>
<Entry>
<Name>Mary Sue Lastname</Name>
<PhoneNumber>(508) 552-5698</PhoneNumber>
<Address hasZip="no" type="home">456 Another Addr, Boston OR</Address>
</Entry>
</AddressBook>
Which I capture in DOM form with
dom = xmlread(file)
xmlEl = dom.getDocumentElement();
There should be 3 first-level children, correct? But when I call
xmlEl.getLength();
the answer is 7!
It gets better: I then call
xmlwrite(dom);
After which xmlEl.getLength returns 3! What is happening here? Any help would be appreciated

4 commentaires

Guillaume
Guillaume le 9 Juil 2018
Matlab defers the whole xml parsing to a Java implementation
Unfortunately, its documentation is rather sparse and doesn't really explain how the dom is parsed.
A lot of the child nodes are empty #text elements. No idea what that mean.
Amen to that. I've spent a few hours attempting to track down some sort of API for matlab XML functionality, but it is surprisingly difficult to find any documentation. It seems like MW picked and chose what they wanted from Xerces DOM controls without writing it down anywhere...
Do you have any suggestions on how to write a program that can handle this consistently? It would be extremely 'ghetto' programming to make a call to xmlwrite in my function to properly format the XML tree
Guillaume
Guillaume le 9 Juil 2018
xml2struct seems to be able to handle that properly. From reading the code, it would appear that it just discards these #text empty children.
Thanks for the suggestion! It seems to work fine. Though, it is a shame that a wrapper file has to be used for very simple information extraction. Would it be worth contacting Mathworks, or do they not guarantee any reliable performance for Java-based functions?

Connectez-vous pour commenter.

 Réponse acceptée

Nathan Jessurun
Nathan Jessurun le 17 Juil 2018

0 votes

Per Guillaume's answer in the a comment above (https://www.mathworks.com/matlabcentral/answers/409506-what-causes-this-inconsistency-in-xml-parsing#comment_587489) - these extra children are text nodes, which represent the literal space in the DOM tree. Undocumented behavior in xmlwrite removes these spaces when printing, which alters the tree and yields the correct number of children.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by