How to read, modify and write an xml (In its original format)?

The purpose of trying to automate this step is to effortlessly work through over 1000 files. My intended workflow is:
  1. READ an XML file into a Matlab STRUCT (using readstruct) (PS: xml2struct gives a similar output). PPS: I don't want to use xmlread since manipulating data with DOM becomes extremely cumbersome, given the mild complexity of the documents I'm working with.
  2. MODIFY the contents of this Struct
  3. WRITE this struct to a (renamed) XML file with the same content structure as the original XML file.
Each file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<Profile xmlns="ABC.XYZ">
<Start>
"Content A"
</Start>
<Abra>
"Content B"
</Abra>
<Abra>
"Content C"
</Abra>
<Cadabra>
"Content D"
</Cadabra>
<Abra>
"Content E"
</Abra>
<Cadabra>
"Content F"
</Cadabra>
<Abra>
"Content G"
</Abra>
<Cadabra>
"Content H"
</Cadabra>
<End>
"Content I"
</End>
</Profile>
When I Import this into Matlab, I get a structure that looks like:
structDoc = readstruct("ABC.xml")
% structDoc is a 1x6
% structDoc.xmlnsAttribute = "ABC.XYZ"
% structDoc.Start is a 1x1 Struct (With Content A inside it)
% structDoc.Abra is a 1x4 Struct (With Content B,C,E and G inside it)
% structDoc.Cadabera is a 1x3 Struct (With Content D,F and H inside it)
% structDoc.End is a 1x1 Struct (With Content I inside it)
After modifying the contents of some of the nodes, or even manipulate the Abra or the Cadabra struct arrays, I wish to export this structure to an XML file with the SAME format. i.e., Start > Abra > Abra > Cadabra > Abra > Cadabra > Abra > Cadabra > End
Instead, the XML file I generated looks like Start > Abra > Abra > Abra > Abra > Cadabra > Cadabra > Cadabra > End
PS: I used both writestruct and struct2xml
Does anyone know of an alternative?

7 commentaires

Since the reading seems to merge the different instances already, you may need to use dummy names to make sure that structs are not merged. The writing should retain the original order as far as I understand the documentation.
You might need to write your own reader.
Do you mean something like use: Abra1, Abra2, Abra3, Abra4...etc?
Since you need to prevent overlap, you would need to use something like that, yes.
Okay perhaps this is my only solution then. Do you have an idea on how to parse the xml (in or outside matlab) such that it reads it sequentially? Or would I have to do this manually?
I don't know any way myself. You could try one of xml parsers on the FEX, there are probably several. It may be easier to modify one of those to generate unique field names.
As for the part of reading your file: you can probably use fileread and get most of the way there, but if you want to deal with richt text you need something more robust. You can get my readfile function from the FEX. If you are using R2017a or later, you can also get it through the AddOn-manager. For R2020b and later you can also use readfile=@(varargin)cellstr(readlines(varargin{:})) (note that readlines has slightly different optional parameters).
Quick update: I found this toold called xmltools in the FEX. Its actually pretty amazing. Does exactly the sort of sequence parsing, without needing to put any placeholder or so. I've tested in read and write mode, and it seems to be fine now. Hope I don't face any snags. Hehe/ Thanks a lot though @Rik!! You're response gave me a clue on where to look :P
You're welcome, good luck.

Connectez-vous pour commenter.

Réponses (0)

Produits

Version

R2020b

Commenté :

Rik
le 22 Sep 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by