How to read the PRE html tags and replace some white spaces

Question

0 votes

I read data from html file and delimmited by the following tags

<pre>
  12.0  29132  -60.3  -91.4      1   0.01    260         753.2  753.3  753.2
  10.0  30260  -57.9             1   0.01    260     58  802.4  802.5  802.4
  9.8   30387  -57.7  -89.7      1   0.01    261     61  807.8  807.9  807.8
  6.0   33631  -40.4  -77.4      1   0.17    260     88 1004.0 1006.5 1004.1
  5.9  33746  -40.3  -77.3       1   0.17               1009.2 1011.8 1009.3
   </pre>
    by the code:
t = regexp(html, '<PRE[^>]*>(.*?)</PRE>', 'tokens');

where t is a cell of char

Well, now I am trying to replace blank space with NaN to obtain:

0  29132  -60.3  -91.4      1   0.01    260    Nan  753.2  753.3  753.2
0  30260  -57.9   Nan     1   0.01    260     58  802.4  802.5  802.4
8  30387  -57.7  -89.7      1   0.01    261     61  807.8  807.9  807.8
0  33631  -40.4  -77.4      1   0.17    260     88 1004.0 1006.5 1004.1
9  33746  -40.3  -77.3      1   0.17    NaN    NaN 1009.2 1011.8 1009.3

In this data set the columns are not always delimited by the same space and I do not know the lenght of the white spaces.

For example: in the last one line of my frist one data set there are two "empty places" that I would replace with 'NaN'. The position of all elements can't be changed (textscan function is dangerous I think)

Do you have any suggestion? Maybe I should to read the PRE tags by another way?

Thank you

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Cedric le 20 Juin 2014

Modifié(e) : Cedric le 21 Juin 2014

3 votes

I've got to run, but here is one way (I'll come back later to discuss further if needed).

EDIT: the first solution could not work, I will update it as soon as I have more information.

6 commentaires
Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens

Cedric le 21 Juin 2014

Modifié(e) : Cedric le 21 Juin 2014

Ouvrir dans MATLAB Online

Ok, this is a table with 7 characters fixed column width. So you can process it as follows

regexprep( content, ' {7}', ' NaN' )

where content is the token that is outputted by you first call to REGEXP. If you have more than 7 white spaces at the beginning of each line, e.g. because of HTML indentation, we can refine the pattern to exclude them. Just let me know.

Stefano le 23 Juin 2014

Ok, thank you! Good job! It's a perfect solution for my data. Answer accepted :)

Connectez-vous pour commenter.

How to read the PRE html tags and replace some white spaces

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

6 commentaires
Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

How to read the PRE html tags and replace some white spaces

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

6 commentaires Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

6 commentaires
Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens