Importing table/array from website to Matlab

1 vue (au cours des 30 derniers jours)
Erik Börgesson
Erik Börgesson le 3 Juil 2019
Hello. I am trying to import the central table from this web url 'https://tennisabstract.com/reports/atp_elo_ratings.html', from the players name to the "grass" column into MatLab, as an array or table. I want to do it this way because it keeps updating every week, but I do not know how to approach this problem which makes me need your help.
Thank you, Erik

Réponse acceptée

Guillaume
Guillaume le 3 Juil 2019
First, note that importing data from html is always going to be very iffy. html is a presentation format designed to display things to humans, it's not design for data transfer and you're going to have to remove all the presentation cruft to get at your data.
So, the first thing you should try is contacting the website to see if they have a direct interface to the underlying database.
Bearing this in mind, the following will import your data with the current format of the website. Any change, even minor to the format of that page may break the code.
%definition of patterns used to locate required information with a table row:
intpattern = '<td[^>]*>(\d+)</td>';
linkpattern = '<td[^>]*><a[^>]+>([^<]+)</a></td>';
numberpattern = '<td[^>]*>(\d+(\.\d+)?)</td>';
anypattern = '<td[^>]*>([^<]+)</td>';
emptypattern = '<td[^>]*></td>';
%read and parse html
html = webread('https://tennisabstract.com/reports/atp_elo_ratings.html');
tabledata = regexp(html, ['<tr[^>]*>', ...
intpattern, ...
linkpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
numberpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
anypattern, ...
numberpattern, ...
numberpattern], 'tokens');
assert(~isempty(tabledata), 'Failed to parse html according to pattern. The format of the page may have changed');
tabledata = cell2table(vertcat(tabledata{:}), 'VariableNames', {'Rank', 'Player', 'Age', 'ELO', 'Hard', 'Clay', 'Grass', 'Peak_Match', 'Peak_Age', 'Peak_ELO'});
tabledata = convertvars(tabledata, [1, 3:7, 9, 10], @str2double);
tabledata.Player = strrep(tabledata.Player, '&nbsp;', ' ')
  5 commentaires
Guillaume
Guillaume le 3 Juil 2019
Don't use c as a variable name. It's meaningless and doesn't say anything about what it contains.
Assuming, you've imported the data as tabledata:
>> elo(tabledata, 'Ivan Nedelko', 'Kevin King')
ans =
0.230209216637309
0.769790783362691
Guillaume
Guillaume le 3 Juil 2019
Note: you could calculate the odds for matches of every player against any player with:
Q = 10 .^ (tabledata.ELO / 400);
odds = Q ./ (Q + Q.');
odds(r, c) is then the odds of tabledata.Player{r} winning against tabledata.Player{c}

Connectez-vous pour commenter.

Plus de réponses (1)

Erik Börgesson
Erik Börgesson le 3 Juil 2019
Thank you so much. I understand and it works just like I wanted it to.
Erik

Catégories

En savoir plus sur Data Import and Export dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by