String Match for Plotting in Excel

Question

0 votes

I have a simple CSV Data file. See below. I need to extract data for each individual state with its temperature. For example, I need a script that can find the string New York and get the associated temperature data for New York, and put it in an array for plotting. So the output would be:

New_York = [83,55], Indiana = [70,60]

State      Temperature (headerline)
New York      83 
Indiana       70
Texas         72
California    80
Indiana       60
Texas         61
California    92
New York      55

Thanks, Amanda

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Cedric le 30 Avr 2013

Modifié(e) : Cedric le 30 Avr 2013

Ouvrir dans MATLAB Online

0 votes

If states had no space in their names, or if you had commas as delimiters in the CSV file, you could go for a variant of

 [state, temp] = textread('myFile.csv', '%s %d', 'delimiter', ',', ...
                          'headerlines', 1) ;

Now as it seems that there are spaces in names and no comma as delimiter, you can read line by line and extract states and temperatures with more specific functions (TEXTSCAN, FSCANF, FGETL+SSCANF, etc) and based on position if needed (e.g. start reading temperatures from char. 15 on). Then you can use STRCMPI to find indices of relevant states, and get corresponding temperatures based on these indices. But you could also go for a solution based on regular expressions (less common approach for this kind of structured data), that I illustrate below:

 >> buffer = fileread('myFile.csv') ;
 >> state = 'New York' ;
 >> temp = str2double(regexpi(buffer, sprintf('(?<=%s\\s*)\\d*', state), ...
                              'match'))
 temp =
    83    55
 >> state = 'California' ;
 >> temp = str2double(regexpi(buffer, sprintf('(?<=%s\\s*)\\d*', state), ...
                              'match'))
 temp =
    80    92

Note that I wrote this in a concise manner, but we do the following in fact:

 >> pattern = sprintf('(?<=%s\\s*)\\d*', state) ;
 >> match   = regexpi(buffer, pattern, 'match') ;
 >> temp    = str2double(match) ;

If you look at the pattern for New Work:

 >> pattern
 pattern =
 (?<=New York\s*)\d*

It tells regexp to match

As many numeric characters as possible: \d*
Preceded by (positive look behind (?<=)) the literal New York followed by as many white spaces as possible: New York\s*

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Cedric le 30 Avr 2013

You're welcome! I updated my answer so you have a bit more explanations about REGEXP.

Amanda le 30 Avr 2013

Thanks for the more detail explanation. Some reason, data extraction from a file has always been a challenge.

Connectez-vous pour commenter.

String Match for Plotting in Excel

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

String Match for Plotting in Excel

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Plus de réponses (0)

Catégories

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien