Main Content

Import Mixed Data from Text File into Table

This example shows how to use the readtable function to import mixed text and numeric data into a table, specify the data types for the variables, and then append a new variable to the table.

Sample File Overview

The sample file, outages.csv, contains data representing electric utility outages in the US. The first few lines of the file are:

Region,OutageTime,Loss,Customers,RestorationTime,Cause

SouthWest,2002-01-20 11:49,672,2902379,2002-01-24 21:58,winter storm

SouthEast,2002-01-30 01:18,796,336436,2002-02-04 11:20,winter storm

SouthEast,2004-02-03 21:17,264.9,107083,2004-02-20 03:37,winter storm

West,2002-06-19 13:39,391.4,378990,2002-06-19 14:27,equipment fault

Read Text File

Import the data using readtable and display the first five rows. The readtable function automatically detects the delimiter and the variable types.

T = readtable('outages.csv');
head(T,5)
       Region           OutageTime        Loss     Customers     RestorationTime            Cause       
    _____________    ________________    ______    __________    ________________    ___________________

    {'SouthWest'}    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    {'winter storm'   }
    {'SouthEast'}    2003-01-23 00:49    530.14    2.1204e+05                 NaT    {'winter storm'   }
    {'SouthEast'}    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    {'winter storm'   }
    {'West'     }    2004-04-06 05:44    434.81    3.4037e+05    2004-04-06 06:10    {'equipment fault'}
    {'MidWest'  }    2002-03-16 06:18    186.44    2.1275e+05    2002-03-18 23:23    {'severe storm'   }

Specify Variable Data Types Before Import

Updating the variable data types to the appropriate MATLAB® data types can benefit your data, based on the type of variables in your file. For example, the first and sixth columns in outages.csv are categorical. By designating these two columns as categorical arrays you can leverage MATLAB functions for processing categorical data.

Designate and specify the data types of the variables in one of these ways:

  • Specify the Format name-value pair in readtable

  • Set the VariableTypes property of the import options for the file

Use the Format name-value pair to specify the variable data types, read the data, and display the first five rows. In the %{yyyy-MM-dd HH:mm}D part of the formatSpec specifier, the text between the curly braces describes the format of the date and time data. The values specified in Format designate the:

  • First and last columns in the file as categorical data

  • Second and fifth columns as formatted date and time data

  • Third and fourth columns as floating-point values

formatSpec = '%C%{yyyy-MM-dd HH:mm}D%f%f%{yyyy-MM-dd HH:mm}D%C';
T = readtable('outages.csv','Format',formatSpec);
head(T,5) 
     Region         OutageTime        Loss     Customers     RestorationTime          Cause     
    _________    ________________    ______    __________    ________________    _______________

    SouthWest    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    winter storm   
    SouthEast    2003-01-23 00:49    530.14    2.1204e+05                 NaT    winter storm   
    SouthEast    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    winter storm   
    West         2004-04-06 05:44    434.81    3.4037e+05    2004-04-06 06:10    equipment fault
    MidWest      2002-03-16 06:18    186.44    2.1275e+05    2002-03-18 23:23    severe storm   

Alternatively, specify the data types for the variables by using the setvartype function of the import options. First, create an import options object for the file. The data file contains different types of variables. Designate the first and last variables as categorical arrays, the second and fifth variables as datetime arrays, and the remaining variables as double.

opts = detectImportOptions('outages.csv');
varNames = opts.VariableNames ;
varTypes = {'categorical','datetime','double',...   
                'double','datetime','categorical'}; 
opts = setvartype(opts,varNames,varTypes);                      

Import the data using readtable with opts, and then display the first five rows.

T = readtable('outages.csv',opts);
head(T,5) 
     Region         OutageTime        Loss     Customers     RestorationTime          Cause     
    _________    ________________    ______    __________    ________________    _______________

    SouthWest    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    winter storm   
    SouthEast    2003-01-23 00:49    530.14    2.1204e+05                 NaT    winter storm   
    SouthEast    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    winter storm   
    West         2004-04-06 05:44    434.81    3.4037e+05    2004-04-06 06:10    equipment fault
    MidWest      2002-03-16 06:18    186.44    2.1275e+05    2002-03-18 23:23    severe storm   

Append New Variable to Table

Table T contains OutageTime and RestorationTime. Calculate the duration of each electrical outage and append this data to the table.

T.Duration = T.RestorationTime - T.OutageTime;
head(T,5) 
     Region         OutageTime        Loss     Customers     RestorationTime          Cause         Duration 
    _________    ________________    ______    __________    ________________    _______________    _________

    SouthWest    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    winter storm       148:32:00
    SouthEast    2003-01-23 00:49    530.14    2.1204e+05                 NaT    winter storm             NaN
    SouthEast    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    winter storm       226:59:00
    West         2004-04-06 05:44    434.81    3.4037e+05    2004-04-06 06:10    equipment fault     00:26:00
    MidWest      2002-03-16 06:18    186.44    2.1275e+05    2002-03-18 23:23    severe storm        65:05:00

See Also

| | | | | |

Related Topics