Import Mixed Data from Text File into Table
This example shows how to use the readtable
function to import mixed text and numeric data into a table, specify the data types for the variables, and then append a new variable to the table.
Sample File Overview
The sample file, outages.csv
, contains data representing electric utility outages in the US. The first few lines of the file are:
Region,OutageTime,Loss,Customers,RestorationTime,Cause
SouthWest,2002-01-20 11:49,672,2902379,2002-01-24 21:58,winter storm
SouthEast,2002-01-30 01:18,796,336436,2002-02-04 11:20,winter storm
SouthEast,2004-02-03 21:17,264.9,107083,2004-02-20 03:37,winter storm
West,2002-06-19 13:39,391.4,378990,2002-06-19 14:27,equipment fault
Read Text File
Import the data using readtable
and display the first five rows. The readtable
function automatically detects the delimiter and the variable types.
T = readtable('outages.csv');
head(T,5)
Region OutageTime Loss Customers RestorationTime Cause _____________ ________________ ______ __________ ________________ ___________________ {'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' } {'SouthEast'} 2003-01-23 00:49 530.14 2.1204e+05 NaT {'winter storm' } {'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' } {'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'} {'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' }
Specify Variable Data Types Before Import
Updating the variable data types to the appropriate MATLAB® data types can benefit your data, based on the type of variables in your file. For example, the first and sixth columns in outages.csv
are categorical. By designating these two columns as categorical
arrays you can leverage MATLAB functions for processing categorical data.
Designate and specify the data types of the variables in one of these ways:
Specify the
Format
name-value pair inreadtable
Set the
VariableTypes
property of the import options for the file
Use the Format
name-value pair to specify the variable data types, read the data, and display the first five rows. In the %{yyyy-MM-dd HH:mm}D
part of the formatSpec
specifier, the text between the curly braces describes the format of the date and time data. The values specified in Format
designate the:
First and last columns in the file as categorical data
Second and fifth columns as formatted date and time data
Third and fourth columns as floating-point values
formatSpec = '%C%{yyyy-MM-dd HH:mm}D%f%f%{yyyy-MM-dd HH:mm}D%C'; T = readtable('outages.csv','Format',formatSpec); head(T,5)
Region OutageTime Loss Customers RestorationTime Cause _________ ________________ ______ __________ ________________ _______________ SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm
Alternatively, specify the data types for the variables by using the setvartype
function of the import options. First, create an import options object for the file. The data file contains different types of variables. Designate the first and last variables as categorical
arrays, the second and fifth variables as datetime
arrays, and the remaining variables as double
.
opts = detectImportOptions('outages.csv'); varNames = opts.VariableNames ; varTypes = {'categorical','datetime','double',... 'double','datetime','categorical'}; opts = setvartype(opts,varNames,varTypes);
Import the data using readtable
with opts
, and then display the first five rows.
T = readtable('outages.csv',opts);
head(T,5)
Region OutageTime Loss Customers RestorationTime Cause _________ ________________ ______ __________ ________________ _______________ SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm
Append New Variable to Table
Table T
contains OutageTime
and RestorationTime
. Calculate the duration of each electrical outage and append this data to the table.
T.Duration = T.RestorationTime - T.OutageTime; head(T,5)
Region OutageTime Loss Customers RestorationTime Cause Duration _________ ________________ ______ __________ ________________ _______________ _________ SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm 148:32:00 SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm NaN SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm 226:59:00 West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault 00:26:00 MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm 65:05:00
See Also
readtimetable
| readtable
| detectImportOptions
| setvaropts
| setvartype
| preview
| head