Main Content

Analyze Data Using MDF Datastore and Tall Arrays

This example shows how to work with a big data set using tall arrays and the MDF datastore feature. Tall arrays are commonly used to perform calculations on different types of data that do not fit in memory.

This example first operates on a small subset of data and then scales up to analyze the entire data set. Although the data set used here might not represent the actual size in real-world applications, the same analysis technique can scale up further to work on data sets so large that they cannot be read into memory.

To learn more about tall arrays, see the example Analyze Big Data in MATLAB Using Tall Arrays.

Introduction to Tall Arrays

Tall arrays and tall tables are used to work with out-of-memory data that has any number of rows. Using tall arrays and tables, you can work with large data sets in a manner similar to in-memory MATLAB arrays.

The difference is that tall arrays typically remain unevaluated until the calculations are requested to be performed. This deferred evaluation enables MATLAB to combine the queued calculations where possible and take the minimum number of passes through the data.

Create an MDF Datastore

An MDF datastore can be used to read and process homogeneous data stored in multiple MDF files as a single entity. If the data set is too large to fit in memory, a datastore also makes it possible to work with the data set in smaller blocks that individually fit in memory. This capability can be further extended by tall arrays which enable working with out-of-memory data backed up by a datastore using common functions.

Create an MDF datastore using the mdfDatastore function by selecting MDF file EngineData_MDF_TallArray.mf4 in the current workflow directory. This file contains time-stamped data logged from a Simulink model representing an engine plant and controller connected to a dynamometer.

mds = mdfDatastore("EngineData_MDF_TallArray.mf4")
mds = 
  MDFDatastore with properties:

  Datastore Details
                         Files: {
                                ' ...\michellw.Bdoc24a_MDF\vnt-ex08773747\EngineData_MDF_TallArray.mf4'
                                }
                 ChannelGroups: 
                                  GroupNumber    AcquisitionName     Comment          ... and 10 more columns    
                                  ___________    _______________    __________                

                                       1         {[<undefined>]}    {[Python]}                


                      Channels: 
                                       Name          GroupNumber    DisplayName        ... and 17 more columns    
                                  _______________    ___________    ___________                

                                  "EngineSpeed"           1             ""                     
                                  "EngineTorque"          1             ""                     
                                  "TorqueCommand"         1             ""                     

                                ... and 1 more rows

  Options
          SelectedChannelNames: {
                                'EngineSpeed';
                                'EngineTorque';
                                'TorqueCommand'
                                 ... and 1 more
                                }
    SelectedChannelGroupNumber: 1
                      ReadSize: "file"
                       ReadRaw: 0

It is possible to further configure the MDF datastore to control what and how data is read from the MDF file. By default, the first channel group is selected and all channels from the group are read.

mds.SelectedChannelGroupNumber
ans = 1
mds.SelectedChannelNames
ans = 4×1 string
    "EngineSpeed"
    "EngineTorque"
    "TorqueCommand"
    "t"

Configure the MDF datastore to select only three variables of interest: EngineSpeed, TorqueCommand, and EngineTorque.

mds.SelectedChannelNames = ["EngineSpeed", "TorqueCommand", "EngineTorque"]
mds = 
  MDFDatastore with properties:

  Datastore Details
                         Files: {
                                ' ...\michellw.Bdoc24a_MDF\vnt-ex08773747\EngineData_MDF_TallArray.mf4'
                                }
                 ChannelGroups: 
                                  GroupNumber    AcquisitionName     Comment          ... and 10 more columns    
                                  ___________    _______________    __________                

                                       1         {[<undefined>]}    {[Python]}                


                      Channels: 
                                       Name          GroupNumber    DisplayName        ... and 17 more columns    
                                  _______________    ___________    ___________                

                                  "EngineSpeed"           1             ""                     
                                  "EngineTorque"          1             ""                     
                                  "TorqueCommand"         1             ""                     

                                ... and 1 more rows

  Options
          SelectedChannelNames: {
                                'EngineSpeed';
                                'TorqueCommand';
                                'EngineTorque'
                                }
    SelectedChannelGroupNumber: 1
                      ReadSize: "file"
                       ReadRaw: 0

Preview the selected data using the preview function.

preview(mds)
ans=8×3 timetable
          t           EngineSpeed    TorqueCommand    EngineTorque
    ______________    ___________    _____________    ____________

    0 sec                     0              0           47.153   
    0 sec              2.37e-26              0           47.153   
    1.47e-05 sec        0.11056         47.158           47.158   
    8.85e-05 sec        0.66312         48.708           48.708   
    0.00010107 sec      0.75762          49.77            49.77   
    0.00010107 sec      0.75762          49.77            49.77   
    0.0001405 sec         1.053         39.967           39.967   
    0.00017993 sec       1.3482         23.143           23.143   

Create Tall Array

Tall arrays are similar to in-memory MATLAB arrays, except that they can have any number of rows. Because the MDF datastore mds contains time-stamped tabular data, the tall function returns a tall timetable containing data from the datastore.

tt = tall(mds)
tt =

  M×3 tall timetable

          t           EngineSpeed    TorqueCommand    EngineTorque
    ______________    ___________    _____________    ____________

    0 sec                     0              0           47.153   
    0 sec              2.37e-26              0           47.153   
    1.47e-05 sec        0.11056         47.158           47.158   
    8.85e-05 sec        0.66312         48.708           48.708   
    0.00010107 sec      0.75762          49.77            49.77   
    0.00010107 sec      0.75762          49.77            49.77   
    0.0001405 sec         1.053         39.967           39.967   
    0.00017993 sec       1.3482         23.143           23.143   
          :                :               :               :
          :                :               :               :

The display includes the first several rows of data. The timetable size may display as M×3 to indicate that the number of rows is not yet known to MATLAB.

Perform Calculations on Tall Array

You can work with tall arrays and tall tables similar to in-memory MATLAB arrays and tables. However, MATLAB does not perform most operations on tall arrays, and defers the actual computations until the output is requested.

It is common to work with unevaluated tall arrays and request output only when required. MATLAB does not know the content or size of an unevaluated tall array until you request that it be evaluated and displayed.

Calculate median, minimum, and maximum values of the TorqueCommand variable. Note that the results are not immediately evaluated.

medianTorqueCommand = median(tt.TorqueCommand)
medianTorqueCommand =

  tall double

    ?

Preview deferred. Learn more.
minTorqueCommand = min(tt.TorqueCommand)
minTorqueCommand =

  tall double

    ?

Preview deferred. Learn more.
maxTorqueCommand = max(tt.TorqueCommand)
maxTorqueCommand =

  tall double

    ?

Preview deferred. Learn more.

Gather Results into Workspace

The gather function forces evaluation of all queued operations and brings the resulting output back into memory.

Perform the queued operations, median, min, max, and evaluate the answers. If the calculation requires several passes through the data, MATLAB determines the minimum number of passes to save execution time and displays this information at the command line.

[medianTorqueCommand, minTorqueCommand, maxTorqueCommand] = gather(medianTorqueCommand, minTorqueCommand, maxTorqueCommand)
Evaluating tall expression using the Parallel Pool 'Processes':
- Pass 1 of 4: Completed in 0.74 sec
- Pass 2 of 4: Completed in 0.37 sec
- Pass 3 of 4: Completed in 0.61 sec
- Pass 4 of 4: Completed in 0.47 sec
Evaluation completed in 3.2 sec
medianTorqueCommand = 116.2799
minTorqueCommand = 0
maxTorqueCommand = 232.9807

Select Subset of Tall Array

Use head to select a subset of 10,000 rows from the data for prototyping code before scaling to the full data set.

ttSubset = head(tt, 10000)
ttSubset =

  10,000×3 tall timetable

          t           EngineSpeed    TorqueCommand    EngineTorque
    ______________    ___________    _____________    ____________

    0 sec                     0              0           47.153   
    0 sec              2.37e-26              0           47.153   
    1.47e-05 sec        0.11056         47.158           47.158   
    8.85e-05 sec        0.66312         48.708           48.708   
    0.00010107 sec      0.75762          49.77            49.77   
    0.00010107 sec      0.75762          49.77            49.77   
    0.0001405 sec         1.053         39.967           39.967   
    0.00017993 sec       1.3482         23.143           23.143   
          :                :               :               :
          :                :               :               :

Remove Duplicate Rows in Tall Array

Timetable rows are duplicates if they have the same row times and the same data values. Use the unique function to remove duplicate rows from the subset tall timetable.

ttSubset = unique(ttSubset)
ttSubset =

  9,968×3 tall timetable

          t           EngineSpeed    TorqueCommand    EngineTorque
    ______________    ___________    _____________    ____________

    0 sec                     0              0            47.153  
    0 sec              2.37e-26              0            47.153  
    1.47e-05 sec        0.11056         47.158            47.158  
    8.85e-05 sec        0.66312         48.708            48.708  
    0.00010107 sec      0.75762          49.77             49.77  
    0.0001405 sec         1.053         39.967            39.967  
    0.00017993 sec       1.3482         23.143            23.143  
    0.00037708 sec       2.8228         23.143         -0.021071  
          :                :               :               :
          :                :               :               :

Calculate Engine Power

Calculate engine power in kilowatts (kW) with EngineSpeed and EngineTorque using the formula P[kW]=πN[rpm]T[Nm]301000. Save the results to a new variable named EnginePower in the tall timetable.

ttSubset.EnginePower = (pi * ttSubset.EngineSpeed .* ttSubset.EngineTorque) / (30 * 1000)
ttSubset =

  9,968×4 tall timetable

          t           EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    ______________    ___________    _____________    ____________    ___________

    0 sec                     0              0            47.153                0
    0 sec              2.37e-26              0            47.153       1.1703e-28
    1.47e-05 sec        0.11056         47.158            47.158       0.00054599
    8.85e-05 sec        0.66312         48.708            48.708        0.0033824
    0.00010107 sec      0.75762          49.77             49.77        0.0039487
    0.0001405 sec         1.053         39.967            39.967        0.0044072
    0.00017993 sec       1.3482         23.143            23.143        0.0032675
    0.00037708 sec       2.8228         23.143         -0.021071      -6.2287e-06
          :                :               :               :               :
          :                :               :               :               :

The topkrows function for tall arrays returns the top k rows in sorted order. Obtain the top 20 rows with maximum EnginePower values.

maxEnginePower = topkrows(ttSubset, 20, "EnginePower")
maxEnginePower =

  20×4 tall timetable

        t        EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    _________    ___________    _____________    ____________    ___________

    15.17 sec        750           78.052           78.052         6.1302   
    15.16 sec        750           77.841           77.841         6.1136   
    15.15 sec        750           77.556           77.556         6.0912   
    15.14 sec        750           77.326           77.326         6.0732   
    15.18 sec        750           77.277           77.277         6.0693   
    15.13 sec        750           77.157           77.157         6.0599   
    15.12 sec        750           77.082           77.082          6.054   
    15.11 sec        750           77.067           77.075         6.0534   
        :             :               :               :               :
        :             :               :               :               :

Call the gather function to execute all queued operations and collect the results into memory.

[ttSubset, maxEnginePower] = gather(ttSubset, maxEnginePower)
ttSubset=9968×4 timetable
          t           EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    ______________    ___________    _____________    ____________    ___________

    0 sec                     0              0            47.153                0
    0 sec              2.37e-26              0            47.153       1.1703e-28
    1.47e-05 sec        0.11056         47.158            47.158       0.00054599
    8.85e-05 sec        0.66312         48.708            48.708        0.0033824
    0.00010107 sec      0.75762          49.77             49.77        0.0039487
    0.0001405 sec         1.053         39.967            39.967        0.0044072
    0.00017993 sec       1.3482         23.143            23.143        0.0032675
    0.00037708 sec       2.8228         23.143         -0.021071      -6.2287e-06
    0.00076951 sec       5.7492             15         -0.042938      -2.5851e-05
    0.0014014 sec        10.437             15         -0.078013      -8.5265e-05
    0.0023449 sec        17.382             15          -0.13009      -0.00023679
    0.0036773 sec        27.079             15          -0.20304      -0.00057575
    0.0054808 sec            40             15          -0.30067       -0.0012595
    0.0072843 sec        52.691             15          -0.39703       -0.0021907
    0.01 sec             71.373             15          -0.53973       -0.0040341
    0.013562 sec         95.119             15            51.176          0.50976
      ⋮

maxEnginePower=20×4 timetable
        t        EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    _________    ___________    _____________    ____________    ___________

    15.17 sec        750           78.052           78.052         6.1302   
    15.16 sec        750           77.841           77.841         6.1136   
    15.15 sec        750           77.556           77.556         6.0912   
    15.14 sec        750           77.326           77.326         6.0732   
    15.18 sec        750           77.277           77.277         6.0693   
    15.13 sec        750           77.157           77.157         6.0599   
    15.12 sec        750           77.082           77.082          6.054   
    15.11 sec        750           77.067           77.075         6.0534   
    15.1 sec         750           77.067           77.067         6.0528   
    15.09 sec        750           77.059           77.059         6.0522   
    15.08 sec        750           77.051           77.051         6.0516   
    15.07 sec        750           77.042           77.042         6.0509   
    15.06 sec        750           77.034           77.034         6.0502   
    15.05 sec        750           77.025           77.025         6.0495   
    15.04 sec        750           77.016           77.016         6.0488   
    15.03 sec        750           77.006           77.006         6.0481   
      ⋮

Visualize Data in Tall Array

Visualize the EngineTorque and EnginePower signals over time in a plot with two y-axes.

figure
yyaxis left
plot(ttSubset.t, ttSubset.EngineTorque)
title("Engine Torque and Engine Power Over Time")
xlabel("Time")
ylabel("Engine Torque [Nm]")

yyaxis right
plot(ttSubset.t, ttSubset.EnginePower)
ylabel("Engine Power [kW]")

Figure contains an axes object. The axes object with title Engine Torque and Engine Power Over Time, xlabel Time, ylabel Engine Power [kW] contains 2 objects of type line.

Scale to Entire Data Set

Instead of using the smaller data returned from head, scale up to apply the same steps on the entire data set by using the complete tall timetable.

tt = tall(mds)
tt =

  M×3 tall timetable

          t           EngineSpeed    TorqueCommand    EngineTorque
    ______________    ___________    _____________    ____________

    0 sec                     0              0           47.153   
    0 sec              2.37e-26              0           47.153   
    1.47e-05 sec        0.11056         47.158           47.158   
    8.85e-05 sec        0.66312         48.708           48.708   
    0.00010107 sec      0.75762          49.77            49.77   
    0.00010107 sec      0.75762          49.77            49.77   
    0.0001405 sec         1.053         39.967           39.967   
    0.00017993 sec       1.3482         23.143           23.143   
          :                :               :               :
          :                :               :               :

Firstly, remove duplicate rows from the tall timetable.

tt = unique(tt)
tt =

  M×3 tall timetable

    t    EngineSpeed    TorqueCommand    EngineTorque
    _    ___________    _____________    ____________

    ?         ?               ?               ?      
    ?         ?               ?               ?      
    ?         ?               ?               ?      
    :         :               :               :
    :         :               :               :

Preview deferred. Learn more.

Secondly, calculate engine power and obtain the top 20 rows with maximum EnginePower values.

tt.EnginePower = (pi * tt.EngineSpeed .* tt.EngineTorque) / (30 * 1000)
tt =

  M×4 tall timetable

    t    EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    _    ___________    _____________    ____________    ___________

    ?         ?               ?               ?               ?     
    ?         ?               ?               ?               ?     
    ?         ?               ?               ?               ?     
    :         :               :               :               :
    :         :               :               :               :

Preview deferred. Learn more.
maxEnginePower = topkrows(tt, 20, "EnginePower")
maxEnginePower =

  M×4 tall timetable

    t    EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    _    ___________    _____________    ____________    ___________

    ?         ?               ?               ?               ?     
    ?         ?               ?               ?               ?     
    ?         ?               ?               ?               ?     
    :         :               :               :               :
    :         :               :               :               :

Preview deferred. Learn more.
[tt, maxEnginePower] = gather(tt, maxEnginePower)
Evaluating tall expression using the Parallel Pool 'Processes':
- Pass 1 of 1: Completed in 0.95 sec
Evaluation completed in 1.4 sec
tt=359326×4 timetable
          t           EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    ______________    ___________    _____________    ____________    ___________

    0 sec                     0              0            47.153                0
    0 sec              2.37e-26              0            47.153       1.1703e-28
    1.47e-05 sec        0.11056         47.158            47.158       0.00054599
    8.85e-05 sec        0.66312         48.708            48.708        0.0033824
    0.00010107 sec      0.75762          49.77             49.77        0.0039487
    0.0001405 sec         1.053         39.967            39.967        0.0044072
    0.00017993 sec       1.3482         23.143            23.143        0.0032675
    0.00037708 sec       2.8228         23.143         -0.021071      -6.2287e-06
    0.00076951 sec       5.7492             15         -0.042938      -2.5851e-05
    0.0014014 sec        10.437             15         -0.078013      -8.5265e-05
    0.0023449 sec        17.382             15          -0.13009      -0.00023679
    0.0036773 sec        27.079             15          -0.20304      -0.00057575
    0.0054808 sec            40             15          -0.30067       -0.0012595
    0.0072843 sec        52.691             15          -0.39703       -0.0021907
    0.01 sec             71.373             15          -0.53973       -0.0040341
    0.013562 sec         95.119             15            51.176          0.50976
      ⋮

maxEnginePower=20×4 timetable
        t         EngineSpeed    TorqueCommand    EngineTorque    EnginePower
    __________    ___________    _____________    ____________    ___________

    3819.8 sec       5000           217.53           217.53          113.9   
    3819.8 sec       5000           217.53           217.53          113.9   
    3819.8 sec       5000           217.53           217.53          113.9   
    3819.8 sec       5000           217.53           217.53          113.9   
    3819.8 sec       5000           217.53           217.53          113.9   
    3819.9 sec       5000           217.53           217.53          113.9   
    3819.9 sec       5000           217.53           217.53          113.9   
    3819.9 sec       5000           217.53           217.53          113.9   
    3819.9 sec       5000           217.52           217.52         113.89   
    3819.9 sec       5000           217.52           217.52         113.89   
    3820 sec         5000           217.52           217.52         113.89   
    3820.1 sec       5000           217.52           217.52         113.89   
    3820.2 sec       5000           217.52           217.52         113.89   
    3820.3 sec       5000           217.52           217.52         113.89   
    3820.4 sec       5000           217.52           217.52         113.89   
    3820.5 sec       5000           217.52           217.52         113.89   
      ⋮

Lastly, visualize the EngineTorque and EnginePower signals over time in a plot with two y-axes.

figure
yyaxis left
plot(tt.t, tt.EngineTorque)
title("Engine Torque and Engine Power Over Time")
xlabel("Time")
ylabel("Engine Torque [Nm]")

yyaxis right
plot(tt.t, tt.EnginePower)
ylabel("Engine Power [kW]")

Figure contains an axes object. The axes object with title Engine Torque and Engine Power Over Time, xlabel Time, ylabel Engine Power [kW] contains 2 objects of type line.

Close MDF File

Close access to the MDF file by clearing the MDF datastore variable from workspace.

clear mds