Main Content

stack

Stack data from input table or timetable into one variable of output table or timetable

Description

S = stack(U,vars) stacks data from multiple variables of the input table or timetable into one variable of the output table or timetable. The input argument vars specifies the variables to stack. The function interleaves values from variables of the input into the output. It also adds an indicator variable to the output. In each row of the output, the value of the indicator variable indicates which variable of the input provided the data.

If the input has N rows and vars specifies M variables, then the output has M*N rows. In general, the output contains fewer variables, but more rows, than the input.

The indicator variable of the output has the categorical data type. You can use the indicator variable as a grouping variable when you call functions such as groupsummary or varfun on the stacked data.

If vars does not specify all variables in the input table or timetable, then the values of the remaining variables are not stacked with values from the variables specified by vars. Instead, stack replicates the unstacked variables into their own variables in the output.

  • If the input is a table with row names, then you cannot stack its row names in the output.

  • If the input is a timetable, then you cannot stack its row times in the output.

example

S = stack(U,{vars1,...,varsN}) stacks values from multiple lists of variables, specified by {vars1,...,varsN}, into N stacked variables in the output.

example

S = stack(___,Name=Value) specifies options using one or more name-value arguments in addition to the input arguments in previous syntaxes. For example, you can specify your own names for the new and stacked variables in the output.

example

[S,iu] = stack(___) also returns an index vector, iu, indicating the correspondence between rows in the output and rows in the input. You can use any of the input arguments in previous syntaxes.

example

Examples

collapse all

Create a table that contains temperature measurements from three separate weather stations. The table is in an unstacked format because each table variable corresponds to one of the stations.

ST1 = [93;57;87;89];
ST2 = [89;77;92;86];
ST3 = [95;62;89;91];

U = table(ST1,ST2,ST3)
U=4×3 table
    ST1    ST2    ST3
    ___    ___    ___

    93     89     95 
    57     77     62 
    87     92     89 
    89     86     91 

To stack the measurements into one variable in the output table, use the stack function. In each row, the indicator variable ST1_ST2_ST3_Indicator indicates the station that the corresponding measurement came from.

S = stack(U,1:3)
S=12×2 table
    ST1_ST2_ST3_Indicator    ST1_ST2_ST3
    _____________________    ___________

             ST1                 93     
             ST2                 89     
             ST3                 95     
             ST1                 57     
             ST2                 77     
             ST3                 62     
             ST1                 87     
             ST2                 92     
             ST3                 89     
             ST1                 89     
             ST2                 86     
             ST3                 91     

One use of the stacked format is that you can group rows using the indicator variable and perform calculations on the groups. For example, to calculate the mean temperature and the standard deviation of the temperature for each station, use the groupsummary function. With groupsummary you can calculate several statistics in one function call and return the results in one table.

weatherStats = groupsummary(S,"ST1_ST2_ST3_Indicator",["mean" "std"])
weatherStats=3×4 table
    ST1_ST2_ST3_Indicator    GroupCount    mean_ST1_ST2_ST3    std_ST1_ST2_ST3
    _____________________    __________    ________________    _______________

             ST1                 4               81.5              16.523     
             ST2                 4                 86              6.4807     
             ST3                 4              84.25              15.042     

Load a table from a sample file that lists the amount of snowfall in three towns from five different storms. The Storm variable has storm numbers in a categorical array because the table records a fixed set of storm numbers for this season.

load snowfallByTown.mat
U
U=5×5 table
       Date        Storm    Natick    Boston    Worcester
    ___________    _____    ______    ______    _________

    25-Dec-2024      1        20        18         26    
    02-Jan-2025      2         5         9         10    
    23-Jan-2025      3        13        21         16    
    07-Feb-2025      4         0         5          3    
    15-Feb-2025      5        17        12         15    

To stack data about the snowfall amounts, use the stack function. Stack values from the variables Natick, Boston, and Worcester into a single variable. Name the variable that contains the stacked data values Snowfall. Name the new indicator variable Town.

The output table contains three rows for each storm. The stack function treated the variables Date and Storm as constant variables because they were not specified as data or indicator variables. Instead of stacking values from Date and Storm, stack replicated them where needed in the rows of the output table. For example, the value of Storm is 1 in the first three rows of the output because those rows all have data about the same storm. For the same reason, the first three rows have the same date.

S = stack(U,["Natick" "Boston" "Worcester"], ...
          NewDataVariableName="Snowfall", ...
          IndexVariableName="Town")
S=15×4 table
       Date        Storm      Town       Snowfall
    ___________    _____    _________    ________

    25-Dec-2024      1      Natick          20   
    25-Dec-2024      1      Boston          18   
    25-Dec-2024      1      Worcester       26   
    02-Jan-2025      2      Natick           5   
    02-Jan-2025      2      Boston           9   
    02-Jan-2025      2      Worcester       10   
    23-Jan-2025      3      Natick          13   
    23-Jan-2025      3      Boston          21   
    23-Jan-2025      3      Worcester       16   
    07-Feb-2025      4      Natick           0   
    07-Feb-2025      4      Boston           5   
    07-Feb-2025      4      Worcester        3   
    15-Feb-2025      5      Natick          17   
    15-Feb-2025      5      Boston          12   
    15-Feb-2025      5      Worcester       15   

To include only Storm as a constant variable, specify the ConstantVariables name-value argument.

S = stack(U,["Natick" "Boston" "Worcester"], ...
          NewDataVariableName="Snowfall", ...
          IndexVariableName="Town", ...
          ConstantVariables="Storm")
S=15×3 table
    Storm      Town       Snowfall
    _____    _________    ________

      1      Natick          20   
      1      Boston          18   
      1      Worcester       26   
      2      Natick           5   
      2      Boston           9   
      2      Worcester       10   
      3      Natick          13   
      3      Boston          21   
      3      Worcester       16   
      4      Natick           0   
      4      Boston           5   
      4      Worcester        3   
      5      Natick          17   
      5      Boston          12   
      5      Worcester       15   

Load a table from a sample file that lists the number of power company customers who experienced electric power outages. The sample data are aggregated by month and by region of the United States. Convert the table to a timetable. The months become the row times of the timetable.

load customersByRegion.mat
customersByRegion = table2timetable(customersByRegion)
customersByRegion=6×3 timetable
        Month        NorthEast    MidWest    SouthEast
    _____________    _________    _______    _________

    October-2023        3492         565       1027   
    November-2023       2944        1293        834   
    December-2023       2559         936       1412   
    January-2024       12045       50117          0   
    February-2024       4931        1089        137   
    March-2024          3018         942        870   

Stack the variables NorthEast, MidWest, and SouthEast into a single variable called CustomersOutage. Name the new indicator variable Region. Also, return an index vector named indexCustomersOutage as the second output. It indicates the correspondence between rows in the unstacked input and the stacked output. (For the purpose of stacking, stack treats Month, which contains the row times, as though it were a constant variable. The function replicates the row times in the output timetable.)

[customersOutage,indexCustomersOutage] = stack(customersByRegion,1:3, ...
                                               NewDataVariableName="CustomersOutage", ...
                                               IndexVariableName="Region")
customersOutage=18×2 timetable
        Month         Region      CustomersOutage
    _____________    _________    _______________

    October-2023     NorthEast          3492     
    October-2023     MidWest             565     
    October-2023     SouthEast          1027     
    November-2023    NorthEast          2944     
    November-2023    MidWest            1293     
    November-2023    SouthEast           834     
    December-2023    NorthEast          2559     
    December-2023    MidWest             936     
    December-2023    SouthEast          1412     
    January-2024     NorthEast         12045     
    January-2024     MidWest           50117     
    January-2024     SouthEast             0     
    February-2024    NorthEast          4931     
    February-2024    MidWest            1089     
    February-2024    SouthEast           137     
    March-2024       NorthEast          3018     
      ⋮

indexCustomersOutage = 18×1

     1
     1
     1
     2
     2
     2
     3
     3
     3
     4
     4
     4
     5
     5
     5
      ⋮

You can use the index vector to get all rows of the stacked output that have data from specific rows of the unstacked input. For example, index into customersOutage to get all output rows with data that came from the first row of customersByRegion.

idx = (indexCustomersOutage == 1);
dataFromFirstRowCustomersByRegion = customersOutage(idx,:)
dataFromFirstRowCustomersByRegion=3×2 timetable
       Month         Region      CustomersOutage
    ____________    _________    _______________

    October-2023    NorthEast         3492      
    October-2023    MidWest            565      
    October-2023    SouthEast         1027      

You can stack values from the input table or timetable into multiple variables in the output table or timetable. To create multiple stacked variables in the output, use a cell array to specify multiple groups of variables from the input.

For example, create a table with four variables, where the variables contain two sets of temperature measurements.

D1 = [93;57;87;89];
D2 = [89;77;92;86];
T1 = [95;62;89;91];
T2 = [88;69;91;83];

U = table(D1,D2,T1,T2)
U=4×4 table
    D1    D2    T1    T2
    __    __    __    __

    93    89    95    88
    57    77    62    69
    87    92    89    91
    89    86    91    83

Suppose that D1 and D2 sampled one location with different instruments, and similarly that T1 and T2 sampled a second location. To keep measurements from the same locations together, use stack with a cell array that specifies two groups of input table variables. The stack function interleaves values from D1 and D2 into the output table variable D1_D2 and from T1 and T2 into T1_T2. The indicator variable shows this interleaving by indicating which input table variable in each group contributed values to the output table. When you specify multiple groups of variables, the groups must have the same number of variables.

S = stack(U,{["D1" "D2"],["T1" "T2"]})
S=8×3 table
    Indicator    D1_D2    T1_T2
    _________    _____    _____

        1         93       95  
        2         89       88  
        1         57       62  
        2         77       69  
        1         87       89  
        2         92       91  
        1         89       91  
        2         86       83  

To specify new variable names, use the NewDataVariableName name-value argument. The number of new names must equal the number of groups that you specify in the cell array.

S = stack(U,{["D1" "D2"],["T1" "T2"]}, ...
          NewDataVariableName=["Location 1" "Location 2"])
S=8×3 table
    Indicator    Location 1    Location 2
    _________    __________    __________

        1            93            95    
        2            89            88    
        1            57            62    
        2            77            69    
        1            87            89    
        2            92            91    
        1            89            91    
        2            86            83    

Input Arguments

collapse all

Input table, specified as a table or a timetable.

Variables of the input to stack, specified as a string array, character vector, cell array of character vectors, pattern scalar, positive integer, array of positive integers, or logical vector.

Example: S = stack(U,["Var1" "Var3" "Var5"]) stacks the variables of U that are named Var1, Var3, and Var5.

Example: S = stack(U,1:4) stacks the first four variables of U into one variable in S.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: NewDataVariableName="StackedData" names the new data variable StackedData.

Variables other than vars to replicate in the output, specified as a string array, character vector, cell array of character vectors, pattern scalar, positive integer, array of positive integers, or logical vector.

By default, the stack function stacks the variables specified by vars and then replicates values from the remaining variables to the appropriate rows of the output. If you specify ConstantVariables, then stack stacks the variables specified by vars, replicates values from the variables specified by ConstantVariables, and excludes the remaining variables.

The input can have row labels along its first dimension. If the input is a table, then it can have row names as the labels. If the input is a timetable, then it must have row times as the labels.

  • You can include the row names or row times when you specify the value of ConstantVariables.

  • stack replicates the row names or row times even when you do not include them in ConstantVariables.

Name for the new data variable in the output, specified as a string array, character vector, or cell array of character vectors. The default is a concatenation of the names of the variables from U that are stacked up.

Name for the new indicator variable in the output, specified as a string scalar or character vector. The default is a name based on NewDataVariableName.

Output Arguments

collapse all

Stacked table, returned as a table or a timetable. The output contains a stacked data variable, a categorical indicator variable, and any constant variables.

You can store additional metadata such as descriptions, variable units, variable names, and row names in the output. For more information, see the Properties sections of table or timetable.

stack assigns the variable units and variable description property values from the first variable listed in vars to the corresponding S.Properties.VariableUnits and S.Properties.VariableDescriptions values for the new data variable.

Row index to the input, returned as a column vector. The index vector iu identifies the row in the input that contains the corresponding data copied to the output. For example, if the value of iu(5) is 2, then the data in the fifth row of the output came from the second row of the input.

Extended Capabilities

expand all

Version History

Introduced in R2013b