Performance comparison among Struct Array, Cell Array and Table

229 vues (au cours des 30 derniers jours)
Kat Lee
Kat Lee le 1 Nov 2018
Commenté : Walter Roberson le 18 Mar 2022
I am facing an issue when to use what. There are three common way to store data in MATLAB: 1. Cell array; 2. Tables; 3. Struct arrays.
I did some search online for the performance among three of them: Struct will be the fastest, but still not really clear when to use what.
Can someone give me a general concept of the performance among these MATLAB data structures?
Thanks :)
  3 commentaires
Stephen23
Stephen23 le 6 Nov 2018
"There are three common way to store data in MATLAB: 1. Cell array; 2. Tables; 3. Struct arrays."
You only list container classes. What about the simpler ways of storing data: the numeric array (single, double, uint*, and int*), the character array, and the logical array? These are faster to access than the ones that you list. Is there a reason why you do not list them?
Michael O'Brien
Michael O'Brien le 18 Mar 2022
i found this super helpful due to the discussion it generated - thanks Kat Lee!

Connectez-vous pour commenter.

Réponses (3)

Bruno Luong
Bruno Luong le 2 Nov 2018
My general rule of thumbs:
  • Simple Array is the fastest
  • Using cell if you don't have a choice (mixing class or uniform sizes) and don't care about how to "name" elements.
  • Next recommendation is using struct of arrays and/or cell-arrays, that allows to have meaningful fieldnames, and flexible data exchanges.
  • Avoid at all cost array of structs for large number of records (said > 10), this will soon or later have big penalty of speed. I can't remember the last tile I use it, probably in my youth and never did it again.
  • Table is sort of Object Oriented built on top of CELL, personally I never feel a need to use it. I recognize it's very attractive for people who like excel sheet. ;-)
  5 commentaires
Jaromir
Jaromir le 28 Nov 2019
Peter
Tables are built on top cell arrays. Your example is misleading since you're comparing two very different things. Your cell array c is literally a 1000000-by-10 array. Your table t is built on top of a 1-by-1 cell array, where the entire numeric array x is placed in one cell. This is how tables work - each "variable" in the table language is placed in its own cell. The table t is hence sort of equivalent to a cell array { x }.
Walter Roberson
Walter Roberson le 28 Nov 2019
Notice Peter's phrase, "at least not in the way that you probably mean."
In particular, many people tend to think that a table with N rows and V variables is stored as an N by V cell array, but instead it is stored as a struct that contains a 1 x V cell array each entry of which is an object with N rows.

Connectez-vous pour commenter.


Matt J
Matt J le 1 Nov 2018
Modifié(e) : Matt J le 1 Nov 2018
They should all be about the same speed. If speed matters and the data is large, however, you shouldn't be using any of these. You should be storing data in numeric arrays instead. That way the data will be held contiguously in RAM and accessing it will be very fast.
  5 commentaires
Stephen23
Stephen23 le 6 Nov 2018
Modifié(e) : Stephen23 le 6 Nov 2018
"store in numeric array won't be applicable for me since I also need the fieldname to associate with number."
That is a very poor reason not to use numeric arrays, especially if you then ask about efficiency accessing data!
Simply keep an array of text data (e.g. cell array of char vectors, string array) and a corresponding array of numeric data (any numeric class). This will make your data processing much simpler and more efficient than messing about with numeric data pointlessly split up into a cell array.
A table might be a good solution (it effectively does the same thing).
Bruno Luong
Bruno Luong le 6 Nov 2018
Especially when one can put the numerical array inside a struct with a meaningful fieldname.

Connectez-vous pour commenter.


Peter Perkins
Peter Perkins le 6 Nov 2018
Kat, there's no way you are gonna get a useful answer without providing more information. The best representation of your data is gonna depend on your data and what you are doing, and how you plan on writing your code. Without knowing that, any answer is just guessing.
  5 commentaires
Matt J
Matt J le 6 Nov 2018
Modifié(e) : Matt J le 6 Nov 2018
I really doubt Table's performance, since what I see before Table's performance is not very good
That's not a good reason in and of itself to doubt the performance of tables. The person who was demonstrating their performance to you may have been an inexperienced programmer who didn't use properly vectorized methods to get the best performance.
Walter Roberson
Walter Roberson le 18 Mar 2022
Also, Mathworks has improved table() performance over the years.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Structures dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by