Best way to organize categorical data for plotting

27 vues (au cours des 30 derniers jours)
Malin Abrahamsen
Malin Abrahamsen le 27 Jan 2021
I have wind speed data that I want to split and plot in several different ways. For instance, I may want to plot a 3d bar plot with different combinations of wind speed, month, frequency count, altitude. Or a subsection of the data as stacked bar plot or area plot. I've been looking at different ways of organizing the data to make plotting different variables and subsections easy, but I'm not used to working with categorical data in matlab, and I keep running into limitations with table/timetable. I'm coming from Python (and older versions of Matlab) and I find Matlab 2019 to be close enough to be just a bit frustrating.
I'm looking for advice on what might be the best way to organize data like this to make plotting quick and flexible. Clearly there's some logic to this aspect of Matlab that I've missed. I've previously simply split the data into various matrices and manipulated them for plotting, which is plain and simple, but I'm sure there are better and more advanced ways to do this if one knows how.
I've attached the file GC.mat where the data is organized by month, wind speed (5 m/s bins), altitude (2 km bins) and count. I've also added part of the original timetable w_tt.csv with 2 s wind speed data (the original data covers 6 years).
I can provide examples of my plotting attempts and try to figure out why it's not working, but I have a feeling the main problem is that I've just not understood the best way to organize the data, so I'm starting with an attempt to learn why this is/isn't a good approach.
Example table:
166×6 table
monthname_Time disc_Altitude disc_Windspeed GroupCount norm perc
______________ _____________ ______________ __________ __________ __________
January [20000, Inf] [0, 2) 503 4.3455e-05 0.0043455
January [20000, Inf] [2, 4) 1355 0.00011706 0.011706
January [20000, Inf] [4, 6) 2452 0.00021183 0.021183
January [20000, Inf] [6, 8) 2931 0.00025321 0.025321
January [20000, Inf] [8, 10) 3516 0.00030375 0.030375
January [20000, Inf] [10, 12) 3640 0.00031447 0.031447
January [20000, Inf] [12, 14) 3392 0.00029304 0.029304
January [20000, Inf] [14, 16) 2398 0.00020717 0.020717
January [20000, Inf] [16, 18) 2134 0.00018436 0.018436
January [20000, Inf] [18, 20) 2815 0.00024319 0.024319
January [20000, Inf] [20, 22) 3811 0.00032924 0.032924
January [20000, Inf] [22, 24) 4504 0.00038911 0.038911
Example timetable data:
Time Windspeed Altitude
___________________ _________ ________
01/01/2015 00:17:01 27.3 18317
01/01/2015 00:17:03 27.3 18325
01/01/2015 00:17:05 27.2 18334
01/01/2015 00:17:07 27.1 18343
01/01/2015 00:17:09 27 18352
01/01/2015 00:17:11 26.9 18361
  1 commentaire
dpb
dpb le 27 Jan 2021
I think where you got stuck with examples would be more helpful...I'd probably keep the dates as datetime instead of converting them to categorical and try to shorten the categorical categories simply to make less typing; you can use the valueset, catnames optional inputs for display purposes.
Also, it could be more advantageous to also leave the altitude and windspeed data as numeric and use discretize and friends to do the binning on the fly instead as well.
rowfun is extremely powerful in conjunction with grouping variables for such things...

Connectez-vous pour commenter.

Réponses (1)

Avni Agrawal
Avni Agrawal le 16 Mai 2024
I understand that you are trying to organize and visualize your wind speed data in MATLAB.
Data Organization with Tables and Timetables:
  • Tables are versatile for mixed data types, ideal for categorizing wind speeds, altitudes, and counts. Timetables are perfect for time-series data, offering easy indexing and aggregation based on time.
Categorization and Grouping:
  • Convert relevant variables to categorical for meaningful grouping (e.g., month, wind speed bins). Use groupsummary for quick calculations within these groups.
Reshaping for Visualization:
  • Reshape your data to fit the needs of different plots. Functions like unstack can pivot data for easier plotting, organizing rows and columns by categories such as month or wind speed bin.
Visualization Techniques:
  • 3D Bar Plots (bar3): Great for showing relationships between three variables (e.g., month, wind speed bin, and count).
  • Stacked Bar and Area Plots (bar, area): Useful for comparing parts of a whole over categories, with data organized in 2D matrices.
Here's a simple example of how you might start to organize and plot your data:
% Load your data
load('GC.mat'); % Assuming this loads a table named 'GC'
% Convert to categorical if not already
GC.monthname_Time = categorical(GC.monthname_Time);
GC.disc_Altitude = categorical(GC.disc_Altitude);
GC.disc_Windspeed = categorical(GC.disc_Windspeed);
% Example: Plotting frequency count by month and wind speed bin
% This requires reshaping the data for plotting
pivotTable = unstack(GC, 'GroupCount', 'disc_Windspeed', 'GroupingVariables', 'monthname_Time');
% Simple 3D bar plot example
figure;
bar3(pivotTable{:, :});
xlabel('Wind Speed Bin');
ylabel('Month');
zlabel('Frequency Count');
Please take a look at below documentations for better understanding:
I hope this helps!

Catégories

En savoir plus sur Data Preprocessing dans Help Center et File Exchange

Produits


Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by