How can I find numeric values (stored as strings) in table of strings and convert them to double?

29 views (last 30 days)
I have some experience with Matlab and I've found a lot of solutions online so far, but I can't find anything that would solve my current problem. As there are some threads that are similar but not quite IT, I will try to be as specific as possible.
  • We have a really huge tool that processes our entire data. It reads a lot of input data (via readtable from Excel-Files), combines all the data, compares, calculates, appends etc. etc. In the end, there is a single table (a literal Matlab table) and we write it to an Excel-File with writetable.
  • We experienced over the course of building this tool that it's easier to force the VariavleType to be string. So basically this is our readtable Code:
opts = detectImportOptions(xls,'Sheet',sheet);
opts = setvartype(opts, opts.SelectedVariableNames, 'string'); % force VariableType to be string (both ext and numbers)
data = readtable(xls,opts,'ReadVariableNames',false,'basic',true);
  • It makes comparisons easier, but of course, if we want to calculate something, we have to use str2double and string again:
data.tq = string(str2double(data.x) + str2double(data.y));
So this table, we create, is a table full of strings, where also the numbers are stored as strings. With Matlab2018 writetable worked perfectly or let's say "forgivingly". After writetable, we open the Excel-File via ActiveX and do the formatting directly in Excel based on CustomProperties where we, e.g., defined the number of decimals. Even though, we have a table with numbers stored as strings, we've never had a problem to format them in Excel. They, somehow, were stored as numbers in Excel:
Now I'm testing our tool for Matlab2022 and I guess, Matlab has become more rigorous over time. So when we use writetable now, numbers stored as string in the Matlab table will be also be stored as string in Excel. Which is frustrating because now, the formatting via ActiveX doesn't work anymore .
I tried various things:
  1. PreserveFormat in writetable, but apparently it means preserving the Excel Format
  2. Converting the table to an array and/or cell, trying to convert strings to numbers where there are numbers. But converting a table into an array or cell will lead to numeric values stored as char - same problem.
  3. I tried this: How to extract the numerial values out of readtable output? but cellfun(@isnumeric,table2cell(data)) doesn't work because of issue in 2. Or maybe it does with some modifcations that I don't know yet.
  4. In one case, this workaround works. (But only because, in this case, I could set a CustomProperty that allows me to define the variables that are numbers and unfortunately, this is not always the case. So not a universal solution.)
idxDecNum = find(strcmp('num',data.Properties.CustomProperties.VarOutType)); % Find columns where the CustomProperty VarOutType is defined as 'num'
data = convertvars(out,[idxDecNum],'double'); % convert only those found to double
for i = 1:size(idxDecNum,2) % convert each column to cell and store them in table
data.(data.Properties.VariableNames{idxDecNum(i)}) = num2cell(data.(data.Properties.VariableNames{idxDecNum(i)}));
My question(s):
  • Is there a possibility to find numeric values that are stored as string/char within a cell array and to convert only them? To be clear, the columns either contain purely numeric values or purely text. There is no mix of numbers and text within a column. But, of course, entries might be empty. So it would not be possible to check the first row for numeric and text values as the respective value might just be empty in the first row and contain data in the second or any following row.
  • Or is there an ActiveX command that converts all numbers stored as text to numbers? (However, I'd rather fix this problem in Matlab.)
Any kind of ideas are appreciated! But no, I really would like to avoid changing the readtable-command. First, I want to try to find a solution for the end.
Thank you, Laura
Laura V.
Laura V. on 23 Nov 2022
@dpb @Stephen23 Thank you for your opinion! :) Voss's solution worked and I will use it for now. However, you are both absolutely right! We should definitely take the time to revise our code completely! :)

Sign in to comment.

Accepted Answer

Voss on 23 Nov 2022
"Is there a possibility to find numeric values that are stored as string/char within a cell array and to convert only them?"
% a cell array containing mixed types:
C = {1 '2' "green" []; 'forest' "" 5 "6"; '7' 'tree' 9 NaN}
C = 3×4 cell array
{[ 1]} {'2' } {["green"]} {0×0 double} {'forest'} {["" ]} {[ 5]} {["6" ]} {'7' } {'tree'} {[ 9]} {[ NaN]}
% find the chars and strings:
is_text = cellfun(@(x)ischar(x) || isstring(x),C);
% convert the char/string elements to numeric.
% use cellfun(@str2double,_) rather than str2double alone, in order to
% handle mixed chars and strings (e.g., str2double({'4' "6"}) returns [4 NaN]):
C_numeric = NaN(size(C));
C_numeric(is_text) = cellfun(@str2double,C(is_text));
% take the non-NaN elements of C_numeric (i.e., those that were in fact
% numbers stored as chars/strings) and put them where they belong in C:
is_numeric_text = ~isnan(C_numeric);
C(is_numeric_text) = num2cell(C_numeric(is_numeric_text))
C = 3×4 cell array
{[ 1]} {[ 2]} {["green"]} {0×0 double} {'forest'} {["" ]} {[ 5]} {[ 6]} {[ 7]} {'tree'} {[ 9]} {[ NaN]}
A very similar (simpler) alternative:
% a cell array containing mixed types:
C = {1 '2' "green" []; 'forest' "" 5 "6"; '7' 'tree' 9 NaN};
% apply str2double on all cells (returns NaN for those already numeric):
C_numeric = cellfun(@str2double,C);
% take the non-NaN elements of C_numeric (i.e., those that were in fact
% numbers stored as chars/strings) and put them where they belong in C:
is_numeric_text = ~isnan(C_numeric);
C(is_numeric_text) = num2cell(C_numeric(is_numeric_text))
C = 3×4 cell array
{[ 1]} {[ 2]} {["green"]} {0×0 double} {'forest'} {["" ]} {[ 5]} {[ 6]} {[ 7]} {'tree'} {[ 9]} {[ NaN]}

More Answers (1)

dpb on 23 Nov 2022
Edited: dpb on 23 Nov 2022
Well, against better judgement, if are adamant about not fixing the problem at its core, then given the previously stated condition that the table variables are all either string or numeric (and a MATLAB table can't mix data types in a variable anyway, other than using a cell array), then I'd do something more like--
The above will attempt to convert all columns first, but only those that are determined to be able to be will be put into the table...of course, you could have determined this when reading the file by using detectImportOptions that would have found out which were numeric and handled it right off the bat.
With a well-formed input file, using detectImportOptions is probably not required; the builtin scanning inside readtable and friends will almost certainly get it right unless the input file format isn't kosher.
It's certainly not clear what issue you may have run into originally to have chosen this route; it would still be helpful to see the input file format; it just doesn't seem like one would choose to do the force-to-string thing unless there's an issue in the file format.
dpb on 24 Nov 2022
If you're not careful, that 10% will keep growing to counteract the proliferation of tools and datasets...@Stephen23's points are well taken to heart.
One suggestion I'd make going forward in the interim to help would be to build a library of import options objects for a given product/version and save the library of those. Then you can have your main code detect the version/product and use the prebuilt options object for that particular file on import. Ideally, your product design built in a product ID/release that you can read; if not that's a second immediate thing to add going forward.

Sign in to comment.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by