Why is an empty string not empty? isempty('') returns true, but isempty("") returns false

(Updated to clarify the problem)
Strings, added in R2016b, are a great addition to matlab, but one aspect is a problem: testing for empty variables with isempty() no longer gives a consistent response.
>> isempty('')
ans =
logical
1
>> isempty("")
ans =
logical
0
% even though comparison for equality is 'true':
>> '' == ""
ans =
logical
1
One would expect that if the two values are considered equal, that a builtin function like isempty() would produce the same result.
Presumably "" is not empty because it is a string object, while '' really is the same as [ ]. But this is a problem for code that tests for empty character input using isempty(...). It no longer works as expected if passed "" instead of ''. Is there a switch we can turn on or off to enable isempty("") to return true?
A function written prior to R2016b never needed to check for empty "" (double-quoted) strings, because Matlab would throw an exception if passed something in double quotes. Now, that code produces incorrect results if a user passes in an empty double-quoted string.
One can, of course, test using strlength(...)==0, but IIRC, the Matlab IDE gave a warning about efficiency for this, and recommended using isempty(...).

4 commentaires

"but one aspect is a problem: testing for empty variables with isempty() no longer gives a consistent response"
Lets see if you notice the pattern here:
"ABCDE" % 1x1 string
"ABCD" % 1x1 string
"ABC" % 1x1 string
"AB" % 1x1 string
"A" % 1x1 string
"" % 1x1 string
What you are expecting is inconsistent, but MATLAB certainly isn't.
I understand that. As of R2016b, "" is a string object, not an empty array. The problem is that code written prior to R2016b never had to test for input of "" , so that code now behaves incorrectly if someone passes is an empty double-quoted string.
Stephen23
Stephen23 le 19 Jan 2018
Modifié(e) : Stephen23 le 19 Jan 2018
" The problem is that code written prior to R2016b never had to test for input of "" "
The solution is not to create a totally misleading and inconsistent definition of empty. You need to bite the bullet and update your code. And probably improve the input checking to make sure that it only works with data types that work properly.
See if the anonymous function in my Comment does what you want.

Connectez-vous pour commenter.

Réponses (6)

I think string arrays function more like cell arrays in this context... the first element of the string itself isn't empty, but the contents of that element are:
>> y = "";
>> whos y
Name Size Bytes Class Attributes
y 1x1 132 string
>> isempty(y)
ans =
logical
0
>> isempty(y{1})
ans =
logical
1

4 commentaires

Ian
Ian le 19 Jan 2018
Modifié(e) : Ian le 19 Jan 2018
Right. Strings are objects, not simple data types. The problem is that existing code which checks for empty strings using isempty(...) no longer works correctly if a user passes it an empty string (in double quotes) instead of a char array (in single quotes). This is my dilemma. Code written prior to R2016b would simply abort (throw an exception) if passed "". There was no such thing as a double-quoted string, so code never checked for it. Now, instead of aborting, it runs, but produces incorrect results because isempty(...) returns the wrong answer.
I think the easiest fix to the code would be to check whether the user passed a string input, and convert it to a character array if so. You can check the matlab version to keep the back-compatibility:
if ~verLessThan('matlab', '9.1')
if isstring(x)
x = char(x);
end
end
Hmmm. That would work, unless x is potentially an array of strings. In that case the conversion needs to be to a cell array of chars.
if true
if isstring(x)
if (length(x) == 1)
x = char(x);
else
x = cellstr(x);
end
end
endan array of chars.
Stephen23
Stephen23 le 19 Jan 2018
Modifié(e) : Stephen23 le 19 Jan 2018
"The problem is that existing code which checks for empty strings using isempty(...) no longer works correctly if a user passes it an empty string (in double quotes) instead of a char array (in single quotes)"
That isn't a "problem" at all, because that is actually the correct behavior: a scalar string is a scalar string, no matter how many characters it might contain.
"no longer works correctly"
MATLAB works correctly. What you are proposing is totally inconsistent, as the size of a string array is totally unrelated to how many characters might be contained in any one element of that array. That is in fact the whole point of string arrays.

Connectez-vous pour commenter.

Another option is the strlength (link) function. It appears to give the correct results for both string and character arrays:
chr1 = '';
chr2 = 'ab';
str1 = "";
str2 = "ab";
lenstr1 = strlength(str1)
lenstr2 = strlength(str2)
lenchr1 = strlength(chr1)
lenchr2 = strlength(chr2)
lenstr1 =
0
lenstr2 =
2
lenchr1 =
0
lenchr2 =
2
so there is no need to test the variable type, only the length.

3 commentaires

Ian
Ian le 19 Jan 2018
Modifié(e) : Ian le 19 Jan 2018
yes, but unfortunately strlength(...) throws an exception if the input is not char or string, so a function expecting either character or numeric input will abort on numeric input when it tests to see if the input is empty.
Sigh. I'm trying to find a clean solution that avoids going back through thousands of lines of code written prior to R2016b and rewriting all my tests for empty strings.
You could use isnumeric to test for numeric variables (including the empty array [] that returns true), and if that returns false, then test with strlength.
Experiment with this anonymous function with various arguments:
mtstr = @(x) ~isnumeric(x) && (strlength(x) == 0);
The ‘&&’ short-circuits the function so strlength will not ‘see’ numeric inputs.
" I'm trying to find a clean solution that avoids going back through thousands of lines of code written prior to R2016b and rewriting all my tests for empty strings."
You need to accept that different MATLAB versions have different features and can behave in different ways. You can either pick one version and stick with it, or accept that newer versions made some changes and ensure that your code matches that. The decision is yours. (This applies to all languages of course, not just MATLAB)
Your proposal of creating a totally inconsistent definition of empty is certainly not a "clean solution".

Connectez-vous pour commenter.

Ian
Ian le 19 Jan 2018
Modifié(e) : Ian le 19 Jan 2018
There have been many good suggestions above for workarounds for code going forward. Thanks all for your input. Having read and synthesized everyone's input, here are my (hopefully final) thoughts.
First, it seems that there are no perfect solutions here. On one hand, Cobeldick is right that changing isempty("") to return true would produce problems for someone who needs "" to not be empty; on the other hand, there is a huge codebase out there, including mine, which tests for empty strings with isempty(...) and which did not anticipate the introduction into matlab of string objects for which isempty("") would return false. That code base now is broken, and I and many others now need to update our code.
For my own part, I am going back through all my code and replacing all calls to isempty(...) with a call to a new function that returns the proper answer for my needs, and putting that function in a folder which is in the search path for all code I write.
In case it helps others, here is that function:
function tf = isempty_s(x)
tf = isempty(x) || (isstring(x) && length(x) == 1 && strlength(x)==0);
end

11 commentaires

Stephen23
Stephen23 le 19 Jan 2018
Modifié(e) : Stephen23 le 19 Jan 2018
@Ian: just out of curiosity, how would your code handle a non-scalar string array? Presumably the code was written to handle some char vectors... but a non-scalar string is equivalent to multiple char vectors, so how do you handle such string arrays?
Lets say that an input (which you intended to be char) is checked to ensure that it is a row vector: now this might be a string row vector. What happens to the contents of each string element? You seem to be under the impression that as long as you can handle just one case (scalar strings with zero characters) then everything else will work perfectly. I would be interested to know how that goes when your users start supplying non-scalar string arrays.
That will not work if you are using the Control System Toolbox.
A typical use for me is for input to my modeling code to be either an array or table of inputs, or the name of a file containing the data. If the input is empty or missing, it branches to a function to provide a set of default values. I also have class objects with name/value pairs, where the constructors use if (empty(...)) to populate the object with appropriate defaults. Prior to this, my code would never even start if passed "" as an input, so users knew there was a problem with their input. Now it runs, but silently does not branch to the code that provides appropriate default values. Since this is complex climate modeling code, it can run for hours days, and then spits out incorrect modeling results, which users may not realize did not use the appropriate initialization.
I would still say the best solution is to check right up front that all the inputs are of the type you expect, and if they're not, either convert them appropriately or throw an error.
For example, say right now you expect input 1 to be a character array. You say the code runs, but incorrectly, if the user passes in an empty string. What if they put a pass a double? A structure? A complex custom object? In the case of the latter, it's theoretically possible that someone may have coded an object that shares all the same method-names as a character array, so your code in theory would still run. But obviously you don't want that.
So, check up front. Make sure x is a character array. If it's a string, you can decide that's acceptable as long as it has length 1, and then immediately convert it to a character array. If the input is anything else, throw an error. Then move on to checking if the input character array is empty, and adding in your default value if not.
In some of my more complicated research-related code, I have entire subfunctions dedicated to this task. Parse input, check that everything is as expected, substitute defaults (loading datasets from file if necessary), etc. Never trust your users to not do something stupid... even if the only user is yourself.
Also, I highly recommend looking into inputParser objects. They take a little practice to get used to, but they make all the above checks pretty easy to accomplish. This link shows the input-parsing for a model of mine with particularly convoluted input... it has a bunch of inputs that are probably similar to yours (input should be either A or an empty array, with empty indicating to use loaded-from-file default)
Thanks for the suggestions. My code generally does that, but because strings weren't in the language until recently, it relied on isempty() to determine if the input was, indeed empty. Rather than add a check for empty strings as well, I am now simply replacing any calls to isempty(...) that could receive a string with a call to my isempty_s(...) wrapper.
I'll look into InputParser. Thanks.
I argue that the problem isn't that isempty isn't properly checking for an empty input. The problem is that your logic is assuming that non-empty input (whether "", or anything else) is definitely a properly-formatted table, or file name, or whatever. Don't assume! Check!
"I am now simply replacing any calls to isempty(...) that could receive a string with a call to my isempty_s(...) wrapper."
You would be much better off doing proper input checking, exactly as Kelly Kearney suggested. Your wrapper is not going to make your code robust.
Mathworks uses
matlab.io.internal.utility.convertStringsToChars
to do the conversion on argument lists. That is, you can use
[varargin{:}] = matlab.io.internal.utility.convertStringsToChars(varargin{:});
You can also do
xChar = convertStringsToChars(x);
tf = isempty(xChar);
Thanks for posting this solution. Still up against this in some larger code bases. For the most part, much of our legacy code still works OK, but it's important that the legacy code is fed char-arrays, and that can become difficult as using string types has become more common. We've attempted to go back through the legacy code and add argument blocks, etc., but its difficult to catch everything in a large codebase. Something like this `isempty_s` is a nice crutch to help refactoring these legacy codebases over to strings.

Connectez-vous pour commenter.

Hmmm. I suppose one could override isempty(...) Attempting to view the code for isempty(), it shows no code, just comments, that says isempty() is a builtin, but suggests the equivialent is:
prod(size(x))==0
The following should do the trick, if placed in the searchpath before the matlab folders are searched. Then none of my code needs to be updated...just put that in my utils folder. Unfortunately, it produces a long string of warnings.:
function tf = isempty(x)
if (isstring(x))
tf = strlength(x)==0;
else
tf = prod(size(x));
end
end

5 commentaires

"I suppose one could override isempty(...)..."
Well, that would be a really bad idea. Do you know how many toolbox functions rely on isempty? Do you know what side effects this would have on all of those functions?
I don't know that there are any good answers to this dilemma. As I said at the outset, the addition of strings is a great improvement over char arrays, just as C++ strings have made text handling much easier in that language. The problem is that one unintended consequence of the introduction Matlab's string objects is the sudden introduction of a bug into all existing prior code. Mathworks has generally been good about warning when part of their language is depracated, and generally give you several revisions' notice. This is one case where that hasn't happened.
Stephen23
Stephen23 le 19 Jan 2018
Modifié(e) : Stephen23 le 20 Jan 2018
"The problem is that one unintended consequence of the introduction Matlab's string objects is the sudden introduction of a bug into all existing prior code."
My functions check if an input needs to be class char, so I don't expect any bugs or "unintended consequences".
"Mathworks has generally been good about warning when part of their language is depracated... This is one case where that hasn't happened"
What has been deprecated?
the use of isempty(...) to test for empty strings
What has been deprecated?
"the use of isempty(...) to test for empty strings"
No, isempty correctly tests for empty strings, exactly as it should. It certainly has not been deprecated. A 2x1 string is NOT empty, a 1x1 string is NOT empty (no matter how many characters it contains), and a 1x0 string is empty. Try it:
isempty(strings(1,0)) % a 1x0 string really is empty!
What you are claiming is that
strings(1,1) % scalar string with no characters
should be classified as empty. Under your inconsistent definition of empty (where 1x1 string with zero characters is empty) this 1x4 string array would cause four iterations of this loop:
S = ["ABC","DEF","GHI","JK"] % 1x4 string array
while ~isempty(S), S=S(2:end), end
whereas this 1x4 string array would iterate three times:
S = ["ABC","DEF","GHI",""] % 1x4 string array
while ~isempty(S), S=S(2:end), end
and apparently this would not iterate even once!:
S = ["","","",""] % 1x4 string array.
while ~isempty(S), S=S(2:end), end
Even though all of them are 1x4 string arrays, you have just invented a definition of empty that magically changes how many times my loop iterates, not depending on the size of the array itself (which is what MATLAB currently does) but depending on the number of characters within the string array. If you then decide that a 1x4 string array should obviously cause it to iterate four times, then how many times should it for a 1x3 string? Or a 1x2 string? Or a 1x1 string? Or a 1x0 string?
Oh... Actually I think I like this more and more: I could use this to avoid doing lots of work: "sorry I could not process your data: you entered the test name as an empty string and as a result my code just skipped that iteration entirely, and so I never knew about it". You have me convinced!
Nothing has been deprecated. isempty is totally correct.
PS: actually I could not figure out how many iterations this string array ["","","",""] would result in using your inconsistent definition of isempty:
  • four times? (disagrees with you insisting that 1x1 can be empty)
  • three times? (because only the last iteration would be a 1x1 string with zero characters, fitting your definition).
  • zero times? (because all four of the string elements have zero characters).
Three times seems to fit your definition, but would mean that that loop runs a different number of times depending on what data is contained in the string, and not on the size of the string itself! So in some cases your definition causes my 1x4 string to be iterated over four times, sometimes maybe three times... and I cannot use size or numel to tell me!
Oh, if only MATLAB had implemented some consistent definition of isempty, so that my loop always iterated over all of my 1x4 arrays (not matter what class) consistently four times! Oh wait... that is exactly what MATLAB already does!
PPS: What about ["","","","ABC"]? Your special isempty definition has me totally flummoxed on this one. Please advise how it would work!

Connectez-vous pour commenter.

As near as I can tell, Matlab does not use the double quote character, so
isempty("")
is not a valid statement.

5 commentaires

Double quotes define string arrays, which were introduced in R2016b.
Ian
Ian le 19 Jan 2018
Modifié(e) : Ian le 19 Jan 2018
Strings are new; as of R2016b Matlab has added string objects. See https://www.mathworks.com/help/matlab/ref/string.html
In R2016b, string (link) arrays, that use double quotes, were introduced. They behave differently than character arrays.
The use of double quotes on input was introduced in R2017a, but string objects were introduced in R2016b.
Thanks for the update.

Connectez-vous pour commenter.

I think I'm going to submit this to MathWorks as a bug, and see what they say.

6 commentaires

Errr....
A string array is much like a cell array, in the sense that it is like a container:
{[1,2,3,4]} % 1x1 cell contains 1x4 double
{[1,2,3]} % 1x1 cell contains 1x3 double
{[1,2]} % 1x1 cell contains 1x2 double
{[1]} % 1x1 cell contains 1x1 double
{[]} % 1x1 cell contains 1x0 double
Notice how the size of the cell array does not depend on the size of the contents. But you claim that
{[]} % empty cell !
How can it be an empty cell if it has size 1x1? How can it be an empty cell if it contains a numeric array? In all other cases the size of the contents is irrelevant, and now you state that is this one case is suddenly becomes relevant. How is adding a special case "consistent"? This concept would require slowing down MATLAB, because instead of isempty simply checking the size of the object it has been given it would also have to check if that object is a container of some kind, and then look at its contents and check its size too: most of the time the content's size would then simply be discarded, except for the one special case when it is empty. Slow and inconsistent.
Why do you insist on something that is 1x1 to measure as empty?
I understand what you are saying. The problem is that existing code now produces the wrong answer if passed "" instead of ''. Prior to R2016b, the proper way to test for an empty string was to use isempty(x), and there is a HUGE code base out there testing this way. Now that code is going to give the wrong answer if an unsuspecting user passes "" as input to a function which naively tests for isempty(x). I don't know how much code you have out there that others are using...I have tens or hundres of thousands of lines of code in use by others that will now start giving incorrect results, and I'm trying to find a correct solution to this.
See my latest Comment. It might be what you want.
Stephen23
Stephen23 le 19 Jan 2018
Modifié(e) : Stephen23 le 19 Jan 2018
"The problem is that existing code now produces the wrong answer..."
Yep, and the code is the problem, not isempty. Fix the code.
Or tell your users to use a particular MATLAB version.
If the code is really that important then clearly you have done all of the things that are recommended when writing MATLAB functions, like writing documentation, input checking, and adding version numbers. Oh wait, the first two of those would alert your users to the fact that strings do not work with your functions...!
And of course this is no different to any other update, so I'm sure that for such important code you already have a process in place for giving your very important users updates or bugfixes, perhaps via GIT or github or something similar. Or do you simply distribute your code as if it was perfect and will never require updates and revisions? That would be... very optimistic.
There are reasons why programmers have invented the tedious and sometimes painful but incredibly practical and invaluable versioning systems: because code does change, bugs do get fixed, and people in different locations need to be kept updated. You are not the first person in the universe to find themselves in this position: the "correct solution" that you talk about is to improve your processes.
"I'm trying to find a correct solution to this."
It is not clear why you think that forcing TMW to re-define the currently consistent definition of isempty into a totally misleading and inconsistent definition of isempty would be the "correct solution" to the problem that your code needs to be updated to work with a new MATLAB version.
Before you start thinking that I am just being factious for fun, actually my aim is to help you. It seems to me that focusing on isempty is a distraction and ultimately will just be a waste of time. You really would be much better off focusing on:
  • updating your code (e.g. adding input checks).
  • improving your code distribution processes.
These are "correct solutions" in exactly the sense that you meant: they are standard methods using standard tools and coding best practice. They are best practice because they are known to work.
Thanks for your comments. I appreciate your attempt to help me.
Please see my "accepted answer" above.
Note, BTW, that matlab considers "" and '' to be equal:
>> "" == ''
ans =
logical
1
Therefore I would not say that it is unreasonable to expect a builtin function like isempty(...) to produce identical results for the two inputs.
This is an unfortunate unintended consequence of the late introduction of strings to the matlab programming language, for which there is probably no good resolution.
Stephen23
Stephen23 le 20 Jan 2018
Modifié(e) : Stephen23 le 20 Jan 2018
"Note, BTW, that matlab considers "" and '' to be equal:"
"Therefore I would not say that it is unreasonable to expect a builtin function like isempty(...) to produce identical results for the two inputs."
eq for strings is very much an overloaded convenience operator, as the help describes: "If one input is a string array, the other input can be a string array, a character vector, or a cell array of character vectors. The corresponding elements of A and B are compared lexicographically".
Lexicographical equivalence does not imply that any of these different arrays are in any other ways identical.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Characters and Strings dans Centre d'aide et File Exchange

Produits

Question posée :

Ian
le 19 Jan 2018

Commenté :

le 17 Juil 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by