Replace values in matrix (text to value)

Hi,
I have a matrix with 44 columns with the following headings:
id trip time objID status1 status2 status3 ... status40
I used textscan to read the data into one block. The problem is the "status" columns have "NULL" values as their first few values, but the columns are all different, ie. status1 has NULL for its first 5 rows, status2 for its first 3 rows, status3 for its first 10 rows, etc. Textscan has the "headerlines" argument that removes a specified number of rows from the beginning of the matrix, but is there anything I can do to remove a varying amount of rows? To ensure all the columns have the same number of values, I was going to replace all of the NULL values with a 0. Any idea on how to do that?
Thanks for the help.

4 commentaires

Cedric
Cedric le 21 Oct 2013
Modifié(e) : Cedric le 21 Oct 2013
Attach the file to your question by clicking on the icon
in the editor, so we can have a look.
Muneer
Muneer le 21 Oct 2013
My m-file? or the data file?
Cedric
Cedric le 21 Oct 2013
Modifié(e) : Cedric le 21 Oct 2013
The data file. Or a sample with 20-50 lines if it's too large.
Muneer
Muneer le 21 Oct 2013
Here's an image of the data because to get a better idea. Unfortunately, I can't give you a file because of security issues. But hopefully this helps.
I appreciate the help.

Connectez-vous pour commenter.

 Réponse acceptée

Cedric
Cedric le 21 Oct 2013
Modifié(e) : Cedric le 21 Oct 2013
Here is a small example using the following content
vehicle trip_time A B C
C23432 1234556 NULL NULL NULL
C23432 1234557 1 NULL NULL
C23432 1234558 2 NULL 100
C23432 1234559 3 10 200
C23432 1234560 4 20 300
the question that remains is what you want/need NULL entries to be converted to.
fid = fopen( 'myFile.txt', 'r' ) ;
content = textscan( fid, '%s %f %s %s %s', 'headerlines', 1 ) ;
fclose( fid ) ;
for c = 3 : numel( content ), content{c} = str2double( content{c} ) ; end
vehicle = content{1} ;
data = [content{2:end}] ;
With that you get
>> vehicle
vehicle =
'C23432'
'C23432'
'C23432'
'C23432'
'C23432'
which is a cell array of strings, and a numeric array for the rest of the data, with NaN entries for NULL values (seems appropriate)..
>> data
data =
1234556 NaN NaN NaN
1234557 1 NaN NaN
1234558 2 NaN 100
1234559 3 10 200
1234560 4 20 300
I'd personally work with that, but if you wanted zeros instead of NaNs, you could proceed as follows:
>> data(isnan(data)) = 0
data =
1234556 0 0 0
1234557 1 0 0
1234558 2 0 100
1234559 3 10 200
1234560 4 20 300
It's more handy for e.g. plotting, but you loose the information about NULL entries, because you can't say anymore which were initially NULL and which were true zeros.
Let me know if it's not exactly what you wanted to achieve, and we can refine the answer.

10 commentaires

Muneer
Muneer le 21 Oct 2013
I think this will work. My problem was that the "NULL" values didn't allow the columns of data to be read in as numbers (%n) with textscan, I had to read the whole column in as strings (%s). Changing NULL to NaN seems to take care of that. If I wanted to, could I change the NaN value to a 0, for example?
Thanks for your help
Cedric
Cedric le 21 Oct 2013
Modifié(e) : Cedric le 21 Oct 2013
Yes, I updated my answer with an example showing hoe to replace NaNs with zeros, probably right after you saw it. Please have a look at the update.
Muneer
Muneer le 21 Oct 2013
Thanks for the help. I loaded a section of my data in and ran the for loop. After, I called one of the columns (entering "data" to display the matrix was much too large) to look for the NaN values and didn't see them. I called status2. Shouldn't I see NaN in the column of data? It seems to have cut off the first second or two
Muneer
Muneer le 21 Oct 2013
Modifié(e) : Muneer le 21 Oct 2013
here are the first 5 columns (time, id, status1, status2, status3). the data should look like the first picture I attached.
Cedric
Cedric le 21 Oct 2013
My solution should be adapted to your case; in my example, column 1 contains characters, so I read it a strings, column 2 contains only numbers so I read it as floats, and the next column contain numbers and NULL entries. I read them as strings as well before post-processing them with STR2DOUBLE. The latter converts to double what can be converted and to NaN the rest. At the end of this process, columns 2 to end are numeric arrays and column 1 is a cell array of strings. I concatenate columns 2 to end into an numeric array named data and I save column 1 into its own cell array named vehicle.
Now if the content is a bit different, the solution should be adapted accordingly.
Is the data file too confidential for sending it to me by email? If not, send it to me and I'll have a look.
Muneer
Muneer le 22 Oct 2013
Sure I can email you. I'll send you a portion of the file and my code. Is the email on your page the right one? It says "remove this" after it.
Muneer
Muneer le 22 Oct 2013
I just experimented with it and found that the headerlines argument with textscan is what is deleting the first few points of data. The actual data has time starting at 0, but when introducing 'Headerlines', '1' with textscan all of the data before time 2046 is deleted. Not sure why that is...
Cedric
Cedric le 22 Oct 2013
Yes, just remove the part ".REMOVETHIS" .. it's a cheap trick to avoid receiving too much spam.
Muneer
Muneer le 22 Oct 2013
Maybe the for loop is doing the headerlines operation multiple times and deleting the top row each time?
Thanks for your help
Cedric
Cedric le 22 Oct 2013
I just replied to your email.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Characters and Strings dans Centre d'aide et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by