map first field to second field in txt file

hi,
I have this txt file :
1::Toy Story (1995) ::Animation|Children's|Comedy
2::Jumanji (1995) ::Adventure|Children's|Fantasy
8::Tom and Huck (1995) ::Adventure|Children's
I want to map for example 1 into animation, and 2 into adventure 8 into adventure i.e ,i need creat txtfile has two columns , the first column contains 1,2,8 and second column contains animation,adventure,adventure
please, how do that thanks in advance

 Réponse acceptée

per isakson
per isakson le 31 Juil 2012
Modifié(e) : per isakson le 5 Août 2012
A slight modification of the textscan command I provided to your question the other day will read the file. (You never explained how "::" should be interpreted.) What do you mean by "I read each filed alone of a one row, textscan do not work with it."? If you don't need a column add "*" after "%", e.g. "%*d" to suppress the first column.
Thus
>> cac = txt2m
cac =
[3x1 int32] {3x1 cell} {3x1 cell}
>> cac{:}
ans =
1
2
8
ans =
'Toy Story (1995) '
'Jumanji (1995) '
'Tom and Huck (1995) '
ans =
'Animation|Children's|Comedy'
'Adventure|Children's|Fantasy'
'Adventure|Children's'
>>
where the function, txt2m, is given by
function cac = txt2m()
fid = fopen('cssm.txt');
cac = textscan( fid, '%d%s%s' ...
, 'Delimiter' , ':' ...
, 'CollectOutput' , false ...
... , 'EmptyValue' , -999 ...
... , 'ExpChars' , '' ...
, 'MultipleDelimsAsOne' , true ...
, 'Whitespace' , '' );
fclose( fid );
end
then regexp and str2num
>> regexp( cac{2}, '\d{4}', 'match' )
ans =
{1x1 cell}
{1x1 cell}
{1x1 cell}
>> ans{:}
ans =
'1995'
ans =
'1995'
ans =
'1995'
--- In response to the answer below ---
This modified function, txt2m, reads and parses your file. It reads the file to a string with the function, fileread (thanks Walter, I didn't know of that one), and replaces "::" by "¤" (knock on wood). I just picked a character on the keyboard.
Try
>> cac = txt2m()
cac =
[13x1 int32] {13x1 cell} {13x1 cell}
>>
where
cssm.txt contains your 13 rows
and where
function cac = txt2m()
str = fileread( 'cssm.txt' );
str = strrep( str, '::', '¤' );
cac = textscan( str, '%d%s%s' ...
, 'Delimiter' , '¤' ...
, 'CollectOutput' , false ...
... , 'EmptyValue' , -999 ...
... , 'ExpChars' , '' ...
, 'MultipleDelimsAsOne' , true ...
, 'Whitespace' , '' );
end

13 commentaires

huda nawaf
huda nawaf le 31 Juil 2012
thanks, when write c1=cac{1} I got all data of three columns, can we get c1=cac{1};c2=cac{2}; c3=cac{3}?
many thanks
per isakson
per isakson le 31 Juil 2012
Modifié(e) : per isakson le 31 Juil 2012
If so, your textscan-command is not identical to mine. Or less likely Matlab version. I use R2012a. See above. Try
cac2 = cac{1};
Use the Variable Editor to inspect the content of variables. Try double-click "cac" in the Workspace window.
huda nawaf
huda nawaf le 31 Juil 2012
Modifié(e) : Walter Roberson le 31 Juil 2012
now i got c1=cac{1}
1
2
8
but when write c2=cac{2}
i got the other two fields together not separating.
why?
Hard for me to guess. I don't know what exactly you are doing. Do you use this "line"?
'CollectOutput' , false ...
this is what I do
f1=fopen('d:\matlab\r2011a\bin\movielens\1m_mov\movies.txt');
cac = textscan( f1, '%d %s %s' ...
, 'Delimiter' , ':' ...
, 'CollectOutput' , true ...
... , 'EmptyValue' , -999 ...
, 'ExpChars' , '' ...
, 'MultipleDelimsAsOne' , true ...
, 'Whitespace' , '' );
fclose( f1 )
c1=cac{1};c2=cac{2};c3=cac{3}
huda nawaf
huda nawaf le 31 Juil 2012
now I got the three fields when I place false instea of true.
but I got just the 12 values for each field, while I have 3000 values for each column. why?
per isakson
per isakson le 1 Août 2012
Modifié(e) : per isakson le 1 Août 2012
I cannot guess
  1. Any error message?
  2. What do the data lines 12 to 14 look like?
huda nawaf
huda nawaf le 2 Août 2012
thanks there is no any error message these lines 12::Dracula: Dead and Loving It (1995)::Comedy|Horror 13::Balto (1995)::Animation|Children's 14::Nixon (1995)::Drama
I will check my code with another txtfile, then tell u
huda nawaf
huda nawaf le 2 Août 2012
I tried with another txt file, the same thing, where I took file with 7 rows
but when run it give me 2 row values
per isakson
per isakson le 2 Août 2012
Modifié(e) : per isakson le 2 Août 2012
The row
12::Dracula: Dead and Loving It (1995)::Comedy|Horror
contains four values not three as assumed when designing the format string. The ":" after "Dracula" is interpreted as a delimiter.
There is no simple way AFAIK to prescribe "::" as delimiter.
One way is to read the complete file as one string or one string per line and replace "::" by another character and use that as delimiter. Which character would be safe to use as delimiter?
You should have inspected the result, cac, of the read operation.
huda nawaf
huda nawaf le 3 Août 2012
THANKS, i corrected the error, and I did what you suggested but the problem is not solved.
per isakson
per isakson le 3 Août 2012
Modifié(e) : per isakson le 3 Août 2012
  1. What did you do? What does your new code look like?
  2. How does it behave? What output? What error message?
Why do you expect me to guess?
per isakson
per isakson le 4 Août 2012
Modifié(e) : per isakson le 4 Août 2012
Why don't you care to respond?

Connectez-vous pour commenter.

Plus de réponses (1)

huda nawaf
huda nawaf le 4 Août 2012
Modifié(e) : Walter Roberson le 4 Août 2012
I just need to read txtfile with this format:
1::Toy Story (1995) ::Animation|Children's|Comedy
2::Jumanji (1995) ::Adventure|Children's|Fantasy
8::Tom and Huck (1995) ::Adventure|Children's
there is no error message , but I have 3000 rows ,when I read it use the code u sent it earlier I got just first 12 rows?
I want to map first fiels into the first word of third field
ex. 1 Animation 2 Adventure 8 Adventure
this is what I need . the first 13 rows of my file:
1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller
11::American President, The (1995)::Comedy|Drama|Romance
12::Dracula: Dead and Loving It (1995)::Comedy|Horror
13::Balto (1995)::Animation|Children's
thanks

3 commentaires

Walter Roberson
Walter Roberson le 4 Août 2012
textscan() will not work for this, at least not as-is. You can read the file (such as by using fileread() ) and then use regexp() to parse it.
per isakson
per isakson le 4 Août 2012
Modifié(e) : per isakson le 4 Août 2012
See my answer above. I hope the lines you don't show don't contain "¤".
huda nawaf
huda nawaf le 5 Août 2012
thanks for both walter and per. lastly, I got what I need by your efforts

Connectez-vous pour commenter.

Catégories

En savoir plus sur Large Files and Big Data dans Centre d'aide et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by