MATLAB Answers

import csv file and use the data in Matlab

1 view (last 30 days)
ali bahramali
ali bahramali on 1 Dec 2019
Edited: dpb on 1 Dec 2019
I have a problem with one huge CSV file which I want to just have the import information as some variable for make more calculation. here I do not know how to get ride of quotations mark and other symbol after importing the csv file to matlab
T = readtable('C:\Users\mabah\Downloads\ip83_to_ip37_clean.csv')
I import csv file like this and the table is like :
'1578' '81.866260614' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4857'
'1579' '81.875650007' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4865'
'1580' '81.885960309' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4866'
'1581' '81.896168537' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4876'
'1582' '81.906376207' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4885'
..............
..................

  5 Comments

Show 2 older comments
ali bahramali
ali bahramali on 1 Dec 2019
thank you for your help. I stock with syntax . I do not need '....' I want just the data to make more calculation on it.
here is the first 20 rows :
'1578' '81.866260614' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4857'
'1579' '81.875650007' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4865'
'1580' '81.885960309' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4866'
'1581' '81.896168537' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4876'
'1582' '81.906376207' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4885'
'1583' '81.916706046' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4893'
'1584' '81.927024970' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4897'
'1585' '81.937320618' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4903'
'1586' '81.947794823' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4912'
'1587' '81.957728564' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4920'
'1588' '81.967817205' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4928'
'1589' '81.978317914' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4932'
'1590' '81.988481605' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4938'
'1591' '81.998799386' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4939'
'1592' '82.012383323' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4942'
'1593' '82.020065820' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4946'
'1594' '82.029564626' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4956'
'1595' '82.039907112' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4959'
'1596' '82.050066959' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0xc000' 'CS0' '0' '4964'
'1597' '82.060212786' '192.168.83.83' '192.168.83.37' 'UDP' '64' '60' '41761' '138' '' '4' '0x4000' 'CS0' '0' '4965'
the cyclist
the cyclist on 1 Dec 2019
The readtable function is typically smart enough to import and column of numbers as a numeric type. My best guess, without having seen the input file, is that you have some entries (even in the numeric columns) that are text. Therefore, MATLAB makes the whole column text.
We could take a look, if you upload the file, or a representative sample. (If you upload a sample, see if you get the same issue on it.)
ali bahramali
ali bahramali on 1 Dec 2019
I uplaod the file.
what I would really like to do is to use the diffrent column for the further calculation, especially on the IPFlags (column 12) and also time . regarthing the IPFlag, if the 0xc000 is IP Flag count it as 0 and if the IP Flag is 0x4000 count it as one and after that convert all the result to ASCII code.

Sign in to comment.

Accepted Answer

dpb
dpb on 1 Dec 2019
Edited: dpb on 1 Dec 2019
The problem is the file has explicit "" for string variables built into it...that overrides what readtable would otherwise do for numeric columns--it presumes since you wrote the file as character data explicitly, that's what you meant.
Either fix the file format to write numeric data as numeric or force the interpretation of the numeric columns as numeric via the import options object as suggested earlier...
opt=detectImportOptions('ip83_to_ip37_clean.csv'); % get default import object
opt.VariableTypes([1:2 6:11 14:15])={'double'}; % tell it which variables are numeric
opt.VariableTypes([3:5 12:13])={'categorical'}; % treat protocol data as categorical
t=readtable('ip83_to_ip37_clean.csv',opt);
results in
>> t(1:10,:)
ans =
10×15 table
No_ Time Source Destination IPProtocol IPTTL Length SrcPort DestPort TCPFlags IPVersion IPFlags IPDSCPValue IPFragOffset IPID
____ ______ _____________ _____________ __________ _____ ______ _______ ________ ________ _________ _______ ___________ ____________ ____
1578 81.866 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4857
1579 81.876 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4865
1580 81.886 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4866
1581 81.896 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4876
1582 81.906 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4885
1583 81.917 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4893
1584 81.927 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4897
1585 81.937 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4903
1586 81.948 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0x4000 CS0 0 4912
1587 81.958 192.168.83.83 192.168.83.37 UDP 64 60 41761 138 NaN 4 0xc000 CS0 0 4920
>>
As for the IPFlags,
>> categories(t.IPFlags)
ans =
2×1 cell array
{'0x4000'}
{'0xc000'}
>>
shows only the two values exist so I'd probably create a logical variable as you indicated...
>> t.IPFlag=(t.IPFlags=='0x4000');

  0 Comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.

Tags