How to internally change decimal separator while importing data from a csv?

Question

0 votes

Hello,

I am currently working with very large data files (csv) with approximately 15000 rows and a couple hundred columns. Since the data in the cells is derived by european software, the decimal separator turns out to be a comma and I am replacing it with a dot by using a small script. This script works very well but the corrected data has to be stored in a new file since the original file must not be manipulated. What I want to do is to import the data from the file, generate a workspace variable that contains all the data but has the commas replaced by dots. This would be very useful because for processing the data any further I always have to generate a new file and load that in manually. Since that is quite time-consuming, I would like to avoid this step.

Current Process:

%% Small script that replaces comma by dot and saves corrected data to new csv-file with name [...]_dots
[filenames,path]=uigetfile('*.csv');
cd(path);
disp(filenames)
oldfile=filenames(1:end-4);
NewFileName = sprintf('%s_Dots.csv',oldfile);
Data = fileread(filenames);
Data = strrep(Data, ',', '.');
% Save corrected data to new file
FID = fopen(NewFileName, 'w');
fwrite(FID, Data, 'char');
fclose(FID);

When having the dot as a decimal separator, the import is very easy and generates a handy structure array that is used in the further evaluation of the data as seen below. The structure array is used for further examination of the data

i_start_import=1;
datastruct=importdata(filenames,';',i_start_import);

However, the importdata function does not generate a structure array when the comma is used as a decimal separator.

Experiments:

What I have tried so far to receive the same struct but with dots as a separator is to read in the original data with importdata to further convert the comma to dot and then split the data string to receive tabular stored data. Unfortunately, this is very time consuming and takes too long for all the thousands of cells in my csv (especially the splitting process).

The data after applying importdata looks like this:

And is then split to this:

data=importdata(filenames,';',100000);
% Create structure array
datastruct=struct
datastruct.data=[];
% Replace comma by dot
for i=1:length(data)
    a=strrep(data{i},',','.');
    data{i}=a;
end
% Split data string
for i=1:length(data) %index rows
    for j=1:516 %since there are 516 columns, index columns
        str=strsplit(data{i},';');
        datastruct.textdata{i,j}=str(1,j);
        if j==516
            break
        else
            continue
        end
    end
end

Ideally, the data should be stored in a structure array with field textdata having the values of the first column (TimeStamp) and the first row (variables) and a field data having all the data.

I am now looking for a way to read in the original data with comma as decimal separator and convert the comma to dot internally and save it as a variable (struct) that I can use in my further examination in the script.

2 commentaires
Afficher Aucune Masquer Aucune

Jan le 6 Mai 2021

Ouvrir dans MATLAB Online

Some pitfalls:

Do not use the important function "path" as name of a variable. This can cause extremely strange effects during debugging.
Avoid to use cd() to change the current directory. The callbacks of GUIs or TIMERs could do this also and the assumes files are not found anymore. This is a frequent source of bugs. Use absolute path names instead using fullfile(folder, filename).
In:

    for j=1:516 %since there are 516 columns, index columns
        str=strsplit(data{i},';');
        datastruct.textdata{i,j}=str(1,j);
        if j==516       % from here  
            break       %
        else            %
            continue    %
        end             % to here
    end

the marked block is useless. Simply omit it.

Dennis B le 6 Mai 2021

Hello Jan,

Thank you for your hints, I will change that! Do you have any other ideas for changing the decimal separator internally without having to create a new file?

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Stephen23 le 6 Mai 2021

Modifié(e) : Stephen23 le 6 Mai 2021

Ouvrir dans MATLAB Online

3 votes

Do NOT use CD to access data files: it is more efficient to use absolute/relative filenames (with FULLFILE).

To import a CSV (actually you appear to have a semi-colon-delimited fields) file with decimal comma simply select the appropriate options with READTABLE or READMATRIX:

[F,P] = uigetfile('*.csv');
T = readtable(fullfile(P,F), 'Delimiter',';', 'DecimalSeparator',',', 'VariableNamingRule','preserve')

Adapt to suit your file. If you had uploaded an actual data file (instead of screenshots) by clicking the paperclip button then I would have tested this as well.

17 commentaires
Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens

Stephen23 le 15 Mai 2021

Modifié(e) : Stephen23 le 15 Mai 2021

Ouvrir dans MATLAB Online

Example_Data.csv

str = fileread('Example_Data.csv');
str = strrep(str,',','.');
[hdr,idx] = regexp(str,'^TimeStamp;[^\n]+','lineanchors','match','end','once');
hdr = regexp(hdr,'[^;]+','match')
hdr = 1×43 cell array
    {'TimeStamp'}    {'70788.1.C670010_V1:1'}    {'70788.1.C670011_V1:1'}    {'70788.1.C670012_V1:1'}    {'70788.1.C670013_V1:1'}    {'70788.1.E601000_W1:6'}    {'70788.1.E601000_X1:6'}    {'70788.1.E602000_W1:6'}    {'70788.1.E602000_X1:6'}    {'70788.1.E604001_X1:10'}    {'70788.1.E604002_X1:10'}    {'70788.1.E604003_X1:10'}    {'70788.1.E604004_X1:10'}    {'70788.1.E604005_X1:10'}    {'70788.1.E604006_X1:10'}    {'70788.1.E604007_X1:10'}    {'70788.1.E604008_X1:10'}    {'70788.1.E604009_X1:10'}    {'70788.1.E604010_X1:10'}    {'70788.1.E604011_X1:10'}    {'70788.1.E604012_X1:10'}    {'70788.1.E604013_X1:10'}    {'70788.1.E604014_X1:10'}    {'70788.1.E604015_X1:10'}    {'70788.1.E604016_X1:10'}    {'70788.1.E604017_X1:10'}    {'70788.1.E604018_X1:10'}    {'70788.1.E604019_X1:10'}    {'70788.1.E604020_X1:10'}    {'70788.1.E604021_X1:10'}    {'70788.1.E604022_X1:10'}    {'70788.1.E604023_X1:10'}    {'70788.1.E604024_X1:10'}    {'70788.1.E604025_X1:10'}    {'70788.1.E604026_X1:10'}    {'70788.1.E604027_X1:10'}    {'70788.1.E604028_X1:10'}    {'70788.1.E604029_X1:10'}    {'70788.1.E604030_X1:10'}    {'70788.1.E604031_X1:10'}    {'70788.1.E604032_X1:10'}    {'70788.1.E604033_X1:10'}    {'70788.1.E604034_X1:10←'}
opt = {'Delimiter',';', 'CollectOutput',true};
fmt = repmat('%f',1,numel(hdr)-1);
fmt = ['%{dd.MM.yyyy HH:mm:ss.SSSSSSSSS}D',fmt]; %01.01.1900 16:37:21.0000000
out = textscan(str(idx:end),fmt,opt{:})
out = 1×2 cell array
    {33×1 datetime}    {32×42 double}
out{1}
ans = 33×1 datetime array
   01.01.1900 16:37:21.000000000
   01.01.1900 16:37:21.000000100
   01.01.1900 16:37:21.000000200
   01.01.1900 16:37:21.000000300
   01.01.1900 16:37:21.000000400
   01.01.1900 16:37:21.000000500
   01.01.1900 16:37:21.000000600
   01.01.1900 16:37:21.000000700
   01.01.1900 16:37:21.000000800
   01.01.1900 16:37:21.000000900
   01.01.1900 16:37:21.000001000
   01.01.1900 16:37:21.000001100
   01.01.1900 16:37:21.000001200
   01.01.1900 16:37:21.000001299
   01.01.1900 16:37:21.000001400
   01.01.1900 16:37:21.000001500
   01.01.1900 16:37:21.000001600
   01.01.1900 16:37:21.000001700
   01.01.1900 16:37:21.000001800
   01.01.1900 16:37:21.000001900
   01.01.1900 16:37:21.000002000
   01.01.1900 16:37:21.000002100
   01.01.1900 16:37:21.000002200
   01.01.1900 16:37:21.000002300
   01.01.1900 16:37:21.000002400
   01.01.1900 16:37:21.000002500
   01.01.1900 16:37:21.000002599
   01.01.1900 16:37:21.000002700
   01.01.1900 16:37:21.000002800
   NaT                          
out{2}
ans = 32×42
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9600         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.7500         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    2.0400         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9400         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9700         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0170    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9100         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0220    0.0250    0.0170    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9000         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0220    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9300         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9500         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050
    0.9545    0.1770    0.9406    0.9716   30.0000    1.9300         0    0.0300    0.0060    0.0010    0.0130    0.0060    0.0040    0.0160    0.0060    0.0250    0.0250    0.0140    0.0040    0.0150    0.0050    0.0040    0.0060    0.0050    0.0110    0.0050    0.0050    0.0100         0    0.0050

Louis-Marie le 20 Oct 2022

Thanks Stephen for your useful tip. I have one additional question: i also need to change the readtable options using data=readtable(filename, opts) AND to read comma separated numbers using data=readtable(filename, 'DecimalSeparator',',')

But i didn't succed in mixing both strategies like: data=readtable(filename, opts,'DecimalSeparator',',') .

Is there a way to do so?

Stephen23 le 20 Oct 2022

@Louis-Marie: presumably the OPTS variable in your code is the object returned by DETECTIMPORTOPTIONS, in which case you can specify the 'DecimalSeparator' option there. It does not really make sense to specify the 'DecimalSeparator' option in READTABLE after calling DETECTIMPORTOPTIONS, because DETECTIMPORTOPTIONS will also need to know the decimal separator to detect e.g. which data are numeric.

Connectez-vous pour commenter.

How to internally change decimal separator while importing data from a csv?

2 commentaires
Afficher Aucune Masquer Aucune

Réponse acceptée

17 commentaires
Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens

Plus de réponses (0)

Catégories

Produits

Version

Tags

Community Treasure Hunt

How to internally change decimal separator while importing data from a csv?

2 commentaires Afficher Aucune Masquer Aucune

Réponse acceptée

17 commentaires Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens

Plus de réponses (0)

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

2 commentaires
Afficher Aucune Masquer Aucune

17 commentaires
Afficher 15 commentaires plus anciens Masquer 15 commentaires plus anciens