detecting Excel files without Excel extension
4 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I need to be able to identify an Excel file without using xlsfinfo. It is a VERY slow function. Without it, my algorithm runs faster. Any ideas?
Background: Our test setup outputs text files that are named with the ".xls" extension. Back in Excel 2003 and before, if you named a tab delimited text file with "xls", Excel would open the file no problem and properly interpret it. Now, it is causing me headaches with importing data into MATLAB. MATLAB uses the ".xls" extension to check for wrong input to "textscan" and "importdata." Here is where it gets interesting. When you open a text file with the ".xls" extension and then save it, it is now an excel file and no longer tab delimited text file. Basically, I need to be able to tell the difference between text and Excel files without xlsfinfo because xlsfinfo is very slow.
0 commentaires
Réponse acceptée
Iain
le 23 Mai 2013
You can try something like
fid = fopen(filename,'r');
fifty_bytes = fread(fid,50,'uint8'); fclose(fid);
Valid_characters = ['A':'Z' 'a':'z' ',./\ 0123456789' 9 10 13 '!"£$%^&*()[]{}-=_+ ... and any other valid characters you expect in your text files. ]
excel_file = false;
for i = 1:50
if any(fifty_bytes ~= Valid_characters)
excel_file = true;
end
end
Plus de réponses (2)
José-Luis
le 23 Mai 2013
Modifié(e) : José-Luis
le 23 Mai 2013
You could always check the file signature. You would need to read the file as a binary file and look into specific locations, using hexadecimal values.
Here is a snippet that will give you a string of the hexadecimal values:
fid = fopen('/path/to/test.xls','r');
uid = fread(fid,4,'ushort','b');
fseek(fid,520,'bof');
ftype = fread(fid,4,'ushort','b');
fclose(fid);
s1 = sprintf('%X',uid)
s2 = sprintf('%X',ftype)
If the file is an office document, s1 should be:
D0 CF 11 E0 A1 B1 1A E1 without the spaces
It it is an excel spreasheet, then s2 should be:
FD FF FF FF nn xx where the last two values (nn and xx) can change within a certain range.
It would be a matter of using strcmp(). You could of also compare hexadecimal values directly with fi(). For your purpose, it sounds like it would be enough to check if it is an Office document.
Text files carry no file identifier.
4 commentaires
José-Luis
le 24 Mai 2013
What are you getting? Maybe it is a little endian system? You could try with:
uid = fread(fid,4,'ushort','l');
Same thing for ftype.
Voir également
Catégories
En savoir plus sur Spreadsheets dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!