Reading Parquet Key/Value Metadata
23 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Is there a recommended method for reading metadata key/value pairs in the footer for a parquet file?
I'm using MATLAB to extract JSON metadata added from python's pyarrow. The footer layout is documented here https://github.com/apache/parquet-format#file-format. I have the following code which works, but would prefer a more robust solution in the future.
function metadata = parquetmeta(filename)
fid = fopen(filename);
fseek(fid, -8, 'eof');
footer_bytes = fread(fid, 1, 'uint32', 'ieee-le');
fseek(fid, -footer_bytes, 'eof');
footer = fread(fid, [1, footer_bytes], '*char');
fclose(fid);
start_idx = find(footer == '{', 1, 'first');
end_idx = find(footer == '}', 1, 'last');
metadata = struct;
if ~isempty(start_idx) && ~isempty(end_idx)
try %#ok<TRYNC>
metadata = jsondecode(footer(start_idx, end_idx));
end
end
end
I'm hoping MathWorks will add support for reading Parquet file metadata in a future release. Interestingly, it appears this use to be included as a feature before MATLAB 2019, but cannot verify on my version of MATLAB.
jobj = com.mathworks.bigdata.parquet.Reader;
md = jobj.getParquetFileReader.getFileMetaData.getKeyValueMetaData;
Thanks!
0 commentaires
Réponses (0)
Voir également
Catégories
En savoir plus sur Logical dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!