Reading Parquet Key/Value Metadata

23 vues (au cours des 30 derniers jours)
Matthew Clause
Matthew Clause le 22 Sep 2021
Is there a recommended method for reading metadata key/value pairs in the footer for a parquet file?
I'm using MATLAB to extract JSON metadata added from python's pyarrow. The footer layout is documented here https://github.com/apache/parquet-format#file-format. I have the following code which works, but would prefer a more robust solution in the future.
function metadata = parquetmeta(filename)
fid = fopen(filename);
fseek(fid, -8, 'eof');
footer_bytes = fread(fid, 1, 'uint32', 'ieee-le');
fseek(fid, -footer_bytes, 'eof');
footer = fread(fid, [1, footer_bytes], '*char');
fclose(fid);
start_idx = find(footer == '{', 1, 'first');
end_idx = find(footer == '}', 1, 'last');
metadata = struct;
if ~isempty(start_idx) && ~isempty(end_idx)
try %#ok<TRYNC>
metadata = jsondecode(footer(start_idx, end_idx));
end
end
end
I'm hoping MathWorks will add support for reading Parquet file metadata in a future release. Interestingly, it appears this use to be included as a feature before MATLAB 2019, but cannot verify on my version of MATLAB.
jobj = com.mathworks.bigdata.parquet.Reader;
md = jobj.getParquetFileReader.getFileMetaData.getKeyValueMetaData;
Thanks!

Réponses (0)

Catégories

En savoir plus sur Logical dans Help Center et File Exchange

Produits


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by