how can I append a Parquet file?

33 vues (au cours des 30 derniers jours)
8eodosis
8eodosis le 13 Mai 2020
Hello,
I have a Parquet file that I wish to append. I looked at the documentation of parquetwrite but doesnt provide any info on appending. It looks like this was an option in the old interface setting the option 'AppendData' to true:

Réponse acceptée

Kevin Gurney
Kevin Gurney le 10 Sep 2020
The version of parquetwrite introduced in R2019a does not currently support appending to preexisting Parquet files on disk.
The "AppendData" name-value pair that you referenced in the Parquet Support Package does not append to a preexisting file, but rather incrementally writes chunks of data to an open Parquet file output stream.
The Support Package uses a "stateful Writer object" in conjunction with multiple write() calls to achieve this. The Parquet file output stream is closed when a call to finish() is made.
There is currently no equivalent ParquetWriter object shipping in MATLAB.
----------
An alternative workflow to appending chunks of data to a preexisting Parquet file, would be to write out new Parquet files and then "emulate" the behavior of having one contiguous Parquet file using parquetDatastore.
If you write multiple Parquet files to disk in sequence (one for each chunk), which have consecutive numeric suffixes (e.g. data_01.parquet, data_02.parquet, ..., data_0N.parquet), you can use parquetDatastore to order these files as though they were one contiguous Parquet file. With this approach, you can call readall(parquetDatastore) to read the entire sequence of Parquet file "chunks" in one function call.
An example:
% Assuming the current directory contains data_01.parquet, data_02.parquet, ..., data_0N.parquet.
>> data = readall(parquetDatastore("data*.parquet"));

Plus de réponses (0)

Catégories

En savoir plus sur Data Import and Analysis dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by