Overview

Request 1130498 accepted

- update to 2023.8.0:
* More general timestamp units (#874)
* ReadTheDocs V2 (#871)
* Better roundtrip dtypes (#861, 859)
* No convert when computing bytes-per-item for str (#858)

- Add patch to fox the test test_delta_from_def_2 on
* row-level filtering of the data. Whereas previously, only full
row-groups could be excluded on the basis of their parquet
metadata statistics (if present), filtering can now be done
within row-groups too. The syntax is the same as before,
allowing for multiple column expressions to be combined with
AND|OR, depending on the list structure. This mechanism
requires two passes: one to load the columns needed to create
the boolean mask, and another to load the columns actually
needed in the output. This will not be faster, and may be
slower, but in some cases can save significant memory
footprint, if a small fraction of rows are considered good and
the columns for the filter expression are not in the output.
* DELTA integer encoding (read-only): experimentally working,
but we only have one test file to verify against, since it is
not trivial to persuade Spark to produce files encoded this
way. DELTA can be extremely compact a representation for
* nanosecond resolution times: the new extended "logical" types
system supports nanoseconds alongside the previous millis and
micros. We now emit these for the default pandas time type,
and produce full parquet schema including both "converted" and
"logical" type information. Note that all output has
isAdjustedToUTC=True, i.e., these are timestamps rather than
local time. The time-zone is stored in the metadata, as

Request History
Dirk Mueller's avatar

dirkmueller created request

- update to 2023.8.0:
* More general timestamp units (#874)
* ReadTheDocs V2 (#871)
* Better roundtrip dtypes (#861, 859)
* No convert when computing bytes-per-item for str (#858)

- Add patch to fox the test test_delta_from_def_2 on
* row-level filtering of the data. Whereas previously, only full
row-groups could be excluded on the basis of their parquet
metadata statistics (if present), filtering can now be done
within row-groups too. The syntax is the same as before,
allowing for multiple column expressions to be combined with
AND|OR, depending on the list structure. This mechanism
requires two passes: one to load the columns needed to create
the boolean mask, and another to load the columns actually
needed in the output. This will not be faster, and may be
slower, but in some cases can save significant memory
footprint, if a small fraction of rows are considered good and
the columns for the filter expression are not in the output.
* DELTA integer encoding (read-only): experimentally working,
but we only have one test file to verify against, since it is
not trivial to persuade Spark to produce files encoded this
way. DELTA can be extremely compact a representation for
* nanosecond resolution times: the new extended "logical" types
system supports nanoseconds alongside the previous millis and
micros. We now emit these for the default pandas time type,
and produce full parquet schema including both "converted" and
"logical" type information. Note that all output has
isAdjustedToUTC=True, i.e., these are timestamps rather than
local time. The time-zone is stored in the metadata, as


Factory Auto's avatar

factory-auto added opensuse-review-team as a reviewer

Please review sources


Factory Auto's avatar

factory-auto accepted review

Check script succeeded


Staging Bot's avatar

staging-bot added as a reviewer

Being evaluated by staging project "openSUSE:Factory:Staging:adi:10"


Staging Bot's avatar

staging-bot accepted review

Picked "openSUSE:Factory:Staging:adi:10"


Saul Goodman's avatar

licensedigger accepted review

The legal review is accepted preliminary. The package may require actions later on.


Dominique Leuenberger's avatar

dimstar accepted review


Ana Guerrero's avatar

anag+factory accepted review

Staging Project openSUSE:Factory:Staging:adi:10 got accepted.


Ana Guerrero's avatar

anag+factory approved review

Staging Project openSUSE:Factory:Staging:adi:10 got accepted.


Ana Guerrero's avatar

anag+factory accepted request

Staging Project openSUSE:Factory:Staging:adi:10 got accepted.

openSUSE Build Service is sponsored by