Revisions of python-fastparquet

Markéta Machová's avatar Markéta Machová (mcalabkova) accepted request 1032127 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 40)
- Update to 0.8.3
  * improved key/value handling and rejection of bad types
  * fix regression in consolidate_cats (caught in dask tests)
- Release 0.8.2
  * datetime indexes initialised to 0 to prevent overflow from
    randommemory
  * case from csv_to_parquet where stats exists but has not nulls
    entry
  * define len and bool for ParquetFile
  * maintain int types of optional data tha came from pandas
  * fix for delta encoding
- Add fastparquet-pr813-updatefixes.patch gh#dask/fastparquet#813
buildservice-autocommit accepted request 972913 from Markéta Machová's avatar Markéta Machová (mcalabkova) (revision 39)
baserev update by copy to link target
Markéta Machová's avatar Markéta Machová (mcalabkova) accepted request 972857 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 38)
- Update to 0.8.1
  * fix critical buffer overflow crash for large number of columns
    and long column names
  * metadata handling
  * thrift int32 for list
  * avoid error storing NaNs in column stats
Matej Cepl's avatar Matej Cepl (mcepl) accepted request 950136 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 37)
- Update to 0.8.0
  * our own cythonic thrift implementation (drop thrift dependency)
  * more in-place dataset editing ad reordering
  * python 3.10 support
  * fixes for multi-index and pandas types
- Clean test skips
Matej Cepl's avatar Matej Cepl (mcepl) accepted request 946801 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 36)
- Clean specfile from unused python36 conditionals
- Require thrift 0.15.0 (+patch) for Python 3.10 compatibility
  * gh#dask/fastparquet#514
Dirk Mueller's avatar Dirk Mueller (dirkmueller) accepted request 934308 from Arun Persaud's avatar Arun Persaud (apersaud) (revision 35)
- still some failed builds, but they are also in the current package (and I don't know how to fix them)

- update to version 0.7.2:
  * Ability to remove row-groups in-place for multifile datasets
  * Accept pandas nullable Float type
  * allow empty strings and fix min/max when there is no data
  * make writing statistics optional
  * row selection in to_pandas()
Matej Cepl's avatar Matej Cepl (mcepl) accepted request 910725 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 34)
- Update to version 0.7.1
  * Back compile for older versions of numpy
  * Make pandas nullable types opt-out. The old behaviour (casting
    to float) is still available with ParquetFile(...,
    pandas_nulls=False).
  * Fix time field regression: IsAdjustedToUTC will be False when
    there is no timezone
  * Micro improvements to the speed of ParquetFile creation by
    using simple simple string ops instead of regex and
    regularising filenames once at the start. Effects datasets with
    many files.
- Release 0.7.0
  * This version institutes major, breaking changes, listed here,
    and incremental fixes and additions.
  * Reading a directory without a _metadata summary file now works
    by providing only the directory, instead of a list of
    constituent files. This change also makes direct of use of
    fsspec filesystems, if given, to be able to load the footer
    metadata areas of the files concurrently, if the storage
    backend supports it, and not directly instantiating
    intermediate ParquetFile instances
  * row-level filtering of the data. Whereas previously, only full 
    row-groups could be excluded on the basis of their parquet 
    metadata statistics (if present), filtering can now be done 
    within row-groups too. The syntax is the same as before, 
    allowing for multiple column expressions to be combined with 
    AND|OR, depending on the list structure. This mechanism 
    requires two passes: one to load the columns needed to create 
    the boolean mask, and another to load the columns actually 
    needed in the output. This will not be faster, and may be 
    slower, but in some cases can save significant memory 
    footprint, if a small fraction of rows are considered good and 
    the columns for the filter expression are not in the output. 
    Not currently supported for reading with DataPageV2.
  * DELTA integer encoding (read-only): experimentally working, 
    but we only have one test file to verify against, since it is 
    not trivial to persuade Spark to produce files encoded this 
    way. DELTA can be extremely compact a representation for 
    slowly varying and/or monotonically increasing integers.
  * nanosecond resolution times: the new extended "logical" types 
    system supports nanoseconds alongside the previous millis and 
    micros. We now emit these for the default pandas time type, 
    and produce full parquet schema including both "converted" and 
    "logical" type information. Note that all output has 
    isAdjustedToUTC=True, i.e., these are timestamps rather than 
    local time. The time-zone is stored in the metadata, as 
    before, and will be successfully recreated only in fastparquet 
    and (py)arrow. Otherwise, the times will appear to be UTC. For 
    compatibility with Spark, you may still want to use 
    times="int96" when writing.
  * DataPageV2 writing: now we support both reading and writing. 
    For writing, can be enabled with the environment variable 
    FASTPARQUET_DATAPAGE_V2, or module global fastparquet.writer.
    DATAPAGE_VERSION and is off by default. It will become on by 
    default in the future. In many cases, V2 will result in better 
    read performance, because the data and page headers are 
    encoded separately, so data can be directly read into the 
    output without addition allocation/copies. This feature is 
    considered experimental, but we believe it working well for 
    most use cases (i.e., our test suite) and should be readable 
    by all modern parquet frameworks including arrow and spark.
  * pandas nullable types: pandas supports "masked" extension 
    arrays for types that previously could not support NULL at 
    all: ints and bools. Fastparquet used to cast such columns to 
    float, so that we could represent NULLs as NaN; now we use the 
    new(er) masked types by default. This means faster reading of 
    such columns, as there is no conversion. If the metadata 
    guarantees that there are no nulls, we still use the 
    non-nullable variant unless the data was written with 
    fastparquet/pyarrow, and the metadata indicates that the 
    original datatype was nullable. We already handled writing of 
    nullable columns.
buildservice-autocommit accepted request 894287 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 33)
baserev update by copy to link target
Matej Cepl's avatar Matej Cepl (mcepl) accepted request 894265 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 32)
- Update to version 0.6.3
  * no release notes
  * new requirement: cramjam instead of separate compression libs
    and their bindings
  * switch from numba to Cython
buildservice-autocommit accepted request 871464 from Dirk Mueller's avatar Dirk Mueller (dirkmueller) (revision 31)
baserev update by copy to link target
Dirk Mueller's avatar Dirk Mueller (dirkmueller) committed (revision 30)
- skip python 36 build
buildservice-autocommit accepted request 870700 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 29)
baserev update by copy to link target
Matej Cepl's avatar Matej Cepl (mcepl) accepted request 869540 from Jan Engelhardt's avatar Jan Engelhardt (jengelh) (revision 28)
- Use of "+=" in %check warrants bash as buildshell.
buildservice-autocommit accepted request 869528 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 27)
baserev update by copy to link target
Matej Cepl's avatar Matej Cepl (mcepl) accepted request 869041 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 26)
- Skip the import without warning test gh#dask/fastparquet#558
- Apply the Cepl-Strangelove-Parameter to pytest
  (--import-mode append)
buildservice-autocommit accepted request 859938 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 25)
baserev update by copy to link target
Matej Cepl's avatar Matej Cepl (mcepl) accepted request 859934 from Benjamin Greiner's avatar Benjamin Greiner (bnavigator) (revision 24)
- update to version 0.5
  * no changelog
- update test suite setup -- install the .test module
buildservice-autocommit accepted request 821679 from Todd R's avatar Todd R (TheBlackCat) (revision 23)
baserev update by copy to link target
Todd R's avatar Todd R (TheBlackCat) accepted request 821674 from Arun Persaud's avatar Arun Persaud (apersaud) (revision 22)
update to latest version
buildservice-autocommit accepted request 819826 from Tomáš Chvátal's avatar Tomáš Chvátal (scarabeus_iv) (revision 21)
baserev update by copy to link target
Displaying revisions 21 - 40 of 60
openSUSE Build Service is sponsored by