File apache-arrow.changes of Package apache-arrow
-------------------------------------------------------------------
Fri Jun 13 18:22:55 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 20.0.0
## Bug Fixes
* GH-30302 - [C++][Parquet] Preserve the bitwidth of integer
dictionary indices on round-trip to Parquet (#45685)
* GH-31992 - [C++][Parquet] Handling the special case when
DataPageV2 values buffer is empty (#45252)
* GH-37630 - [C++][Python][Dataset] Allow disabling fragment
metadata caching (#45330)
* GH-39023 - [C++][CMake] Add missing launcher path conversion
for ExternalPackage (#45349)
* GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor
(#44990)
* GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter
in parquet::arrow::FileWriter::NewRowGroup() (#45088)
* GH-45129 - [Python][C++] Fix usage of deprecated C++
functionality on pyarrow (#45189)
* GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114)
* GH-45185 - [C++][Parquet] Raise an error for invalid repetition
levels when delimiting records (#45186)
* GH-45254 - [C++][Acero] Fix the row offset truncation in row
table merge (#45255)
* GH-45266 - [C++][Acero] Fix the running tasks count of
Scheduler when get error tasks in multi-threads (#45268)
* GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds
(#45271)
* GH-45301 - [C++] Change PrimitiveArray ctor to protected
(#45444)
* GH-45334 - [C++][Acero] Fix swiss join overflow issues in row
offset calculation for fixed length and null masks (#45336)
* GH-45362 - [C++] Fix identity cast for time and list scalar
(#45370)
* GH-45371 - [C++] Fix data race in SimpleRecordBatch::columns
(#45372)
* GH-45393 - [C++][Compute] Fix wrong decoding for 32-bit column
in row table (#45473)
* GH-45396 - [C++] Use Boost with ARROW_FUZZING (#45397)
* GH-45423 - [C++] Don’t require Boost library with
ARROW_TESTING=ON/ARROW_BUILD_SHARED=OFF (#45424)
* GH-45497 - [C++][CSV] Avoid buffer overflow when a line has too
many columns (#45498)
* GH-45510 - [CI][C++] Fix LLVM APT repository preparation on
Debian (#45511)
* GH-45512 - [C++] Clean up undefined symbols in libarrow without
IPC (#45513)
* GH-45514 - [CI][C++][Docs] Set CUDAToolkit_ROOT explicitly in
debian-docs (#45520)
* GH-45537 - [CI][C++] Add missing includes (iwyu) to
file_skyhook.cc (#45538)
* GH-45541 - [Doc][C++] Render ASCII art as-is (#45542)
* GH-45545 - [C++][Parquet] Add missing includes (#45554)
* GH-45564 - [C++][Acero] Add size validation for names and
expressions vectors in ProjectNode (#45565)
* GH-45568 - [C++][Parquet][CMake] Enable zlib automatically when
Thrift is needed (#45569)
* GH-45578 - [C++] Use max not min in
MakeStatisticsArrayMaxApproximate test (#45579)
* GH-45587 - [C++][Docs] Fix the statistics schema link in
arrow::RecordBatch::MakeStatisticsArray()’s docstring (#45588)
* GH-45614 - [C++] Use Boost’s CMake packages instead of
FindBoost.cmake in CMake (#45623)
* GH-45628 - [C++] Ensure specifying Boost include directory for
bundled Thrift (#45637)
* GH-45669 - [C++][Parquet] Add missing
ParquetFileReader::GetReadRanges() definition (#45684)
* GH-45693 - [C++][Gandiva] Fix aes_encrypt/decrypt algorithm
selection (#45695)
* GH-45700 - [C++][Compute] Added nullptr check in Equals method
to handle null impl_ pointers (#45701)
* GH-45733 - [C++][Python] Add biased/unbiased toggle to skew and
kurtosis functions (#45762)
* GH-45739 - [C++][Python] Fix crash when calling
hash_pivot_wider without options (#45740)
* GH-45788 - [C++][Acero] Fix data race in aggregate node
(#45789)
* GH-45868 - [C++][CI] Fix test for ambiguous initialization on
C++ 20 (#45871)
* GH-45905 - [C++][Acero] Enlarge the timeout in ConcurrentQueue
test to reduce sporadical failures (#45923)
* GH-45930 - [C++] Don’t use ICU C++ API in Azure SDK C++
(#45952)
* GH-45939 - [C++][Benchmarking] Fix compilation failures
(#45942)
* GH-45959 - [C++][CMake] Fix Protobuf dependency in
Arrow::arrow_static (#45960)
* GH-45980 - [C++] Bump Bundled Snappy version to 1.2.2 (#45981)
* GH-45999 - [C++][Gandiva] Fix crashes on LLVM 20.1.1 (#46000)
* GH-46022 - [C++] Fix build error with g++ 7.5.0 (#46028)
* GH-46067 - [CI][C++] Remove system Flatbuffers from macOS
(#46105)
* GH-46077 - [CI][C++] Disable -Werror on macos-13 (#46106)
* GH-46111 - [C++][CI] Fix boost 1.88 on MinGW (#46113)
* GH-46123 - [C++] Undefined behavior in compare_internal.cc and
light_array_internal.cc (#46124)
* GH-46134 - [CI][C++] Explicit conversion of possible
absl::string_view on protobuf methods to std::string (#46136)
* GH-46159 - [CI][C++] Stop using possibly missing
boost/process/v2.hpp on boost 1.88 and use individual includes
(#46160)
* GH-46195 - [Release][C++] verify-rc-source-cpp-macos-amd64
failed to build googlemock
## New Features and Improvements
* GH-26648 - [C++] Optimize union equality comparison (#45384)
* GH-33592 - [C++] support casting nullable fields to
non-nullable if there are no null values (#43782)
* GH-41764 - [Parquet][C++] Support future logical types in the
Parquet reader (#41765)
* GH-41816 - [C++] Add Minimal Meson Build of libarrow (#45441)
* GH-43296 - [C++][FlightRPC] Remove Flight UCX transport
(#43297)
* GH-43573 - [C++] Copy bitmap when casting from string-view to
offset string and binary types (#44822)
* GH-44042 - [C++][Parquet] Limit num-of row-groups when building
parquet for encrypted file (# 44043)
* GH-44393 - [C++][Compute] Vector selection functions
inverse_permutation and scatter (#44394)
* GH-44615 - [C++][Compute] Add extract_regex_span function
(#45577)
* GH-44629 - [C++][Acero] Use implicit_ordering for asof_join
rather than require_sequenced_output (#44616)
* GH-44950 - [C++] Bump minimum CMake version to 3.25 (#44989)
* GH-45045 - [C++][Parquet] Add a benchmark for
size_statistics_level (#45085)
* GH-45190 - [C++][Compute] Add rank_quantile function (#45259)
* GH-45196 - [C++][Acero] Small refinement to hash join (#45197)
* GH-45206 - [C++][CMake] Add sanitizer presets (#45207)
* GH-45209 - [C++][CMake] Fix the issue that allocator not
disabled for sanitizer cmake presets (#45210)
* GH-45215 - [C++][Acero] Export SequencingQueue and
SerialSequencingQueue (#45221)
* GH-45216 - [C++][Compute] Refactor Rank implementation (#45217)
* GH-45219 - [C++][Examples] Update examples to disable mimalloc
(#45220)
* GH-45225 - [C++] Upgrade ORC to 2.1.0 (#45226)
* GH-45227 - [C++][Parquet] Enable Size Stats and Page Index by
default (#45249)
* GH-45269 - [C++][Compute] Add “pivot_wider” and
“hash_pivot_wider” functions (#45562)
* GH-45279 - [C++][Compute] Move all Grouper tests to
grouper_test.cc (#45280)
* GH-45344 - [C++][Testing] Generic StepGenerator (#45345)
* GH-45358 - [C++][Python] Add MemoryPool method to print
statistics (#45359)
* GH-45361 - [CI][C++] Curate ci/vcpkg/vcpkg.json (#45081)
* GH-45366 - [C++][Parquet] Set is_compressed to false when data
page v2 is not compressed (#45367)
* GH-45416 - [CI][C++][Homebrew] Backport the latest formula
changes (#45460)
* GH-45478 - [CI][C++] Drop support for Ubuntu 20.04 (#45519)
* GH-45506 - [C++][Acero] More overflow-safe Swiss table (#45515)
* GH-45551 - [C++][Acero] Release temp states of Swiss join
building hash table to reduce memory consumption (#45552)
* GH-45563 - [C++][Compute] Split up hash_aggregate.cc (#45725)
* GH-45566 - [C++][Parquet][CMake] Remove a workaround for
Windows in FindThriftAlt.cmake (#45567)
* GH-45572 - [C++][Compute] Add rank_normal function (#45573)
* GH-45584 - [C++][Thirdparty] Bump zstd to v1.5.7 (#45585)
* GH-45589 - [C++] Enable singular test in Meson configuration
(#45596)
* GH-45591 - [C++][Acero] Refine hash join benchmark and remove
openmp from the project (#45593)
* GH-45605 - [R][C++] Fix identifier … preceded by whitespace
warnings (#45606)
* GH-45611 - [C++][Acero] Improve Swiss join build performance by
partitioning batches ahead to reduce contention (#45612)
* GH-45620 - [CI][C++] Use Visual Studio 2022 not 2019 (#45621)
* GH-45652 - [C++][Acero] Unify ConcurrentQueue and
BackpressureConcurrentQueue API (#45421)
* GH-45676 - [C++][Python][Compute] Add skew and kurtosis
functions (#45677)
* GH-45680 - [C++][Python] Remove deprecated functions in 20.0
* GH-45689 - [C++][Thirdparty] Bump Apache ORC to 2.1.1 (#45600)
* GH-45694 - [C++] Bump vendored flatbuffers to 24.3.6 (#45687)
* GH-45696 - [C++][Gandiva] Accept LLVM 20.1 (#45697)
* GH-45732 - [C++][Compute] Accept more pivot key types (#45945)
* GH-45744 - [C++] Remove deprecated GetNextSegment (#45745)
* GH-45746 - [C++] Remove deprecated functions in 20.0 (C++
subset) (#45748)
* GH-45755 - [C++][Python][Compute] Add winsorize function
(#45763)
* GH-45771 - [C++] Add tests to top level Meson configuration
(#45773)
* GH-45772 - [C++] Export Arrow as dependency from Meson
configuration (#45774)
* GH-45775 - [C++] Use dict.get() in Meson configuration (#45776)
* GH-45779 - [C++] Add testing directory to Meson configuration
(#45780)
* GH-45784 - [C++] Unpin LLVM and OpenSSL in Brewfile (#45785)
* GH-45792 - [C++] Add benchmarks to Meson configuration (#45793)
* GH-45816 - [C++] Make VisitType() fallback branch unreachable
(#45815)
* GH-45820 - [C++] Add optional out_offset for Buffer-returning
CopyBitmap function (#45852)
* GH-45821 - [C++][Compute] Grouper improvements (#45822)
* GH-45825 - [C++] Add c directory to Meson configuration
(#45826)
* GH-45827 - [C++] Add io directory to Meson configuration
(#45828)
* GH-45831 - [C++] Add CSV directory to Meson configuration
(#45832)
* GH-45848 - [C++][Python][R] Remove deprecated PARQUET_2_0
(#45849)
* GH-45877 - [C++][Acero] Cleanup 64-bit temp states of Swiss
join by using 32-bit (#45878)
* GH-45917 - [C++][Acero] Add flush taskgroup to enable
parallelization (#45918)
* GH-45922 - [C++][Flight] Remove deprecated Authenticate and
StartCall (#45932)
* GH-45953 - [C++] Use lock to fix atomic bug in
ReadaheadGenerator (#45954)
* GH-45986 - [C++] Update bundled GoogleTest (#45996)
* GH-45987 - [C++] Set CMAKE_POLICY_VERSION_MINIMUM=3.5 for
bundled dependencies (#45997)
-------------------------------------------------------------------
Mon Apr 21 14:34:37 UTC 2025 - Friedrich Haubensak <hsk17@mail.de>
- to fix cmake-4 build problems, upgrade bundled mimalloc from
2.0.6 to 2.0.9 and add apache-arrow-19.0.1-mimalloc-version.patch;
mimalloc changes according to readme.md:
* 2.0.9:
- Supports building with asan and improved [Valgrind] support.
- Support abitrary large alignments, in particular for
`std::pmr` pools.
- Added C++ STL allocators attached to a specific heap.
- Heap walks now visit all object (including huge objects).
- Support Windows nano server containers.
- Various small bug fixes.
* 2.0.7:
- Initial support for [Valgrind] for leak testing and heap
block overflow detection.
- Initial support for attaching heaps to a speficic memory area.
- Fix `realloc` behavior for zero size blocks,
- Remove restriction to integral multiple of the alignment in
`alloc_align`.
- Improved aligned allocation performance.
- Reduced contention with many threads on few processors.
- VS2022 support.
- Support `pkg-config`.
-------------------------------------------------------------------
Fri Mar 28 08:47:10 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Re-enable flight, grpc has been fixed boo#1237422
-------------------------------------------------------------------
Thu Mar 13 18:57:51 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Add missing dependencies for libboost_process explicitly
boo#1239599
-------------------------------------------------------------------
Wed Feb 19 15:58:28 UTC 2025 - Ben Greiner <code@bnavigator.de>
- disable flight because of gh#grpc/grpc#37968 boo#1237422
-------------------------------------------------------------------
Mon Feb 17 19:17:26 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 19.0.1
## Bug Fixes
* [C++] Fix overflow issues for large build side in swiss join
(#45108)
* [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181)
* [C++][Parquet] Omit level histogram when max level is 0
(#45285)
* [Parquet][C++] Fix statistics load logic for no row group and
multiple row groups (#45350)
* [C++] Disable Flight test (#45232)
## Improvements
* [C++][Parquet] Improve performance of generating size
statistics (#45202)
* [C++][S3] Workaround compatibility issue between AWS SDK and
MinIO (#45310)
- Release 19.0.0
## New Features and Improvements
* [CI][C++] Add a nightly job to test offline build (#44721)
* [C++] GcsFileSystem::Make should return Result (#44503)
* [C++][Parquet] Implement SizeStatistics (#40594)
* [C++] Reduce string inlining in Substrait serde (#45174)
* [C++][Acero] Enhance asof_join to work in multi-threaded
execution by sequencing input (#44083)
* [C++] Support the AWS S3 SSE-C encryption (#43601)
* [C++][Parquet] Parquet Metadata Printer supports print
sort-columns (#43599)
* [C++] Add C++ implementation of Async C Data Interface (#44495)
* [C++][Acero] Support AVX2 swiss join decoding (#43832)
* [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621)
* [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252)
* [C++] Improve merge step in chunked sorting (#44217)
* [C++][Parquet] Tools: Debug Print for Json should be valid JSON
(#44532)
* [C++][FS][Azure] Implement SAS token authentication (#45021)
* [C++] Don’t export template class (#44365)
* [C++][Docs] Update the URL to C++ Development in README.md
(#44427)
* [C++] Added rvalue-reference-qualified overload for
arrow::Result::status() returning value instead of reference
(#44477)
* [C++] StatusConstant- cheaply copied const Status (#44493)
* [C++][Compute] Allow casting struct to bigger nullable struct
(#44587)
* [C++] Use array type to compute min/max statistics Arrow type
(#45094)
* [C++] Minor: ArrayData ctor can assign null_count directly
(#44582)
* [C++] Add const and & to arrow::Array::statistics() return type
(#44592)
* [Python][C++] Add version suffix to libarrow_python* libraries
(#44702)
* [C++] NumericBuilder::AppendValues append vector prevent from
ub (#44794)
* [C++][Parquet] Remove obsolete parquet_constants generated
files from old thrift (#44772)
* [Docs][C++] Add arrow::ArrayStatistics to API doc (#44764)
* [C++] Upgrade ORC to 2.0.3 (#44745)
* [C++][Parquet] Add arrow::Result version of
parquet::arrow::OpenFile() (#44785)
* [C++] Fix a couple of maybe-uninitialized warnings (#44789)
* [C++] Use arrow::util::span on
arrow::util::bitmap_builders_utilities instead of std::vector
(#44796)
* [C++][Parquet] Add arrow::Result version of
parquet::arrow::FileReader::GetRecordBatchReader() (#44809)
* [C++] minor optimize cancel and thread pool (#44812)
* [C++][Parquet] Add an example to dump statistics read as
arrow::ArrayStatistics (#44816)
* [C++] Add the Expm1(exponent) scalar arithmetic function
(#44904)
* [C++] Add WithinUlp testing functions (#44906)
* [C++][Python] Add Hyperbolic Trig functions (#44630)
* [C++] Enable mimalloc by default, disable jemalloc by default
and more (#44951)
* [C++] Add support for building system OpenTelemetry (#44983)
* [C++][CMake] Use librt only for Linux (#44984)
* [C++] Support for fixed-size list in conversion of range tuple
(#45008)
* [C++][Parquet] Allow configuring the default footer read size
(#45016)
* [C++] Remove result_internal.h (#45066)
* [FlightRPC][C++] Deprecate InitializeFlightUcx before removing
UCX (#45080)
* [C++][Parquet] Add GetReadRanges function to FileReader
(#45093)
* [C++] Apply a cstdint patch to bundled Thrift for GCC 15
(#45097)
* [C++] Remove useless “hash table ready” states in swiss join
(#45136)
* [CI][C++] Add a GCC 15 job (#45138)
* [C++] Ensure using cpp/cmake_modules/*.cmake (#45143)
* [CI][C++] Upgrade Alpine Linux to 3.18 from 3.16 (#45168)
## Bug Fixes
* [C++] Fix CopyFiles when destination is a FileSystem with
background_writes (#44897)
* [C++][Python] Fix ORC crash when file contains unknown timezone
(#45051)
* [C++] Replace std::aligned_storage that is deprecated in C++23
(#45019)
* [C++][Parquet] Refuse writing non-nullable column that contains
nulls (#44921)
* [C++] Initialize offset vector head as 0 after memory allocated
in grouper.cc (#43123)
* [C++] io::BufferedInput: Fix invalid state after SetBufferSize
(#44387)
* [C++][Parquet] Fix schema conversion from two-level encoding
nested list (#43995)
* [C++] Use “lib” for generating bundled dependencies even with
“clang-cl” (#44391)
* [C++] Fix unaligned load/store implementation for clang-18
(#44468)
* [C++] Use CMAKE_LIBTOOL on macOS (#44385)
* [CI][C++] Use setup-python on hosted runner (#44411)
* [C++] Update vendored date to 3.0.3 (#44482)
* [GLib][C++] Meson searches libraries with specific versions.
(#44475)
* [C++][Acero] Fix crash when thread in asof_join is not running
(#44584)
* [C++] NumericArray should not use ctor from parent directly
(#44542)
* [C++] FunctionOptions::{Serialize,Deserialize}() return an
error without ARROW_IPC (#45171)
* [C++][Acero] Enhance partition sort example (#44678)
* [C++][Python] Fix Flight Timestamp precision, revert workaround
from #43537 (#44681)
* [C++] Add S3 option to ignore SIGPIPE signals (#44735)
* [C++] Keep field metadata for keys and values when importing a
map type via the C data interface (#44715)
* [C++][CI] Fix arrow-c-bridge-test timeout with threading
disabled (#44737)
* [C++] Use lowercased windows.h to enable cross-platform builds
(#44755)
* [C++] Fix Float16.To{Little,Big}Endian on big endian machines
(#44768)
* [C++][Parquet] Fix read/write of metadata length footer on
big-endian systems (#44787)
* [C++][CI] Migrate to arrow::Result based
parquet::arrow::OpenFile() API in example tutorials (#44807)
* [C++] Fix thread-unsafe access in ConcurrentQueue::UnsyncFront
(#44849)
* [C++] Fix compilation error on GCC 8 (#44899)
* [C++][CI] Silence protobuf-generated deprecations (#44955)
* [C++] Use recommended downloads URLs for ORC and Thrift
(#44977)
* [C++] Include path in the documentation is wrong (#45031)
* [C++] Remove Parquet requirement from Arrow Acero and from
Arrow Dataset when not necessary (#45035)
* [C++] Add support for Boost 1.87.0 (#45057)
* [C++][CI] Fix test-build-cpp-fuzz failures (#45060)
* [C++][Parquet] Fix generation of repetition levels for
encryption test data (#45074)
* [C++] Avoid static const variable in the status.h (#45100)
* [C++][Parquet] Fix Null-dereference READ in
parquet::arrow::ListToSchemaField (#45152)
* [C++][Release] Add llvm-dev back to setup-ubuntu.sh (#45184)
* [C++][Parquet] test-conda-cpp-valgrind fails on
arrow-dataset-file-parquet-encryption-test
- Release 18.1.0
## Bug Fixes
* [C++] Add support for overwriting grpc_cpp_plugin path for
cross-compiling (#44507)
* [Docs][C++] Fix documentation directive for ChunkLocation
(#44505)
* [C++] Add find module for abseil that handles missing version
(#44613)
* [C++][Dev] Update bundled Thrift, update mirrors to use CDN
(#44685)
## New Features and Improvements
* [C++] Move ChunkResolver to the public API (#44357)
- Release 18.0.0
## Bug Fixes
* [C++] data corruption when using `group_by` and `aggregate` on
large data sets
* [C++] Use PutObject request for S3 in OutputStream when only
uploading small data (#41564)
* [C++] Clean up implicit fallthrough warnings (#41892)
* [C++] Fix avx2 gather rows more than 2^31 issue in
CompareColumnsToRows (#43065)
* [C++][ArrowFlight] Crash due to UCS thread mode
* [C++] Add workaround for missing Boost dependency of Thrift
(#43328)
* [C++] Skip not Emscripten ready tests in CSV tests (#43724)
* [C++] Add date{32,64} to date{32,64} cast (#43192)
* [C++][Compute] Detect and explicit error for offset overflow in
row table (#43226)
* [C++] Fix decimal benchmarks to avoid out-of-bounds accesses
(#43212)
* [C++] Resolve Abseil like any other dependency in the build
system (#43219)
* [C++][Parquet] Refactor parquet::encryption::AesEncryptor to
use unique_ptr (#43222)
* [C++] Fix Abseil compile error on GCC 13 (#43157)
* [C++] Add missing serde methods to Location (#43332)
* [C++][Parquet] min-max Statistics doesn’t work well when one of
min-max is truncated (#43383)
* [C++][Parquet] parquet-dump-footer: Remove redundant link and
fix –debug processing (#43375)
* [C++] Ensure using bundled GoogleTest when we use bundled
GoogleTest (#43465)
* [C++][Compute] Fix invalid memory access when resizing
var-length buffer in row table (#43415)
* [C++][FlightRPC] Fix Flight UCX build issues (#43430)
* [C++] FIlter out zero length buffers on gRPC transport (#43448)
* [C++][Gandiva] Always use gdv_function_stubs.h in
context_helper.cc (#43464)
* [C++] Add support for the official LZ4 CMake package (#43468)
* [C++] Register the new Opaque extension type by default
(#43788)
* [C++][Acero] Fix typos in join benchmark (#43871)
* [C++][CI] Catch potential integer overflow in PoolBuffer
(#43886)
* [C++] Leak S3 structures if finalization happens too late
(#44090)
* [C++][Parquet] Fix reported metrics in
parquet-arrow-reader-writer-benchmark (#44082)
* [C++] Don’t use Boost.Process with Emscripten (#44097)
* [C++] Add home made _mm256_set_m128i for compilers who are
missing it (#44116)
* [C++] JsonExtensionType equality check ignores storage type
(#44215)
* [CI][C++][AppVeyor] Use conda instead of Mamba (#44235)
* [C++][FS][Azure] Fix edgecase where GetFileInfo incorrectly
returns NotFound on flat namespace and Azurite (#44302)
* [C++][FS][Azure] Catch missing exceptions on HNS support check
(#44274)
* [C++][FS][Azure] Fix minor hierarchical namespace bugs (#44307)
* [C++] Fix S3 error handling in ObjectOutputStream (#44335)
* [C++] Disable jemalloc by default on ARM (#44380)
## New Features and Improvements
* [C++][Python] Native support for UUID (#37298)
* [C++][Python] Bool8 Extension Type Implementation (#43488)
* [C++][Parquet] Add JSON canonical extension type (#13901)
* [C++][Compute] Replace explicit checking with DCHECK for
invariants in row segmenter (#44236)
* [C++][CI] Improve IPC fuzzing seed corpus (#43621)
* [Documentation][C++] Explicitly note that compute is optional
(#43629)
* [C++] Azure file system write buffering & async writes (#43096)
* [C++][Parquet] Separate encoders and decoder (#43972)
* [C++][Python][Parquet] Support reading/writing key-value
metadata from/to ColumnChunkMetaData (#41580)
* [Docs][C++] Is arrow::dataset namespace still experimental?
* [C++] Add arrow::ArrayStatistics (#43273)
* [CI][C++] Update Minio version (#44225)
* [C++][Parquet] Add binary that extracts a footer from a parquet
file (#42174)
* [C++] Support casting to and from utf8_view/binary_view
(#43302)
* [C++] Update bundled vendor/datetime to support for building
with libc++ and C++20 (#43094)
* [C++] Implement PathFromUri support for Azure file system
(#43098)
* [C++][Compute] Fix the unnecessary allocation of extra bytes
when encoding row table (#43125)
* [C++][Parquet] Replace use of int with int32_t in the internal
Parquet encryption APIs (#43413)
* [C++][Parquet] Refactor Encryptor API to use arrow::util::span
instead of raw pointers (#43195)
* [C++][Parquet] Default initialize some parquet metadata
variables (#43144)
* [C++] Fix CMake link order for AWS SDK (#43230)
* [C++] Suggest a cast when Concatenate fails due to offsets
overflow (#43190)
* [C++] Support basic is_in predicate simplification (#43761)
* [C++][AzureFS] Ignore password field in URI (#44220)
* [C++] Add lint for DCHECK in public headers (#43248)
* [C++][FlightRPC] Reduce repetition in flight/types.cc in serde
functions (#43237)
* [C++][Parquet] remove useless template parameter of
DeltaLengthByteArrayEncoder (#43250)
* [C++] Always prefer mimalloc to jemalloc (#40875)
* [C++][Flight] Use a Base CRTP type for the types used in RPC
calls (#43255)
* [C++] Expand the ‘take’ function tests to cover more
chunked-array cases (#43292)
* [C++][Parquet] Enhance the comment for ColumnReader/Decoder
(#44003)
* [C++] Order classes in flight/types.h according to Flight.proto
(#43330)
* [C++][Parquet] Deprecate ColumnChunk::file_offset field and no
longer write Metadata at end of Chunk (#43428)
* [C++] Add benchmark for binary view builder (#43445)
* [C++][Python] Add Opaque canonical extension type (#43458)
* [Java][C++] Support more CsvFragmentScanOptions in JNI call
(#43482)
* [C++] Thirdparty: Bump lz4 to 1.10.0 (#43493)
* [C++][Compute] Widen the row offset of the row table to 64-bit
(#43389)
* [C++] Use ViewOrCopyTo instead of CopyTo when pretty printing
non-CPU data (#43508)
* [FlightRPC][C++] Reduce the number of references to
protobuf::Any (#43544)
* [C++] Simplify arrow::ArrayStatistics::ValueType (#43581)
* [C++][GLib] Don’t install arrow-cuda.pc/arrow-cuda-glib.pc on
Windows (#43593)
* [C++] Remove redundant default constructor/deconstructor in
arrow::ArrayStatistics (#43579)
* [C++] Remove std::optional from
arrow::ArrayStatistics::is_{min,max}_exact (#43595)
* [C++][FlightRPC] Move the FlightTestServer to its own .cc and
.h files (#43678)
* [C++] Compute: fix register kernel SimdLevel for
AddMinMax512AggKernels (#43704)
* [C++] Prevent Snappy from disabling RTTI when bundled (#43706)
* [C++][FS][Azure] Use the latest Azurite and update the bundled
Azure SDK for C++ to azure-identity_1.9.0 (#43723)
* [C++][Parquet][CI] Parquet: Introducing more bad_data for
testing (#43708)
* [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly
when !HasNullCount() (#43726)
* [C++] Clarify the way SIMD-enabled agg kernels come from the
same code in different compilation units (#43720)
* [C++] Fix Scalar boolean handling in row encoder (#43734)
* [C++] Add support for Boost 1.86 (#43766)
* [C++] Compute: More comment in RowEncoder (#43763)
* [C++] Acero: Minor code enhancement for Join (#43760)
* [C++] Fix the case when boolean_{any all} meets constant input
with length in Acero (#43799)
* [C++] Add chunked Take benchmarks with a small selection factor
(#43772)
* [C++] Indent preprocessor directives (#43798)
* [C++] Attach arrow::ArrayStatistics to arrow::ArrayData
(#43801)
* [C++] Enable filesystem automatically when one of
ARROW_{AZURE,GCS,HDFS,S3}=ON is specified (#43806)
* [C++] Expose the set of device types where a ChunkedArray is
allocated (#43853)
* [C++] Make ChunkResolver::ResolveMany output a list of
ChunkLocations (#43928)
* [C++][Parquet] Add support for arrow::ArrayStatistics: non
zero-copy int based types (#43945)
* [C++][Parquet] Guard against use of cleared decryptor/encryptor
(#43947)
* [C++] Add tests based on random data and benchmarks to
ChunkResolver::ResolveMany (#43954)
* [C++] Enhance error message for URI parsing (#43938)
* [CI][C++][Dev] Add cpplint to pre-commit (#43982)
* [C++][Parquet] Add support for arrow::ArrayStatistics:
zero-copy types (#43984)
* [C++][Acero] Some code cleanup to Grouper (#43988)
* [C++] Add missing std::move() in array_nested.cc (#43993)
* [C++][Docs] Add missing install command in building docs
(#44000)
* [C++][Parquet] Add support for arrow::ArrayStatistics: boolean
(#44009)
* [C++] IPC: ipc reader/writer code enhancement (#44019)
* [C++][Compute] Reduce the complexity of row segmenter (#44053)
* [C++][Parquet] Add Float16 reading benchmarks (#44073)
* [C++][Parquet] Remove deprecated APIs (#44080)
* [C++][Acero] Add more row segmenter tests (#44166)
* [C++][Parquet] Fix typo in parquet/column_writer.cc (#40856)
* [C++] Avoid repeated ArrayData::offset lookups (#44190)
* [C++][Gandiva] Accept LLVM 19.1 (#44233)
* [C++] Unify simd header includings (#44250)
* [C++][Decimal] Use 0E+1 not 0.E+1 for broader compatibility
(#44275)
* [Packaging][C++] Enable Azure file system for deb/rpm (#44348)
- Drop apache-arrow-pr43766-boost1_86.patch
- Release notes for 18.0.0 and 19.0.0
-------------------------------------------------------------------
Fri Sep 27 05:31:41 UTC 2024 - Guang Yee <gyee@suse.com>
- Set the appropriate C++ complier for the given platform so
it will compile on Leap 15.x.
-------------------------------------------------------------------
Wed Sep 18 06:59:36 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86
* gh#apache/arrow#43766
-------------------------------------------------------------------
Mon Aug 12 17:11:06 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 17.0.0
## Bug Fixes
* [C++] Add option to string ‘center’ kernel to control
left/right alignment on odd number of padding (#41449)
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [C++] Replace null_count with MayHaveNulls in
ListArrayFromArray and MapArray (#41957)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [C++][Parquet] Timestamp conversion from Parquet to Arrow does
not follow compatibility guidelines for convertedType
* [C++] Use LargeStringArray for casting when writing tables to
CSV (#40271)
* [C++][Python] Map child Array constructed from keys and items
shouldn’t have offset (#40871)
* [C++] Fix compile warning with ‘implicitly-defined constructor
does not initialize’ in encoding_benchmark (#41060)
* [C++] Get null_bit_id according to are_cols_in_encoding_order
in NullUpdateColumnToRow_avx2 (#40998)
* [C++] Clean up unused parameter warnings (#41111)
* [C++][Acero] Fix asof join race (#41614)
* [C++] support for single threaded joins (#41125)
* [C++] Fix hashjoin benchmark failed at make utf8’s random
batches (#41195)
* [C++] Check to avoid copying when NullBitmapBuffer is Null
(#41452)
* [C++] Fix crash on invalid Parquet file (#41366)
* [C++][Parquet] More strict Parquet level checking (#41346)
* [C++][Gandiva] Fix gandiva cache size env var (#41330)
* [C++][CMake][Windows] Remove needless .dll suffix from link
libraries (#41341)
* [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
* [C++][maybe_unused] with Arrow macro (#41359)
* [C++][Large] ListView and Map nested types for scalar_if_else’s
kernel functions (#41419)
* [C++][Gandiva] Fix ascii_utf8 function to return same result on
x86 and Arm (#41434)
* [C++] Reuse deduplication logic for direct registration
(#41466)
* [C++] Clean up more redundant move warnings (#41487)
* [C++][Compute] Remove redundant logic for ArrayData as
ExecResults in ExecScalarCaseWhen (#41380)
* [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
* [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
* [C++][Acero] Remove an useless parameter for QueryContext::Init
called in hash_join_benchmark (#41716)
* [C++] Fix the issue that temp vector stack may be under sized
(#41746)
* [C++] Check that extension metadata key is present before
attempting to delete it (#41763)
* [C++] Iterator releases its resource immediately when it reads
all values (#41824)
* [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
* [C++] Fix avx2 gather offset larger than 2GB in
CompareColumnsToRows (#42188)
* [C++][S3] Fix potential deadlock when closing output stream
(#41876)
* [CI][C++] Clear cache for mamba on AppVeyor (#41977)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [C++] Support list-views on list_slice (#42067)
* [C++] Fix an OTel test failure and remove needless logs
(#42122)
* [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol
(#42108)
* [C++] Support list-view typed arrays in array_take and
array_filter (#42117)
* [C++] Fix some potential uninitialized variable warnings
(#42207)
* [C++] Avoid invalid accesses in parquet-encoding-benchmark
(#42141)
* [C++] Use FetchContent for bundled ORC (#43011)
* [C++] Fix GetRecordBatchPayload crashes for device data
(#42199)
* [C++] Use non-stale c-ares download URL (#42250)
* [C++][Parquet] Check for valid ciphertext length to prevent
segfault (#43071)
* [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as
large memory test (#43128)
* [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
## New Features and Improvements
* [C++][Compute] Implement Grouper::Reset (#41352)
* [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
* [C++][FS][Azure] Support azure cli auth (#41976)
* [C++][FS][Azure] Add support for environment credential
(#41715)
* [C++] Optimize Take for fixed-size types including nested
fixed-size lists (#41297)
* [C++][Device] Add Copy/View slice functions to a CPU pointer
(#41477)
* [C++] Add support for OpenTelemetry logging (#39905)
* [C++] Import/Export ArrowDeviceArrayStream (#40807)
* [C++] move LocalFileSystem to the registry (#40356)
* [C++] Make flatbuffers serialization more deterministic
(#40392)
* [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like
function (#40970)
* [C++] Introduce portable compiler assumptions (#41021)
* [C++] Add a grouper benchmark for preventing performance
regression (#41036)
* [C++] Support flatten for combining nested list related types
(#41092)
* [C++] Clean up remaining tasks related to half float casts
(#41084)
* [C++][FS][Azure] Add support for CopyFile with hierarchical
namespace support (#41276)
* [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
* [C++] IO: enhance boundary checking in CompressedInputStream
(#41117)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst
(#41187)
* [C++] Extract the kernel loops used for PrimitiveTakeExec and
generalize to any fixed-width type (#41373)
* [C++][Acero] Use per-node basis temp vector stack to mitigate
overflow (#41335)
* [C++][Parquet] Optimize DelimitRecords by batch execution when
max_rep_level > 1 (#41362)
* [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API
reference (#41411)
* [C++] Use ASAN to poison temp vector stack memory (#41695)
* [C++][S3] Add a new option to check existence before CreateDir
(#41822)
* [C++][Parquet] Fix
DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
* [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
* [C++] Improve fixed_width_test_util.h (#41575)
* [C++] ChunkResolver: Implement ResolveMany and add unit tests
(#41561)
* [C++] fixed_width_internal.h: Simplify docstring and support
bit-sized types (BOOL) (#41597)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [C++][CMake][Windows] Don’t build needless object libraries
(#41658)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [C++][Parquet] Thrift: generate template method to accelerate
reading thrift (#41703)
* [C++][Parquet] Minor: moving EncodedStats by default rather
than copying (#41727)
* [C++][ORC] Ensure setting detected ORC version (#41767)
* [C++][Parquet] Add file metadata read/write benchmark (#41761)
* [C++] Make git-dependent definitions internal (#41781)
* [C++][S3] Remove GetBucketRegion hack for newer AWS SDK
versions (#41798)
* [C++][Parquet] normalize dictionary encoding to use
RLE_DICTIONARY (#41819)
* [C++] IPC: Minor enhance the code of writer (#41900)
* [C++] Fix ExecuteScalar deduce all_scalar with chunked_array
(#41925)
* [C++] Minor enhance code style for FixedShapeTensorType
(#41954)
* [C++] Follow up of adding null_bitmap to MapArray::FromArrays
(#41956)
* [C++] Misc changes making code around list-like types and
list-view types behave the same way (#41971)
* [C++] : kernel.cc: Remove defaults on switch so that compiler
can check full enum coverage for us (#41995)
* [C++][Parquet] ParquetFilePrinter::JSONPrint print length of
FLBA (#41981)
* [C++][CMake] Add preset for Valgrind (#42110)
* [C++] Move TakeXXX free functions into TakeMetaFunction and
make them private (#42127)
* [C++][FS][Azure] Validate
AzureOptions::{blob,dfs}_storage_scheme (#42135)
* [C++] list_parent_indices: Add support for list-view types
(#42236)
* [C++] Reduce the recursion of many-join test (#43042)
* [C++] Limit buffer size in BufferedInputStream::SetBufferSize
with raw_read_bound (#43064)
- Require cmake lz4 for 1.10
-------------------------------------------------------------------
Sun Apr 21 16:35:21 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 16.0.0
## Bug Fixes
* [C++][ORC] Catch all ORC exceptions to avoid crash (#40697)
* [C++][S3] Handle conventional content-type for directories
(#40147)
* [C++] Strengthen handling of duplicate slashes in S3, GCS
(#40371)
* [C++] Avoid hash_mean overflow (#39349)
* [C++] Fix spelling (array) (#38963)
* [C++][Parquet] Fix crash in Modular Encryption (#39623)
* [C++][Dataset] Fix failures in dataset-scanner-benchmark
(#39794)
* [C++][Device] Fix Importing nested and string types for
DeviceArray (#39770)
* [C++] Use correct (non-CPU) address of buffer in
ExportDeviceArray (#39783)
* [C++] Improve error message for "chunker out of sync" condition
(#39892)
* [C++] Use make -j1 to install bundled bzip2 (#39956)
* [C++] DatasetWriter avoid creating zero-sized batch when
max_rows_per_file enabled (#39995)
* [C++][CI] Disable debug memory pool for ASAN and Valgrind
(#39975)
* [C++][Gandiva] Make Gandiva's default cache size to be 5000 for
object code cache (#40041)
* [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash
issues on hierarchical namespace accounts (#40054)
* [C++][FS][Azure] Validate containers in
AzureFileSystem::Impl::MovePaths() (#40086)
* [C++] Decimal types with different precisions and scales bind
failed in resolve type when call arithmetic function (#40223)
* [C++][Docs] Correct the console emitter link (#40146)
* [C++][Python] Fix test_gdb failures on 32-bit (#40293)
* [Python][C++] Fix large file handling on 32-bit Python build
(#40176)
* [C++] Support glog 0.7 build (#40230)
* [C++] Fix cast function bind failed after add an alias name
through AddAlias (#40200)
* [C++] TakeCC: Concatenate only once and delegate to TakeAA
instead of TakeCA (#40206)
* [C++] Fix an abort on asof_join_benchmark run for lost an arg
(#40234)
* [C++] Fix an simple buffer-overflow case in decimal_benchmark
(#40277)
* [C++] Reduce S3Client initialization time (#40299)
* [C++] Fix a wrong total_bytes to generate StringType's test
data in vector_hash_benchmark (#40307)
* [C++][Gandiva] Add support for compute module's decimal
promotion rules (#40434)
* [C++][Parquet] Add missing config.h include in
key_management_test.cc (#40330)
* [C++][CMake] Add missing glog::glog dependency to arrow_util
(#40332)
* [C++][Gandiva] Add missing OpenSSL dependency to
encrypt_utils_test.cc (#40338)
* [C++] Remove const qualifier from Buffer::mutable_span_as
(#40367)
* [C++] Avoid simplifying expressions which call impure functions
(#40396)
* [C++] Expose protobuf dependency if opentelemetry or ORC are
enabled (#40399)
* [C++][FlightRPC] Add missing expiration_time arguments (#40425)
* [C++] Move key_hash/key_map/light_array related files to
internal for prevent using by users (#40484)
* [C++] Add missing Threads::Threads dependency to arrow_static
(#40433)
* [C++] Fix static build on Windows (#40446)
* [C++] Ensure using bundled FlatBuffers (#40519)
* [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559)
* [C++] Repair FileSystem merge error (#40564)
* [C++] Fix 3.12 Python support (#40322)
* [C++] Move mold linker flags to variables (#40603)
* [C++] Enlarge dest buffer according to dest offset for
CopyBitmap benchmark (#40769)
* [C++][Gandiva] 'ilike' function does not work (#40728)
* [C++] Fix protobuf package name setting for builds with
substrait (#40753)
* [C++][ORC] Fix std::filesystem related link error with ORC
2.0.0 or later (#41023)
* [C++] Fix TSAN link error for module library (#40864)
* [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with
Valgrind (#41163)
* [C++] Fix null count check in BooleanArray.true_count()
(#41070)
* [C++] IO: fixing compiling in gcc 7.5.0 (#41025)
* [C++][Parquet] Bugfixes and more tests in boolean arrow
decoding (#41037)
* [C++] formatting.h: Make sure space is allocated for the 'Z'
when formatting timestamps (#41045)
* [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12
(#41062)
* [C++] Fix: left anti join filter empty rows. (#41122)
* [CI][C++] Don't use CMake 3.29.1 with vcpkg (#41151)
* [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150)
* [CI][R][C++] test-r-linux-valgrind has started failing
* [C++][Python] Sporadic asof_join failures in PyArrow
* [C++] Fix Valgrind error in string-to-float16 conversion
(#41155)
* [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake
(#41177)
* [C++] Fix mistake in integration test. Explicitly cast
std::string to avoid compiler interpreting char* -> bool
(#41202)
## New Features and Improvements
* [C++] Filesystem implementation for Azure Blob Storage
* [C++] Implement cast to/from halffloat (#40067)
* [C++] Add residual filter support to swiss join (#39487)
* [C++] Add support for building with Emscripten (#37821)
* [C++][Python] Add missing methods to RecordBatch (#39506)
* [C++][Java][Flight RPC] Add Session management messages
(#34817)
* [C++] build filesystems as separate modules (#39067)
* [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations
using xsimd (#40335)
* [C++] Add support for service-specific endpoint for S3 using
AWS_ENDPOINT_URL_S3 (#39160)
* [C++][FS][Azure] Implement DeleteFile() (#39840)
* [C++] Implement Azure FileSystem Move() via Azure DataLake
Storage Gen 2 API (#39904)
* [C++] Add ImportChunkedArray and ExportChunkedArray to/from
ArrowArrayStream (#39455)
* [CI][C++][Go] Don't run jobs that use a self-hosted GitHub
Actions Runner on fork (#39903)
* [C++][FS][Azure] Use the generic filesystem tests (#40567)
* [C++][Compute] Add binary_slice kernel for fixed size binary
(#39245)
* [C++] Avoid creating memory manager instance for every buffer
view/copy (#39271)
* [C++][Parquet] Minor: Style enhancement for
parquet::FileMetaData (#39337)
* [C++] IO: Reuse same buffer in CompressedInputStream (#39807)
* [C++] Use more permissable return code for rename (#39481)
* [C++][Parquet] Use std::count in ColumnReader ReadLevels
(#39397)
* [C++] Support cast kernel from large string, (large) binary to
dictionary (#40017)
* [C++] Pass -jN to make in external projects (#39550)
* [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT
(#39570)
* [C++] Ensure top-level benchmarks present informative metrics
(#40091)
* [C++] Ensure CSV and JSON benchmarks present a bytes/s or
items/s metric (#39764)
* [C++] Ensure dataset benchmarks present a bytes/s or items/s
metric (#39766)
* [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or
items/s metric (#40435)
* [C++][Parquet] Benchmark levels decoding (#39705)
* [C++][FS][Azure] Remove StatusFromErrorResponse as it's not
necessary (#39719)
* [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic
(#39748)
* [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types
(#39772)
* [C++] Document and micro-optimize ChunkResolver::Resolve()
(#39817)
* [C++] Allow building cpp/src/arrow/**/*.cc without waiting
bundled libraries (#39824)
* [C++][Parquet] Parquet binary length overflow exception should
contain the length of binary (#39844)
* [C++][Parquet] Minor: avoid creating a new Reader object in
Decoder::SetData (#39847)
* [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878)
* [C++] DataType::ToString support optionally show metadata
(#39888)
* [C++][Gandiva] Accept LLVM 18 (#39934)
* [C++] Use Requires instead of Libs for system RE2 in arrow.pc
(#39932)
* [C++] Small CSV reader refactoring (#39963)
* [C++][Parquet] Expand BYTE_STREAM_SPLIT to support
FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094)
* [C++][FS][Azure] Add support for reading user defined metadata
(#40671)
* [C++][FS][Azure] Add AzureFileSystem support to
FileSystemFromUri() (#40325)
* [C++][FS][Azure] Make attempted reads and writes against
directories fail fast (#40119)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor
(#40064)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add support for different data types (#40359)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add option to cast NULL to NaN (#40803)
* [C++][FS][Azure] Implement DeleteFile() for flat-namespace
storage accounts (#40075)
* [CI][C++] Add a job on ARM64 macOS (#40456)
* [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT
encoding (#40127)
* [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length
(#40132)
* [C++] Make S3 narrative test more flexible (#40144)
* [C++] Remove redundant invocation of BatchesFromTable (#40173)
* [C++][CMake] Use "RapidJSON" CMake target for RapidJSON
(#40210)
* [C++][CMake] Use arrow/util/config.h.cmake instead of
add_definitions() (#40222)
* [C++] Fix: improve the backpressure handling in the dataset
writer (#40722)
* [C++][CMake] Improve description why we need to initialize AWS
C++ SDK in arrow-s3fs-test (#40229)
* [C++] Add support for system glog 0.7 (#40275)
* [C++] Specialize ResolvedChunk::Value on value-specific types
instead of entire class (#40281)
* [C++][Docs] Add documentation of array factories (#40373)
* [C++][Parquet] Allow use of FileDecryptionProperties after the
CryptoFactory is destroyed (#40329)
* [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection
(#40084)
* [C++] Add benchmark for ToTensor conversions (#40358)
* [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372)
* [C++] Add support for mold (#40397)
* [C++] Add support for LLD (#40927)
* [C++] Produce better error message when Move is attempted on
flat-namespace accounts (#40406)
* [C++][ORC] Upgrade ORC to 2.0.0 (#40508)
* [CI][C++] Don't install FlatBuffers (#40541)
* [C++] Ensure pkg-config flags include -ldl for static builds
(#40578)
* [Dev][C++][Python][R] Use pre-commit for clang-format (#40587)
* [C++] Rename Function::is_impure() to is_pure() (#40608)
* [C++] Add missing util/config.h in arrow/io/compressed_test.cc
(#40625)
* [Python][C++] Support conversion of pyarrow.RunEndEncodedArray
to numpy/pandas (#40661)
* [C++] Expand Substrait type support (#40696)
* [C++] Create registry for Devices to map DeviceType to
MemoryManager in C Device Data import (#40699)
* [C++][Parquet] Minor enhancement code of encryption (#40732)
* [C++][Parquet] Simplify PageWriter and ColumnWriter creation
(#40768)
* [C++] Re-order loads and stores in MemoryPoolStats update
(#40647)
* [C++] Revert changes from PR #40857 (#40980)
* [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857)
* [C++] Thirdparty: bump zstd to 1.5.6 (#40837)
* [Docs][C++][Python] Add initial documentation for
RecordBatch::Tensor conversion (#40842)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add support for row-major (#40867)
* [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap)
for PlainBooleanDecoder (#40876)
* [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes
(#40883)
* [C++] Fix unused function build error (#40984)
* [C++][Parquet] RleBooleanDecoder supports DecodeArrow with
nulls (#40995)
* [C++][FS][Azure] Adjust
DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors
against Azure for generic filesystem tests (#41068)
* [C++][Parquet] Avoid allocating buffer object in RecordReader's
SkipRecords (#39818)
- Drop apache-arrow-pr40230-glog-0.7.patch
- Drop apache-arrow-pr40275-glog-0.7-2.patch
- Belated inclusion of submission without changelog by
Shani Hadiyanto <shanipribadi@gmail.com>)
* disable static devel packages by default: The CMake targets
require them for all builds, if not disabled
* Add subpackages for Apache Arrow Flight and Flight SQL
-------------------------------------------------------------------
Sat Mar 23 15:23:23 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 15.0.2
## Bug Fixes
* [C++][Acero] Increase size of Acero TempStack (#40007)
* [C++][Dataset] Add missing Protobuf static link dependency
(#40015)
* [C++] Possible data race when reading metadata of a parquet
file (#40111)
* [C++] Make span SFINAE standards-conforming to enable
compilation with nvcc (#40253)
-------------------------------------------------------------------
Wed Feb 28 08:08:44 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Reenable logging
* Add apache-arrow-pr40230-glog-0.7.patch
* Add apache-arrow-pr40275-glog-0.7-2.patch
* now requires glog devel files to be present for
apache-arrow-devel; ArrowConfig.cmake fails otherwise
* gh#apache/arrow#40181
* gh#apache/arrow#40230
* gh#apache/arrow#40275
-------------------------------------------------------------------
Fri Feb 23 17:35:45 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 15.0.1
## Bug Fixes
* [C++] "iso_calendar" kernel returns incorrect results for array
length > 32 (#39360)
* [C++] Explicit error in ExecBatchBuilder when appending var
length data exceeds offset limit (int32 max) (#39383)
* [C++][Parquet] Pass memory pool to decoders (#39526)
* [C++][Parquet] Validate page sizes before truncating to int32
(#39528)
* [C++] Fix tail-word access cross buffer boundary in
`CompareBinaryColumnToRow` (#39606)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (for fixed size types) (#39585)
* [Release] Update platform tags for macOS wheels to macosx_10_15
(#39657)
* [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711)
* [C++] Fix tail-byte access cross buffer boundary in key hash
avx2 (#39800)
* [C++][Acero] Fix AsOfJoin with differently ordered schemas than
the output (#39804)
* [C++] Expression ExecuteScalarExpression execute empty args
function with a wrong result (#39908)
* [C++] Strip extension metadata when importing a registered
extension (#39866)
* [C#] Restore support for .NET 4.6.2 (#40008)
* [C++] Fix out-of-line data size calculation in
BinaryViewBuilder::AppendArraySlice (#39994)
* [C++][CI][Parquet] Fixing parquet column_writer_test building
(#40175)
## New Features and Improvements
* [C++] PollFlightInfo does not follow rule of 5
* [C++] Fix filter and take kernel for month_day_nano intervals
(#39795)
* [C++] Thirdparty: Bump zlib to 1.3.1 (#39877)
* [C++] Add missing "#include <algorithm>" (#40010)
- Release 15.0.0
## Bug Fixes
* [C++] Bring back case_when tests for union types (#39308)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (#39234)
* [C++][Python] Add a no-op kernel for
dictionary_encode(dictionary) (#38349)
* [C++] Use the latest tagged version of flatbuffers (#38192)
* [C++] Don't use MSVC_VERSION to determin
-fms-compatibility-version (#36595)
* [C++] Optimize hash kernels for Dictionary ChunkedArrays
(#38394)
* [C++][Gandiva] Avoid registering exported functions multiple
times in gandiva (#37752)
* [C++][Acero] Fix race condition caused by straggling input in
the as-of-join node (#37839)
* [C++][Parquet] add more closed file checks for
ParquetFileWriter (#38390)
* [C++][FlightRPC] Add missing app_metadata arguments (#38231)
* [C++][Parquet] Fix Valgrind memory leak in
arrow-dataset-file-parquet-encryption-test (#38306)
* [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL
1.1 (#38379)
* [C++] Re-generate flatbuffers C++ for Skyhook (#38405)
* [C++] Avoid passing null pointer to LZ4 frame decompressor
(#39125)
* [C++] Add missing explicit size_t cast for i386 (#38557)
* [C++] Fix: add TestingEqualOptions for gtest functions.
(#38642)
* [C++][Gandiva] Use arrow io util to replace
std::filesystem::path in gandiva (#38698)
* [C++] Protect against PREALLOCATE preprocessor defined on macOS
(#38760)
* [C++] Check variadic buffer counts in bounds (#38740)
* [C++][FS][Azure] Do nothing for CreateDir("/container", true)
(#38783)
* Fix TestArrowReaderAdHoc.ReadFloat16Files to use new
uncompressed files (#38825)
* [C++] S3FileSystem export s3 sdk config
"use_virtual_addressing" to arrow::fs::S3Options (#38858)
* [C++][Gandiva] Fix Gandiva to_date function's validation for
supress errors parameter (#38987)
* [C++][Parquet] Fix spelling (#38959)
* [C++] Fix spelling (acero) (#38961)
* [C++] Fix spelling (compute) (#38965)
* [C++] Fix spelling (util) (#38967)
* [C++] Fix spelling (dataset) (#38969)
* [C++] Fix spelling (filesystem) (#38972)
* [C++] Fix spelling (#38978)
* [C++] Fix spelling (#38980)
* [C++][Acero] union node output batches should be unordered
(#39046)
* [C++][CI] Fix Valgrind failures (#39127)
* [C++] Remove needless system Protobuf dependency with
-DARROW_HDFS=ON (#39137)
* [C++][Compute] Fix negative duration division (#39158)
* [C++] Add missing data copy in StreamDecoder::Consume(data)
(#39164)
* [C++] Remove compiler warnings with -Wconversion
-Wno-sign-conversion in public headers (#39186)
* [C++][Benchmarking] Remove hardcoded min times (#39307)
* [C++] Don't use "if constexpr" in lambda (#39334)
* [C++] Disable -Werror=attributes for Azure SDK's identity.hpp
(#39448)
* [C++] Fix compile warning (#39389)
* [CI][JS] Force node 20 on JS build on arm64 to fix build issues
(#39499)
* [C++] Disable parallelism for jemalloc external project
(#39522)
* [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering
(#39632)
* [C++] Disable parallelism for all `make`-based externalProjects
when CMake >= 3.28 is used
## New Features and Improvements
* [C++][JSON] Change the max rows to Unlimited(int_32) (#38582)
* [C++][Python] Add "Z" to the end of timestamp print string when
tz defined (#39272)
* [C++][Python] DLPack implementation for Arrow Arrays (producer)
(#38472)
* [C++] Diffing of Run-End Encoded arrays (#35003)
* [C++][Python][R] Allow users to adjust S3 log level by
environment variable (#38267)
* [C++][Format] Implementation of the LIST_VIEW and
LARGE_LIST_VIEW array formats (#35345)
* [C++] Use Cast() instead of CastTo() for Scalar in test
(#39044)
* [C++][Python][Parquet] Implement Float16 logical type (#36073)
* [C++] Add Utf8View and BinaryView to the c ABI (#38443)
* [C++][Parquet] Add api to get RecordReader from RowGroupReader
(#37003)
* [C++] Expose a span converter for Buffer and ArraySpan (#38027)
* [C++] Add A Dictionary Compaction Function For DictionaryArray
(#37418)
* [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970)
* [C++] Implement file reads for Azure filesystem (#38269)
* [C++][Integration] Add C++ Utf8View implementation (#37792)
* [C++][Gandiva] Add external function registry support (#38116)
* [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC
v2/LLJIT (#39098)
* [C++] Feature: support concatenate recordbatches. (#37896)
* [C++] Add support for specifying custom Array opening and
closing delimiters to arrow::PrettyPrintDelimiters (#38187)
* [R] Allow code() to return package name prefix. (#38144)
* [C++][Benchmark] Add non-stream Codec Compression/Decompression
(#38067)
* [C++][Parquet] Change DictEncoder dtor checking to warning log
(#38118)
* [C++][Parquet] Support reading parquet files with multiple gzip
members (#38272)
* [C++][Parquet] check the decompressed page size same as size in
page header (#38327)
* [C++][Azure] Use properties for input stream metadata (#38524)
* [C++][FS][Azure] Implement file writes (#38780)
* [C++] Implement GetFileInfo for a single file in Azure
filesystem (#38505)
* [C++][CMake] Use transitive dependency for system GoogleTest
(#38340)
* [C++][Parquet] Use new encrypted files for page index
encryption test (#38347)
* Add validation logic for offsets and values to
arrow.array.ListArray.fromArrays (#38531)
* [C++][Acero] Create a sorted merge node (#38380)
* [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression
(#38453)
* [C++] Support LogicalNullCount for DictionaryArray (#38681)
* [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529)
* [C++][Gandiva] Support registering external C functions
(#38632)
* [C++] Implement GetFileInfo(selector) for Azure filesystem
(#39009)
* [C++][FS][Azure] Implement CreateDir() (#38708)
* [C++][FS][Azure] Implement DeleteDir() (#38793)
* [C++][FS][Azure] Implement DeleteDirContents() (#38888)
* [C++] : Implement AzureFileSystem::DeleteRootDirContents
(#39151)
* [C++][FS][Azure] Implement CopyFile() (#39058)
* [C++][Go][Parquet] Add tests for reading Float16 files in
parquet-testing (#38753)
* [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773)
* [C++] Implement directory semantics even when the storage
account doesn't support HNS (#39361)
* [C++][Parquet] Update parquet.thrift to sync with 2.10.0
(#38815)
* [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to
ARROW_WITH_ZLIB (#38853)
* [C++][Parquet] Using length to optimize bloom filter read
(#38863)
* [C++][Parquet] Minor: making parquet TypedComparator operation
as const method (#38875)
* [C++] DatasetWriter release rows_in_flight_throttle when
allocate writing failed (#38885)
* [C++][Parquet] Move EstimatedBufferedValueBytes from
TypedColumnWriter to ColumnWriter (#39055)
* [C++] Stop installing internal bpacking_simd* headers (#38908)
* [C++][Gandiva] Refactor function holder to return arrow Result
(#38873)
* [C++] Use Cast() instead of CastTo() for Dictionary Scalar in
test (#39362)
* [C++] Use Cast() instead of CastTo() for Timestamp Scalar in
test (#39060)
* [C++] Use Cast() instead of CastTo() for List Scalar in test
(#39353)
* [C++][Parquet] Support row group filtering for nested paths for
struct fields (#39065)
* [C++] Refactor the Azure FS tests and filesystem class
instantiation (#39207)
* [C++][Parquet] Optimize FLBA record reader (#39124)
* Create module info compiler plugin (#39135)
* [C++] : Try to make Buffer::device_type_ non-optional (#39150)
* [C++][Parquet] Remove deprecated AppendRowGroup(int64_t
num_rows) (#39209)
* [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized
RowGroup (#39211)
* [C++] Support binary to fixed_size_binary cast (#39236)
* [C++][Azure][FS] Add default credential auth configuration
(#39263)
* [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+
(#39269)
* [C++][FS] : Remove the AzureBackend enum and add more flexible
connection options (#39293)
* [C++][FS] : Inform caller of container not-existing when
checking for HNS support (#39298)
* [C++][FS][Azure] Add workload identity auth configuration
(#39319)
* [C++][FS][Azure] Add managed identity auth configuration
(#39321)
* [C++] Forward arguments to ExceptionToStatus all the way to
Status::FromArgs (#39323)
* [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure
test (#39379)
* [C++] Add ForceCachedHierarchicalNamespaceSupport to help with
testing (#39340)
* [C++][FS][Azure] Add client secret auth configuration (#39346)
* [C++] Reduce function.h includes (#39312)
* [C++] Use Cast() instead of CastTo() for Parquet (#39364)
* [C++][Parquet] Vectorize decode plain on FLBA (#39414)
* [C++][Parquet] Style: Using arrow::Buffer data_as api rather
than reinterpret_cast (#39420)
* [C++][ORC] Upgrade ORC to 1.9.2 (#39431)
* [C++] Use default Azure credentials implicitly and support
anonymous credentials explicitly (#39450)
* [C++][Parquet] Allow reading dictionary without reading data
via ByteArrayDictionaryRecordReader (#39153)
- Disable logging until compatibility with glog is restored
gh#apache/arrow#40181
-------------------------------------------------------------------
Mon Jan 15 20:38:45 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 14.0.2
## New Features and Improvements
* GH-38449 - [Release][Go][macOS] Use local test data if possible
(#38450)
* GH-38591 - [Parquet][C++] Remove redundant open calls in
ParquetFileFormat::GetReaderAsync (#38621)
## Bug Fixes
* GH-38345 - [Release] Use local test data for verification if
possible (#38362)
* GH-38438 - [C++] Dataset: Trying to fix the async bug in
Parquet dataset (#38466)
* GH-38577 - Reading parquet file behavior change from 13.0.0 to
14.0.0
* GH-38618 - [C++] S3FileSystem: fix regression in deleting
explicitly created sub-directories (#38845)
* GH-38861 - [C++] Add missing “-framework Security” to
Libs.private in arrow.pc (#38869)
* GH-39072 - [Release][CI] Python3.11-devel is required for the
verification job on AlmaLinux 8 (#39073)
* GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS
(#39082)
-------------------------------------------------------------------
Thu Jan 11 20:27:13 UTC 2024 - pgajdos@suse.com
- disable some tests for s390x [bsc#1218592]
-------------------------------------------------------------------
Mon Nov 13 23:51:00 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
- update 14.0.1
* GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
* GH-38607 - [Python] Disable PyExtensionType autoload
- update to 14.0.1
* very long list of changes can be found here:
https://arrow.apache.org/release/14.0.0.html
-------------------------------------------------------------------
Fri Aug 25 09:05:09 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 13.0.0
## Acero
* Handling of unaligned buffers is input nodes can be configured
programmatically or by setting the environment variable
ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when
an unaligned buffer is detected GH-35498.
## Compute
* Several new functions have been added:
- aggregate functions “first”, “last”, “first_last” GH-34911;
- vector functions “cumulative_prod”, “cumulative_min”,
“cumulative_max” GH-32190;
- vector function “pairwise_diff” GH-35786.
* Sorting now works on dictionary arrays, with a much better
performance than the naive approach of sorting the decoded
dictionary GH-29887. Sorting also works on struct arrays, and
nested sort keys are supported using FieldRed GH-33206.
* The check_overflow option has been removed from
CumulativeSumOptions as it was redundant with the availability
of two different functions: “cumulative_sum” and
“cumulative_sum_checked” GH-35789.
* Run-end encoded filters are efficiently supported GH-35749.
* Duration types are supported with the “is_in” and “index_in”
functions GH-36047. They can be multiplied with all integer
types GH-36128.
* “is_in” and “index_in” now cast their inputs more flexibly:
they first attempt to cast the value set to the input type,
then in the other direction if the former fails GH-36203.
* Multiple bugs have been fixed in “utf8_slice_codeunits” when
the stop option is omitted GH-36311.
## Dataset
* A custom schema can now be passed when writing a dataset
GH-35730. The custom schema can alter nullability or metadata
information, but is not allowed to change the datatypes
written.
## Filesystems
* The S3 filesystem now writes files in equal-sized chunks, for
compatibility with Cloudflare’s “R2” Storage GH-34363.
* A long-standing issue where S3 support could crash at shutdown
because of resources still being alive after S3 finalization
has been fixed GH-36346. Now, attempts to use S3 resources
(such as making filesystem calls) after S3 finalization should
result in a clean error.
* The GCS filesystem accepts a new option to set the project id
GH-36227.
## IPC
* Nullability and metadata information for sub-fields of map
types is now preserved when deserializing Arrow IPC GH-35297.
## Orc
* The Orc adapter now maps Arrow field metadata to Orc type
attributes when writing, and vice-versa when reading GH-35304.
## Parquet
* It is now possible to write additional metadata while a
ParquetFileWriter is open GH-34888.
* Writing a page index can be enabled selectively per-column
GH-34949. In addition, page header statistics are not written
anymore if the page index is enabled for the given column
GH-34375, as the information would be redundant and less
efficiently accessed.
* Parquet writer properties allow specifying the sorting columns
GH-35331. The user is responsible for ensuring that the data
written to the file actually complies with the given sorting.
* CRC computation has been implemented for v2 data pages
GH-35171. It was already implemented for v1 data pages.
* Writing compliant nested types is now enabled by default
GH-29781. This should not have any negative implication.
* Attempting to load a subset of an Arrow extension type is now
forbidden GH-20385. Previously, if an extension type’s storage
is nested (for example a “Point” extension type backed by a
struct<x: float64, y: float64>), it was possible to load
selectively some of the columns of the storage type.
## Substrait
* Support for various functions has been added: “stddev”,
“variance”, “first”, “last” (GH-35247, GH-35506).
* Deserializing sorts is now supported GH-32763. However, some
features, such as clustered sort direction or custom sort
functions, are not implemented.
## Miscellaneous
* FieldRef sports additional methods to get a flattened version
of nested fields GH-14946. Compared to their non-flattened
counterparts, the methods GetFlattened, GetAllFlattened,
GetOneFlattened and GetOneOrNoneFlattened combine a child’s
null bitmap with its ancestors’ null bitmaps such as to compute
the field’s overall logical validity bitmap.
* In other words, given the struct array [null, {'x': null},
{'x': 5}], FieldRef("x")::Get might return [0, null, 5] while
FieldRef("y")::GetFlattened will always return [null, null, 5].
* Scalar::hash() has been fixed for sliced nested arrays
GH-35360.
* A new floating-point to decimal conversion algorithm exhibits
much better precision GH-35576.
* It is now possible to cast between scalars of different
list-like types GH-36309.
-------------------------------------------------------------------
Mon Jun 12 12:13:18 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.1
* [GH-35423] - [C++][Parquet] Parquet PageReader Force
decompression buffer resize smaller (#35428)
* [GH-35498] - [C++] Relax EnsureAlignment check in Acero from
requiring 64-byte aligned buffers to requiring value-aligned
buffers (#35565)
* [GH-35519] - [C++][Parquet] Fixing exception handling in parquet
FileSerializer (#35520)
* [GH-35538] - [C++] Remove unnecessary status.h include from
protobuf (#35673)
* [GH-35730] - [C++] Add the ability to specify custom schema on a
dataset write (#35860)
* [GH-35850] - [C++] Don't disable optimization with
RelWithDebInfo (#35856)
- Drop cflags.patch -- fixed upstream
-------------------------------------------------------------------
Thu May 18 07:00:43 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.0
* Run-End Encoded Arrays have been implemented and are accessible
(GH-32104)
* The FixedShapeTensor Logical value type has been implemented
using ExtensionType (GH-15483, GH-34796)
## Compute
* New kernel to convert timestamp with timezone to wall time
(GH-33143)
* Cast kernels are now built into libarrow by default (GH-34388)
## Acero
* Acero has been moved out of libarrow into it’s own shared
library, allowing for smaller builds of the core libarrow
(GH-15280)
* Exec nodes now can have a concept of “ordering” and will reject
non-sensible plans (GH-34136)
* New exec nodes: “pivot_longer” (GH-34266), “order_by”
(GH-34248) and “fetch” (GH-34059)
* Breaking Change: Reorder output fields of “group_by” node so
that keys/segment keys come before aggregates (GH-33616)
## Substrait
* Add support for the round function GH-33588
* Add support for the cast expression element GH-31910
* Added API reference documentation GH-34011
* Added an extension relation to support segmented aggregation
GH-34626
* The output of the aggregate relation now conforms to the spec
GH-34786
## Parquet
* Added support for DeltaLengthByteArray encoding to the Parquet
writer (GH-33024)
* NaNs are correctly handled now for Parquet predicate push-downs
(GH-18481)
* Added support for reading Parquet page indexes (GH-33596) and
writing page indexes (GH-34053)
* Parquet writer can write columns in parallel now (GH-33655)
* Fixed incorrect number of rows in Parquet V2 page headers
(GH-34086)
* Fixed incorrect Parquet page null_count when stats are disabled
(GH-34326)
* Added support for reading BloomFilters to the Parquet Reader
(GH-34665)
* Parquet File-writer can now add additional key-value metadata
after it has been opened (GH-34888)
* Breaking Change: The default row group size for the Arrow
writer changed from 64Mi rows to 1Mi rows. GH-34280
## ORC
* Added support for the union type in ORC writer (GH-34262)
* Fixed ORC CHAR type mapping with Arrow (GH-34823)
* Fixed timestamp type mapping between ORC and arrow (GH-34590)
## Datasets
* Added support for reading JSON datasets (GH-33209)
* Dataset writer now supports specifying a function callback to
construct the file name in addition to the existing file name
template (GH-34565)
## Filesystems
* GcsFileSystem::OpenInputFile avoids unnecessary downloads
(GH-34051)
## Other changes
* Convenience Append(std::optional...) methods have been added to
array builders
([GH-14863](https://github.com/apache/arrow/issues/14863))
* A deprecated OpenTelemetry header was removed from the Flight
library (GH-34417)
* Fixed crash in “take” kernels on ExtensionArrays with an
underlying dictionary type (GH-34619)
* Fixed bug where the C-Data bridge did not preserve nullability
of map values on import (GH-34983)
* Added support for EqualOptions to RecordBatch::Equals
(GH-34968)
* zstd dependency upgraded to v1.5.5 (GH-34899)
* Improved handling of “logical” nulls such as with union and
RunEndEncoded arrays (GH-34361)
* Fixed incorrect handling of uncompressed body buffers in IPC
reader, added IpcWriteOptions::min_space_savings for optional
compression optimizations (GH-15102)
-------------------------------------------------------------------
Mon Apr 3 11:09:06 UTC 2023 - Andreas Schwab <schwab@suse.de>
- cflags.patch: fix option order to compile with optimisation
- Adjust constraints
-------------------------------------------------------------------
Wed Mar 29 13:13:13 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Remove gflags-static. It was only needed due to a packaging error
with gflags which is about to be fixed in Tumbleweed
- Disable build of the jemalloc memory pool backend
* It requires every consuming application to LD_PRELOAD
libjemalloc.so.2, even when it is not set as the default memory
pool, due to static TLS block allocation errors
* Usage of the bundled jemalloc as a workaround is not desired
(gh#apache/arrow#13739)
* jemalloc does not seem to have a clear advantage over the
system glibc allocator:
https://ursalabs.org/blog/2021-r-benchmarks-part-1
* This overrides the default behavior documented in
https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool
-------------------------------------------------------------------
Sun Mar 12 04:28:52 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to v11.0.0
* ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
* ARROW-11776 - [C++][Java] Support parquet write from ArrowReader
to file (#14151)
* ARROW-13938 - [C++] Date and datetime types should autocast from
strings
* ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
* ARROW-14999 - [C++] Optional field name equality checks for map
and list type (#14847)
* ARROW-15538 - [C++] Expanding coverage of math functions from
Substrait to Acero (#14434)
* ARROW-15592 - [C++] Add support for custom output field names in
a substrait::PlanRel (#14292)
* ARROW-15732 - [C++] Do not use any CPU threads in execution plan
when use_threads is false (#15104)
* ARROW-16782 - [Format] Add REE definitions to FlatBuffers
(#14176)
* ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
* ARROW-17301 - [C++] Implement compute function "binary_slice"
(#14550)
* ARROW-17509 - [C++] Simplify async scheduler by removing the
need to call End (#14524)
* ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll)
(#14186)
* ARROW-17610 - [C++] Support additional source types in
SourceNode (#14207)
* ARROW-17613 - [C++] Add function execution API for a
preconfigured kernel (#14043)
* ARROW-17640 - [C++] Add File Handling Test cases for GlobFile
handling in Substrait Read (#14132)
* ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to
Parquet writer (#14191)
* ARROW-17825 - [C++] Allow the possibility to write several
tables in ORCFileWriter (#14219)
* ARROW-17836 - [C++] Allow specifying alignment of buffers
(#14225)
* ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext
that will store a plan's shared data structures (#14227)
* ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource
(#14250)
* ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in
Flight SQL (#14266)
* ARROW-17932 - [C++] Implement streaming RecordBatchReader for
JSON (#14355)
* ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
* ARROW-17966 - [C++] Adjust to new format for Substrait optional
arguments (#14415)
* ARROW-17975 - [C++] Create at-fork facility (#14594)
* ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
* ARROW-17989 - [C++][Python] Enable struct_field kernel to accept
string field names (#14495)
* ARROW-18008 - [Python][C++] Add use_threads to
run_substrait_query
* ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
* ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
* ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
* ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be
uninitialized (#14480)
* ARROW-18144 - [C++] Improve JSONTypeError error message in
testing (#14486)
* ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
* ARROW-18206 - [C++][CI] Add a nightly build for C++20
compilation (#14571)
* ARROW-18235 - [C++][Gandiva] Fix the like function
implementation for escape chars (#14579)
* ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
* ARROW-18253 - [C++][Parquet] Add additional bounds safety checks
(#14592)
* ARROW-18259 - [C++][CMake] Add support for system Thrift CMake
package (#14597)
* ARROW-18280 - [C++][Python] Support slicing to end in list_slice
kernel (#14749)
* ARROW-18282 - [C++][Python] Support step >= 1 in list_slice
kernel (#14696)
* ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc
provided by vcpkg (#14609)
* ARROW-18342 - [C++] AsofJoinNode support for Boolean data field
(#14658)
* ARROW-18350 - [C++] Use std::to_chars instead of std::to_string
(#14666)
* ARROW-18367 - [C++] Enable the creation of named table relations
(#14681)
* ARROW-18373 - Fix component drop-down, add license text (#14688)
* ARROW-18377 - MIGRATION: Automate component labels from issue
form content (#15245)
* ARROW-18395 - [C++] Move select-k implementation into separate
module
* ARROW-18402 - [C++] Expose DeclarationInfo (#14765)
* ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu
20.04 (#14735)
* ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in
building plasma-glib (#14739)
* ARROW-18413 - [C++][Parquet] Expose page index info from
ColumnChunkMetaData (#14742)
* ARROW-18419 - [C++] Update vendored fast_float (#14817)
* ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex
(#14803)
* ARROW-18421 - [C++][ORC] Add accessor for stripe information in
reader (#14806)
* ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode
(#14934)
* ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
* GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in.
(#14900)
* GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake
package (#15251)
* GH-14937 - [C++] Add rank kernel benchmarks (#14938)
* GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED
encoding (#15140)
* GH-15072 - [C++] Move the round functionality into a separate
module (#15073)
* GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit
(#15182)
* GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
* GH-15100 - [C++][Parquet] Add benchmark for reading strings from
Parquet (#15101)
* GH-15151 - [C++] Adding RecordBatchReaderSource to solve an
issue in R API (#15183)
* GH-15185 - [C++][Parquet] Improve documentation for Parquet
Reader column_indices (#15184)
* GH-15199 - [C++][Substrait] Allow
AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
* GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
* GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch
(#15240)
* GH-15226 - [C++] Add DurationType to hash kernels (#33685)
* GH-15237 - [C++] Add ::arrow::Unreachable() using
std::string_view (#15238)
* GH-15239 - [C++][Parquet] Parquet writer writes decimal as
int32/64 (#15244)
* GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case
when the scalar is null (#15291)
* GH-33607 - [C++] Support optional additional arguments for
inline visit functions (#33608)
* GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc
without ARROW_PARQUET=ON (#33665)
* PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated
fields (#14366)
* PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader
(#14142)
* PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should
reuse scratch space (#14509)
* PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader
ReadBatch and Skip (#14523)
* PARQUET-2209 - [parquet-cpp] Optimize skip for the case that
number of values to skip equals page size (#14545)
* PARQUET-2210 - [C++][Parquet] Skip pages based on header
metadata using a callback (#14603)
* PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field
(#14556)
- Remove unused python3-arrow package declaration
* Add options as recommended for python support
- Provide test data for unittests
- Don't use system jemalloc but bundle it in order to avoid
static TLS errors in consuming packages like python-pyarrow
* gh#apache/arrow#13739
-------------------------------------------------------------------
Sun Aug 28 19:30:50 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
- Revert ccache change, using ccache in a pristine buildroot
just slows down OBS builds (use --ccache for local builds).
- Remove unused gflags-static-devel dependency.
-------------------------------------------------------------------
Mon Aug 22 06:22:43 UTC 2022 - John Vandenberg <jayvdb@gmail.com>
- Speed up builds with ccache
-------------------------------------------------------------------
Sat Aug 6 01:59:08 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
- Update to v9.0.0
No (current) changelog provided
- Spec file cleanup:
* Remove lots of duplicate, unused, or wrong build dependencies
* Do not package outdated Readmes and Changelogs
- Enable tests, disable ones requiring external test data
-------------------------------------------------------------------
Sat Nov 14 09:07:59 UTC 2020 - John Vandenberg <jayvdb@gmail.com>
- Update to v2.0.0
-------------------------------------------------------------------
Wed Nov 13 21:14:00 UTC 2019 - TheBlackCat <toddrme2178@gmail.com>
- Initial spec for v0.12.0