close
Skip to content

GH-29309: [C++] Preserve BinaryBuilder data type#50049

Open
BITree2004 wants to merge 2 commits into
apache:mainfrom
BITree2004:ARROW-29309-binary-builder-type
Open

GH-29309: [C++] Preserve BinaryBuilder data type#50049
BITree2004 wants to merge 2 commits into
apache:mainfrom
BITree2004:ARROW-29309-binary-builder-type

Conversation

@BITree2004
Copy link
Copy Markdown

@BITree2004 BITree2004 commented May 27, 2026

Rationale for this change

BinaryBuilder accepted a DataType in its constructor but discarded it, so finishing through ArrayBuilder::Finish() always produced a plain binary array instead of preserving the supplied type, such as an extension type backed by binary storage.

What changes are included in this PR?

  • Store the supplied DataType in BaseBinaryBuilder.
  • Use the stored type for BinaryBuilder and LargeBinaryBuilder output.
  • Add a regression test covering BinaryBuilder with a binary-backed extension type.

Are these changes tested?

Yes.

ninja -C cpp/build arrow-array-test
./cpp/build/release/arrow-array-test --gtest_filter=BinaryBuilder.PreservesDataType
./cpp/build/release/arrow-array-test

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #29309 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #29309 has been automatically assigned in GitHub to PR creator.

1 similar comment
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #29309 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #29309 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes BinaryBuilder / LargeBinaryBuilder so that when they’re constructed with an explicit DataType (notably binary-backed extension types), the finished array preserves that type instead of always producing a plain (large) binary array.

Changes:

  • Store the provided DataType inside BaseBinaryBuilder and return it from type().
  • Remove BinaryBuilder / LargeBinaryBuilder’s hardcoded type() overrides so FinishInternal() uses the stored type.
  • Add a regression test ensuring a binary-backed ExtensionType is preserved through BinaryBuilder::Finish().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
cpp/src/arrow/array/builder_binary.h Persist builder DataType in BaseBinaryBuilder and use it for finished array type.
cpp/src/arrow/array/array_binary_test.cc Adds a regression test covering preservation of a binary-backed extension type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/src/arrow/array/builder_binary.h Outdated
Comment thread cpp/src/arrow/array/builder_binary.h Outdated
Comment thread cpp/src/arrow/array/builder_binary.h
Comment thread cpp/src/arrow/array/builder_binary.h Outdated
@BITree2004
Copy link
Copy Markdown
Author

Addressed the Copilot review comments in 6ad9de9:

  • validate the stored binary/large-binary builder type, including extension storage type
  • use the type singleton for default binary builders
  • guard typed Finish(...) overloads so extension-typed builders return TypeError instead of casting to BinaryArray/LargeBinaryArray
  • add regression coverage for BinaryBuilder and LargeBinaryBuilder

Local verification:

  • cmake --build cpp/build-pr-50049 --target arrow-array-test
  • ./cpp/build-pr-50049/release/arrow-array-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants