close
Skip to content

fix(controller): patch CR metadata without owning spec#10264

Open
weicao wants to merge 1 commit into
mainfrom
lily/metadata-only-cr-updates
Open

fix(controller): patch CR metadata without owning spec#10264
weicao wants to merge 1 commit into
mainfrom
lily/metadata-only-cr-updates

Conversation

@weicao
Copy link
Copy Markdown
Contributor

@weicao weicao commented May 19, 2026

Summary

  • replace whole-object finalizer updates with merge patches that only change metadata
  • update ComponentVersion compatible ComponentDefinition labels and annotations through merge patch instead of whole-object update
  • keep chart-rendered spec fields owned by the chart manager during controller metadata reconciliation

Context

Helm server-side apply upgrades can conflict when controller metadata updates are submitted as full-object updates. The apiserver may then record the controller manager as owning spec fields that were rendered by the chart, such as ComponentVersion.spec.releases or OpsDefinition.spec.actions.

This change keeps the controller write path scoped to metadata for those reconciliation steps. HandleCRDeletion is used by multiple CR reconcilers, including ComponentVersion, OpsDefinition, OpsRequest, Addon, Restore, BackupRepo, ActionSet, BackupSchedule, BackupPolicy, and legacy parameter config rendering paths, so the fix prevents the same managedFields ownership drift across those metadata-only finalizer paths.

Validation

  • go test ./pkg/controllerutil -run TestHandleCRDeletionPatchesFinalizerMetadata -count=1 -v
  • go test ./controllers/apps -run TestComponentVersionSupportedCompDefLabelsUsesMetadataPatch -count=1 -v
  • go test -c ./controllers/apps
  • go test -c ./controllers/operations
  • go test -c ./pkg/controllerutil
  • git diff --check

Boundary

Local tests assert the generated patch payload contains metadata changes and does not contain spec. Live managedFields and Helm in-place upgrade validation should be run with a patch image against the addon upgrade scenario.

@weicao weicao requested review from a team and leon-ape as code owners May 19, 2026 11:56
@apecloud-bot
Copy link
Copy Markdown
Collaborator

Auto Cherry-pick Instructions

Usage:
  - /nopick: Not auto cherry-pick when PR merged.
  - /pick: release-x.x [release-x.x]: Auto cherry-pick to the specified branch when PR merged.

Example:
  - /nopick
  - /pick release-1.1

@github-actions github-actions Bot added the size/L Denotes a PR that changes 100-499 lines. label May 19, 2026
@weicao
Copy link
Copy Markdown
Contributor Author

weicao commented May 19, 2026

LGTM (peer review, 不能 formal approve 因为共享 GitHub identity).

Code-level second review verified:

  • HandleCRDeletion two UpdatePatch with MergeFrom(cr.DeepCopyObject()) — patch payload only contains finalizer metadata
  • updateSupportedCompDefLabels: client.Clientclient.Writer (least-privilege) + MergeFrom(compVersion.DeepCopy()) patch — payload only contains labels + compatibleDefinitionsKey annotation
  • TestComponentVersionSupportedCompDefLabelsUsesMetadataPatch directly asserts !strings.Contains(patchData, "spec") + strings.Contains(patchData, "metadata") at byte level

Scope verification (high-leverage fix):

  • HandleCRDeletion is called by 10+ reconcilers (CMPV, OpsDefinition, OpsRequest, Addon, Restore, BackupRepo, ActionSet, BackupSchedule, BackupPolicy, LegacyParamConfigRenderer) — all chart-rendered CRs benefit
  • Verified OpsDefinition controller has no other spec write path; HandleCRDeletion fix is sufficient for OpsDef .spec.actions SSA conflict

Non-blocker observations (future follow-up):

  • Production code does not set explicit client.FieldOwner("kubeblocks"); relies on default UserAgent → fieldManager which matches existing manager=kubeblocks in managedFields. Adding explicit FieldOwner later would make the contract code-visible.

Test boundary: plain Go tests assert patch-data contract at byte level (envtest BeforeSuite unavailable due to missing /usr/local/kubebuilder/bin/etcd locally — same constraint for me and PR author). Live managedFields + helm in-place upgrade validation provided by sideload patch image test against the addon upgrade scenario.

mergeStateStatus=BLOCKED expected (only formal review gate); CI just queued.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.69%. Comparing base (c1eb0c6) to head (a961065).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/controllerutil/controller_common.go 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10264      +/-   ##
==========================================
- Coverage   52.73%   52.69%   -0.05%     
==========================================
  Files         534      534              
  Lines       61231    61234       +3     
==========================================
- Hits        32288    32265      -23     
- Misses      25691    25706      +15     
- Partials     3252     3263      +11     
Flag Coverage Δ
unittests 52.69% <66.66%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@weicao
Copy link
Copy Markdown
Contributor Author

weicao commented May 19, 2026

Live verification cross-reference (release-1.0 backport PR #10265) — release-1.0 backport with identical production diff was patch-imaged, sideloaded to idc2, and validated against the SQL Server addon's mix-A-1 v18 in-place helm upgrade scenario.

  • helm install (chart 1.0.4-with-fix1268) → REVISION=1 deployed
  • helm upgrade in-place to chart 1.0.5 → STATUS=deployed, REVISION=2, rc=0, 5s, NO SSA conflict (previously failed with 4 .spec.releases / .spec.actions[*] Apply conflicts)
  • managedFields after-state: manager=kubeblocks no longer claims any f:spec.* for any of the 4 target objects (cmpv/mssql + 3 mssql-dynamic-* OpsDefinition); helm retains f:spec ownership as expected
  • Details + evidence pack: see fix(controller): patch CR metadata without owning spec (release-1.0) #10265 (comment)

Since this PR (main) and #10265 (release-1.0) carry identical production diff (verified via git diff origin/release-1.0..PR-10265-head -- pkg/controllerutil/controller_common.go controllers/apps/componentversion_controller.go equivalence with this PR), the mechanism validation on release-1.0 extends to main behavior.

@weicao
Copy link
Copy Markdown
Contributor Author

weicao commented May 19, 2026

/pick release-1.1 release-1.0

@apecloud-bot apecloud-bot added pick-1.1 Auto cherry-pick to release-1.1 when PR merged pick-1.0 Auto cherry-pick to release-1.0 when PR merged labels May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pick-1.0 Auto cherry-pick to release-1.0 when PR merged pick-1.1 Auto cherry-pick to release-1.1 when PR merged size/L Denotes a PR that changes 100-499 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants