close
Skip to content

refactor/consolidated JSON IO routines#3998

Open
d-v-b wants to merge 11 commits into
zarr-developers:mainfrom
d-v-b:refactor/json-io
Open

refactor/consolidated JSON IO routines#3998
d-v-b wants to merge 11 commits into
zarr-developers:mainfrom
d-v-b:refactor/json-io

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented May 22, 2026

This is an internal refactor that does a few things related to JSON IO:

  • removes the _get_json / _get_bytes methods from the store ABCs. we never used them, and that functionality has proven to be better served via helper functions that take a store as a parameter.
  • adds helper functions that take a store as a parameter for doing useful stuff with JSON (reading it and writing it)
  • adds helper functions that handle buffer -> JSON and vice versa

none of these things touch public APIs. the goal is removing dead code and boilerplate.

d-v-b and others added 5 commits May 22, 2026 15:41
Add four free functions for moving JSON documents in and out of stores,
plus a thin StorePath.get_json wrapper:

- buffer_to_json / json_to_buffer: convert between a Buffer and a parsed
  JSON value. The buffer prototype and the serialization policy
  (indent from config, allow_nan=True) live here, in one place.
- get_json / set_json: compose those with Store.get / Store.set.
  get_json returns None for a missing key, matching what most callers
  want; callers needing presence check for None themselves.

These are free functions, not Store ABC methods, so stores cannot (and
need not) override them and the Store contract gains no dependency on
the buffer prototype or global config. Subsequent commits sweep the
hand-rolled JSON I/O sites onto these and delete the old private
_get_bytes/_get_json store methods.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep the hand-rolled `json.dumps`/`json.loads` + buffer construction in
the metadata write paths and the metadata/node read paths onto the free
functions added in the previous commit.

Write sites (`to_buffer_dict` in group/metadata) now call `json_to_buffer`,
which centralizes the `json_indent`/`allow_nan=True` serialization policy.
Read sites parse buffers through `buffer_to_json_object`, a new helper that
narrows the parsed `JSON` to `dict[str, JSON]` once (every metadata document
is an object) rather than relying on `json.loads` returning `Any`.

The compact (no-indent) consolidated-metadata blob in `GroupMetadata.to_buffer_dict`
is intentionally left as a raw `json.dumps` to preserve its byte layout, and
`_read_metadata_v2` keeps its `asyncio.gather` of three `store.get` calls.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`Store._get_bytes`, `_get_bytes_sync`, `_get_json`, and `_get_json_sync`
had no production callers — the metadata read/write paths now go through
the free functions in `zarr.core._json`. Delete the four ABC methods, the
eight per-store overrides on `LocalStore`/`MemoryStore` (which existed only
to make the `prototype` argument optional), and the tests that exercised
them (the shared `StoreTests` methods plus the per-store prototype=None
tests). Drop the now-unused `json`/`sync`/`Any` imports they pulled in.

These methods were always private; removing them is not a public API change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`json_to_buffer` (and `set_json`) now take the JSON encoding parameters
`indent` and `allow_nan` as explicit keyword arguments instead of reading
`config.get("json_indent")` internally. The functions in `zarr.core._json`
are now pure: they depend only on their arguments, not the global config.

Each `to_buffer_dict` caller (group, metadata v2/v3) reads
`config.get("json_indent")` and passes it as `indent=`. This re-localizes
the config read to the metadata layer that owns the policy, and keeps the
I/O helpers free of any config dependency.

With `indent` now explicit, the compact consolidated-metadata blob in
`GroupMetadata.to_buffer_dict` also routes through `json_to_buffer`
(indent defaults to None == compact), removing the last raw `json.dumps`
in that method and the `json` import from group.py. Output bytes are
unchanged (the consolidated/metadata regression tests pass).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.51%. Comparing base (c0e2afa) to head (72235a3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3998      +/-   ##
==========================================
+ Coverage   93.49%   93.51%   +0.01%     
==========================================
  Files          88       89       +1     
  Lines       11873    11825      -48     
==========================================
- Hits        11101    11058      -43     
+ Misses        772      767       -5     
Files with missing lines Coverage Δ
src/zarr/abc/store.py 96.12% <ø> (-0.30%) ⬇️
src/zarr/core/_json.py 100.00% <100.00%> (ø)
src/zarr/core/array.py 97.78% <100.00%> (ø)
src/zarr/core/group.py 94.97% <100.00%> (-0.01%) ⬇️
src/zarr/core/metadata/v2.py 89.44% <100.00%> (+0.05%) ⬆️
src/zarr/core/metadata/v3.py 93.94% <100.00%> (-0.02%) ⬇️
src/zarr/storage/_common.py 89.52% <100.00%> (+3.05%) ⬆️
src/zarr/storage/_local.py 95.04% <100.00%> (-0.37%) ⬇️
src/zarr/storage/_memory.py 96.52% <ø> (-0.23%) ⬇️
src/zarr/testing/store.py 99.43% <ø> (-0.04%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b and others added 2 commits May 22, 2026 16:34
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The except clauses in `contains_array`, `contains_group`, and
`_contains_node_v3` (widened to catch `TypeError` in the JSON I/O refactor)
had no test exercising them — a stored `zarr.json` that is malformed bytes,
valid-but-non-object JSON, or an object missing `node_type` now has a test
asserting each function reports the node as absent. One test per failure
mode; the two `contains_*` functions are parametrized over `func`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@d-v-b d-v-b requested review from chuckwondo and maxrjones May 22, 2026 15:31
Copy link
Copy Markdown
Contributor

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I made some trivial/non-blocking comments.

Comment thread src/zarr/core/_json.py

Every metadata document zarr reads is a JSON object, so this narrows the
`JSON` union to `dict[str, JSON]` once, here, instead of at each call site.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems trivial, but do you want to add a Parameters section?

Comment thread src/zarr/core/_json.py Outdated

Parameters
----------
obj : JSON
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all new code, do you want to stop including types in docstrings?

Comment thread src/zarr/core/_json.py Outdated
Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com>
@maxrjones
Copy link
Copy Markdown
Member

It looks like this will break Icechunk's tests (https://github.com/earth-mover/icechunk/blob/6c504cf377db2e9559703467adfa435a845ce14d/icechunk-python/tests/test_zarr/test_store/test_icechunk_store.py#L573-L583), but likely not its functionality. I'd like to delegate my review request to an Icechunk dev if possible since there's an an impact on downstream maintenance and I hold no opinion about this PR. @dcherian would you mind reviewing this one?

d-v-b and others added 3 commits May 22, 2026 19:55
Add a Parameters section to buffer_to_json_object and drop the
redundant type annotations from the numpydoc Parameters sections,
per review feedback on zarr-developers#3998.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 22, 2026

It looks like this will break Icechunk's tests (https://github.com/earth-mover/icechunk/blob/6c504cf377db2e9559703467adfa435a845ce14d/icechunk-python/tests/test_zarr/test_store/test_icechunk_store.py#L573-L583), but likely not its functionality. I'd like to delegate my review request to an Icechunk dev if possible since there's an an impact on downstream maintenance and I hold no opinion about this PR. @dcherian would you mind reviewing this one?

ouch, we really shouldn't have downstream projects relying on private methods like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants