close
Skip to content

Add: per-task and scope-filter granularity to scope_stats#976

Open
doraemonmj wants to merge 1 commit into
hw-native-sys:mainfrom
doraemonmj:scope-stats-doc
Open

Add: per-task and scope-filter granularity to scope_stats#976
doraemonmj wants to merge 1 commit into
hw-native-sys:mainfrom
doraemonmj:scope-stats-doc

Conversation

@doraemonmj
Copy link
Copy Markdown
Contributor

Extend scope_stats (#858) with two orthogonal, opt-in axes on top of the existing per-scope begin/end sampling (#902):

  • Scope filter (--enable-scope-stats ): restrict collection to the listed PTO2_SCOPE line numbers via CallConfig.scope_stats_scope. Empty = every scope (unchanged default). The orchestration is one source file, so a line uniquely identifies a scope.
  • Per-task sampling (--scope-stats-task): emit one phase="task" record per submit_task carrying task_id and ring/heap occupancy, attributed to the enclosing scope and subject to the scope filter.

Both travel through CallConfig to the device collector. The runtime only calls the scope_stats_* interface (one weak-gated scope_stats_record_task in submit_task_common); the collector parses the filter CSV and gates record emission, leaving depth/site bookkeeping intact so nesting stays correct. Filtered-out scopes are not appended and do not count toward total. The implicit top-level scope now renders as "" not "(unknown)". jsonl gains "phase" (begin/end/task) and "task_id" fields.

Mirrored across a2a3/a5 and onboard/sim; docs/dfx/scope-stats.md updated.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 372aa87c-3a50-4916-afd2-26b7de5daa0d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds selective scope-site collection and per-task scope-stats record emission. Users can now filter scope-stats collection to specific PTO2_SCOPE line numbers via --enable-scope-stats <lines> and emit per-task occupancy snapshots via --scope-stats-task. The changes extend CallConfig with two new fields, update the shared header and record schema to include filtering configuration and task-phase records, thread the new parameters through device runner initialization and C APIs, conditionally record per-task samples in the orchestrator, filter host-side collection by site line, and produce task-timeline visualizations in the scope-stats output.

Changes

Scope Stats Per-Task & Filtering Support

Layer / File(s) Summary
Data contract: phase, task_id, filter configuration
src/common/platform/include/common/scope_stats.h, src/common/task_interface/call_config.h, src/common/platform/include/aicpu/scope_stats_collector_aicpu.h
Introduces SCOPE_STATS_PHASE_TASK phase type, adds task_id field to ScopeStatsRecord, extends ScopeStatsDataHeader with site-filter and task-enabled configuration, adds scope_stats_scope (64-byte filter) and scope_stats_task fields to CallConfig, and declares new API functions scope_stats_record_task() and is_scope_stats_task_enabled().
Host configuration APIs: setters/getters in DeviceRunnerBase
src/common/platform/onboard/host/device_runner_base.h, src/common/platform/sim/host/device_runner_base.h, src/common/platform/include/host/scope_stats_collector.h
Adds public set_scope_stats_scope(), scope_stats_scope(), set_scope_stats_task(), and scope_stats_task() methods to both onboard and sim DeviceRunnerBase; updates ScopeStatsCollector::init() signature to accept site_filter_csv and task_enabled parameters.
C API and function pointer signatures
src/common/platform/onboard/host/c_api_shared.cpp, src/common/platform/sim/host/c_api_shared.cpp, src/common/worker/pto_runtime_c_api.h, src/common/worker/chip_worker.h, src/common/worker/chip_worker.cpp
Updates run_prepared() C ABI to include scope_stats_scope and scope_stats_task parameters; updates function pointer type; passes new fields through implementation and into device runner setters.
Device runner initialization: pass scope config to collector
src/a2a3/platform/onboard/host/device_runner.cpp, src/a2a3/platform/sim/host/device_runner.cpp, src/a5/platform/onboard/host/device_runner.cpp, src/a5/platform/sim/host/device_runner.cpp
Updates DeviceRunner::init_scope_stats() to pass scope_stats_scope().c_str() and scope_stats_task() into scope_stats_collector_.init() across both a2a3 and a5 platforms.
Device-side per-task recording in orchestrators
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp, src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
Adds weak, hidden is_scope_stats_task_enabled() predicate and conditionally emits per-task scope-stats records in submit_task_common() when gated by the predicate; records task id and current allocator/tensormap occupancy.
Host-side collection: filtering and task record handling
src/common/platform/shared/aicpu/scope_stats_collector_aicpu.cpp, src/common/platform/shared/host/scope_stats_collector.cpp
Implements site-filter gating via site_passes_filter() predicate; caches task-enabled state for fast hot-path checking; conditionally emits BEGIN/END/TASK records only when site filter passes; implements CSV filter parsing in parse_site_filter_csv() helper; updates NDJSON output to include three phases (begin/task/end) with task_id in task records.
Python bindings and mailbox deserialization
python/bindings/task_interface.cpp, python/simpler/worker.py
Extends Python CallConfig nanobind bindings with scope_stats_scope (string) and scope_stats_task (boolean) properties; updates _CFG_FMT mailbox struct format and _read_config_from_mailbox() to unpack and populate new fields.
Test framework: CLI parsing and parameter threading
conftest.py, simpler_setup/scene_test.py, tests/st/a2a3/tensormap_and_ringbuffer/dfx/scope_stats/test_scope_stats.py
Updates pytest and standalone CLI parsing to make --enable-scope-stats accept optional comma-separated value and --scope-stats-task to enable per-task records; extends run_class_cases(), _build_config(), and test pipeline with scope_stats_scope and scope_stats_task parameters; implements optional-value semantics and disablement logic; updates test gating to correctly interpret the optional flag.
Visualization and documentation: task timeline and guides
simpler_setup/tools/scope_stats_plot.py, docs/dfx/scope-stats.md
Adds new task timeline visualization showing per-record occupancy with phase-colored markers and task-id tooltips; updates _ring_section() to include timeline as first chart per ring; extends documentation to cover site-filtering, per-task collection, expanded schema with task phase and task_id field, and device-side filtering behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 Hops through rings of memory clear,
Task timelines bloom, phase-colored cheer,
Filtering scopes with CSV lines,
Per-task records paint occupancy designs,
Now scope-stats shows the dancing flow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding per-task and scope-filter granularity to scope_stats, which is the central focus of this large multi-file PR.
Description check ✅ Passed The description provides clear context about the two new features (scope filter and per-task sampling), how they propagate through the system, and notes the mirrored changes across platforms.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two optional refinements to the scope_stats profiling tool: a scope site filter (--enable-scope-stats <lines>) to restrict collection to specific line numbers, and per-task sampling (--scope-stats-task) to emit fine-grained occupancy records during task submission. It updates the HTML plotting tool to render a detailed task timeline, and propagates these new configurations through CallConfig, Python bindings, and the device-side collectors. One issue was identified in the test suite where test_scope_stats.py may skip validation if only --scope-stats-task is specified, because it does not account for the fact that this flag implies --enable-scope-stats.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tests/st/a2a3/tensormap_and_ringbuffer/dfx/scope_stats/test_scope_stats.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/dfx/scope-stats.md (1)

55-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the documented input path for scope_stats_plot.py.

Line 56 uses <output_prefix>/scope_stats.jsonl, but the generated file is under scope_stats/ (<output_prefix>/scope_stats/scope_stats.jsonl).

Suggested fix
-python simpler_setup/tools/scope_stats_plot.py <output_prefix>/scope_stats.jsonl
-# writes <output_prefix>/scope_stats.html
+python simpler_setup/tools/scope_stats_plot.py <output_prefix>/scope_stats/scope_stats.jsonl
+# writes <output_prefix>/scope_stats/scope_stats.html (unless --out-dir is provided)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dfx/scope-stats.md` around lines 55 - 61, The docs show the wrong input
path for the scope stats plotting script; update the examples referencing
scope_stats_plot.py to point to the generated file under the scope_stats
subdirectory (i.e., use <output_prefix>/scope_stats/scope_stats.jsonl instead of
<output_prefix>/scope_stats.jsonl) and adjust the alternate example similarly
(e.g., path/to/scope_stats/scope_stats.jsonl or path/to/scope_stats.jsonl within
a scope_stats directory) so the README matches the actual output layout for
scope_stats_plot.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp`:
- Around line 684-694: The scope stats snapshot is taken too late (after
prepare_task/tensormap work and scheduler push) using
task_allocator.task_tail()/heap_tail(), so make the allocator snapshot
immediately after allocation and reuse it at submit-time: capture
alloc.task_tail(), alloc.task_head(), alloc.heap_tail(), alloc.heap_top() (and
orch->tensor_map.current_used()) into a small TaskAllocationSnapshot stored on
the allocated slot or returned by the allocator when the slot is reserved (e.g.
in the code path around prepare_task()/task_allocator allocation), and replace
the late calls in the is_scope_stats_task_enabled() block that calls
scope_stats_record_task with reads from that cached snapshot so the PHASE_TASK
record reflects the state at allocation time.

In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp`:
- Around line 682-692: The current scope stats sample calls
task_tail()/heap_tail() late (in the submit path) so occupancy can change; move
sampling to immediately after allocation and reuse that cached snapshot here:
when the slot is allocated by orch->rings[ring_id].task_allocator (e.g., in the
allocation/prepare_task code path), read and store task_tail(), task_head(),
heap_tail(), heap_top(), and orch->tensor_map.current_used() into a small
Snapshot struct (or into fields on the task descriptor), and then change this
block to call scope_stats_record_task(...) with the cached snapshot instead of
calling task_tail()/heap_tail() live; keep the is_scope_stats_task_enabled()
guard and reference the same symbols (is_scope_stats_task_enabled(),
orch->rings[ring_id].task_allocator, scope_stats_record_task,
tensor_map.current_used()) so the recorded PHASE_TASK reflects the allocator
state at allocation time.

In `@src/common/task_interface/call_config.h`:
- Around line 61-67: The CSV buffer scope_stats_scope in call_config.h is too
small (64 bytes) to hold the advertised 16-filter CSV from scope_stats.h;
increase its size to match the 16-entry contract (e.g., at least 96 bytes, or
128 for headroom) so init_scope_stats can parse full input, and make the same
change for the other occurrence noted (the second scope_stats_scope
declaration). Update the declaration(s) of scope_stats_scope to the larger fixed
size and keep the comment about comma-separated PTO2_SCOPE line numbers in sync.

In `@tests/st/a2a3/tensormap_and_ringbuffer/dfx/scope_stats/test_scope_stats.py`:
- Around line 101-102: The test gate currently returns when
request.config.getoption("--enable-scope-stats", default=None) is None, but it
should also consider the presence of the related CLI flag "--scope-stats-task";
update the condition in the setup/guard so it only returns when both
request.config.getoption("--enable-scope-stats", default=None) is None AND
request.config.getoption("--scope-stats-task", default=None) is None, i.e.,
treat either option as enabling scope stats; locate the check using
request.config.getoption("--enable-scope-stats") in the test setup (around the
current if) and add the additional getoption("--scope-stats-task") check so
assertions run when either flag is provided.

---

Outside diff comments:
In `@docs/dfx/scope-stats.md`:
- Around line 55-61: The docs show the wrong input path for the scope stats
plotting script; update the examples referencing scope_stats_plot.py to point to
the generated file under the scope_stats subdirectory (i.e., use
<output_prefix>/scope_stats/scope_stats.jsonl instead of
<output_prefix>/scope_stats.jsonl) and adjust the alternate example similarly
(e.g., path/to/scope_stats/scope_stats.jsonl or path/to/scope_stats.jsonl within
a scope_stats directory) so the README matches the actual output layout for
scope_stats_plot.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 05113caa-c38d-46c4-8416-e09fa9259a8d

📥 Commits

Reviewing files that changed from the base of the PR and between 4898057 and 7fe8921.

📒 Files selected for processing (26)
  • conftest.py
  • docs/dfx/scope-stats.md
  • python/bindings/task_interface.cpp
  • python/simpler/worker.py
  • simpler_setup/scene_test.py
  • simpler_setup/tools/scope_stats_plot.py
  • src/a2a3/platform/onboard/host/device_runner.cpp
  • src/a2a3/platform/sim/host/device_runner.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
  • src/a5/platform/onboard/host/device_runner.cpp
  • src/a5/platform/sim/host/device_runner.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
  • src/common/platform/include/aicpu/scope_stats_collector_aicpu.h
  • src/common/platform/include/common/scope_stats.h
  • src/common/platform/include/host/scope_stats_collector.h
  • src/common/platform/onboard/host/c_api_shared.cpp
  • src/common/platform/onboard/host/device_runner_base.h
  • src/common/platform/shared/aicpu/scope_stats_collector_aicpu.cpp
  • src/common/platform/shared/host/scope_stats_collector.cpp
  • src/common/platform/sim/host/c_api_shared.cpp
  • src/common/platform/sim/host/device_runner_base.h
  • src/common/task_interface/call_config.h
  • src/common/worker/chip_worker.cpp
  • src/common/worker/chip_worker.h
  • src/common/worker/pto_runtime_c_api.h
  • tests/st/a2a3/tensormap_and_ringbuffer/dfx/scope_stats/test_scope_stats.py

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated
Comment thread src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated
Comment thread src/common/task_interface/call_config.h
Comment thread tests/st/a2a3/tensormap_and_ringbuffer/dfx/scope_stats/test_scope_stats.py Outdated
@doraemonmj doraemonmj force-pushed the scope-stats-doc branch 3 times, most recently from 51036e5 to 784d1bb Compare June 3, 2026 03:27
Extend scope_stats (hw-native-sys#858) with two orthogonal, opt-in axes on top of the
existing per-scope begin/end sampling (hw-native-sys#902):

- Scope filter (--enable-scope-stats <lines>): restrict collection to the
  listed PTO2_SCOPE line numbers via CallConfig.scope_stats_scope. Empty =
  every scope (unchanged default). The orchestration is one source file,
  so a line uniquely identifies a scope.
- Per-task sampling (--scope-stats-task): emit one phase="task" record per
  submit_task carrying task_id and ring/heap occupancy, attributed to the
  enclosing scope and subject to the scope filter.

Both travel through CallConfig to the device collector. The runtime only
calls the scope_stats_* interface (one weak-gated scope_stats_record_task
in submit_task_common); the collector parses the filter CSV and gates
record emission, leaving depth/site bookkeeping intact so nesting stays
correct. Filtered-out scopes are not appended and do not count toward
total. The implicit top-level scope now renders as "<root>" not
"(unknown)". jsonl gains "phase" (begin/end/task) and "task_id" fields.

Mirrored across a2a3/a5 and onboard/sim; docs/dfx/scope-stats.md updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant