Closes #21870: Optimized Prefix and IP Range availability and utilization calculations#22211
Closes #21870: Optimized Prefix and IP Range availability and utilization calculations#22211pheus wants to merge 1 commit into
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
331d99c to
031aad9
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration. |
|
Claude finished @pheus's task in 4m 50s —— View job PR Review: Optimized Prefix/IPRange Availability & Utilization
Issues of Concern1. SQL Explosion in
|
031aad9 to
80fa30c
Compare
Replace netaddr.IPSet-based availability and utilization calculations with SQL-side distinct host counts and Python-side interval merging. This avoids materializing large address sets in memory when rendering prefix and range summaries. The new implementation: - counts distinct IPAddress host values in the database - merges populated/utilized IPRanges before counting address space - avoids double-counting IPs covered by populated/utilized ranges - finds the first available IP by streaming sorted occupied intervals - updates the prefix detail panel to use available_ip_count - adds host-expression indexes for IPRange start/end addresses Also update IPRange child-IP matching and populated-range validation to compare host portions instead of full address/mask values, preserve rebuild_prefixes() compatibility while allowing queryset input, and fix BaseIPField.get_prep_value() to handle zero addresses such as 0.0.0.0 and ::.
80fa30c to
b16dd9e
Compare
|
@claude /review I agree with points 2, 4, and 5. For point 2, I’ve added equivalence coverage between For For the For point 1, I think the current interval-exclusion approach is still the right trade-off for this PR. The intervals are merged before building the predicate, and the expected hot path is many IPs with comparatively few populated/utilized ranges. A very high number of disjoint ranges could justify a different strategy later, but I’d prefer to keep this implementation readable unless benchmarks show that case is a real bottleneck. For point 3, the template already avoids calling |
|
Claude finished @pheus's task in 1m 25s —— View job Re-review: Verifying Addressed Concerns
FindingsPoint 2 — Equivalence coverage: well done. The Point 4 — if value is None or value == '' or (type(value) is int and value == 0):Using Point 5 — Fast-path comment: addressed. "one merged occupied interval covers the entire usable span" is unambiguous. No confusion for future readers. Point 1 — SQL explosion: accepted trade-off. The reasoning is sound for the expected workload. Merging intervals before building the predicate is the right mitigation, and the scenario where it degrades (hundreds of disjoint non-overlapping ranges) is unlikely in practice. Acceptable to revisit if benchmarks show otherwise. Point 3 — Duplicate queries in template: accepted trade-off. The existing short-circuit when No remaining issues of concern. The three addressed points are handled correctly and the test coverage added is solid. |
jeremystretch
left a comment
There was a problem hiding this comment.
I've done an initial review, but I'm finding it difficult to keep straight the hierarchy of all the new utility functions. It seems like a lot of them could be moved to model or queryset methods to keep things better organized. What do you think?
| # Use an explicit None / empty-string check; `not value` incorrectly treats | ||
| # the valid zero addresses 0.0.0.0 and :: as empty. Raw int 0 is preserved | ||
| # as "empty" for backward compatibility (Django's ORM does not pass it here). | ||
| if value is None or value == '' or (type(value) is int and value == 0): |
There was a problem hiding this comment.
This seems equivalent but simpler:
| if value is None or value == '' or (type(value) is int and value == 0): | |
| if value in (None, '', 0): |
|
|
||
| return available_ips | ||
|
|
||
| @property |
There was a problem hiding this comment.
Given that accessing this property triggers potentially expensive database queries (get_child_ranges(), get_child_ips()), we should probably implement it as a regular method, e.g. get_available_ip_count(). (Alternatively, we could use @cached_property but then we need to worry about stale value.)
| get_available_ips() remains available for callers that need the actual set. | ||
| """ | ||
| first_ip, last_ip = get_usable_ip_bounds(self) | ||
| usable_size = int(last_ip) - int(first_ip) + 1 |
There was a problem hiding this comment.
Might be worth exposing this as a cached property on Prefix, but we also want to be careful to avoid confusion with self.prefix.size.
| return sum(end - start + 1 for start, end in intervals) | ||
|
|
||
|
|
||
| def count_distinct_ip_hosts_outside_intervals(queryset, intervals, version): |
There was a problem hiding this comment.
Can this be collapsed into count_distinct_ip_hosts() above by making intervals and version optional?
| return netaddr.IPNetwork(text).ip | ||
|
|
||
|
|
||
| def annotate_host_address(queryset): |
There was a problem hiding this comment.
It might make sense to introduce a custom subclass of QuerySet for the IPAddress model to hold some of these functions, so that they can be executed on a queryset directly.
| ) | ||
|
|
||
|
|
||
| def filter_ip_hosts_between(queryset, first_ip, last_ip): |
There was a problem hiding this comment.
Did you happen to explore implementing this as a lookup? It would be nice to do e.g. .filter(address__host_between=[x, y]) if feasible.
| # Infer the Prefix model from the queryset/list so this helper does not need | ||
| # a model import at module scope. | ||
| prefix_model = getattr(prefix_list, 'model', None) or prefix_list[0].__class__ |
There was a problem hiding this comment.
Could we use apps.get_model() here instead?
|
|
||
| return max(self.size - count_distinct_ip_hosts(self.get_child_ips()), 0) | ||
|
|
||
| @property |
There was a problem hiding this comment.
We probably want to keep this as a method, both to avoid a breaking change and because it effects a database query.
| @property | ||
| def available_ip_count(self): |
There was a problem hiding this comment.
Same concern here as with available_ip_count() on IPRange. Just calling out for consistency.
Fixes: #21870
This PR provides an alternative approach to #21954 for improving prefix detail performance. Instead of deferring expensive rendering work, it reduces the cost of the underlying IP availability and utilization calculations.
The main change is replacing
netaddr.IPSet-based counting with SQL-side distinct host counts and Python-side interval merging. This avoids materializing large address sets in memory when rendering prefix and range summaries, especially for large prefixes with many child IPs.The new implementation:
IPAddresshost values in the databaseIPRangeintervals before counting address spaceavailable_ip_countIPRangestart/end addressesThis also includes a few related correctness improvements:
IPRange.get_child_ips()now compares host portions instead of full address/mask valuesrebuild_prefixes()can accept querysets while preserving compatibility with existing callersBaseIPField.get_prep_value()now handles zero addresses such as0.0.0.0and::Overall, this aims to improve the performance of the hot paths identified in #21870 while preserving the existing address-space semantics.