fix(fuzz): harden XSS context analyzer edge cases (#7086)#7279
Conversation
… cases Add WHATWG-compliant URI scheme normalization that strips ASCII tab, newline, and carriage return characters before scheme detection, closing a bypass where obfuscated URIs like "java\tscript:" would evade classification as executable context. Add 19 regression tests covering all 4 edge cases from projectdiscovery#7086: 1. javascript: URI classification with scheme obfuscation variants 2. JSON script block non-executable classification 3. Case-insensitive reflection detection across all contexts 4. srcdoc attribute nested HTML context handling Fixes projectdiscovery#7086 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughEnhanced XSS analyzer to detect obfuscated Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/fuzz/analyzers/xss/analyzer.go (1)
390-430:⚠️ Potential issue | 🟡 Minor
strings.TrimSpacediverges from WHATWG URL preprocessing, causing both false negatives and false positives.The helper function at line 430 uses
strings.TrimSpace, which trims Unicode whitespace (including U+00A0 NBSP). However, WHATWG URL preprocessing removes only leading/trailing C0 controls (U+0000–U+001F) and U+0020. This creates two issues:
\x01javascript:will not be trimmed bystrings.TrimSpacebut should be per WHATWG, leaving the dangerous prefix undetected (false negative).\u00A0javascript:will be trimmed bystrings.TrimSpacebut should not be per WHATWG, potentially causing a false positive.Replace the final trim with explicit ASCII C0+space handling:
Suggested fix
- return strings.TrimSpace(strings.ToLower(b.String())) + return strings.TrimFunc(strings.ToLower(b.String()), func(r rune) bool { + return r <= 0x20 + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 390 - 430, The normalizeURIScheme function currently removes tab/newline/carriage returns from anywhere and then calls strings.TrimSpace, which diverges from WHATWG preprocessing; change normalizeURIScheme to trim only leading and trailing ASCII C0 controls (rune values U+0000–U+001F) and ASCII space (U+0020) and then lowercase the remaining string for prefix checks (do not strip interior whitespace or NBSP). Locate normalizeURIScheme and replace the current rune loop + strings.TrimSpace call with logic that finds the first and last rune indices that are not (r <= 0x1F || r == 0x20), slice the original string to that range, and return strings.ToLower of that slice so schemes like "\x01javascript:" get trimmed but "\u00A0javascript:" do not.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 390-430: The normalizeURIScheme function currently removes
tab/newline/carriage returns from anywhere and then calls strings.TrimSpace,
which diverges from WHATWG preprocessing; change normalizeURIScheme to trim only
leading and trailing ASCII C0 controls (rune values U+0000–U+001F) and ASCII
space (U+0020) and then lowercase the remaining string for prefix checks (do not
strip interior whitespace or NBSP). Locate normalizeURIScheme and replace the
current rune loop + strings.TrimSpace call with logic that finds the first and
last rune indices that are not (r <= 0x1F || r == 0x20), slice the original
string to that range, and return strings.ToLower of that slice so schemes like
"\x01javascript:" get trimmed but "\u00A0javascript:" do not.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c27cde5f-5831-48c4-9cf4-277f6b272165
📒 Files selected for processing (2)
pkg/fuzz/analyzers/xss/analyzer.gopkg/fuzz/analyzers/xss/analyzer_test.go
|
Thank you for your contribution. Issue #7086 was originally filed against PR #7076's XSS context analyzer implementation, which was not merged. We adopted a different implementation (PR #7164) for issue #5838, which handles XSS context analysis differently. Since #7086's reported edge cases were specific to the unmerged PR #7076's approach, the issue itself is invalid against the current codebase. Closing this PR accordingly. |
Neo - PR Security ReviewCritical: 1 · Medium: 1 Highlights
Critical (1)
Medium (1)
Security ImpactXSS bypass via leading/trailing C0 control characters in javascript: URIs ( Missing test coverage for C0 control character bypass vectors ( Attack ExamplesXSS bypass via leading/trailing C0 control characters in javascript: URIs ( Suggested FixesXSS bypass via leading/trailing C0 control characters in javascript: URIs ( Missing test coverage for C0 control character bypass vectors ( 🤖 Prompt for AI AgentsHardening Notes
Comment |
| // per the WHATWG URL spec, then lowercases the result for prefix comparison. | ||
| // This ensures that obfuscated URIs like "java\tscript:" or "java\nscript:" | ||
| // are correctly identified as dangerous schemes. | ||
| func normalizeURIScheme(val string) string { |
There was a problem hiding this comment.
🔴 XSS bypass via leading/trailing C0 control characters in javascript: URIs (CWE-79) — The normalizeURIScheme function only strips tab (0x09), LF (0x0A), and CR (0x0D) from anywhere in the URL, then uses strings.TrimSpace to remove leading/trailing whitespace. However, the WHATWG URL spec requires browsers to strip ALL C0 control characters (0x00-0x1F) from leading and trailing positions before parsing. This includes NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), and other control characters. An attacker can prepend these characters to bypass detection.
Attack Example
<a href="\x00javascript:alert(document.cookie)">click</a> — Browser strips leading NULL byte and executes the javascript: URI, but analyzer fails to detect it because \x00javascript: doesn't match the prefix check.
Suggested Fix
Extend normalizeURIScheme to strip ALL C0 control characters (0x00-0x1F, not just tab/LF/CR) from leading and trailing positions. Replace strings.TrimSpace with a custom trim that removes C0 controls: for c >= 0x00 && c <= 0x1F || c == ' ', strip from both ends.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 420 in the normalizeURIScheme
function, replace the final `strings.TrimSpace(strings.ToLower(b.String()))`
line with a custom trim that removes ALL C0 control characters (0x00-0x1F) plus
space (0x20) from leading and trailing positions, per WHATWG URL spec section
4.3 which requires 'Remove any leading and trailing C0 control or space from
input'. Add a helper like `trimC0ControlsAndSpace(s string) string` that strips
runes where `c >= 0x00 && c <= 0x20` from both ends of the lowercased string.
| } | ||
| } | ||
|
|
||
| func TestNormalizeURIScheme(t *testing.T) { |
There was a problem hiding this comment.
🟡 Missing test coverage for C0 control character bypass vectors (CWE-1357) — The TestNormalizeURIScheme function tests tab, newline, and CR stripping, but does not test other C0 control characters like NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07), or backspace (0x08) in leading/trailing positions. These are stripped by browsers per WHATWG spec but not tested here, leaving the bypass vulnerability undetected.
Suggested Fix
Add test cases to TestNormalizeURIScheme for leading/trailing C0 control characters: {"leading NULL byte", "\x00javascript:alert(1)", "javascript:alert(1)"}, {"leading form feed", "\x0Cjavascript:alert(1)", "javascript:alert(1)"}, {"trailing NULL", "javascript:\x00", "javascript:"}, {"multiple C0 at start", "\x00\x0B\x0Cjavascript:x", "javascript:x"}. Also add integration tests to TestAnalyzeReflectionContext that verify these URIs are classified as ContextScript.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/fuzz/analyzers/xss/analyzer_test.go` at line 602 in
TestNormalizeURIScheme, add 5 new test cases after line 617 to verify
leading/trailing C0 control character stripping: (1) leading NULL byte \x00, (2)
leading form feed \x0C, (3) leading vertical tab \x0B, (4) trailing NULL byte,
(5) multiple C0 controls at start. Each should verify the normalized output
matches the expected javascript: scheme without the control characters. Also add
corresponding integration tests in TestAnalyzeReflectionContext around line 435
that verify `<a href="\x00javascript:alert('FUZZ1337MARKER')">` returns
ContextScript.
Summary
Fixes #7086 — XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks.
/claim #7086
Changes
1. WHATWG-compliant URI scheme normalization (
normalizeURIScheme)Per the WHATWG URL spec, browsers strip ASCII tab (0x09), newline (0x0A), and carriage return (0x0D) from URL schemes before parsing. This means URIs like
java\tscript:alert(1)execute identically tojavascript:alert(1)in navigable contexts (<a href>,<iframe src>, etc.).The existing code used
strings.TrimSpace+strings.ToLowerwhich correctly handles leading whitespace and case, but did not strip embedded control characters. The newnormalizeURISchemefunction handles this per spec.2. Regression tests for all 4 edge cases from #7086
Added 19 new test cases explicitly labeled with
#7086:prefix:Plus a unit test for
normalizeURISchemewith 10 cases.Files changed
pkg/fuzz/analyzers/xss/analyzer.go— AddednormalizeURIScheme()+ use it inclassifyAttributeContextpkg/fuzz/analyzers/xss/analyzer_test.go— 19 regression tests for XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks #7086 + 10 unit tests for URI normalizationProof
All 73 tests pass (54 existing + 19 new):
Checklist
devbranchgo vetpasses cleanlySummary by CodeRabbit
Bug Fixes
Tests