close
Skip to content

fix(fuzz): harden XSS context analyzer edge cases (#7086)#7279

Closed
mendarb wants to merge 1 commit into
projectdiscovery:devfrom
mendarb:fix/xss-context-edge-cases-7086
Closed

fix(fuzz): harden XSS context analyzer edge cases (#7086)#7279
mendarb wants to merge 1 commit into
projectdiscovery:devfrom
mendarb:fix/xss-context-edge-cases-7086

Conversation

@mendarb
Copy link
Copy Markdown

@mendarb mendarb commented Mar 20, 2026

Summary

Fixes #7086 — XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks.

/claim #7086

Note: PR #7208 and other previous attempts were closed. This is a fresh, minimal implementation that builds on the existing tokenizer-based analyzer already on dev.

Changes

1. WHATWG-compliant URI scheme normalization (normalizeURIScheme)

Per the WHATWG URL spec, browsers strip ASCII tab (0x09), newline (0x0A), and carriage return (0x0D) from URL schemes before parsing. This means URIs like java\tscript:alert(1) execute identically to javascript:alert(1) in navigable contexts (<a href>, <iframe src>, etc.).

The existing code used strings.TrimSpace + strings.ToLower which correctly handles leading whitespace and case, but did not strip embedded control characters. The new normalizeURIScheme function handles this per spec.

2. Regression tests for all 4 edge cases from #7086

Added 19 new test cases explicitly labeled with #7086: prefix:

Edge case Tests added
javascript: URI classification 7 tests — tab/newline/CR in scheme, formaction, iframe src, object data
JSON script blocks (non-executable) 5 tests — application/json, ld+json, charset params, mixed blocks, empty type
Case-insensitive reflection detection 4 tests — body, script, attribute, comment contexts
srcdoc attribute context 3 tests — basic, script injection, complex nested HTML

Plus a unit test for normalizeURIScheme with 10 cases.

Files changed

Proof

All 73 tests pass (54 existing + 19 new):

$ go test ./pkg/fuzz/analyzers/xss/... -v -count=1
--- PASS: TestAnalyzeReflectionContext (0.00s)
    --- PASS: #7086:_javascript:_URI_with_tab_in_scheme_(WHATWG_normalization) (0.00s)
    --- PASS: #7086:_javascript:_URI_with_newline_in_scheme (0.00s)
    --- PASS: #7086:_javascript:_URI_with_CR_in_scheme (0.00s)
    --- PASS: #7086:_javascript:_URI_with_mixed_whitespace_in_scheme (0.00s)
    --- PASS: #7086:_javascript:_URI_in_formaction_on_button (0.00s)
    --- PASS: #7086:_javascript:_URI_in_iframe_src (0.00s)
    --- PASS: #7086:_javascript:_URI_in_object_data_attr (0.00s)
    --- PASS: #7086:_script_type=application/json_is_ScriptData (0.00s)
    --- PASS: #7086:_script_type=application/ld+json_is_ScriptData (0.00s)
    --- PASS: #7086:_script_type=application/json_with_charset_param_is_ScriptData (0.00s)
    --- PASS: #7086:_mixed_JSON_and_executable_script_blocks (0.00s)
    --- PASS: #7086:_script_type_empty_string_is_executable (0.00s)
    --- PASS: #7086:_marker_fully_uppercased_in_body_still_detected (0.00s)
    --- PASS: #7086:_marker_with_random_casing_in_script (0.00s)
    --- PASS: #7086:_marker_case-insensitive_in_attribute_value (0.00s)
    --- PASS: #7086:_marker_case-insensitive_in_comment (0.00s)
    --- PASS: #7086:_srcdoc_on_iframe_is_HTMLBody_context (0.00s)
    --- PASS: #7086:_srcdoc_with_script_injection_is_HTMLBody (0.00s)
    --- PASS: #7086:_srcdoc_with_complex_nested_HTML (0.00s)
--- PASS: TestNormalizeURIScheme (0.00s)
PASS
ok  github.com/projectdiscovery/nuclei/v3/pkg/fuzz/analyzers/xss  0.818s

Checklist

  • PR created against dev branch
  • All existing tests still pass
  • Tests added that prove the fix is effective
  • go vet passes cleanly
  • Minimal diff — only touches the 2 XSS analyzer files

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced XSS analyzer to detect obfuscated JavaScript URIs with embedded whitespace characters (tabs, newlines, carriage returns), providing improved protection against evasion-based XSS attacks.
  • Tests

    • Significantly expanded test coverage for obfuscated scheme detection patterns, non-executable script MIME types, and case-insensitive marker detection across various contexts to ensure comprehensive validation.

… cases

Add WHATWG-compliant URI scheme normalization that strips ASCII tab,
newline, and carriage return characters before scheme detection, closing
a bypass where obfuscated URIs like "java\tscript:" would evade
classification as executable context.

Add 19 regression tests covering all 4 edge cases from projectdiscovery#7086:
1. javascript: URI classification with scheme obfuscation variants
2. JSON script block non-executable classification
3. Case-insensitive reflection detection across all contexts
4. srcdoc attribute nested HTML context handling

Fixes projectdiscovery#7086

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

Walkthrough

Enhanced XSS analyzer to detect obfuscated javascript: URI schemes by normalizing whitespace characters (tab, newline, carriage return) in URL attribute values before checking for dangerous scheme prefixes, improving classification accuracy for malicious URIs.

Changes

Cohort / File(s) Summary
URI Scheme Normalization
pkg/fuzz/analyzers/xss/analyzer.go
Added normalizeURIScheme function to collapse embedded whitespace characters in URI values. Updated URL-scheme detection logic for attributes in urlAttrs to use the new normalization function instead of basic string trimming and lowercasing, enabling detection of schemes obfuscated with tabs, newlines, or carriage returns.
Test Coverage for Scheme Normalization & Context Classification
pkg/fuzz/analyzers/xss/analyzer_test.go
Added TestNormalizeURIScheme unit test validating whitespace collapsing and case normalization in javascript: schemes. Expanded TestAnalyzeReflectionContext with regression cases for issue #7086, including assertions for executable-context detection in javascript: URIs with mixed whitespace, executable sink attributes (formaction, iframe src, object data), non-executable JSON MIME types in script blocks, case-insensitive marker detection, and iframe srcdoc classification.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A scheme once hidden by \t and \n,
Now caught by normalization's keen sight,
Whitespace stripped, obfuscation breaks thin,
XSS detection shines ever more bright! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(fuzz): harden XSS context analyzer edge cases (#7086)' accurately describes the main change: addressing XSS context analyzer edge cases documented in issue #7086.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from #7086: normalizeURIScheme handles obfuscated javascript: URIs [#7086], JSON script blocks with non-executable MIME types are classified as ContextScriptData [#7086], case-insensitive reflection detection [#7086], and srcdoc attribute context as HTMLBody [#7086].
Out of Scope Changes check ✅ Passed All code changes are directly scoped to the four objectives from #7086: URI scheme normalization, JSON script handling, case-insensitive reflection, and srcdoc context classification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/fuzz/analyzers/xss/analyzer.go (1)

390-430: ⚠️ Potential issue | 🟡 Minor

strings.TrimSpace diverges from WHATWG URL preprocessing, causing both false negatives and false positives.

The helper function at line 430 uses strings.TrimSpace, which trims Unicode whitespace (including U+00A0 NBSP). However, WHATWG URL preprocessing removes only leading/trailing C0 controls (U+0000–U+001F) and U+0020. This creates two issues:

  • \x01javascript: will not be trimmed by strings.TrimSpace but should be per WHATWG, leaving the dangerous prefix undetected (false negative).
  • \u00A0javascript: will be trimmed by strings.TrimSpace but should not be per WHATWG, potentially causing a false positive.

Replace the final trim with explicit ASCII C0+space handling:

Suggested fix
-	return strings.TrimSpace(strings.ToLower(b.String()))
+	return strings.TrimFunc(strings.ToLower(b.String()), func(r rune) bool {
+		return r <= 0x20
+	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 390 - 430, The
normalizeURIScheme function currently removes tab/newline/carriage returns from
anywhere and then calls strings.TrimSpace, which diverges from WHATWG
preprocessing; change normalizeURIScheme to trim only leading and trailing ASCII
C0 controls (rune values U+0000–U+001F) and ASCII space (U+0020) and then
lowercase the remaining string for prefix checks (do not strip interior
whitespace or NBSP). Locate normalizeURIScheme and replace the current rune loop
+ strings.TrimSpace call with logic that finds the first and last rune indices
that are not (r <= 0x1F || r == 0x20), slice the original string to that range,
and return strings.ToLower of that slice so schemes like "\x01javascript:" get
trimmed but "\u00A0javascript:" do not.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 390-430: The normalizeURIScheme function currently removes
tab/newline/carriage returns from anywhere and then calls strings.TrimSpace,
which diverges from WHATWG preprocessing; change normalizeURIScheme to trim only
leading and trailing ASCII C0 controls (rune values U+0000–U+001F) and ASCII
space (U+0020) and then lowercase the remaining string for prefix checks (do not
strip interior whitespace or NBSP). Locate normalizeURIScheme and replace the
current rune loop + strings.TrimSpace call with logic that finds the first and
last rune indices that are not (r <= 0x1F || r == 0x20), slice the original
string to that range, and return strings.ToLower of that slice so schemes like
"\x01javascript:" get trimmed but "\u00A0javascript:" do not.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c27cde5f-5831-48c4-9cf4-277f6b272165

📥 Commits

Reviewing files that changed from the base of the PR and between d5eafeb and 7745da0.

📒 Files selected for processing (2)
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/analyzer_test.go

@Mzack9999
Copy link
Copy Markdown
Member

Thank you for your contribution. Issue #7086 was originally filed against PR #7076's XSS context analyzer implementation, which was not merged. We adopted a different implementation (PR #7164) for issue #5838, which handles XSS context analysis differently. Since #7086's reported edge cases were specific to the unmerged PR #7076's approach, the issue itself is invalid against the current codebase. Closing this PR accordingly.

@Mzack9999 Mzack9999 closed this Mar 20, 2026
@neo-by-projectdiscovery-dev
Copy link
Copy Markdown

neo-by-projectdiscovery-dev Bot commented Mar 20, 2026

Neo - PR Security Review

Critical: 1 · Medium: 1

Highlights

Critical (1)
  • XSS bypass via leading/trailing C0 control characters in javascript: URIspkg/fuzz/analyzers/xss/analyzer.go:420
    The normalizeURIScheme function only strips tab (0x09), LF (0x0A), and CR (0x0D) from anywhere in the URL, then uses strings.TrimSpace to remove leading/trailing whitespace. However, the WHATWG URL spec requires browsers to strip ALL C0 control characters (0x00-0x1F) from leading and trailing positions before parsing. This includes NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), and other control characters. An attacker can prepend these characters to bypass detection.
Medium (1)
  • Missing test coverage for C0 control character bypass vectorspkg/fuzz/analyzers/xss/analyzer_test.go:602
    The TestNormalizeURIScheme function tests tab, newline, and CR stripping, but does not test other C0 control characters like NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07), or backspace (0x08) in leading/trailing positions. These are stripped by browsers per WHATWG spec but not tested here, leaving the bypass vulnerability undetected.
Security Impact

XSS bypass via leading/trailing C0 control characters in javascript: URIs (pkg/fuzz/analyzers/xss/analyzer.go:420):
Attacker can bypass XSS context detection by prepending C0 control characters like NULL byte to javascript: URIs. The analyzer will classify these as safe ContextHTMLAttributeURL instead of dangerous ContextScript, allowing XSS payloads to execute in browsers that strip the control characters per WHATWG spec.

Missing test coverage for C0 control character bypass vectors (pkg/fuzz/analyzers/xss/analyzer_test.go:602):
Missing test cases allow the C0 control character bypass vulnerability to remain undetected during development and code review, increasing the risk that attackers discover and exploit this XSS detection bypass in production.

Attack Examples

XSS bypass via leading/trailing C0 control characters in javascript: URIs (pkg/fuzz/analyzers/xss/analyzer.go:420):

<a href="\x00javascript:alert(document.cookie)">click</a> — Browser strips leading NULL byte and executes the javascript: URI, but analyzer fails to detect it because \x00javascript: doesn't match the prefix check.
Suggested Fixes

XSS bypass via leading/trailing C0 control characters in javascript: URIs (pkg/fuzz/analyzers/xss/analyzer.go:420):

Extend normalizeURIScheme to strip ALL C0 control characters (0x00-0x1F, not just tab/LF/CR) from leading and trailing positions. Replace strings.TrimSpace with a custom trim that removes C0 controls: for c >= 0x00 && c <= 0x1F || c == ' ', strip from both ends.

Missing test coverage for C0 control character bypass vectors (pkg/fuzz/analyzers/xss/analyzer_test.go:602):

Add test cases to TestNormalizeURIScheme for leading/trailing C0 control characters: {"leading NULL byte", "\x00javascript:alert(1)", "javascript:alert(1)"}, {"leading form feed", "\x0Cjavascript:alert(1)", "javascript:alert(1)"}, {"trailing NULL", "javascript:\x00", "javascript:"}, {"multiple C0 at start", "\x00\x0B\x0Cjavascript:x", "javascript:x"}. Also add integration tests to TestAnalyzeReflectionContext that verify these URIs are classified as ContextScript.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 420 in the normalizeURIScheme
function, replace the final `strings.TrimSpace(strings.ToLower(b.String()))`
line with a custom trim that removes ALL C0 control characters (0x00-0x1F) plus
space (0x20) from leading and trailing positions, per WHATWG URL spec section
4.3 which requires 'Remove any leading and trailing C0 control or space from
input'. Add a helper like `trimC0ControlsAndSpace(s string) string` that strips
runes where `c >= 0x00 && c <= 0x20` from both ends of the lowercased string.

In `@pkg/fuzz/analyzers/xss/analyzer_test.go` at line 602 in
TestNormalizeURIScheme, add 5 new test cases after line 617 to verify
leading/trailing C0 control character stripping: (1) leading NULL byte \x00, (2)
leading form feed \x0C, (3) leading vertical tab \x0B, (4) trailing NULL byte,
(5) multiple C0 controls at start. Each should verify the normalized output
matches the expected javascript: scheme without the control characters. Also add
corresponding integration tests in TestAnalyzeReflectionContext around line 435
that verify `<a href="\x00javascript:alert('FUZZ1337MARKER')">` returns
ContextScript.
Hardening Notes
  • Consider adding detection for blob: URLs in executable contexts — <iframe src="blob:..."> can load attacker-controlled HTML if the blob URL is XSS-injectable
  • The srcdoc test at line 545 treats <iframe srcdoc="<script>alert(MARKER)</script>"> as HTMLBody context — consider documenting that nested document contexts are not recursively analyzed, as this may affect payload generation strategies
  • Add a test case for mixed-case dangerous MIME types like text/JavaScript or APPLICATION/json to verify case normalization is consistent across script type detection

Comment @pdneo help for available commands. · Open in Neo

// per the WHATWG URL spec, then lowercases the result for prefix comparison.
// This ensures that obfuscated URIs like "java\tscript:" or "java\nscript:"
// are correctly identified as dangerous schemes.
func normalizeURIScheme(val string) string {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 XSS bypass via leading/trailing C0 control characters in javascript: URIs (CWE-79) — The normalizeURIScheme function only strips tab (0x09), LF (0x0A), and CR (0x0D) from anywhere in the URL, then uses strings.TrimSpace to remove leading/trailing whitespace. However, the WHATWG URL spec requires browsers to strip ALL C0 control characters (0x00-0x1F) from leading and trailing positions before parsing. This includes NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), and other control characters. An attacker can prepend these characters to bypass detection.

Attack Example
<a href="\x00javascript:alert(document.cookie)">click</a> — Browser strips leading NULL byte and executes the javascript: URI, but analyzer fails to detect it because \x00javascript: doesn't match the prefix check.
Suggested Fix
Extend normalizeURIScheme to strip ALL C0 control characters (0x00-0x1F, not just tab/LF/CR) from leading and trailing positions. Replace strings.TrimSpace with a custom trim that removes C0 controls: for c >= 0x00 && c <= 0x1F || c == ' ', strip from both ends.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 420 in the normalizeURIScheme
function, replace the final `strings.TrimSpace(strings.ToLower(b.String()))`
line with a custom trim that removes ALL C0 control characters (0x00-0x1F) plus
space (0x20) from leading and trailing positions, per WHATWG URL spec section
4.3 which requires 'Remove any leading and trailing C0 control or space from
input'. Add a helper like `trimC0ControlsAndSpace(s string) string` that strips
runes where `c >= 0x00 && c <= 0x20` from both ends of the lowercased string.

}
}

func TestNormalizeURIScheme(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing test coverage for C0 control character bypass vectors (CWE-1357) — The TestNormalizeURIScheme function tests tab, newline, and CR stripping, but does not test other C0 control characters like NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07), or backspace (0x08) in leading/trailing positions. These are stripped by browsers per WHATWG spec but not tested here, leaving the bypass vulnerability undetected.

Suggested Fix
Add test cases to TestNormalizeURIScheme for leading/trailing C0 control characters: {"leading NULL byte", "\x00javascript:alert(1)", "javascript:alert(1)"}, {"leading form feed", "\x0Cjavascript:alert(1)", "javascript:alert(1)"}, {"trailing NULL", "javascript:\x00", "javascript:"}, {"multiple C0 at start", "\x00\x0B\x0Cjavascript:x", "javascript:x"}. Also add integration tests to TestAnalyzeReflectionContext that verify these URIs are classified as ContextScript.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer_test.go` at line 602 in
TestNormalizeURIScheme, add 5 new test cases after line 617 to verify
leading/trailing C0 control character stripping: (1) leading NULL byte \x00, (2)
leading form feed \x0C, (3) leading vertical tab \x0B, (4) trailing NULL byte,
(5) multiple C0 controls at start. Each should verify the normalized output
matches the expected javascript: scheme without the control characters. Also add
corresponding integration tests in TestAnalyzeReflectionContext around line 435
that verify `<a href="\x00javascript:alert('FUZZ1337MARKER')">` returns
ContextScript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks

2 participants