fix(fuzz): harden XSS context analyzer edge cases (#7086) by mendarb · Pull Request #7279 · projectdiscovery/nuclei

mendarb · 2026-03-20T15:25:59Z

Summary

Fixes #7086 — XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks.

Note: PR #7208 and other previous attempts were closed. This is a fresh, minimal implementation that builds on the existing tokenizer-based analyzer already on dev.

Changes

1. WHATWG-compliant URI scheme normalization (normalizeURIScheme)

Per the WHATWG URL spec, browsers strip ASCII tab (0x09), newline (0x0A), and carriage return (0x0D) from URL schemes before parsing. This means URIs like java\tscript:alert(1) execute identically to javascript:alert(1) in navigable contexts (<a href>, <iframe src>, etc.).

The existing code used strings.TrimSpace + strings.ToLower which correctly handles leading whitespace and case, but did not strip embedded control characters. The new normalizeURIScheme function handles this per spec.

2. Regression tests for all 4 edge cases from #7086

Added 19 new test cases explicitly labeled with #7086: prefix:

Edge case	Tests added
javascript: URI classification	7 tests — tab/newline/CR in scheme, formaction, iframe src, object data
JSON script blocks (non-executable)	5 tests — application/json, ld+json, charset params, mixed blocks, empty type
Case-insensitive reflection detection	4 tests — body, script, attribute, comment contexts
srcdoc attribute context	3 tests — basic, script injection, complex nested HTML

Plus a unit test for normalizeURIScheme with 10 cases.

Files changed

pkg/fuzz/analyzers/xss/analyzer.go — Added normalizeURIScheme() + use it in classifyAttributeContext
pkg/fuzz/analyzers/xss/analyzer_test.go — 19 regression tests for XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks #7086 + 10 unit tests for URI normalization

Proof

All 73 tests pass (54 existing + 19 new):

$ go test ./pkg/fuzz/analyzers/xss/... -v -count=1
--- PASS: TestAnalyzeReflectionContext (0.00s)
    --- PASS: #7086:_javascript:_URI_with_tab_in_scheme_(WHATWG_normalization) (0.00s)
    --- PASS: #7086:_javascript:_URI_with_newline_in_scheme (0.00s)
    --- PASS: #7086:_javascript:_URI_with_CR_in_scheme (0.00s)
    --- PASS: #7086:_javascript:_URI_with_mixed_whitespace_in_scheme (0.00s)
    --- PASS: #7086:_javascript:_URI_in_formaction_on_button (0.00s)
    --- PASS: #7086:_javascript:_URI_in_iframe_src (0.00s)
    --- PASS: #7086:_javascript:_URI_in_object_data_attr (0.00s)
    --- PASS: #7086:_script_type=application/json_is_ScriptData (0.00s)
    --- PASS: #7086:_script_type=application/ld+json_is_ScriptData (0.00s)
    --- PASS: #7086:_script_type=application/json_with_charset_param_is_ScriptData (0.00s)
    --- PASS: #7086:_mixed_JSON_and_executable_script_blocks (0.00s)
    --- PASS: #7086:_script_type_empty_string_is_executable (0.00s)
    --- PASS: #7086:_marker_fully_uppercased_in_body_still_detected (0.00s)
    --- PASS: #7086:_marker_with_random_casing_in_script (0.00s)
    --- PASS: #7086:_marker_case-insensitive_in_attribute_value (0.00s)
    --- PASS: #7086:_marker_case-insensitive_in_comment (0.00s)
    --- PASS: #7086:_srcdoc_on_iframe_is_HTMLBody_context (0.00s)
    --- PASS: #7086:_srcdoc_with_script_injection_is_HTMLBody (0.00s)
    --- PASS: #7086:_srcdoc_with_complex_nested_HTML (0.00s)
--- PASS: TestNormalizeURIScheme (0.00s)
PASS
ok  github.com/projectdiscovery/nuclei/v3/pkg/fuzz/analyzers/xss  0.818s

Checklist

PR created against dev branch
All existing tests still pass
Tests added that prove the fix is effective
go vet passes cleanly
Minimal diff — only touches the 2 XSS analyzer files

Summary by CodeRabbit

Bug Fixes
- Enhanced XSS analyzer to detect obfuscated JavaScript URIs with embedded whitespace characters (tabs, newlines, carriage returns), providing improved protection against evasion-based XSS attacks.
Tests
- Significantly expanded test coverage for obfuscated scheme detection patterns, non-executable script MIME types, and case-insensitive marker detection across various contexts to ensure comprehensive validation.

… cases Add WHATWG-compliant URI scheme normalization that strips ASCII tab, newline, and carriage return characters before scheme detection, closing a bypass where obfuscated URIs like "java\tscript:" would evade classification as executable context. Add 19 regression tests covering all 4 edge cases from projectdiscovery#7086: 1. javascript: URI classification with scheme obfuscation variants 2. JSON script block non-executable classification 3. Case-insensitive reflection detection across all contexts 4. srcdoc attribute nested HTML context handling Fixes projectdiscovery#7086 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-20T15:26:17Z

Walkthrough

Enhanced XSS analyzer to detect obfuscated javascript: URI schemes by normalizing whitespace characters (tab, newline, carriage return) in URL attribute values before checking for dangerous scheme prefixes, improving classification accuracy for malicious URIs.

Changes

Cohort / File(s)	Summary
URI Scheme Normalization `pkg/fuzz/analyzers/xss/analyzer.go`	Added `normalizeURIScheme` function to collapse embedded whitespace characters in URI values. Updated URL-scheme detection logic for attributes in `urlAttrs` to use the new normalization function instead of basic string trimming and lowercasing, enabling detection of schemes obfuscated with tabs, newlines, or carriage returns.
Test Coverage for Scheme Normalization & Context Classification `pkg/fuzz/analyzers/xss/analyzer_test.go`	Added `TestNormalizeURIScheme` unit test validating whitespace collapsing and case normalization in `javascript:` schemes. Expanded `TestAnalyzeReflectionContext` with regression cases for issue `#7086`, including assertions for executable-context detection in `javascript:` URIs with mixed whitespace, executable sink attributes (`formaction`, `iframe src`, `object data`), non-executable JSON MIME types in script blocks, case-insensitive marker detection, and `iframe srcdoc` classification.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A scheme once hidden by \t and \n,
Now caught by normalization's keen sight,
Whitespace stripped, obfuscation breaks thin,
XSS detection shines ever more bright! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(fuzz): harden XSS context analyzer edge cases (`#7086`)' accurately describes the main change: addressing XSS context analyzer edge cases documented in issue `#7086`.
Linked Issues check	✅ Passed	The PR fully implements all coding requirements from `#7086`: normalizeURIScheme handles obfuscated javascript: URIs [`#7086`], JSON script blocks with non-executable MIME types are classified as ContextScriptData [`#7086`], case-insensitive reflection detection [`#7086`], and srcdoc attribute context as HTMLBody [`#7086`].
Out of Scope Changes check	✅ Passed	All code changes are directly scoped to the four objectives from `#7086`: URI scheme normalization, JSON script handling, case-insensitive reflection, and srcdoc context classification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/fuzz/analyzers/xss/analyzer.go (1)
390-430: ⚠️ Potential issue | 🟡 Minor

strings.TrimSpace diverges from WHATWG URL preprocessing, causing both false negatives and false positives.

The helper function at line 430 uses strings.TrimSpace, which trims Unicode whitespace (including U+00A0 NBSP). However, WHATWG URL preprocessing removes only leading/trailing C0 controls (U+0000–U+001F) and U+0020. This creates two issues:

\x01javascript: will not be trimmed by strings.TrimSpace but should be per WHATWG, leaving the dangerous prefix undetected (false negative).

\u00A0javascript: will be trimmed by strings.TrimSpace but should not be per WHATWG, potentially causing a false positive.

Replace the final trim with explicit ASCII C0+space handling:
Suggested fix
-	return strings.TrimSpace(strings.ToLower(b.String()))
+	return strings.TrimFunc(strings.ToLower(b.String()), func(r rune) bool {
+		return r <= 0x20
+	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 390 - 430, The
normalizeURIScheme function currently removes tab/newline/carriage returns from
anywhere and then calls strings.TrimSpace, which diverges from WHATWG
preprocessing; change normalizeURIScheme to trim only leading and trailing ASCII
C0 controls (rune values U+0000–U+001F) and ASCII space (U+0020) and then
lowercase the remaining string for prefix checks (do not strip interior
whitespace or NBSP). Locate normalizeURIScheme and replace the current rune loop
+ strings.TrimSpace call with logic that finds the first and last rune indices
that are not (r <= 0x1F || r == 0x20), slice the original string to that range,
and return strings.ToLower of that slice so schemes like "\x01javascript:" get
trimmed but "\u00A0javascript:" do not.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 390-430: The normalizeURIScheme function currently removes
tab/newline/carriage returns from anywhere and then calls strings.TrimSpace,
which diverges from WHATWG preprocessing; change normalizeURIScheme to trim only
leading and trailing ASCII C0 controls (rune values U+0000–U+001F) and ASCII
space (U+0020) and then lowercase the remaining string for prefix checks (do not
strip interior whitespace or NBSP). Locate normalizeURIScheme and replace the
current rune loop + strings.TrimSpace call with logic that finds the first and
last rune indices that are not (r <= 0x1F || r == 0x20), slice the original
string to that range, and return strings.ToLower of that slice so schemes like
"\x01javascript:" get trimmed but "\u00A0javascript:" do not.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c27cde5f-5831-48c4-9cf4-277f6b272165

📥 Commits

Reviewing files that changed from the base of the PR and between d5eafeb and 7745da0.

📒 Files selected for processing (2)

pkg/fuzz/analyzers/xss/analyzer.go
pkg/fuzz/analyzers/xss/analyzer_test.go

Mzack9999 · 2026-03-20T16:07:12Z

Thank you for your contribution. Issue #7086 was originally filed against PR #7076's XSS context analyzer implementation, which was not merged. We adopted a different implementation (PR #7164) for issue #5838, which handles XSS context analysis differently. Since #7086's reported edge cases were specific to the unmerged PR #7076's approach, the issue itself is invalid against the current codebase. Closing this PR accordingly.

neo-by-projectdiscovery-dev · 2026-03-20T17:29:54Z

Neo - PR Security Review

Critical: 1 · Medium: 1

Highlights

Adds WHATWG-compliant URI normalization to detect obfuscated javascript: URIs with embedded tab/newline/CR characters
Comprehensive test coverage with 19 new regression tests for issue XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks #7086 edge cases
Correctly distinguishes executable vs non-executable script blocks (JSON vs JavaScript MIME types)

Critical (1)

XSS bypass via leading/trailing C0 control characters in javascript: URIs — pkg/fuzz/analyzers/xss/analyzer.go:420
The normalizeURIScheme function only strips tab (0x09), LF (0x0A), and CR (0x0D) from anywhere in the URL, then uses strings.TrimSpace to remove leading/trailing whitespace. However, the WHATWG URL spec requires browsers to strip ALL C0 control characters (0x00-0x1F) from leading and trailing positions before parsing. This includes NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), and other control characters. An attacker can prepend these characters to bypass detection.

Medium (1)

Missing test coverage for C0 control character bypass vectors — pkg/fuzz/analyzers/xss/analyzer_test.go:602
The TestNormalizeURIScheme function tests tab, newline, and CR stripping, but does not test other C0 control characters like NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07), or backspace (0x08) in leading/trailing positions. These are stripped by browsers per WHATWG spec but not tested here, leaving the bypass vulnerability undetected.

Security Impact

XSS bypass via leading/trailing C0 control characters in javascript: URIs (pkg/fuzz/analyzers/xss/analyzer.go:420):
Attacker can bypass XSS context detection by prepending C0 control characters like NULL byte to javascript: URIs. The analyzer will classify these as safe ContextHTMLAttributeURL instead of dangerous ContextScript, allowing XSS payloads to execute in browsers that strip the control characters per WHATWG spec.

Missing test coverage for C0 control character bypass vectors (pkg/fuzz/analyzers/xss/analyzer_test.go:602):
Missing test cases allow the C0 control character bypass vulnerability to remain undetected during development and code review, increasing the risk that attackers discover and exploit this XSS detection bypass in production.

Attack Examples

XSS bypass via leading/trailing C0 control characters in javascript: URIs (pkg/fuzz/analyzers/xss/analyzer.go:420):

<a href="\x00javascript:alert(document.cookie)">click</a> — Browser strips leading NULL byte and executes the javascript: URI, but analyzer fails to detect it because \x00javascript: doesn't match the prefix check.

Suggested Fixes

XSS bypass via leading/trailing C0 control characters in javascript: URIs (pkg/fuzz/analyzers/xss/analyzer.go:420):

Extend normalizeURIScheme to strip ALL C0 control characters (0x00-0x1F, not just tab/LF/CR) from leading and trailing positions. Replace strings.TrimSpace with a custom trim that removes C0 controls: for c >= 0x00 && c <= 0x1F || c == ' ', strip from both ends.

Missing test coverage for C0 control character bypass vectors (pkg/fuzz/analyzers/xss/analyzer_test.go:602):

Add test cases to TestNormalizeURIScheme for leading/trailing C0 control characters: {"leading NULL byte", "\x00javascript:alert(1)", "javascript:alert(1)"}, {"leading form feed", "\x0Cjavascript:alert(1)", "javascript:alert(1)"}, {"trailing NULL", "javascript:\x00", "javascript:"}, {"multiple C0 at start", "\x00\x0B\x0Cjavascript:x", "javascript:x"}. Also add integration tests to TestAnalyzeReflectionContext that verify these URIs are classified as ContextScript.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 420 in the normalizeURIScheme
function, replace the final `strings.TrimSpace(strings.ToLower(b.String()))`
line with a custom trim that removes ALL C0 control characters (0x00-0x1F) plus
space (0x20) from leading and trailing positions, per WHATWG URL spec section
4.3 which requires 'Remove any leading and trailing C0 control or space from
input'. Add a helper like `trimC0ControlsAndSpace(s string) string` that strips
runes where `c >= 0x00 && c <= 0x20` from both ends of the lowercased string.

In `@pkg/fuzz/analyzers/xss/analyzer_test.go` at line 602 in
TestNormalizeURIScheme, add 5 new test cases after line 617 to verify
leading/trailing C0 control character stripping: (1) leading NULL byte \x00, (2)
leading form feed \x0C, (3) leading vertical tab \x0B, (4) trailing NULL byte,
(5) multiple C0 controls at start. Each should verify the normalized output
matches the expected javascript: scheme without the control characters. Also add
corresponding integration tests in TestAnalyzeReflectionContext around line 435
that verify `<a href="\x00javascript:alert('FUZZ1337MARKER')">` returns
ContextScript.

Hardening Notes

Consider adding detection for blob: URLs in executable contexts — <iframe src="blob:..."> can load attacker-controlled HTML if the blob URL is XSS-injectable
The srcdoc test at line 545 treats <iframe srcdoc="<script>alert(MARKER)</script>"> as HTMLBody context — consider documenting that nested document contexts are not recursively analyzed, as this may affect payload generation strategies
Add a test case for mixed-case dangerous MIME types like text/JavaScript or APPLICATION/json to verify case normalization is consistent across script type detection

_{Comment @pdneo help for available commands. · Open in Neo}

neo-by-projectdiscovery-dev · 2026-03-20T17:43:28Z

+// per the WHATWG URL spec, then lowercases the result for prefix comparison.
+// This ensures that obfuscated URIs like "java\tscript:" or "java\nscript:"
+// are correctly identified as dangerous schemes.
+func normalizeURIScheme(val string) string {


🔴 XSS bypass via leading/trailing C0 control characters in javascript: URIs (CWE-79) — The normalizeURIScheme function only strips tab (0x09), LF (0x0A), and CR (0x0D) from anywhere in the URL, then uses strings.TrimSpace to remove leading/trailing whitespace. However, the WHATWG URL spec requires browsers to strip ALL C0 control characters (0x00-0x1F) from leading and trailing positions before parsing. This includes NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), and other control characters. An attacker can prepend these characters to bypass detection.

Attack Example

<a href="\x00javascript:alert(document.cookie)">click</a> — Browser strips leading NULL byte and executes the javascript: URI, but analyzer fails to detect it because \x00javascript: doesn't match the prefix check.

Suggested Fix

Extend normalizeURIScheme to strip ALL C0 control characters (0x00-0x1F, not just tab/LF/CR) from leading and trailing positions. Replace strings.TrimSpace with a custom trim that removes C0 controls: for c >= 0x00 && c <= 0x1F || c == ' ', strip from both ends.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/analyzer.go` at line 420 in the normalizeURIScheme function, replace the final `strings.TrimSpace(strings.ToLower(b.String()))` line with a custom trim that removes ALL C0 control characters (0x00-0x1F) plus space (0x20) from leading and trailing positions, per WHATWG URL spec section 4.3 which requires 'Remove any leading and trailing C0 control or space from input'. Add a helper like `trimC0ControlsAndSpace(s string) string` that strips runes where `c >= 0x00 && c <= 0x20` from both ends of the lowercased string.

neo-by-projectdiscovery-dev · 2026-03-20T17:43:28Z

 	}
 }

+func TestNormalizeURIScheme(t *testing.T) {


🟡 Missing test coverage for C0 control character bypass vectors (CWE-1357) — The TestNormalizeURIScheme function tests tab, newline, and CR stripping, but does not test other C0 control characters like NULL byte (0x00), form feed (0x0C), vertical tab (0x0B), bell (0x07), or backspace (0x08) in leading/trailing positions. These are stripped by browsers per WHATWG spec but not tested here, leaving the bypass vulnerability undetected.

Suggested Fix

Add test cases to TestNormalizeURIScheme for leading/trailing C0 control characters: {"leading NULL byte", "\x00javascript:alert(1)", "javascript:alert(1)"}, {"leading form feed", "\x0Cjavascript:alert(1)", "javascript:alert(1)"}, {"trailing NULL", "javascript:\x00", "javascript:"}, {"multiple C0 at start", "\x00\x0B\x0Cjavascript:x", "javascript:x"}. Also add integration tests to TestAnalyzeReflectionContext that verify these URIs are classified as ContextScript.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/analyzer_test.go` at line 602 in TestNormalizeURIScheme, add 5 new test cases after line 617 to verify leading/trailing C0 control character stripping: (1) leading NULL byte \x00, (2) leading form feed \x0C, (3) leading vertical tab \x0B, (4) trailing NULL byte, (5) multiple C0 controls at start. Each should verify the normalized output matches the expected javascript: scheme without the control characters. Also add corresponding integration tests in TestAnalyzeReflectionContext around line 435 that verify `<a href="\x00javascript:alert('FUZZ1337MARKER')">` returns ContextScript.

algora-pbc Bot added the 🙋 Bounty claim label Mar 20, 2026

auto-assign Bot requested a review from Mzack9999 March 20, 2026 15:26

mendarb mentioned this pull request Mar 20, 2026

XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks #7086

Closed

1 task

coderabbitai Bot reviewed Mar 20, 2026

View reviewed changes

Mzack9999 closed this Mar 20, 2026

neo-by-projectdiscovery-dev Bot reviewed Mar 20, 2026

View reviewed changes

mendarb mentioned this pull request Mar 20, 2026

fix(xss): strip all C0 control chars from URI schemes per WHATWG spec #7280

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fuzz): harden XSS context analyzer edge cases (#7086)#7279

fix(fuzz): harden XSS context analyzer edge cases (#7086)#7279
mendarb wants to merge 1 commit into
projectdiscovery:devfrom
mendarb:fix/xss-context-edge-cases-7086

mendarb commented Mar 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 20, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Mzack9999 commented Mar 20, 2026

Uh oh!

neo-by-projectdiscovery-dev Bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

neo-by-projectdiscovery-dev Bot Mar 20, 2026

Uh oh!

neo-by-projectdiscovery-dev Bot Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mendarb commented Mar 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Files changed

Proof

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Mzack9999 commented Mar 20, 2026

Uh oh!

neo-by-projectdiscovery-dev Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Neo - PR Security Review

Highlights

Uh oh!

neo-by-projectdiscovery-dev Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

neo-by-projectdiscovery-dev Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mendarb commented Mar 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 20, 2026 •

edited

Loading

neo-by-projectdiscovery-dev Bot commented Mar 20, 2026 •

edited

Loading