close
Skip to content

fix(xss): fix 4 context classification edge cases (#7086)#7208

Closed
Gengyscan wants to merge 12 commits into
projectdiscovery:devfrom
Gengyscan:fix/xss-context-edge-cases
Closed

fix(xss): fix 4 context classification edge cases (#7086)#7208
Gengyscan wants to merge 12 commits into
projectdiscovery:devfrom
Gengyscan:fix/xss-context-edge-cases

Conversation

@Gengyscan
Copy link
Copy Markdown
Contributor

@Gengyscan Gengyscan commented Mar 13, 2026

Fix: 4 XSS Context Analyzer Edge Cases

Fixes #7086

Changes

Based on the XSS context analyzer from #7076, this PR fixes all 4 misclassification edge cases:

1. javascript: URIs in href/action/formaction

Attribute values starting with javascript: are now classified as ContextScript instead of ContextAttribute.

2. JSON script blocks

<script type="application/json"> and similar non-executable types are no longer flagged as ContextScript. Added isExecutableScriptType() helper that defers the inScript decision until after reading the ype attribute. #### 3. Case-insensitive marker detection The fast-path exit (strings.Contains) now uses strings.ToLower for both body and marker, preventing missed reflections when case differs. #### 4. srcdoc attribute context srcdoc attributes are classified as ContextHTMLText since they contain nested HTML that gets parsed by the browser. ### Testing - 4 new test cases in context_test.go, one per edge case - All existing tests preserved ### Files - pkg/fuzz/analyzers/xss/context.go — Core fixes in DetectReflections tokenizer - pkg/fuzz/analyzers/xss/context_test.go — 4 new tests - pkg/fuzz/analyzers/xss/analyzer.go — Analyzer integration (from #7076) - pkg/fuzz/analyzers/xss/types.go — Type definitions (from #7076) - pkg/fuzz/analyzers/analyzers.go — Analyzer registry (from #7076) - pkg/protocols/http/*.go — HTTP protocol integration (from #7076) ## Summary by CodeRabbit * **New Features** * Added an XSS context analyzer ("xss_context") that detects reflected markers across HTML (tags, attributes, scripts, comments) and replays context‑appropriate payloads to confirm issues. * **Improvements** * Analyzer now records response body, headers, and status for richer results. * Improved thread‑safety for random utilities used during concurrent fuzzing. * Ensured analyzer parameters are initialized to avoid nil‑map issues. * Enhanced request/fuzz integration and payload handling. * **Tests** * Added extensive tests and benchmarks for XSS detection and context handling.

@auto-assign auto-assign Bot requested a review from dwisiswant0 March 13, 2026 18:48
@neo-by-projectdiscovery-dev
Copy link
Copy Markdown

neo-by-projectdiscovery-dev Bot commented Mar 13, 2026

Neo - PR Security Review

No security issues found

Highlights

  • Commit abf46b5 contains code cleanup/refactoring (net -17 lines across 3 files)
  • All 4 XSS context analyzer edge case fixes verified intact
  • Both critical security fixes remain properly implemented
Hardening Notes
  • Consider adding text/xml and application/xml to the dangerous data URI MIME types list in pkg/fuzz/analyzers/xss/context.go line 95 for defense in depth, as these may be rendered as XHTML in some browser contexts

Comment @pdneo help for available commands. · Open in Neo

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

Walkthrough

Adds a new XSS context analyzer (canary‑based reflection detection, context classification, payload selection, replay verification) with extensive tests and benchmarks; introduces thread‑safe random utilities (exported RandStringBytesMask, mutex‑guarded GetRandomInteger) and new response fields on Options; registers the analyzer via side‑effect import and guards analyzer parameter map initialization.

Changes

Cohort / File(s) Summary
Analyzer core & utilities
pkg/fuzz/analyzers/analyzers.go
Added package-level randomMu (sync.Mutex), mutex‑guarded random helpers (GetRandomInteger, exported RandStringBytesMask), replaced private random usage, and added new Options fields: ResponseBody, ResponseHeaders, ResponseStatusCode.
XSS analyzer implementation
pkg/fuzz/analyzers/xss/analyzer.go
New Analyzer type registered as "xss_context" with ApplyInitialTransformation, Analyze, canary generation, character survival detection, context‑aware payload selection, and replayAndVerify.
HTML reflection tokenizer/classifier
pkg/fuzz/analyzers/xss/context.go
New DetectReflections(body, marker) and helpers to classify reflections across tag names, attributes (quoted/unquoted, event handlers, javascript: URIs, srcdoc), scripts/styles/comments/RCDATA; includes BestReflection.
XSS types & helpers
pkg/fuzz/analyzers/xss/types.go
Introduced Context enum (String()/priority()), ReflectionInfo, CharacterSet, event handler/script‑attribute maps, isEventHandler/isScriptURLAttribute, and normalizeURIScheme.
XSS tests & benchmarks
pkg/fuzz/analyzers/xss/context_test.go
Extensive unit tests and benchmarks covering reflection detection, script‑string contexts, payload selection, character survival, HTML/CSP checks, edge cases, and performance.
HTTP integration
pkg/protocols/http/http.go
Added side‑effect import _ "github.com/projectdiscovery/nuclei/v3/pkg/fuzz/analyzers/xss" to register the analyzer and integrated analyzer usage into request flow.
Request fuzzing init guard
pkg/protocols/http/request_fuzz.go
Ensure request.Analyzer.Parameters map is initialized when applying an analyzer to avoid nil‑map assignments.
Module manifest
go.mod
Module file updated to include new analyzer package dependencies.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FuzzEngine as Fuzz Engine
    participant Analyzer as XSS Analyzer
    participant HTMLParser as HTML Parser
    participant Target as Target Server

    Client->>FuzzEngine: start fuzz run
    FuzzEngine->>Analyzer: ApplyInitialTransformation([XSS_CANARY])
    Analyzer->>Analyzer: generateCanary(), store in params
    FuzzEngine->>Target: send fuzzed request
    Target->>FuzzEngine: response (body, headers)
    FuzzEngine->>Analyzer: Analyze(options with response)
    Analyzer->>HTMLParser: DetectReflections(body, canary)
    HTMLParser-->>Analyzer: []ReflectionInfo
    Analyzer->>Analyzer: BestReflection + detectCharacterSurvival
    Analyzer->>Analyzer: selectPayloads(reflection, charset)
    loop for each payload
        Analyzer->>Target: replayAndVerify(payload)
        Target->>Analyzer: response
        Analyzer->>Analyzer: verify unencoded reflection, check CSP
        alt confirmed
            Analyzer->>FuzzEngine: report positive result (details)
        end
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐇 I nibble canaries in HTML light,
mutex‑safe hops keep randomness bright,
tokens and tags twirl through the night,
payloads replay until reflections bite,
a rabbit cheers — fuzzing takes flight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the four context classification edge cases documented in issue #7086 (javascript: URIs, non-executable script types, case-sensitive reflection detection, and srcdoc handling).
Linked Issues check ✅ Passed The PR implements all four coding objectives from #7086: javascript: URI script context classification [#7086], non-executable script type detection [#7086], case-insensitive reflection detection [#7086], and srcdoc as HTML context [#7086]. Additional enhancements align with security review findings.
Out of Scope Changes check ✅ Passed All changes directly support the four edge-case fixes from #7086 and their integration. HTTP module imports and request fuzzing guards enable the analyzer registration and proper parameter initialization without introducing unrelated functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/fuzz/analyzers/xss/context.go (1)

299-312: Pointer aliasing: BestReflection returns pointer to slice element.

The function returns &reflections[i] which points into the original slice. If the caller modifies the slice or it gets garbage collected, this pointer may become invalid. Since reflections is typically returned from DetectReflections and used immediately, this is likely safe in practice.

♻️ Safer alternative returning a copy
 func BestReflection(reflections []ReflectionInfo) *ReflectionInfo {
 	if len(reflections) == 0 {
 		return nil
 	}
 
-	best := &reflections[0]
+	bestIdx := 0
 	for i := 1; i < len(reflections); i++ {
-		if reflections[i].Context.priority() > best.Context.priority() {
-			best = &reflections[i]
+		if reflections[i].Context.priority() > reflections[bestIdx].Context.priority() {
+			bestIdx = i
 		}
 	}
-	return best
+	result := reflections[bestIdx]
+	return &result
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/context.go` around lines 299 - 312, BestReflection
currently returns a pointer to a slice element (&reflections[i]) which can alias
the backing array; instead allocate a new ReflectionInfo, copy the chosen
element into it and return its address so the caller gets an independent copy.
Locate BestReflection and change the return to create a local var like selected
:= reflections[i] (or reflections[0] for the first) and return &selected; keep
the same logic comparing Context.priority() so behavior is unchanged.
pkg/fuzz/analyzers/xss/analyzer.go (1)

218-227: Character survival detection assumes specific canary+chars ordering.

The logic checks for sequential patterns like canary+"<" and canary+"<>". This works because canaryChars is <>"'/ and ApplyInitialTransformation appends it directly after the canary. The checks progressively verify each character's survival by checking cumulative prefixes.

However, the GreaterThan check (line 222) has redundant logic—it checks for both canary+"<>" and canary+">". The second check (canary+">") would match even if < was encoded but > wasn't, which may not be the intended semantic.

♻️ Clarify greater-than detection
 func detectCharacterSurvival(body string, canary string) CharacterSet {
 	return CharacterSet{
 		LessThan:     strings.Contains(body, canary+"<"),
-		GreaterThan:  strings.Contains(body, canary+"<>") || strings.Contains(body, canary+">"),
+		GreaterThan:  strings.Contains(body, canary+">"),
 		DoubleQuote:  strings.Contains(body, canary+`<>"`),
 		SingleQuote:  strings.Contains(body, canary+`<>"'`),
 		ForwardSlash: strings.Contains(body, canary+canaryChars), // full canary+chars survived
 	}
 }

This simplifies the check while still detecting if > survived (regardless of <).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 218 - 227, The GreaterThan
detection in detectCharacterSurvival is redundant and can produce misleading
matches; change the GreaterThan field to test only for the canary immediately
followed by '>' (i.e., strings.Contains(body, canary+">")) rather than checking
both canary+"<>" and canary+">", so GreaterThan reflects whether '>' survived
regardless of '<' and removes the cumulative-prefix check that can produce false
positives.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 218-227: The GreaterThan detection in detectCharacterSurvival is
redundant and can produce misleading matches; change the GreaterThan field to
test only for the canary immediately followed by '>' (i.e.,
strings.Contains(body, canary+">")) rather than checking both canary+"<>" and
canary+">", so GreaterThan reflects whether '>' survived regardless of '<' and
removes the cumulative-prefix check that can produce false positives.

In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 299-312: BestReflection currently returns a pointer to a slice
element (&reflections[i]) which can alias the backing array; instead allocate a
new ReflectionInfo, copy the chosen element into it and return its address so
the caller gets an independent copy. Locate BestReflection and change the return
to create a local var like selected := reflections[i] (or reflections[0] for the
first) and return &selected; keep the same logic comparing Context.priority() so
behavior is unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 20931324-7215-409d-bf89-d19a7bd5ea64

📥 Commits

Reviewing files that changed from the base of the PR and between 979c867 and 696a00e.

📒 Files selected for processing (8)
  • pkg/fuzz/analyzers/analyzers.go
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/context_test.go
  • pkg/fuzz/analyzers/xss/types.go
  • pkg/protocols/http/http.go
  • pkg/protocols/http/request.go
  • pkg/protocols/http/request_fuzz.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/context.go (1)

170-188: Optional: Consolidate duplicate text handling logic.

The RCDATA and regular text branches have identical logic. Consider consolidating:

Proposed refactor
-			} else if inRCDATA {
-				tag := ""
-				if len(tagStack) > 0 {
-					tag = tagStack[len(tagStack)-1]
-				}
-				reflections = append(reflections, ReflectionInfo{
-					Context: ContextHTMLText,
-					TagName: tag,
-				})
-			} else {
+			} else {
+				// Both RCDATA and regular text are HTML text contexts
 				tag := ""
 				if len(tagStack) > 0 {
 					tag = tagStack[len(tagStack)-1]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/context.go` around lines 170 - 188, The two branches
handling inRCDATA and the plain text fallback contain identical logic; refactor
by merging them into a single branch: compute tag from tagStack (using tag := ""
and if len(tagStack) > 0 { tag = tagStack[len(tagStack)-1] }) once and then
append the ReflectionInfo with Context: ContextHTMLText and TagName: tag to
reflections, replacing the duplicated code in the inRCDATA and else blocks
(symbols: inRCDATA, tagStack, reflections, ReflectionInfo, ContextHTMLText,
TagName).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 276-297: detectAttrQuoting can mis-match substrings (e.g.,
"dataclass" vs "class"); change the single strings.Index lookup to loop
searching for attrAssign in rawLower and for each match verify a word-boundary
before the attr name (e.g., idx==0 or the preceding rune is not a letter, digit,
'-', '_' or ':', or is whitespace or '<'/'/'), only accepting the match when
that boundary check passes; use the updated index offset when calling
strings.Index again to continue searching; keep the rest of the logic (afterEq
checks and quote determinations) the same and reference the detectAttrQuoting
function, attrAssign, rawLower, idx and afterEq symbols.

---

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 170-188: The two branches handling inRCDATA and the plain text
fallback contain identical logic; refactor by merging them into a single branch:
compute tag from tagStack (using tag := "" and if len(tagStack) > 0 { tag =
tagStack[len(tagStack)-1] }) once and then append the ReflectionInfo with
Context: ContextHTMLText and TagName: tag to reflections, replacing the
duplicated code in the inRCDATA and else blocks (symbols: inRCDATA, tagStack,
reflections, ReflectionInfo, ContextHTMLText, TagName).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 05ae5fd8-2ae3-4c1f-8f06-aa66bfbcf24a

📥 Commits

Reviewing files that changed from the base of the PR and between 696a00e and 5804653.

📒 Files selected for processing (2)
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/context.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/fuzz/analyzers/xss/analyzer.go

Comment thread pkg/fuzz/analyzers/xss/context.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 202-216: isExecutableScriptType incorrectly treats parameterized
types and an empty essence as executable and omits many legacy JS essences;
change it to reject any type string that contains a ';' (i.e., treat
parameterized types as non-executable), trim and lower-case the raw scriptType
into an essence but return false if the essence is empty, and replace the
current switch with an exact-match whitelist containing all MIME essences from
the spec: text/javascript, text/ecmascript, text/javascript1.0,
text/javascript1.1, text/javascript1.2, text/javascript1.3, text/javascript1.4,
text/javascript1.5, text/jscript, text/livescript, text/x-ecmascript,
text/x-javascript, application/javascript, application/ecmascript,
application/x-ecmascript, application/x-javascript, and the literal "module";
update the logic in isExecutableScriptType to only return true for exact matches
of those essences and false otherwise.
- Around line 92-94: The code currently promotes any attribute value beginning
with "javascript:" to ContextScript; restrict this to only executable URL
attributes by first checking the attribute name (e.g., attrName) against a
whitelist of navigation-target URL attributes ("href", "action", "formaction",
"src", "xlink:href") before setting ctx = ContextScript, following the existing
isEventHandler() pattern; implement the whitelist as a small helper or map
lookup and use case-insensitive comparison (strings.EqualFold) when matching the
attribute name so non-executable attributes like data-* or aria-* are not
upgraded to ContextScript.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2ae5597c-f505-45c0-a399-588e1044a72e

📥 Commits

Reviewing files that changed from the base of the PR and between 5804653 and d8171af.

📒 Files selected for processing (2)
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/context_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/fuzz/analyzers/xss/context_test.go

Comment thread pkg/fuzz/analyzers/xss/context.go Outdated
Comment thread pkg/fuzz/analyzers/xss/context.go
@Gengyscan
Copy link
Copy Markdown
Contributor Author

Pushed commit 2ae3b64 addressing all CodeRabbit findings plus two additional issues we discovered:

CodeRabbit findings addressed:

  1. detectAttrQuoting word boundary (Minor) — ✅ Already fixed in d8171af
  2. javascript: URI gating (Major) — ✅ Added isScriptURLAttribute() whitelist
  3. isExecutableScriptType MIME essence (Major) — ✅ Full 16-essence WHATWG whitelist + proper essence extraction

Additional fixes:
4. detectCharacterSurvival false negatives (Critical) — Characters were concatenated (canary<>"'/) so only < was ever independently detectable. Now each character is interleaved with its own canary prefix.
5. Template literal payload (High) — selectPayloads for ContextScriptString now includes ${alert(1)} for backtick template literal injection.

All fixes include corresponding test cases.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/context.go (1)

315-322: Consider adding < to attribute boundary characters.

The boundary check handles common whitespace but may miss the case where the attribute immediately follows the tag name opening (e.g., <divclass=... - malformed but possible). Adding < would handle this edge case.

However, since properly formatted HTML will always have whitespace between the tag name and attributes, this is a minor edge case that likely won't occur in practice.

Optional fix for edge case
 func isAttrBoundary(ch byte) bool {
 	switch ch {
-	case ' ', '\t', '\n', '\r', '/':
+	case ' ', '\t', '\n', '\r', '/', '<':
 		return true
 	default:
 		return false
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/context.go` around lines 315 - 322, The isAttrBoundary
function currently treats space, tab, newline, carriage return and '/' as
attribute boundary characters; update it to also consider the '<' character as a
boundary to catch malformed cases like "<divclass=..." by adding '<' to the
switch/case list in isAttrBoundary so it returns true for '<' as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 218-227: The detectCharacterSurvival function is checking for
canary+char but the canary is generated with the critical character before the
canary (char+canary); update detectCharacterSurvival (and the CharacterSet
fields LessThan, GreaterThan, DoubleQuote, SingleQuote, ForwardSlash) to search
for the pattern where the character precedes the canary (e.g., "<"+canary,
">"+canary, `"`+canary, '\''+canary, "/"+canary) so detection matches the actual
generated canary sequence; alternatively, if you prefer, change the canary
generation to place the character after the canary but be consistent between
canary creation and the detectCharacterSurvival checks.

In `@pkg/fuzz/analyzers/xss/types.go`:
- Around line 197-207: The scriptURLAttributes map currently includes the
"poster" key which leads to false positives; remove the "poster" entry from the
scriptURLAttributes map in pkg/fuzz/analyzers/xss/types.go (the map named
scriptURLAttributes) so poster is no longer treated as a javascript-executing
attribute, and run/update any affected tests or comments that assumed poster was
executable.

---

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 315-322: The isAttrBoundary function currently treats space, tab,
newline, carriage return and '/' as attribute boundary characters; update it to
also consider the '<' character as a boundary to catch malformed cases like
"<divclass=..." by adding '<' to the switch/case list in isAttrBoundary so it
returns true for '<' as well.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e639f060-119d-4285-802a-bff2c49a0c66

📥 Commits

Reviewing files that changed from the base of the PR and between d8171af and 2ae3b64.

📒 Files selected for processing (4)
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/context_test.go
  • pkg/fuzz/analyzers/xss/types.go

Comment on lines +218 to +227
// detectCharacterSurvival checks which XSS-critical characters survived server-side encoding
func detectCharacterSurvival(body string, canary string) CharacterSet {
return CharacterSet{
LessThan: strings.Contains(body, canary+"<"),
GreaterThan: strings.Contains(body, canary+">"),
DoubleQuote: strings.Contains(body, canary+`"`),
SingleQuote: strings.Contains(body, canary+`'`),
ForwardSlash: strings.Contains(body, canary+"/"),
}
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: Character survival detection pattern doesn't match canary generation.

The canary is generated as canary + "<" + canary + ">" + ... which produces patterns like ABC<ABC>ABC".... However, detectCharacterSurvival looks for canary + ">" (i.e., ABC>) which doesn't exist in the generated string.

The actual pattern has >canary (e.g., >ABC), not canary> (e.g., ABC>).

Only LessThan and ForwardSlash will be detected correctly:

  • canary< → exists (ABC<ABC)
  • canary> → doesn't exist (pattern is >ABC, not ABC>)
  • canary" → doesn't exist (pattern is "ABC, not ABC")
  • canary' → doesn't exist (pattern is 'ABC, not ABC')
  • canary/ → exists (ABC/)
Proposed fix

Either change the canary generation to put the character after canary:

 func (a *Analyzer) ApplyInitialTransformation(data string, params map[string]interface{}) string {
 	if strings.Contains(data, "[XSS_CANARY]") {
 		canary := generateCanary()
 		if params != nil {
 			params["xss_canary"] = canary
 		}
-		// Each character gets its own canary prefix for independent detection.
-		canaryWithChars := canary + "<" + canary + ">" + canary + `"` + canary + "'" + canary + "/"
+		// Each character is immediately followed by canary for independent detection.
+		canaryWithChars := "<" + canary + ">" + canary + `"` + canary + "'" + canary + "/" + canary
 		data = strings.ReplaceAll(data, "[XSS_CANARY]", canaryWithChars)
 	}

And update detection to look for char + canary:

 func detectCharacterSurvival(body string, canary string) CharacterSet {
 	return CharacterSet{
-		LessThan:     strings.Contains(body, canary+"<"),
-		GreaterThan:  strings.Contains(body, canary+">"),
-		DoubleQuote:  strings.Contains(body, canary+`"`),
-		SingleQuote:  strings.Contains(body, canary+`'`),
-		ForwardSlash: strings.Contains(body, canary+"/"),
+		LessThan:     strings.Contains(body, "<"+canary),
+		GreaterThan:  strings.Contains(body, ">"+canary),
+		DoubleQuote:  strings.Contains(body, `"`+canary),
+		SingleQuote:  strings.Contains(body, `'`+canary),
+		ForwardSlash: strings.Contains(body, "/"+canary),
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 218 - 227, The
detectCharacterSurvival function is checking for canary+char but the canary is
generated with the critical character before the canary (char+canary); update
detectCharacterSurvival (and the CharacterSet fields LessThan, GreaterThan,
DoubleQuote, SingleQuote, ForwardSlash) to search for the pattern where the
character precedes the canary (e.g., "<"+canary, ">"+canary, `"`+canary,
'\''+canary, "/"+canary) so detection matches the actual generated canary
sequence; alternatively, if you prefer, change the canary generation to place
the character after the canary but be consistent between canary creation and the
detectCharacterSurvival checks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respectfully, this is a false positive. The interleaved canary pattern does work correctly.

The generated string is: canary<canary>canary"canary'canary/

For example with canary=ABC: ABC<ABC>ABC"ABC'ABC/

detectCharacterSurvival checks:

  • strings.Contains(body, canary+"<") — matches at position 0 (ABC<) ✅
  • strings.Contains(body, canary+">") — matches at position 4 (ABC>) ✅
  • strings.Contains(body, canary+'"') — matches at position 8 (ABC") ✅
  • strings.Contains(body, canary+"'") — matches at position 12 (ABC') ✅
  • strings.Contains(body, canary+"/") — matches at position 16 (ABC/) ✅

Each character is preceded by its own canary in the interleaved string, so canary + char is independently findable for all 5 characters. This was the whole point of the interleaving fix — the old code concatenated all chars after a single canary (canary<>"'/), making only < detectable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respectfully, this is a false positive. The interleaved canary pattern does work correctly.

The generated string is: canary<canary>canary"canary'canary/

For example with canary=ABC: ABC<ABC>ABC"ABC'ABC/

detectCharacterSurvival checks:

  • strings.Contains(body, canary+"<") — matches at position 0 (ABC<) ✅
  • strings.Contains(body, canary+">") — matches at position 4 (ABC>) ✅
  • strings.Contains(body, canary+'"') — matches at position 8 (ABC") ✅
  • strings.Contains(body, canary+"'") — matches at position 12 (ABC') ✅
  • strings.Contains(body, canary+"/") — matches at position 16 (ABC/) ✅

Each character is preceded by its own canary in the interleaved string, so canary + char is independently findable for all 5 characters. This was the whole point of the interleaving fix — the old code concatenated all chars after a single canary (canary<>"'/), making only < detectable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gengyscan, you are absolutely right — I apologize for the incorrect review comment. Tracing through the generated string manually confirms your explanation:

With canary = "ABC", canaryWithChars = ABC<ABC>ABC"ABC'ABC/

Each canary + char pair is independently present:

  • ABC< at position 0 ✅
  • ABC> at position 4 ✅
  • ABC" at position 8 ✅
  • ABC' at position 12 ✅
  • ABC/ at position 16 ✅

The interleaving design is sound — each character is preceded by its own copy of the canary, so detectCharacterSurvival correctly finds all five patterns via canary + char. My original analysis was in error. This comment is a false positive.


✏️ Learnings added
Learnt from: Gengyscan
Repo: projectdiscovery/nuclei PR: 7208
File: pkg/fuzz/analyzers/xss/analyzer.go:218-227
Timestamp: 2026-03-13T21:42:14.374Z
Learning: In projectdiscovery/nuclei pkg/fuzz/analyzers/xss/analyzer.go, the XSS canary is generated as an interleaved string: `canary + "<" + canary + ">" + canary + '"' + canary + "'" + canary + "/"`. This means `detectCharacterSurvival` correctly uses `canary+char` patterns (e.g., `strings.Contains(body, canary+"<")`), because each special character is immediately preceded by its own copy of the canary in the generated string. This is intentional and correct — do not flag it as a mismatch between canary generation and detection.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ayuxsec
Repo: projectdiscovery/nuclei PR: 6769
File: pkg/installer/template.go:217-217
Timestamp: 2026-01-15T10:44:22.186Z
Learning: In projectdiscovery/gologger, the Print() function writes to stderr and Silent() writes to stdout. During reviews, ensure code uses Print() for error/output going to stderr and Silent() for stdout; verify the chosen function matches the intended destination and that logs are directed to the correct stream for proper piping/redirects.

Learnt from: orbin123
Repo: projectdiscovery/nuclei PR: 6800
File: pkg/fuzz/analyzers/xss/analyzer.go:100-136
Timestamp: 2026-02-02T18:03:19.054Z
Learning: In analyzers under pkg/fuzz/analyzers/, the Component interface is stateful and should be reused per request. For each request, call gr.Component.SetValue(gr.Key, payload) and then gr.Component.Rebuild() without cloning the component. This avoids state leakage; this pattern is used by existing analyzers (e.g., time_delay) and should be followed for all analyzers in this directory.

Learnt from: orbin123
Repo: projectdiscovery/nuclei PR: 6800
File: pkg/fuzz/analyzers/xss/context_detector.go:7-75
Timestamp: 2026-02-02T18:08:18.736Z
Learning: In code reviews of the XSS context analyzer (pkg/fuzz/analyzers/xss/), preserve the design intent: use an exact-canary search during detection, because if the entire canary is heavily encoded there may be no exploitable reflection. The detection phase should only identify reflection points where the canary appears with sufficient unencoded characters to indicate exploitation potential. The implementation should follow the two-phase approach: 1) detect, 2) analyze available characters, 3) select payloads. Ensure changes do not alter this core design and verify this two-phase flow across files in pkg/fuzz/analyzers/xss/.

Learnt from: k-xeno
Repo: projectdiscovery/nuclei PR: 6824
File: pkg/fuzz/parts.go:138-138
Timestamp: 2026-02-11T11:55:52.862Z
Learning: In Go code related to fuzz analyzers (e.g., XSS analyzer in pkg/fuzz/analyzers), extract the canary from FuzzGenerated.Value (the final reflected string) rather than from GeneratedRequest.OriginalPayload. This ensures compatibility across all fuzzing modes, including KV-mode, and makes the canary source consistent and robust for all fuzz runs. Apply this pattern to all files under the pkg/fuzz directory that implement or reference canary extraction for fuzzed inputs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gengyscan, you're absolutely right — I apologize for the incorrect analysis. Tracing through the full string makes it clear:

With canary ABC, the generated payload is:

ABC<ABC>ABC"ABC'ABC/

And detectCharacterSurvival correctly finds all five patterns:

Check Substring sought Present in string Result
LessThan ABC< at pos 0
GreaterThan ABC> at pos 4
DoubleQuote ABC" at pos 8
SingleQuote ABC' at pos 12
ForwardSlash ABC/ at pos 16

Each character is immediately preceded by its own copy of the canary, so canary+char is independently detectable for all five cases. My original comment was wrong. The implementation is correct.


✏️ Learnings added
Learnt from: Gengyscan
Repo: projectdiscovery/nuclei PR: 7208
File: pkg/fuzz/analyzers/xss/analyzer.go:218-227
Timestamp: 2026-03-13T21:42:12.740Z
Learning: In pkg/fuzz/analyzers/xss/analyzer.go (projectdiscovery/nuclei), the canary payload is generated as `canary+"<"+canary+">"+canary+'"'+canary+"'"+canary+"/"`, producing an interleaved string like `ABC<ABC>ABC"ABC'ABC/`. detectCharacterSurvival correctly searches for `canary+char` (e.g., `ABC<`, `ABC>`, `ABC"`, `ABC'`, `ABC/`), all of which exist independently in the interleaved string. Do not flag this as a mismatch between generation and detection — it is intentional and correct.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ayuxsec
Repo: projectdiscovery/nuclei PR: 6769
File: pkg/installer/template.go:217-217
Timestamp: 2026-01-15T10:44:22.186Z
Learning: In projectdiscovery/gologger, the Print() function writes to stderr and Silent() writes to stdout. During reviews, ensure code uses Print() for error/output going to stderr and Silent() for stdout; verify the chosen function matches the intended destination and that logs are directed to the correct stream for proper piping/redirects.

Learnt from: orbin123
Repo: projectdiscovery/nuclei PR: 6800
File: pkg/fuzz/analyzers/xss/analyzer.go:100-136
Timestamp: 2026-02-02T18:03:19.054Z
Learning: In analyzers under pkg/fuzz/analyzers/, the Component interface is stateful and should be reused per request. For each request, call gr.Component.SetValue(gr.Key, payload) and then gr.Component.Rebuild() without cloning the component. This avoids state leakage; this pattern is used by existing analyzers (e.g., time_delay) and should be followed for all analyzers in this directory.

Learnt from: orbin123
Repo: projectdiscovery/nuclei PR: 6800
File: pkg/fuzz/analyzers/xss/context_detector.go:7-75
Timestamp: 2026-02-02T18:08:18.736Z
Learning: In code reviews of the XSS context analyzer (pkg/fuzz/analyzers/xss/), preserve the design intent: use an exact-canary search during detection, because if the entire canary is heavily encoded there may be no exploitable reflection. The detection phase should only identify reflection points where the canary appears with sufficient unencoded characters to indicate exploitation potential. The implementation should follow the two-phase approach: 1) detect, 2) analyze available characters, 3) select payloads. Ensure changes do not alter this core design and verify this two-phase flow across files in pkg/fuzz/analyzers/xss/.

Learnt from: k-xeno
Repo: projectdiscovery/nuclei PR: 6824
File: pkg/fuzz/parts.go:138-138
Timestamp: 2026-02-11T11:55:52.862Z
Learning: In Go code related to fuzz analyzers (e.g., XSS analyzer in pkg/fuzz/analyzers), extract the canary from FuzzGenerated.Value (the final reflected string) rather than from GeneratedRequest.OriginalPayload. This ensures compatibility across all fuzzing modes, including KV-mode, and makes the canary source consistent and robust for all fuzz runs. Apply this pattern to all files under the pkg/fuzz directory that implement or reference canary extraction for fuzzed inputs.

Comment thread pkg/fuzz/analyzers/xss/types.go Outdated
Comment thread pkg/fuzz/analyzers/xss/types.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/types.go`:
- Around line 197-206: Remove the obsolete "codebase" key from the
scriptURLAttributes map literal in types.go: locate the scriptURLAttributes
variable and delete the "codebase": {} entry (and update the surrounding
comma/whitespace as needed), and optionally adjust the comment if it claims
these attributes execute javascript: URIs; run/update tests that might assert
scriptURLAttributes contents to avoid false positives.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8627cd1e-873b-416d-8d1a-56adc87815a2

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae3b64 and ad292c5.

📒 Files selected for processing (1)
  • pkg/fuzz/analyzers/xss/types.go

Comment thread pkg/fuzz/analyzers/xss/types.go Outdated
Comment thread pkg/fuzz/analyzers/xss/context.go Outdated
Comment thread pkg/fuzz/analyzers/xss/context.go Outdated
Comment thread pkg/fuzz/analyzers/xss/context.go Outdated
@Gengyscan
Copy link
Copy Markdown
Contributor Author

Pushed commit 5a0fac6 addressing all 3 Neo security review findings:

1. data: URI XSS bypass (Critical) — Now detects data:text/html and data:image/svg+xml URIs in navigation-target attributes as ContextScript.

2. Tab/newline injection bypass (Major) — Added normalizeURIScheme() that strips internal ASCII whitespace (\t, \n, \r, \f) before scheme prefix matching, matching browser URL parsing behavior.

3. vbscript: scheme (Minor) — Added detection for legacy IE vbscript: URIs.

All fixes include tests (6 new test cases: 3 positive, 1 negative for data:, 2 whitespace variants).

Comment thread pkg/fuzz/analyzers/xss/context.go
Comment thread pkg/fuzz/analyzers/xss/context.go Outdated
@Gengyscan
Copy link
Copy Markdown
Contributor Author

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 17, 2026

✅ Actions performed

Reviews resumed.

@Gengyscan
Copy link
Copy Markdown
Contributor Author

/claim #7086

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/fuzz/analyzers/xss/context_test.go (1)

739-781: Add a regression case for percent-encoded control-whitespace in scheme prefixes.

Please add a case like href="java%0ascript:...#nucleiXSScanary" (and %09) so decode+whitespace normalization behavior is locked by tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/context_test.go` around lines 739 - 781, Update the
TestDetectReflections_EncodedSchemesAndXHTMLDataURI table to include regression
cases for percent-encoded control-whitespace in scheme prefixes: add at least
two new entries (e.g., name "percent-encoded LF in javascript scheme" with body
containing href="java%0ascript:alert(1)#nucleiXSScanary" and name
"percent-encoded TAB in javascript scheme" with
href="java%09script:alert(1)#nucleiXSScanary") and attrName "href"; keep them
inside the same loop so DetectReflections is called and the existing assertion
that a reflection with AttrName == "href" and Context == ContextScript still
holds. Ensure the marker "nucleiXSScanary" remains present in each body so the
detection logic triggers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/types.go`:
- Around line 223-239: The normalizeURIScheme function can reintroduce
whitespace via percent-decoding (url.PathUnescape) which bypasses the initial
uriSchemeWhitespaceRemover; update normalizeURIScheme so that after decoding the
scheme prefix (decodedPrefix) you run the same whitespace removal and
trimming/lowercasing (using uriSchemeWhitespaceRemover.Replace,
strings.TrimSpace and strings.ToLower) on decodedPrefix before concatenating
with normalized[colon:], ensuring any decoded \t/\n/\r/\f characters are
stripped and the scheme check remains reliable.

---

Nitpick comments:
In `@pkg/fuzz/analyzers/xss/context_test.go`:
- Around line 739-781: Update the
TestDetectReflections_EncodedSchemesAndXHTMLDataURI table to include regression
cases for percent-encoded control-whitespace in scheme prefixes: add at least
two new entries (e.g., name "percent-encoded LF in javascript scheme" with body
containing href="java%0ascript:alert(1)#nucleiXSScanary" and name
"percent-encoded TAB in javascript scheme" with
href="java%09script:alert(1)#nucleiXSScanary") and attrName "href"; keep them
inside the same loop so DetectReflections is called and the existing assertion
that a reflection with AttrName == "href" and Context == ContextScript still
holds. Ensure the marker "nucleiXSScanary" remains present in each body so the
detection logic triggers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f9582e0f-6fd0-4b5f-bd70-daea80ad9c7e

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae3b64 and 5136b78.

📒 Files selected for processing (3)
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/context_test.go
  • pkg/fuzz/analyzers/xss/types.go

Comment thread pkg/fuzz/analyzers/xss/types.go Outdated
Gengyscan and others added 9 commits March 18, 2026 03:38
- restrict javascript: URI promotion to executable URL attributes only (href,
  action, formaction, src, xlink:href, data, poster, codebase)
- expand isExecutableScriptType to full WHATWG 16-essence whitelist + module,
  extract MIME essence before matching (text/javascript;charset=utf-8 is valid)
- fix detectCharacterSurvival: interleave canary per character for independent
  detection instead of concatenating all critical chars
- add template literal payload (dollar-brace interpolation) for ContextScriptString
- add tests for all four fixes
…javascript: URIs in video poster

The poster attribute on <video> specifies a preview image URL. Modern browsers
do not execute javascript: URIs in the poster attribute - it is treated as a
URL to an image resource, similar to <img src>. Including it caused false
positives where reflections in poster attributes were incorrectly classified
as executable script context.

Addresses CodeRabbit review feedback.
…olver

codebase on <object> is a base-URL resolver for relative URIs, not a
navigation target. Modern browsers do not execute javascript: URIs in
codebase. Per MDN, genuine javascript: navigation targets are href,
action, formaction, and iframe src. Firefox bug #1973900 confirms
codebase is effectively dead.

Addresses CodeRabbit follow-up suggestion.
Address Neo security review findings:

1. Tab/newline injection bypass (CWE-79): Browsers strip ASCII
   whitespace (tab, LF, CR) from URI schemes during parsing, so
   'java\tscript:' executes as 'javascript:'. Added normalizeURIScheme()
   helper that strips internal control whitespace before prefix check.

2. data: URI XSS bypass (CWE-79): data:text/html and data:image/svg+xml
   URIs execute scripts when loaded in navigation-target attributes
   (iframe src, object data, a href). Now detected as ContextScript.

3. vbscript: URI scheme (CWE-79): Added detection for vbscript: scheme
   used in IE/legacy enterprise environments.

Tests added for all three vectors plus negative case (data:image/png
should not be classified as script context).
…l+xml detection

- normalizeURIScheme now decodes percent-encoded prefix via url.PathUnescape
- Added application/xhtml+xml to executable data: URI MIME types
- Added 3 tests for encoded schemes and XHTML data URIs

Addresses neo-bot review: percent-encoded javascript:/data: bypass (CWE-79)
and missing xhtml+xml MIME type detection.
@Gengyscan Gengyscan force-pushed the fix/xss-context-edge-cases branch from 52f1d86 to d826c01 Compare March 18, 2026 03:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 79-82: The current canary presence and survival checks use
case-sensitive string containment (e.g., checking if body contains canary via
strings.Contains), which can miss reflections that change casing; update those
checks (the initial "if !strings.Contains(body, canary)" check and the later
survival/reflection checks around the canary comparison) to perform
case-insensitive containment comparisons (for example by normalizing both body
and canary to the same case with strings.ToLower before using strings.Contains,
or using strings.ContainsFold if your Go version supports it) so detection does
not fail when the reflected canary’s case is altered.
- Around line 171-173: The CSP note currently checks
hasCSP(options.ResponseHeaders) which can reflect the initial response; change
this to check the headers from the replay response instead (call hasCSP on the
replay response headers variable used elsewhere in this analyzer, e.g.,
replay.ResponseHeaders or replayHeaders) inside the same if block so the CSP
note reflects the replay response rather than options.ResponseHeaders.

In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 96-99: The code currently uses strings.Contains on the entire
data: URI (normalizedVal) which can match MIME substrings in the payload body;
instead extract only the media type portion immediately after "data:" (take the
substring after "data:", split on the first ',' to isolate the mediatype+params,
then split on ';' to drop parameters) and perform a case-insensitive
exact/prefix match against the known executable types ("text/html",
"image/svg+xml", "application/xhtml+xml") before setting ctx = ContextScript;
update the logic around normalizedVal (the data: branch) to use this parsed
mediaType and use strings.EqualFold or strings.HasPrefix as appropriate to avoid
misclassifying payload contents.
- Around line 289-321: detectAttrQuoting currently searches for attrName + "="
literally, missing cases like "href = ..." and misclassifying context; update
the search to locate the attrName using rawLower and isAttrBoundary for the char
before the name, then after finding attrName advance past the name and any
intervening whitespace, ensure the next non-space char is '=', then advance past
'=' and any whitespace to inspect the first char after the equals for '"' or
'\''; keep using rawToken, rawLower, attrName, isAttrBoundary and return the
same byte/boolean semantics when quoted vs unquoted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3a2d7f0e-e337-47b2-a2f7-762a0149de70

📥 Commits

Reviewing files that changed from the base of the PR and between 52f1d86 and d826c01.

📒 Files selected for processing (8)
  • pkg/fuzz/analyzers/analyzers.go
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/context_test.go
  • pkg/fuzz/analyzers/xss/types.go
  • pkg/protocols/http/http.go
  • pkg/protocols/http/request.go
  • pkg/protocols/http/request_fuzz.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/protocols/http/request_fuzz.go
  • pkg/fuzz/analyzers/xss/types.go
  • pkg/fuzz/analyzers/xss/context_test.go

Comment thread pkg/fuzz/analyzers/xss/analyzer.go
Comment on lines +171 to +173
if hasCSP(options.ResponseHeaders) {
details += " [note: CSP header present, may limit exploitability]"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

CSP note should use replay response headers, not the initial response headers.

At Line 171, hasCSP(options.ResponseHeaders) can report stale/mismatched CSP status for the replay request.

🔧 Proposed fix
-		if hasCSP(options.ResponseHeaders) {
+		if hasCSP(resp.Header) {
 			details += " [note: CSP header present, may limit exploitability]"
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if hasCSP(options.ResponseHeaders) {
details += " [note: CSP header present, may limit exploitability]"
}
if hasCSP(resp.Header) {
details += " [note: CSP header present, may limit exploitability]"
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 171 - 173, The CSP note
currently checks hasCSP(options.ResponseHeaders) which can reflect the initial
response; change this to check the headers from the replay response instead
(call hasCSP on the replay response headers variable used elsewhere in this
analyzer, e.g., replay.ResponseHeaders or replayHeaders) inside the same if
block so the CSP note reflects the replay response rather than
options.ResponseHeaders.

Comment thread pkg/fuzz/analyzers/xss/context.go
Comment thread pkg/fuzz/analyzers/xss/context.go
Gengyscan added 2 commits March 18, 2026 19:37
…survival

Addresses CodeRabbit review on commit d826c01:
- Early exit: strings.ToLower on both body and canary
- detectCharacterSurvival: use bodyL/canaryL for case-insensitive comparisons
…uoting detection

Addresses CodeRabbit review on commit d826c01:
- data: URI: parse media-type before comma/semicolon; prevents false positives when
  payload data contains MIME strings (e.g. data:text/plain,...text/html...)
- detectAttrQuoting: skip optional whitespace between attr name and = so
  href = "val" is correctly identified as double-quoted
@Gengyscan
Copy link
Copy Markdown
Contributor Author

All 4 findings from the latest CodeRabbit review (commit d826c01) have been addressed:

analyzer.go — commit 66cf8d9

  1. Early exit case-insensitive: strings.ToLower applied to both body and canary before the strings.Contains check.
  2. detectCharacterSurvival case-insensitive: introduced bodyL/canaryL lowercase vars; all five character checks (<, >, ", ', /) now use them.

context.go — commit c554dbc
3. data: URI MIME parsing: media-type is extracted before the first , or ;, preventing false positives when the payload data portion contains a MIME string like text/html.
4. detectAttrQuoting whitespace: the search loop now skips optional whitespace between the attribute name and =, so href = "val" is handled correctly.

Minor (CSP): Properly threading replay response headers through replayAndVerify requires a broader refactor; tracked as a follow-up to keep this PR focused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
pkg/fuzz/analyzers/xss/analyzer.go (1)

171-173: ⚠️ Potential issue | 🟡 Minor

Use replay response headers for CSP note, not initial response headers.

Line [171] checks hasCSP(options.ResponseHeaders), which may not match the replayed request that confirmed reflection.

🔧 Proposed fix
-		if hasCSP(options.ResponseHeaders) {
+		if hasCSP(resp.Header) {
 			details += " [note: CSP header present, may limit exploitability]"
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 171 - 173, The CSP note
currently checks hasCSP(options.ResponseHeaders) which inspects the initial
response headers; change this to check the headers from the replayed/confirmed
response instead (e.g., call hasCSP on the replay response object such as
replayResponse.ResponseHeaders or replayResp.Headers used in the confirmation
step) so the "[note: CSP header present...]" annotation reflects the actual
headers seen when the reflection was confirmed.
pkg/fuzz/analyzers/xss/context.go (1)

306-313: ⚠️ Potential issue | 🟠 Major

detectAttrQuoting still misclassifies quoted attrs when whitespace appears after =.

Line [322] inspects the byte immediately after =, so inputs like href = "x" get treated as unquoted. This can downgrade context detection and payload choice.

🔧 Proposed fix (skip full ASCII whitespace around and after `=`)
-			for i < len(rawLower) && (rawLower[i] == ' ' || rawLower[i] == '\t') {
+			for i < len(rawLower) && (rawLower[i] == ' ' || rawLower[i] == '\t' || rawLower[i] == '\n' || rawLower[i] == '\r' || rawLower[i] == '\f') {
 				i++
 			}
@@
 	afterEq := idx + eqOffset + 1
+	for afterEq < len(rawLower) && (rawLower[afterEq] == ' ' || rawLower[afterEq] == '\t' || rawLower[afterEq] == '\n' || rawLower[afterEq] == '\r' || rawLower[afterEq] == '\f') {
+		afterEq++
+	}
 	if afterEq >= len(rawToken) {
 		return '"', false
 	}

Also applies to: 322-333

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/context.go` around lines 306 - 313, detectAttrQuoting
misclassifies attributes with spaces after '=' (e.g. `href = "x"`) because it
only skips whitespace before '=' but then checks the byte immediately after '='
for a quote; update detectAttrQuoting to, once it finds '=' (using variables
rawLower, absPos, attrLower and setting idx and eqOffset), advance a new pointer
(e.g. j) past any ASCII whitespace (spaces and tabs) after the '=' before
inspecting for a quote character, and ensure eqOffset still points to the '='
position relative to absPos while quote detection uses the post-'=' pointer so
quoted attributes with spaces around '=' are detected correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 131-143: The code mutates component state with
gr.Component.SetValue(gr.Key, payload) but only registers the restore defer
after gr.Component.Rebuild(), so if Rebuild() fails the mutated state leaks; to
fix, install the defer to restore the original value (using
gr.Component.SetValue(gr.Key, gr.OriginalValue)) immediately after a successful
SetValue call and before calling gr.Component.Rebuild(), so the original state
is always restored even when Rebuild() returns an error.

In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 24-25: Remove the dead inRCDATA state: delete the inRCDATA
variable declaration and remove the assignments to it (the writes that set
inRCDATA) in pkg/fuzz/analyzers/xss/context.go, and ensure any html.TextToken
handling does not reference or expect that state (adjust the html.TextToken
branch to use the existing context state instead). This eliminates the unused
local that causes a compile error while preserving correct token handling.

---

Duplicate comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 171-173: The CSP note currently checks
hasCSP(options.ResponseHeaders) which inspects the initial response headers;
change this to check the headers from the replayed/confirmed response instead
(e.g., call hasCSP on the replay response object such as
replayResponse.ResponseHeaders or replayResp.Headers used in the confirmation
step) so the "[note: CSP header present...]" annotation reflects the actual
headers seen when the reflection was confirmed.

In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 306-313: detectAttrQuoting misclassifies attributes with spaces
after '=' (e.g. `href = "x"`) because it only skips whitespace before '=' but
then checks the byte immediately after '=' for a quote; update detectAttrQuoting
to, once it finds '=' (using variables rawLower, absPos, attrLower and setting
idx and eqOffset), advance a new pointer (e.g. j) past any ASCII whitespace
(spaces and tabs) after the '=' before inspecting for a quote character, and
ensure eqOffset still points to the '=' position relative to absPos while quote
detection uses the post-'=' pointer so quoted attributes with spaces around '='
are detected correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ba2edd6-f7d8-4023-8957-781a7fd27816

📥 Commits

Reviewing files that changed from the base of the PR and between d826c01 and c554dbc.

📒 Files selected for processing (2)
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/context.go

Comment thread pkg/fuzz/analyzers/xss/analyzer.go Outdated
Comment thread pkg/fuzz/analyzers/xss/context.go Outdated
- analyzer.go: move defer restore before Rebuild() to prevent component
  state mutation leaks when Rebuild fails early (finding projectdiscovery#6)
- context.go: remove inRCDATA dead variable and its associated writes;
  both RCDATA and regular text are correctly handled as ContextHTMLText
  by the existing else branch (finding projectdiscovery#7)
- types.go: remove rcdataElements map, now unreferenced after inRCDATA
  dead code removal
- findings projectdiscovery#1-projectdiscovery#5 (isExecutableScriptType MIME whitelist, normalizeURIScheme
  decode whitespace strip, case-insensitive canary checks, data: URI
  media-type parsing, detectAttrQuoting word boundary) were already
  addressed in prior commits
@Gengyscan
Copy link
Copy Markdown
Contributor Author

Addressed all remaining CodeRabbit findings in commit abf46b5e:

Already fixed (from previous commits, CR comments referenced stale code):

  • isExecutableScriptType — full MIME whitelist + param stripping via ; split ✓
  • normalizeURIScheme — whitespace re-stripped after percent-decode ✓
  • Case-insensitive canary — strings.ToLower on both early exit and detectCharacterSurvival
  • data: URI — media-type parsed before ,/; only ✓
  • detectAttrQuoting — word boundary check + whitespace tolerance around =

Fixed in this commit:

  • analyzer.go:143 — moved defer restore immediately after SetValue, before Rebuild(). A Rebuild() failure no longer leaks the mutated component state.
  • context.go:25 — removed dead inRCDATA variable (written but never read) and unused rcdataElements map from types.go.

All tests pass (go test ./pkg/fuzz/... -timeout 120s).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/fuzz/analyzers/xss/analyzer.go (1)

171-173: ⚠️ Potential issue | 🟡 Minor

Use replay response headers for CSP note.

This still checks options.ResponseHeaders (initial response), so the note can be stale/mismatched for the replay request result.

🔧 Proposed fix
-		if hasCSP(options.ResponseHeaders) {
+		if hasCSP(resp.Header) {
 			details += " [note: CSP header present, may limit exploitability]"
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/analyzer.go` around lines 171 - 173, The CSP note
currently inspects the initial request headers via options.ResponseHeaders;
change it to inspect the headers from the replay result so the note reflects the
replayed response. Replace the hasCSP(options.ResponseHeaders) check and the
details append to use the replay response headers variable (the replay/result
response object used in this analyzer, e.g. replayResp.ResponseHeaders or
replayResult.ResponseHeaders) while keeping the hasCSP(...) call and the details
string unchanged otherwise so the message accurately reflects the replay
outcome.
pkg/fuzz/analyzers/xss/context.go (1)

298-303: ⚠️ Potential issue | 🟠 Major

Quoted attributes are still misdetected when whitespace follows =.

detectAttrQuoting handles spaces before =, but it reads the byte immediately after = without skipping whitespace. Cases like href = "x" are treated as unquoted, which can downgrade context accuracy and payload choice.

🔧 Proposed fix
 	for {
 		pos := strings.Index(rawLower[searchStart:], attrLower)
 		if pos < 0 {
 			break
 		}
 		absPos := searchStart + pos
 		if absPos == 0 || isAttrBoundary(rawLower[absPos-1]) {
 			// skip optional whitespace after attr name, then require =
 			i := absPos + len(attrLower)
-			for i < len(rawLower) && (rawLower[i] == ' ' || rawLower[i] == '\t') {
+			for i < len(rawLower) && (rawLower[i] == ' ' || rawLower[i] == '\t' || rawLower[i] == '\n' || rawLower[i] == '\r' || rawLower[i] == '\f') {
 				i++
 			}
 			if i < len(rawLower) && rawLower[i] == '=' {
 				idx = absPos
 				eqOffset = i - absPos
 				break
 			}
 		}
 		searchStart = absPos + 1
 	}
@@
 	afterEq := idx + eqOffset + 1
+	for afterEq < len(rawToken) && (rawToken[afterEq] == ' ' || rawToken[afterEq] == '\t' || rawToken[afterEq] == '\n' || rawToken[afterEq] == '\r' || rawToken[afterEq] == '\f') {
+		afterEq++
+	}
 	if afterEq >= len(rawToken) {
 		return '"', false
 	}

Also applies to: 312-317

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/context.go` around lines 298 - 303, detectAttrQuoting
currently skips whitespace before '=' but when it finds '=' it immediately
inspects the next byte and misclassifies cases like `href = "x"` as unquoted;
update the logic in detectAttrQuoting to advance i past any spaces/tabs after
the '=' before inspecting the next byte so it correctly detects quoted
attributes (i.e., after setting idx and eqOffset, skip whitespace: while i <
len(rawLower) && (rawLower[i] == ' ' || rawLower[i] == '\t') { i++ } then check
rawLower[i] for quotes), and apply the same whitespace-after-equals fix to the
analogous check around the other occurrence referenced (the block at the second
spot ~312-317) so both places treat `=` followed by spaces then a quote as
quoted attributes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 207-225: The code currently strips parameters from the MIME type
(variable normalized) before checking types, causing inputs like "module;foo" to
be treated as the special non-MIME token "module"; to fix, perform an ASCII
case-insensitive check for the exact token "module" (e.g.,
strings.EqualFold(normalized, "module") or compare lowercased) immediately
before the block that trims parameters and, if it matches, return true; only
then proceed to strip parameters and run the existing switch that lists MIME
types.

---

Duplicate comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 171-173: The CSP note currently inspects the initial request
headers via options.ResponseHeaders; change it to inspect the headers from the
replay result so the note reflects the replayed response. Replace the
hasCSP(options.ResponseHeaders) check and the details append to use the replay
response headers variable (the replay/result response object used in this
analyzer, e.g. replayResp.ResponseHeaders or replayResult.ResponseHeaders) while
keeping the hasCSP(...) call and the details string unchanged otherwise so the
message accurately reflects the replay outcome.

In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 298-303: detectAttrQuoting currently skips whitespace before '='
but when it finds '=' it immediately inspects the next byte and misclassifies
cases like `href = "x"` as unquoted; update the logic in detectAttrQuoting to
advance i past any spaces/tabs after the '=' before inspecting the next byte so
it correctly detects quoted attributes (i.e., after setting idx and eqOffset,
skip whitespace: while i < len(rawLower) && (rawLower[i] == ' ' || rawLower[i]
== '\t') { i++ } then check rawLower[i] for quotes), and apply the same
whitespace-after-equals fix to the analogous check around the other occurrence
referenced (the block at the second spot ~312-317) so both places treat `=`
followed by spaces then a quote as quoted attributes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 829bb8ae-e546-4ac6-a2df-b0d0e7cbf4c0

📥 Commits

Reviewing files that changed from the base of the PR and between c554dbc and abf46b5.

📒 Files selected for processing (3)
  • pkg/fuzz/analyzers/xss/analyzer.go
  • pkg/fuzz/analyzers/xss/context.go
  • pkg/fuzz/analyzers/xss/types.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/fuzz/analyzers/xss/types.go

Comment on lines +207 to +225
// Extract MIME essence by stripping parameters after ';'.
// Per WHATWG spec, "text/javascript; charset=utf-8" has essence "text/javascript" → executable.
if idx := strings.IndexByte(normalized, ';'); idx >= 0 {
normalized = strings.TrimSpace(normalized[:idx])
if normalized == "" {
return false // e.g. ";charset=utf-8" — invalid MIME type, not executable
}
}

switch normalized {
case "text/javascript", "text/ecmascript",
"text/javascript1.0", "text/javascript1.1", "text/javascript1.2",
"text/javascript1.3", "text/javascript1.4", "text/javascript1.5",
"text/jscript", "text/livescript",
"text/x-ecmascript", "text/x-javascript",
"application/javascript", "application/ecmascript",
"application/x-ecmascript", "application/x-javascript",
"module":
return true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

According to the WHATWG HTML Standard, is <script type="module;foo">executable, or must the value be an exact ASCII case-insensitive match ofmodule?

💡 Result:

According to the WHATWG HTML Standard, <script type="module;foo"> is NOT executable as a module script. The value must be an exact ASCII case-insensitive match of "module".

Citations:


🏁 Script executed:

cat -n pkg/fuzz/analyzers/xss/context.go | sed -n '190,230p'

Repository: projectdiscovery/nuclei

Length of output: 1578


type="module;foo" is incorrectly classified as executable.

The current code universally strips parameters after ; before checking the script type. For MIME types like "text/javascript; charset=utf-8", this is correct. However, module is not a MIME type—according to the WHATWG HTML Standard, it requires an exact ASCII case-insensitive match. The current logic would incorrectly classify type="module;foo" as executable when it is not.

Fix: Check for "module" before stripping parameters.

Proposed fix
 func isExecutableScriptType(hasType bool, scriptType string) bool {
 	if !hasType {
 		return true
 	}
 
 	normalized := strings.ToLower(strings.TrimSpace(scriptType))
 	if normalized == "" {
 		return true
 	}
+	// Module scripts require exact token match.
+	if normalized == "module" {
+		return true
+	}
 	// Extract MIME essence by stripping parameters after ';'.
 	// Per WHATWG spec, "text/javascript; charset=utf-8" has essence "text/javascript" → executable.
 	if idx := strings.IndexByte(normalized, ';'); idx >= 0 {
 		normalized = strings.TrimSpace(normalized[:idx])
 		if normalized == "" {
 			return false // e.g. ";charset=utf-8" — invalid MIME type, not executable
 		}
 	}
 
 	switch normalized {
 	case "text/javascript", "text/ecmascript",
 		"text/javascript1.0", "text/javascript1.1", "text/javascript1.2",
 		"text/javascript1.3", "text/javascript1.4", "text/javascript1.5",
 		"text/jscript", "text/livescript",
 		"text/x-ecmascript", "text/x-javascript",
 		"application/javascript", "application/ecmascript",
-		"application/x-ecmascript", "application/x-javascript",
-		"module":
+		"application/x-ecmascript", "application/x-javascript":
 		return true
 	default:
 		return false
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fuzz/analyzers/xss/context.go` around lines 207 - 225, The code currently
strips parameters from the MIME type (variable normalized) before checking
types, causing inputs like "module;foo" to be treated as the special non-MIME
token "module"; to fix, perform an ASCII case-insensitive check for the exact
token "module" (e.g., strings.EqualFold(normalized, "module") or compare
lowercased) immediately before the block that trims parameters and, if it
matches, return true; only then proceed to strip parameters and run the existing
switch that lists MIME types.

@Mzack9999
Copy link
Copy Markdown
Member

Closing this PR as the edge cases reported in Issue #7086 targeted the XSS context analyzer from PR #7076, which was not merged. The XSS context analyzer has been implemented via PR #7164 which correctly handles all reported edge cases by design. This PR also includes extensive reformatting of core HTTP protocol files. Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XSS Context Analyzer misclassifies javascript: URIs and JSON script blocks

2 participants