Encoding Markup in .NET
There are at least four ways to escape HTML or XML characters to entities in .NET, but they work quite differently.
TL;DR: In #PowerShell, the best options to encode to HTML or XML are
[Security.SecurityElement]::Escape()
(minimal) or[Text.Encodings.Web.HtmlEncoder]::Default.Encode()
(comprehensive).
Escape
System.Security.SecurityElement.Escape()
is the simplest encoder, only escaping &
<
>
"
and '
and
passing through all other characters unchanged.
This is fine if you want something lightweight (though not as lightweight as doing the five search-and-replace operations yourself), and you don't want or need any special encoding for any other characters.
Escape Effect
codepoint(s) | name | encoded? | format |
---|---|---|---|
U+0022 | QUOTATION MARK | ✔️ | " |
U+0026 | AMPERSAND | ✔️ | & |
U+0027 | APOSTROPHE | ✔️ | ' |
U+003C | LESS-THAN SIGN | ✔️ | < |
U+003E | GREATER-THAN SIGN | ✔️ | > |
all others | Basic Multilingual Plane (remaining) | ❌ | (unescaped) |
Encode
System.Text.Encodings.Web.HtmlEncoder.Default.Encode()
is a more comprehensive encoder to not only encode the bare
minimum characters, but also to encode anything outside 7-bit ASCII for compatibility, using hex codepoint entities
instead of named entities, which work for both HTML and XML.
Encode Effect
codepoint(s) | name | encoded? | format |
---|---|---|---|
U+0000–U+001F | C0 Controls | ✔️ | � – F; |
U+0020 | SPACE | ❌ | |
U+0021 | EXCLAMATION MARK | ❌ | ! |
U+0022 | QUOTATION MARK | ✔️ | " |
U+0023 | NUMBER SIGN | ❌ | # |
U+0024 | DOLLAR SIGN | ❌ | $ |
U+0025 | PERCENT SIGN | ❌ | % |
U+0026 | AMPERSAND | ✔️ | & |
U+0027 | APOSTROPHE | ✔️ | ' |
U+0028 | LEFT PARENTHESIS | ❌ | ( |
U+0029 | RIGHT PARENTHESIS | ❌ | ) |
U+002A | ASTERISK | ❌ | * |
U+002B | PLUS SIGN | ✔️ | + |
U+002C–U+003B | Basic Latin (partial) | ❌ | , – ; |
U+003C | LESS-THAN SIGN | ✔️ | < |
U+003D | EQUALS SIGN | ❌ | = |
U+003E | GREATER-THAN SIGN | ✔️ | > |
U+003F–U+007E | Basic Latin (remaining printable) | ❌ | ? – ~ |
U+007F–U+FFFF | Basic Multilingual Plane (remaining) | ✔️ |  –  |
HtmlEncode
System.Web.HttpUtility.HtmlEncode()
only encodes the minimal symbols as named entities (except decimal for apostrophe,
for maximum HTML compatibility with extremely old browsers and HTML parsers), and the Latin-1 Supplement as decimal
entities. It doesn't encode any control characters or any characters outside the Latin blocks.
HtmlEncode Effect
codepoint(s) | name | encoded? | format |
---|---|---|---|
U+0000–U+001F | C0 Controls | ❌ | ␀ – ␟ |
U+0020 | SPACE | ❌ | |
U+0021 | EXCLAMATION MARK | ❌ | ! |
U+0022 | QUOTATION MARK | ✔️ | " |
U+0023 | NUMBER SIGN | ❌ | # |
U+0024 | DOLLAR SIGN | ❌ | $ |
U+0025 | PERCENT SIGN | ❌ | % |
U+0026 | AMPERSAND | ✔️ | & |
U+0027 | APOSTROPHE | ✔️ | ' |
U+0028–U+003B | Basic Latin (partial) | ❌ | ( – ; |
U+003C | LESS-THAN SIGN | ✔️ | < |
U+003D | EQUALS SIGN | ❌ | = |
U+003E | GREATER-THAN SIGN | ✔️ | > |
U+003F–U+007E | Basic Latin (remaining) | ❌ | ? – ␡ |
U+0080–U+009F | C1 Controls | ❌ | PAD – APC |
U+00AD–U+00FF | Latin-1 Supplement | ✔️ |   – ÿ |
U+0100–U+FFFF | Basic Multilingual Plane (remaining) | ❌ | Ā – U+FFFF† |
† not a valid codepoint
🏚️ AntiXssEncoder
System.Web.Security.AntiXss.AntiXssEncoder
offered a variety of encoding choices, but was discontinued after .NET Framework 4.8.1.
Encoding is accomplished with any of three methods: HtmlEncode()
, XmlAttributeEncode()
, or XmlEncode()
.
These only encode codepoints up through U+00A0 then U+0370 and above as a decimal entity, and mangles a bunch of the characters in the ranges it doesn't encode, so it's a bad choice for a number of reasons.
AntiXssEncoder Effect
codepoint(s) | name | encoded? | format | notes |
---|---|---|---|---|
U+0000–U+001F | C0 Controls | ✔️ | � –  |
|
U+0020 | SPACE | ✔️ |   |
XmlAttributeEncode() |
U+0020 | SPACE | ❌ | HtmlEncode() & XmlEncode() |
|
U+0021 | EXCLAMATION MARK | ❌ | ! | |
U+0022 | QUOTATION MARK | ✔️ | " |
|
U+0023 | NUMBER SIGN | ❌ | # | |
U+0024 | DOLLAR SIGN | ❌ | $ | |
U+0025 | PERCENT SIGN | ❌ | % | |
U+0026 | AMPERSAND | ✔️ | & |
|
U+0027 | APOSTROPHE | ✔️ | ' |
XmlAttributeEncode() & XmlEncode() |
U+0027 | APOSTROPHE | ✔️ | ' |
HtmlEncode() |
U+0028–U+003B | Basic Latin (partial) | ❌ | ( – ; | some printable 7-bit ASCII |
U+003C | LESS-THAN SIGN | ✔️ | < |
|
U+003D | EQUALS SIGN | ❌ | = | |
U+003E | GREATER-THAN SIGN | ✔️ | > |
|
U+003F–U+007E | Basic Latin (remaining printable) | ❌ | ? – ~ | |
U+007F | DELETE | ✔️ |  |
|
U+0080–U+009F | C1 Controls | ✔️ |  – Ÿ |
|
U+00A0 | NO-BREAK SPACE | ✔️ |   |
|
U+00A1–U+00AC | Latin-1 Supplement (partial) | ❌️ | ¡ – ¬ | |
U+00AD | SOFT HYPHEN | ✔️ | ­ |
|
U+00AE–U+036F | Latin (remaining), various extensions | ❌️ | ® – ͯ | see † |
U+0370–U+FFFF | Basic Multilingual Plane (remaining) | ✔️ | Ͱ –  |
† these blocks:
- Latin-1 Supplement (remaining)
- Latin Extended-A
- Latin Extended-B
- IPA Extensions
- Spacing Modifier Letters
- Combining Diacritical Marks