🐛 Bug Fixes
- 🛠 Fixed an HTML sanitization bypass that could allow XSS. This issue affects Sanitize versions 3.0.0 through 5.2.0.
When HTML was sanitized using the "relaxed" config or a custom config that allows certain elements, some content in a
<svg>element may not have beeen sanitized correctly even if
svgwere not in the allowlist. This could allow carefully crafted input to sneak arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
You are likely to be vulnerable to this issue if you use Sanitize's relaxed config or a custom config that allows one or more of the following HTML elements:
- `iframe` - `math` - `noembed` - `noframes` - `noscript` - `plaintext` - `script` - `style` - `svg` - `xmp`
See the security advisory for more details, including a workaround if you're not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
Many thanks to Michał Bentkowski of Securitum for reporting this issue and helping to verify the fix.
- The term "whitelist" has been replaced with "allowlist" throughout Sanitize's source and documentation.
While the etymology of "whitelist" may not be explicitly racist in origin or intent, there are inherent racial connotations in the implication that white is good and black (as in "blacklist") is not.
This is a change I should have made long ago, and I apologize for not making it sooner.
- In transformer input, the
:node_whitelistkeys are now deprecated. New
:node_allowlistkeys have been added. The old keys will continue to work in order to avoid breaking existing code, but they are no longer documented and may be removed in a future semver major release.
- ➕ Added a
:parser_optionsconfig hash, which makes it possible to pass custom parsing options to Nokogumbo. @austin-wang - #194
🐛 Bug Fixes
- Non-characters and non-whitespace control characters are now stripped from HTML input before parsing to comply with the HTML Standard's [preprocessing guidelines][html-preprocessing]. Prior to this Sanitize had adhered to older W3C guidelines that have since been withdrawn. #179
- ➕ Added a
⬆️ For most users, upgrading from 4.x shouldn't require any changes. However, the 💎 minimum required Ruby version has changed, and Sanitize 5.x's HTML output may differ in some small ways from 4.x's output. If this matters to you, please review the changes below carefully.
Potentially Breaking Changes
💎 Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely no longer works in Ruby 1.9.x.
⬆️ Upgraded to Nokogumbo 2.x, which fixes various bugs and adds standard-compliant HTML serialization. @stevecheckoway - #189
🚚 Children of the following elements are now removed by default when these elements are removed, rather than being preserved and escaped:
🚚 Children of allowlisted
iframeelements are now always removed. In modern HTML,
iframeelements should never have children. In HTML 4 and earlier
iframeelements were allowed to contain fallback content for legacy browsers, but it's been almost two decades since that was useful.
🛠 Fixed a bug that caused
:remove_contentsto behave as if it were set to
truewhen it was actually an Array.