  • v1.12.5 Changes

    September 27, 2021

    ๐Ÿ”’ Security

    ๐Ÿ”’ [JRuby] Address CVE-2021-41098 (GHSA-2rr5-8q37-2w7h).

    0๏ธโƒฃ In Nokogiri v1.12.4 and earlier, on JRuby only, the SAX parsers resolve external entities (XXE) by default. This fix turns off entity-resolution-by-default in the JRuby SAX parsers to match the CRuby SAX parsers' behavior.

    ๐Ÿ’Ž CRuby users are not affected by this CVE.

    ๐Ÿ›  Fixed

    • ๐Ÿ’Ž [CRuby] Document#to_xhtml properly serializes self-closing tags in libxml > 2.9.10. A behavior change introduced in libxml 2.9.11 resulted in emitting start and and tags (e.g., <br></br>) instead of a self-closing tag (e.g., <br/>) in previous Nokogiri versions. [#2324]
  • v1.12.4 Changes

    August 29, 2021

    Notable fix: Namespace inheritance

    ๐Ÿ’Ž Namespace behavior when reparenting nodes has historically been poorly specified and the behavior diverged between CRuby and JRuby. As a result, making this behavior consistent in v1.12.0 introduced a breaking change.

    ๐Ÿš€ This patch release reverts the Builder behavior present in v1.12.0..v1.12.3 but keeps the Document behavior. This release also introduces a Document attribute to allow affected users to easily change this behavior for their legacy code without invasive changes.

    Compensating Feature in XML::Document

    ๐Ÿš€ This release of Nokogiri introduces a new Document boolean attribute, namespace_inheritance, which controls whether children should inherit a namespace when they are reparented. Nokogiri::XML:Document defaults this attribute to false meaning "do not inherit," thereby making explicit the behavior change introduced in v1.12.0.

    ๐Ÿ’Ž CRuby users who desire the pre-v1.12.0 behavior may set document.namespace_inheritance = true before reparenting nodes.

    See for example usage.

    ๐Ÿ›  Fix for XML::Builder

    ๐Ÿ— However, recognizing that we want Builder-created children to inherit namespaces, Builder now will set namespace_inheritance=true on the underlying document for both JRuby and CRuby. This means that, on CRuby, the pre-v1.12.0 behavior is restored.

    ๐Ÿ— Users who want to turn this behavior off may pass a keyword argument to the Builder constructor like so: false)

    ๐Ÿ— See for example usage.

    Downstream gem maintainers

    Note that any downstream gems may want to specifically omit Nokogiri v1.12.0--v1.12.3 from their dependency specification if they rely on child namespace inheritance: do |gem|
      # ...
      gem.add_runtime_dependency 'nokogiri', '!=1.12.3', '!=1.12.2', '!=1.12.1', '!=1.12.0'
      # ...

    ๐Ÿ›  Fixed

    • ๐Ÿ“œ [JRuby] Fix NPE in Schema parsing when an imported resource doesn't have a systemId. [#2296] (Thanks, @pepijnve!)
  • v1.12.3 Changes

    August 10, 2021

    ๐Ÿ›  Fixed

    • ๐Ÿ’Ž [CRuby] Fix compilation of libgumbo on older systems with versions of GCC that give errors on C99-isms. Affected systems include RHEL6, RHEL7, and SLES12. [#2302]
  • v1.12.2 Changes

    August 04, 2021

    ๐Ÿ›  Fixed

    • [CRuby] Ensure that C extension files in non-native gem installations are loaded using require and rely on $LOAD_PATH instead of using require_relative. This issue only exists when deleting shared libraries that exist outside the extensions directory, something users occasionally do to conserve disk space. [#2300]
  • v1.12.1 Changes

    August 03, 2021

    ๐Ÿ›  Fixed

    • ๐Ÿ’Ž [CRuby] Fix compilation of libgumbo on BSD systems by avoiding GNU-isms. [#2298]
  • v1.12.0 Changes

    August 02, 2021

    ๐Ÿ’Ž Notable Addition: HTML5 Support (CRuby only)

    ๐Ÿ‘ HTML5 support has been added (to CRuby only) by merging Nokogumbo into Nokogiri. The Nokogumbo public API has been preserved, so this functionality is available under the Nokogiri::HTML5 namespace. [#2204]

    ๐Ÿ’Ž Please note that HTML5 support is not available for JRuby in this version. However, we feel it is important to think about JRuby and we hope to work on this in the future. If you're interested in helping with HTML5 support on JRuby, please reach out to the maintainers by commenting on issue #2227.

    ๐Ÿ“œ Many thanks to Sam Ruby, Steve Checkoway, and Craig Barnes for creating and maintaining Nokogumbo and supporting the Gumbo HTML5 parser. They're now Nokogiri core contributors with all the powers and privileges pertaining thereto. ๐Ÿ™Œ

    Notable Change: Nokogiri::HTML4 module and namespace

    ๐Ÿš€ Nokogiri::HTML has been renamed to Nokogiri::HTML4, and Nokogiri::HTML is aliased to preserve backwards-compatibility. Nokogiri::HTML and Nokogiri::HTML4 parse methods still use libxml2's (or NekoHTML's) HTML4 parser in the v1.12 release series.

    Take special note that if you rely on the class name of an object in your code, objects will now report a class of Nokogiri::HTML4::Foo where they previously reported Nokogiri::HTML::Foo. Instead of relying on the string returned by Object#class, prefer Class#=== or Object#is_a? or Object#instance_of?.

    ๐Ÿš€ Future releases of Nokogiri may deprecate HTML methods or otherwise change this behavior, so please start using HTML4 in place of HTML.

    โž• Added

    • [CRuby] Nokogiri::VERSION_INFO["libxslt"]["datetime_enabled"] is a new boolean value which describes whether libxslt (or, more properly, libexslt) has compiled-in datetime support. This generally going to be true, but some distros ship without this support (e.g., some mingw UCRT-based packages, see See #2272 for more details.

    ๐Ÿ”„ Changed

    • 0๏ธโƒฃ Introduce a new constant, Nokogiri::XML::ParseOptions::DEFAULT_XSLT, which adds the libxslt-preferred options of NOENT | DTDLOAD | DTDATTR | NOCDATA to ParseOptions::DEFAULT_XML.
    • ๐Ÿ’… Nokogiri.XSLT parses stylesheets using ParseOptions::DEFAULT_XSLT, which should make some edge-case XSL transformations match libxslt's default behavior. [#1940]

    ๐Ÿ›  Fixed

    • ๐Ÿ’Ž [CRuby] Namespaced attributes are handled properly when their parent node is reparented into another document. Previously, the namespace may have gotten dropped. [#2228]
    • ๐Ÿ’Ž [CRuby] Reparented nodes no longer inherit their parent's namespace. Previously, a node without a namespace was forced to adopt its parent's namespace. [#1712, #425]

    ๐Ÿ‘Œ Improved

    • ๐Ÿ“ฆ [CRuby] Speed up (slightly) the compile time of packaged libraries libiconv, libxml2, and libxslt by using autoconf's --disable-dependency-tracking option. ("ruby" platform gem only.)

    ๐Ÿ—„ Deprecated

    • ๐Ÿšš Deprecating Nokogumbo's Nokogiri::HTML5.get. This method will be removed in a future version of Nokogiri.


    • โฌ†๏ธ [CRuby] Upgrade mini_portile2 dependency from ~> 2.5.0 to ~> 2.6.1. ("ruby" platform gem only.)
  • v1.11.7 Changes

    June 02, 2021

    ๐Ÿ›  Fixed

    • ๐Ÿ’Ž [CRuby] Backporting an upstream fix to XPath recursion depth limits which impacted some users of complex XPath queries. This issue is present in libxml 2.9.11 and 2.9.12. [#2257]
  • v1.11.6 Changes

    May 26, 2021

    ๐Ÿ›  Fixed

    • ๐Ÿ’Ž [CRuby] DocumentFragment#path now does proper error-checking to handle behavior introduced in libxml > 2.9.10. In v1.11.4 and v1.11.5, calling DocumentFragment#path could result in a segfault.
  • v1.11.5 Changes

    May 19, 2021

    ๐Ÿ›  Fixed

    ๐Ÿ [Windows CRuby] Work around segfault at process exit on Windows when using libxml2 system DLLs.

    ๐Ÿ libxml 2.9.12 introduced new behavior to avoid memory leaks when unloading libxml2 shared libraries (see libxml/!66). Early testing caught this segfault on non-Windows platforms (see #2059 and [email protected]) but it was incompletely fixed and is still an issue on Windows platforms that are using system DLLs.

    We work around this by configuring libxml2 in this situation to use its default memory management functions. Note that if Nokogiri is not on Windows, or is not using shared system libraries, it will will continue to configure libxml2 to use Ruby's memory management functions. Nokogiri::VERSION_INFO["libxml"]["memory_management"] will allow you to verify when the default memory management functions are being used. [#2241]

    โž• Added

    Nokogiri::VERSION_INFO["libxml"] now contains the key "memory_management" to declare whether libxml2 is using its default memory management functions, or whether it uses the memory management functions from ruby. See above for more details.

  • v1.11.4 Changes

    May 14, 2021

    ๐Ÿ”’ Security

    โฌ†๏ธ [CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:

    ๐Ÿš€ Note that two additional CVEs were addressed upstream but are not relevant to this release. CVE-2021-3516 via xmllint is not present in Nokogiri, and CVE-2020-7595 has been patched in Nokogiri since v1.10.8 (see #1992).

    ๐Ÿ”’ Please see nokogiri/GHSA-7rrm-v45f-jp64 or #2233 for a more complete analysis of these CVEs and patches.


    • ๐Ÿš€ [CRuby] vendored libxml2 is updated from 2.9.10 to 2.9.12. (Note that 2.9.11 was skipped because it was superseded by 2.9.12 a few hours after its release.)