Changelog History
Page 3
-
v1.11.3 Changes
April 07, 2021๐ Fixed
- ๐ [CRuby] Passing non-
Node
objects toDocument#root=
now raises anArgumentError
exception. Previously this likely segfaulted. [#1900] - ๐ [JRuby] Passing non-
Node
objects toDocument#root=
now raises anArgumentError
exception. Previously this raised aTypeError
exception. - ๐ [CRuby] arm64/aarch64 systems (like Apple's M1) can now compile libxml2 and libxslt from source (though we continue to strongly advise users to install the native gems for the best possible experience)
- ๐ [CRuby] Passing non-
-
v1.11.2 Changes
March 11, 2021๐ Fixed
- ๐ [CRuby]
NodeSet
may now safely containNode
objects from multiple documents. Previously the GC lifecycle of the parentDocument
objects could lead to nodes being GCed while still in scope. [#1952] - ๐ [CRuby] Patch libxml2 to avoid "huge input lookup" errors on large CDATA elements. (See upstream GNOME/libxml2#200 and GNOME/libxml2!100.) [#2132].
- ๐ [CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and link against
nokogiri.so
by includingLDFLAGS
inNokogiri::VERSION_INFO
. [#2167] - ๐ [CRuby]
{XML,HTML}::Document.parse
now invokes#initialize
exactly once. Previously#initialize
was invoked twice on each object. - ๐ [JRuby]
{XML,HTML}::Document.parse
now invokes#initialize
exactly once. Previously#initialize
was not called, which was a problem for subclassing such as done byLoofah
.
๐ Improved
- โฌ๏ธ Reduce the number of object allocations needed when parsing an
HTML::DocumentFragment
. [#2087] (Thanks, @ashmaroli!) - โก๏ธ [JRuby] Update the algorithm used to calculate
Node#line
to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [#1223, #2177] - ๐ Introduce
--enable-system-libraries
and--disable-system-libraries
flags toextconf.rb
. These flags provide the same functionality as--use-system-libraries
and theNOKOGIRI_USE_SYSTEM_LIBRARIES
environment variable, but are more idiomatic. [#2193] (Thanks, @eregon!) - ๐ฆ [TruffleRuby]
--disable-static
is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [#2191, #2193] (Thanks, @eregon!)
๐ Changed
- ๐
Nokogiri::XML::Path
is now a Module (previously it has been a Class). It has been acting solely as a Module since v1.0.0. See 8461c74.
- ๐ [CRuby]
-
v1.11.1 Changes
January 06, 2021๐ Fixed
- ๐ [CRuby] If
libxml-ruby
is loaded beforenokogiri
, the SAX and Push parsers no longer calllibxml-ruby
's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]
- ๐ [CRuby] If
-
v1.11.0 Changes
January 03, 2021Notes
๐ง Faster, more reliable installation: Native Gems for Linux and OSX/Darwin
"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.
๐ We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:
- ๐ง Linux:
x86-linux
andx86_64-linux
-- including musl platforms like alpine - OSX/Darwin:
x86_64-darwin
andarm64-darwin
We'd appreciate your thoughts and feedback on this work at #2075.
Dependencies
๐ Ruby
๐ This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.
๐ This release ends support for:
- ๐ Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks @ashmaroli!)
- ๐ Ruby 2.4, for which official support ended on 2020-04-05
- ๐ JRuby 9.1, which is the Ruby 2.3-compatible release.
Gems
- Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
- โฌ๏ธ [MRI] Upgrade mini_portile2 dependency from
~> 2.4.0
to~> 2.5.0
[#2005] (Thanks, @alejandroperea!)
๐ Security
๐ See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".
โ Added
- Add Node methods for manipulating "keyword attributes" (for example,
class
andrel
):#kwattr_values
,#kwattr_add
,#kwattr_append
, and#kwattr_remove
. [#2000] - โ Add support for CSS queries
a:has(> b)
,a:has(~ b)
, anda:has(+ b)
. [#688] (Thanks, @jonathanhefner!) - โ Add
Node#value?
to better match expected semantics of a Hash-like object. [#1838, #1840] (Thanks, @MatzFan!) - ๐ [CRuby] Add
Nokogiri::XML::Node#line=
for use by downstream libs like nokogumbo. [#1918] (Thanks, @stevecheckoway!) - ๐
nokogiri.gemspec
is back after a 10-year hiatus. We still prefer you use the official releases, butmain
is pretty stable these days, and YOLO.
๐ Performance
- ๐ [CRuby] The CSS
~=
operator and class selector.
are about 2x faster. [#2137, #2135] - โก๏ธ [CRuby] Patch libxml2 to call
strlen
fromxmlStrlen
rather than the naive implementation, becausestrlen
is generally optimized for the architecture. [#2144] (Thanks, @ilyazub!) - ๐ Improve performance of some namespace operations. [#1916] (Thanks, @ashmaroli!)
- โ Remove unnecessary array allocations from Node serialization methods [#1911] (Thanks, @ashmaroli!)
- Avoid creation of unnecessary zero-length String objects. [#1970] (Thanks, @ashmaroli!)
- Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks, @ilyazub!)
- ๐ [JRuby] Lots of code cleanup and performance improvements. [#1934] (Thanks, @kares!)
- ๐ [CRuby]
RelaxNG.from_document
no longer leaks memory. [#2114]
๐ Improved
- ๐ [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
- ๐
{HTML,XML}::Document#parse
now acceptPathname
objects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because theread
method would be repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!) - ๐ [CRuby] Nokogumbo builds faster because it can now use header files provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
- Add
frozen_string_literal: true
magic comment to alllib
files. [#1745] (Thanks, @oniofchaos!) - ๐ [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)
๐ Fixed
- ๐ HTML Parsing in "strict" mode (i.e., the
RECOVER
parse option not set) now correctly raises aXML::SyntaxError
exception. Previously the value of theRECOVER
bit was being ignored by CRuby and was misinterpreted by JRuby. [#2130] - The CSS
~=
operator now correctly handles non-space whitespace in theclass
attribute. commit e45dedd - The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
- The Node methods
add_previous_sibling
,previous=
,before
,add_next_sibling
,next=
,after
,replace
, andswap
now correctly use their parent as the context node for parsing markup. These methods now also raise aRuntimeError
if they are called on a node with no parent. [nokogumbo#160] - ๐ [JRuby]
XML::Schema
XSD validation errors are captured inXML::Schema#errors
. These errors were previously ignored. - ๐ [JRuby] Standardize reading from IO like objects, including StringIO. [#1888, #1897]
- ๐ [JRuby] Fix how custom XPath function namespaces are inferred to be less naive. [#1890, #2148]
- ๐ [JRuby] Clarify exception message when custom XPath functions can't be resolved.
- ๐ [JRuby] Comparison of Node to Document with
Node#<=>
now matches CRuby/libxml2 behavior. - ๐ [CRuby] Syntax errors are now correctly captured in
Document#errors
for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler. - ๐จ [CRuby] Fixed installation on AIX with respect to
vasprintf
. [#1908] - ๐ [CRuby] On some platforms, avoid symbol name collision with glibc's
canonicalize
. [#2105] - ๐ [Windows Visual C++] Fixed compiler warnings and errors. [#2061, #2068]
- ๐ [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release candidates. [#1788] (Thanks, @stevecheckoway!)
- ๐ [JRuby] Fixed document encoding regression in v1.11.0 release candidates. [#2080, #2083] (Thanks, @thbar!)
โ Removed
- ๐ The internal method
Nokogiri::CSS::Parser.cache_on=
has been removed. Use.set_cache
if you need to muck with the cache internals. - ๐ The class method
Nokogiri::CSS::Parser.parse
has been removed. This was originally deprecated in 2009 in 13db61b. UseNokogiri::CSS.parse
instead.
๐ Changed
0๏ธโฃ
XML::Schema
input is now "untrusted" by defaultโ Address CVE-2020-26247.
In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by
Nokogiri::XML::Schema
were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.
๐ Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".
๐ More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the public advisory.
๐ HTML parser now obeys the
strict
ornorecover
parsing option๐ (Also noted above in the "Fixed" section) HTML Parsing in "strict" mode (i.e., the
RECOVER
parse option not set) now correctly raises aXML::SyntaxError
exception. Previously the value of theRECOVER
bit was being ignored by CRuby and was misinterpreted by JRuby.๐ If you're using the default parser options, you will be unaffected by this fix. If you're passing
strict
ornorecover
to your HTML parser call, you may be surprised to see that the parser now fails to recover and raises aXML::SyntaxError
exception. Given the number of HTML documents on the internet that libxml2 would consider to be ill-formed, this is probably not what you want, and you can omit setting that parse option to restore the behavior that you have been relying upon.๐ Apologies to anyone inconvenienced by this breaking bugfix being present in a minor release, but I felt it was appropriate to introduce this fix because it's straightforward to fix any code that has been relying on this buggy behavior.
VersionInfo
, the output ofnokogiri -v
, and related constants๐ This release changes the metadata provided in
Nokogiri::VersionInfo
which also affects the output ofnokogiri -v
. Some related constants have also been changed. If you're usingVersionInfo
programmatically, or relying on constants related to underlying library versions, please read the detailed changes forNokogiri::VersionInfo
at #2139 and accept our apologies for the inconvenience. - ๐ง Linux:
-
v1.11.0.rc3 Changes
September 08, 2020v1.11.0.rc3 / 2020-09-08
๐ To try out release candidates, use
gem install --prerelease
orgem install nokogiri -v1.11.0.rc3
โก๏ธ If you're using bundler, try updating your Gemfile with:
gem "nokogiri", "~\> 1.11.0.rc3"`
๐ Delta since v1.11.0.rc2:
Notes
โ Added precompiled native gem support for OSX/Darwin platform
x86_64-darwin19
.๐ Fixed
-
v1.11.0.rc2 Changes
April 01, 2020v1.11.0.rc2 / 2020-04-01
๐ To try out release candidates, use
gem install --prerelease
. Latest isv1.11.0.rc2
.๐ Delta since v1.11.0.rc1:
Notes
๐ง Note that the linux-native gems for v1.11.0.rc2 and later support musl systems (e.g., alpine).
Dependencies
- โฌ๏ธ [MRI] Upgrade mini_portile2 dependency from
~> 2.4.0
to~> 2.5.0
[#2005] (Thanks, @alejandroperea!)
โ Added
- Add Node methods for manipulating keyword attributes (like
class
andrel
):#kwattr_values
,#kwattr_add
,#kwattr_append
, and#kwattr_remove
. [#2000]
๐ Fixed
- The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
- The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
โ Removed
- ๐ The internal method
Nokogiri::CSS::Parser.cache_on=
has been removed. Use.set_cache
if you need to muck with the cache internals. - ๐ The method
Nokogiri::CSS::Parser.parse
has been removed. This was originally deprecated in 2009 in 13db61b.
- โฌ๏ธ [MRI] Upgrade mini_portile2 dependency from
-
v1.11.0.rc1 Changes
February 03, 2020v1.11.0.rc1 / 2020-02-02
๐ To try out release candidates, use
gem install --prerelease
.Notes
๐ง Experiment: Pre-Compiled Native Linux Gems
๐ง With the v1.11.0 release candidates, we are experimenting with shipping pre-compiled native Linux gems for the
x86-linux
andx86_64-linux
platforms.๐ง If this works properly for you, it will speed up installation time on Linux.
๐ง If this doesn't work for you, please drop us a note at #1983, we may reach out to you for more information on your distro and configuration.
Either way, we'd appreciate some feedback at #1983.
Dependencies
๐ This release introduces support for:
- ๐ Ruby 2.7, including the precompiled native binary gems for Windows.
๐ This release ends support for:
- ๐ Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks @ashmaroli!)
- ๐ JRuby 9.1, which is the Ruby 2.3-compatible release.
โ Added
- โ Add support for CSS queries "a:has(> b)", "a:has(~ b)", and "a:has(+ b)". [#688] (Thanks, @jonathanhefner!)
- โ Add
Node#value?
to better match expected semantics of a Hash-like object. [#1838, #1840] (Thanks, @MatzFan!) - [MRI] Add
Nokogiri::XML::Node#line=
for use by downstream libs like nokogumbo. [#1918] (Thanks, @stevecheckoway!)
๐ Improved
- Add
frozen_string_literal: true
magic comment to alllib
files. [#1745] (Thanks, @oniofchaos!) - ๐ Improve performance of some namespace operations. [#1916] (Thanks, @ashmaroli!)
- โ Remove unnecessary array allocations from Node serialization methods [#1911] (Thanks, @ashmaroli!)
- Avoid creation of unnecessary zero-length String objects. [#1970] (Thanks, @ashmaroli!)
- ๐ [JRuby] Lots of code cleanup and performance improvements. [#1934] (Thanks, @kares!)
๐ Fixed
- ๐ [JRuby] Standardize reading from IO like objects, including StringIO. [#1888, #1897]
- ๐ [JRuby] Change
NodeSet#to_a
to return a RubyArray instead of Object, for compilation under JRuby 9.2.9 and later. [#1968, #1969] (Thanks, @headius!)
๐ Changed
VersionInfo
and the output ofnokogiri -v
๐ This release changes the information provided in
๐Nokogiri::VersionInfo
, see #1482 and #1974 for background. Note that
the output ofnokogiri -v
will also reflect these changes.Nokogiri::VersionInfo
will no longer contain the following keys (previously these were set only when vendored libraries were being used)libxml/libxml2_path
libxml/libxslt_path
๐
Nokogiri::VersionInfo
now contains version metadata for libxslt:- ๐ฆ
libxslt/source
(either "packaged" or "system", similar tolibxml/source
) libxslt/compiled
(the version of libxslt compiled at installation time, similar tolibxml/compiled
)libxslt/loaded
(the version of libxslt loaded at runtime, similar tolibxml/loaded
)- ๐
libxslt/patches
moved fromlibxml/libxslt_patches
Nokogiri::VersionInfo
keylibxml/libxml2_patches
has been renamed tolibxml/patches
These C macros will no longer be defined:
NOKOGIRI_LIBXML2_PATH
NOKOGIRI_LIBXSLT_PATH
These global variables will no longer be defined:
NOKOGIRI_LIBXML2_PATH
NOKOGIRI_LIBXSLT_PATH
These constants have been renamed:
- ๐
Nokogiri::LIBXML_VERSION
is nowNokogiri::LIBXML_COMPILED_VERSION
- ๐
Nokogiri::LIBXML_PARSER_VERSION
is nowNokogiri::LIBXML_LOADED_VERSION
These methods have been renamed and the return type changed from
String
toGem::Version
:- ๐
VersionInfo#loaded_parser_version
is now#loaded_libxml_version
- ๐
VersionInfo#compiled_parser_version
is now#compiled_libxml_version
โ
Nokogiri.uses_libxml?
now accepts an optional requirement string which is interpreted as aGem::Requirement
and tested against the loaded libxml2 version (the value inVersionInfo
keylibxml/loaded
). This greatly simplifies much of the version-dependent branching logic in both the implementation and the tests.๐ To sum these changes up, the output from CRuby when using vendored libraries was something like:
# Nokogiri (1.10.7) --- warnings: [] nokogiri: 1.10.7 ruby: version: 2.7.0 platform: x86_64-linux description: ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux] engine: ruby libxml: binding: extension source: packaged libxml2_path: "/home/flavorjones/.rvm/gems/ruby-2.7.0/gems/nokogiri-1.10.7/ports/x86_64-pc-linux-gnu/libxml2/2.9.10" libxslt_path: "/home/flavorjones/.rvm/gems/ruby-2.7.0/gems/nokogiri-1.10.7/ports/x86_64-pc-linux-gnu/libxslt/1.1.34" libxml2_patches: - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch - 0002-Remove-script-macro-support.patch - 0003-Update-entities-to-remove-handling-of-ssi.patch - 0004-libxml2.la-is-in-top_builddir.patch libxslt_patches: [] compiled: 2.9.10 loaded: 2.9.10
but now looks like:
# Nokogiri (1.11.0) --- warnings: [] nokogiri: 1.11.0 ruby: version: 2.7.0 platform: x86_64-linux description: ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux] engine: ruby libxml: source: packaged patches: - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch - 0002-Remove-script-macro-support.patch - 0003-Update-entities-to-remove-handling-of-ssi.patch - 0004-libxml2.la-is-in-top_builddir.patch compiled: 2.9.10 loaded: 2.9.10 libxslt: source: packaged patches: [] compiled: 1.1.34 loaded: 1.1.34
and the output from using system libraries now looks like:
# Nokogiri (1.11.0) --- warnings: [] nokogiri: 1.11.0 ruby: version: 2.7.0 platform: x86_64-linux description: ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux] engine: ruby libxml: source: system compiled: 2.9.4 loaded: 2.9.4 libxslt: source: system compiled: 1.1.29 loaded: 1.1.29
-
v1.10.10 Changes
July 06, 20201.10.10 / 2020-07-06
๐ Features
- ๐ [MRI] Cross-built Windows gems now support Ruby 2.7 [#2029]. Note that prior to this release, the v1.11.x prereleases provided this support.
-
v1.10.9 Changes
March 01, 20201.10.9 / 2020-03-01
๐ Fixed
- ๐ [MRI] Raise an exception when Nokogiri detects a specific libxml2 edge case involving blank Schema nodes wrapped by Ruby objects that would cause a segfault. Currently no fix is available upstream, so we're preventing a dangerous operation and informing users to code around it if possible. [#1985, #2001]
- ๐ [JRuby] Change
NodeSet#to_a
to return a RubyArray instead of Object, for compilation under JRuby 9.2.9 and later. [#1968, #1969] (Thanks, @headius!)
-
v1.10.8 Changes
February 10, 20201.10.8 / 2020-02-10
๐ Security
๐ [MRI] Pulled in upstream patch from libxml that addresses CVE-2020-7595. Full details are available in #1992. Note that this patch is not yet (as of 2020-02-10) in an upstream release of libxml.