All Versions
46
Latest Version
3.3
Avg Release Cycle
78 days
Latest Release
1341 days ago

Changelog History
Page 5

  • v0.2.1 Changes

    March 02, 2015

    ๐Ÿ‘ Proper HTML serializing support for script tags

    When serializing an HTML document back to HTML (as a String) the contents of <script> tags are serialized correctly. Previously XML unsafe characters (e.g. <) would be converted to XML entities, which results in invalid Javascript syntax. This has been changed so that <script> tags in HTML documents don't have their contents converted, ensuring proper Javascript syntax upon output.

    ๐Ÿ‘€ See commit 874d7124af540f0bc78e6c586868bbffb4310c5d and issue https://gitlab.com/yorickpeterse/oga/issues/79 for more information.

    ๐Ÿ‘ Proper lexing support for script tags

    When lexing HTML documents the XML lexer is now capable of lexing the contents of <script> tags properly. Previously input such as <script>x >y</script> ๐Ÿ‘€ would result in incorrect tokens being emitted. See commit ba2177e2cfda958ea12c5b04dbf60907aaa8816d and issue https://gitlab.com/yorickpeterse/oga/issues/70 for more information.

    Element Inner Text

    When setting the inner text of an element using Oga::XML::Element#inner_text= all child nodes of the element are now removed first, instead of only text ๐Ÿšš nodes being removed.

    ๐Ÿ‘€ See https://gitlab.com/yorickpeterse/oga/issues/64 for more information.

    ๐Ÿ‘Œ Support for extra XML entities

    ๐Ÿ‘Œ Support for encoding/decoding extra XML entities was added by Dmitry ๐Ÿ‘€ Krasnoukhov. This includes entities such as &#60, &#34, etc. See commit 26baf89440d97bd9dd5e50ec3d6d9b7ab3bdf737 for more information.

    ๐Ÿ‘Œ Support for inline doctypes with newlines in IO input

    ๐Ÿ“œ The XML lexer (and thus the parser) can now handle inline doctypes containing ๐Ÿ†• newlines when using an IO object as the input. For example:

    <!DOCTYPE html[foo
    bar]>
    

    ๐Ÿ‘€ Previously this would result in incorrect tokens being emitted by the lexer. See commit cbb2815146a79805b8da483d2ef48d17e2959e72 for more information.

  • v0.2.0 Changes

    November 17, 2014

    ๐Ÿ‘ CSS Selector Support

    ๐Ÿš€ Probably the biggest feature of this release: support for querying documents ๐Ÿ‘ using CSS selectors. Oga supports a subset of the CSS3 selector specification, ๐Ÿ‘ in particular the following selectors are supported:

    • Element, class and ID selectors
    • Attribute selectors (e.g. foo[x ~= "y"])

    ๐Ÿ‘ The following pseudo classes are supported:

    • :root
    • :nth-child(n)
    • :nth-last-child(n)
    • :nth-of-type(n)
    • :nth-last-of-type(n)
    • :first-child
    • :last-child
    • :first-of-type
    • :last-of-type
    • :only-child
    • :only-of-type
    • :empty

    You can use CSS selectors using the methods css and at_css on an instance of Oga::XML::Document or Oga::XML::Element. For example:

    document = Oga.parse_xml('<people><person>Alice</person></people>')
    
    document.css('people person') # => NodeSet(Element(name: "person" ...))
    

    ๐Ÿ“œ The architecture behind this is quite similar to parsing XPath. There's a lexer ๐Ÿ“œ (Oga::CSS::Lexer) and a parser (Oga::CSS::Parser). Unlike Nokogiri (and perhaps other libraries) the parser does not output XPath expressions as a String or a CSS specific AST. Instead it directly emits an XPath AST. This ๐Ÿ‘ allows the resulting AST to be directly evaluated by Oga::XPath::Evaluator.

    ๐Ÿ‘€ See https://gitlab.com/yorickpeterse/oga/issues/11 for more information.

    ๐Ÿ‘ Mutli-line Attribute Support

    ๐Ÿ“œ Oga can now lex/parse elements that have attributes with newlines in them. Previously this would trigger memory allocation errors.

    ๐Ÿ‘€ See https://gitlab.com/yorickpeterse/oga/issues/58 for more information.

    SAX after_element

    ๐Ÿ“œ The after_element method in the SAX parsing API now always takes two arguments: the namespace name and element name. Previously this method would always receive a single nil value as its argument, which is rather pointless.

    ๐Ÿ‘€ See https://gitlab.com/yorickpeterse/oga/issues/54 for more information.

    XPath Grouping

    XPath expressions can now be grouped together using parenthesis. This allows one to specify a custom operator precedence.

    ๐Ÿ“œ Enumerator Parsing Input

    ๐Ÿ“œ Enumerator instances can now be used as input for Oga.parse_xml and friends. ๐Ÿ“œ This can be used to download and parse XML files on the fly. For example:

    enum = Enumerator.new do |yielder|
      HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
        yielder << chunk
      end
    end
    
    document = Oga.parse_xml(enum)
    

    ๐Ÿ‘€ See https://gitlab.com/yorickpeterse/oga/issues/48 for more information.

    Removing Attributes

    ๐Ÿšš Element attributes can now be removed using Oga::XML::Element#unset:

    element = Oga::XML::Element.new(:name => 'foo')
    
    element.set('class', 'foo')
    element.unset('class')
    

    XPath Attributes

    XPath predicates are now evaluated for every context node opposed to being evaluated once for the entire context. This ensures that expressions such as descendant-or-self::node()/foo[1] are evaluated correctly.

    Available Namespaces

    When calling Oga::XML::Element#available_namespaces the Hash returned by Oga::XML::Element#namespaces would be modified in place. This was a bug that ๐Ÿš€ has been fixed in this release.

    NodeSets

    NodeSet instances can now be compared with each other using ==. Previously this would always consider two instances to be different from each other due to 0๏ธโƒฃ the usage of the default Object#== method.

    XML Entities

    XML entities such as &amp; and &lt; are now encoded/decoded by the lexer, string and text nodes.

    ๐Ÿ‘€ See https://gitlab.com/yorickpeterse/oga/issues/49 for more information.

    General

    Source lines are no longer included in error messages generated by the XML ๐Ÿšš parser. This simplifies the code and removes the need of re-reading the input (in case of IO/Enumerable inputs).

    XML Lexer Newlines

    ๐Ÿ†• Newlines in the XML lexer are now counted in native code (C/Java). On MRI and ๐Ÿ’Ž JRuby the improvement is quite small, but on Rubinius it's a massive ๐Ÿ‘Œ improvement. See commit 8db77c0a09bf6c996dd2856a6dbe1ad076b1d30a for more information.

    ๐ŸŽ HTML Void Element Performance

    ๐ŸŽ Performance for detecting HTML void elements (e.g. <br> and <link>) has been ๐Ÿ‘Œ improved by removing String allocations that were not needed.

  • v0.1.3 Changes

    September 24, 2014

    ๐Ÿš€ This release fixes a problem with serializing attributes using the namespace ๐Ÿ‘€ prefix "xmlns". See https://gitlab.com/yorickpeterse/oga/issues/47 for more information.

  • v0.1.2 Changes

    September 23, 2014

    SAX API

    ๐Ÿ“œ A SAX parser/API has been added. This API is useful when even the overhead of ๐Ÿ“œ the pull-parser is too much memory wise. Example:

    class ElementNames
      attr_reader :names
    
      def initialize
        @names = []
      end
    
      def on_element(namespace, name, attrs = {})
        @names << name
      end
    end
    
    handler = ElementNames.new
    
    Oga.sax_parse_xml(handler, '<foo><bar></bar></foo>')
    
    handler.names # => ["foo", "bar"]
    

    Racc Gem

    Oga will now always use the Racc gem instead of the version shipped with the ๐Ÿ’Ž Ruby standard library.

    Error Reporting

    ๐Ÿ“œ XML parser errors have been made a little bit more user friendly, though they can still be quite cryptic.

    Serializing Elements

    Elements serialized to XML/HTML will use self-closing tags whenever possible. ๐Ÿ“œ When parsing HTML documents only HTML void elements will use self-closing tags (e.g. <link> tags). Example:

    Oga.parse_xml('<foo></foo>').to_xml        # => "<foo />"
    Oga.parse_html('<script></script>').to_xml # => "<script></script>"
    

    0๏ธโƒฃ Default Namespaces

    ๐Ÿšš Namespaces are no longer removed from the attributes list when an element is created.

    0๏ธโƒฃ Default XML namespaces can now be registered using xmlns="...". Previously this would be ignored. Example:

    document = Oga.parse_xml('<root xmlns="baz"></root>')
    root     = document.children[0]
    
    root.namespace # => Namespace(name: "xmlns" uri: "baz")
    

    Lexing Incomplete Input

    Oga can now lex input such as </ without entering an infinite loop. Example:

    Oga.parse_xml('</') # => Document(children: NodeSet(Text("</")))
    

    Absolute XPath Paths

    ๐Ÿ“œ Oga can now parse and evaluate the XPath expression "/" (that is, just "/"). This will return the root node (usually a Document instance). Example:

    document = Oga.parse_xml('<root></root>')
    
    document.xpath('/') # => NodeSet(Document(children: NodeSet(Element(name: "root"))))
    

    Namespace Ordering

    Namespaces available to an element are now returned in the correct order. Previously outer namespaces would take precedence over inner namespaces, instead of it being the other way around. Example:

    document = Oga.parse_xml <<-EOF
    <root xmlns:foo="bar">
      <container xmlns:foo="baz">
        <foo:text>Text!</foo:text>
      </container>
    </root>
    EOF
    
    foo = document.at_xpath('root/container/foo:text')
    
    foo.namespace # => Namespace(name: "foo" uri: "baz")
    

    ๐Ÿ“œ Parsing Capitalized HTML Void Elements

    ๐Ÿ“œ Oga is now capable of parsing capitalized HTML void elements (e.g. <BR>). ๐Ÿ“œ Previously it could only parse lower-cased void elements. Thanks to Tero Tasanen for fixing this. Example:

    Oga.parse_html('<BR>') # => Document(children: NodeSet(Element(name: "BR")))
    

    ๐Ÿšš Node Type Method Removed

    ๐Ÿšš The node_type method has been removed and its purpose has been moved into ๐Ÿ“œ the XML::PullParser class itself. This method was solely used by the pull ๐Ÿ“œ parser to provide shorthands for node classes. As such it doesn't make sense to ๐Ÿ”ฆ expose this as a method to the outside world as a public method.

  • v0.1.1 Changes

    September 13, 2014

    ๐Ÿš€ This release fixes a problem where element attributes were not separated by spaces. Thanks to Jonathan Rochkind for reporting it and Bill Dueber providing an initial patch for this problem.

  • v0.1.0 Changes

    September 12, 2014

    ๐Ÿš€ The first public release of Oga. This release contains support for parsing XML, ๐Ÿ“œ basic support for parsing HTML, support for querying documents using XPath and more.