- Minor enhancements
- Added Mechanize#ignore_bad_chunking for working around servers that don't terminate chunked transfer-encoding properly. Enabling this may cause data loss. Issue #116
- Removed content-type check from Mechanize::Page allowing forced parsing of incorrect or missing content-types. Issue #221 by GarthSnyder
- 🐛 Bug fixes
- Fixed typos in EXAMPLES and GUIDES. Pull Request #213 by Erkan Yilmaz.
- Fixed handling of a quoted content-disposition size. Pull Request #220 by Jason Rust
- Mechanize now ignores a missing gzip footer like browsers do. Issue #224 by afhbl
- Mechanize handles saving of files with the same name better now. Pull Request #223 by Godfrey Chan, Issue #219 by Jon Hart
- Mechanize now sends headers across redirects. Issue #215 by Chris Gahan
- Mechanize now raises Mechanize::ResponseReadError when the server does not terminate chunked transfer-encoding properly. Issue #116
- Mechanize no longer raises an exception when multiple identical radiobuttons are checked. Issue #214 by Matthias Guenther
- Fixed documentation for pre_connect_hooks and post_connect_hooks. Issue #226 by Robert Poor
- Worked around ruby 1.8 run with -Ku and ISO-8859-1 encoded characters in URIs. Issue #228 by Stanislav O.Pogrebnyak
- Minor enhancements
- 🔒 Security fix:
Mechanize#auth and Mechanize#basic_auth allowed disclosure of passwords to malicious servers and have been deprecated.
In prior versions of mechanize only one set of HTTP authentication credentials were allowed for all connections. If a mechanize instance connected to more than one server then a malicious server detecting mechanize could ask for HTTP Basic authentication. This would expose the username and password intended only for one server.
Mechanize#auth and Mechanize#basic_auth now warn when used.
To fix the warning switch to Mechanize#add_auth which requires the URI the credentials are intended for, the username and the password. Optionally an HTTP authentication realm or NTLM domain may be provided.
- Minor enhancement
- Improved exception messages for 401 Unauthorized responses. Mechanize now tells you if you were missing credentials, had an incorrect password, etc.
- Add support for the Max-Age attribute in the Set-Cookie header.
- Added Mechanize::Download#body for compatibility with Mechanize::File when using Mechanize#get_file with Mechanize::Image or other Download-based pluggable parser. Issue #202 by angas
- Mechanize#max_file_buffer may be set to nil to disable creation of Tempfiles.
🐛 Bug fixes
- Applied Mechanize#max_file_buffer to the Content-Encoding handlers as well to prevent extra Tempfiles for small gzip or deflate response
- Increased the default Mechanize#max_file_buffer to 100,000 bytes. This gives ~5MB of response bodies in memory with the default history setting of 50 pages (depending on GC behavior).
- Ignore empty path/domain attributes.
- Cookies with an empty Expires attribute value were stored as session cookies but cookies without the Expires attribute were not. Issue #78
- 🐛 Bug fixes
- Add missing file to the gem, ensure that missing files won't cause failures again. Issue #201 by Alex
- Fix minor grammar issue in README. Issue #200 by Shane Becker.
- 🐛 Bug fixes
- MetaRefresh#href is not normalized to an absolute URL, but set to the original value and resolved later. It is even set to nil when the Refresh URL is unspecified or empty.
- Expose ssl_version from net-http-persistent. Patch by astera.
- SSL parameters and proxy may now be set at any time. Issue #194 by dsisnero.
- Improved Mechanize::Page with #image_with and #images_with and Mechanize::Page::Image various img element attribute accessors, #caption, #extname, #mime_type and #fetch. Pull request #173 by kitamomonga
- Added MIME type parsing for content-types in Mechanize::PluggableParser for fine-grained parser choices. Parsers will be chosen based on exact match, simplified type or media type in that order. See Mechanize::PluggableParser#=.
- Added Mechanize#download which downloads a response body to an IO-like or filename.
- Added Mechanize::DirectorySaver which saves responses in a single directory. Issue #187 by yoshie902a.
- Added Mechanize::Page::Link#noreferrer?
- The documentation for Mechanize::Page#search and #at now show that both XPath and CSS expressions are allowed. Issue #199 by Shane Becker.
🐛 Bug fixes
- Fixed handling of a HEAD request with Accept-Encoding: gzip. Issue #198 by Oleg Dashevskii
- Use #resolve for resolving a Location header value. fixes #197
- A Refresh value can have whitespaces around the semicolon and equal sign.
- MetaRefresh#click no longer sends out Referer.
- A link with an empty href is now resolved correctly where previously the query part was dropped.
- 🐛 Bug fixes
- Set missing idle_timeout default. Issue #196
- Meta refresh URIs are now escaped (excluding %). Issue #177
- Fix charset name extraction. Issue #180
- A Referer URI sent on request no longer includes user information or fragment part.
- Tempfiles for storing response bodies are unlinked upon creation to avoid possible lack of finalization. Issue #183
- The default maximum history size is now 50 pages to avoid filling up a disk with tempfiles accidentally. Related to Issue #183
- Errors in bodies with deflate and gzip responses now result in a Mechanize::Error instead of silently being ignored and causing future errors. Issue #185
- Mechanize now raises an UnauthorizedError instead of crashing when a 403 response does not contain a www-authenticate header. Issue #181
- Mechanize gives a useful exception when attempting to click buttons across pages. Issue #186
- Added note to Mechanize#cert_store describing how to add certificates in case your system does not come with a default set. Issue #179
- Invalid content-disposition headers are now ignored. Issue #191
- Fix NTLM by recognizing the "Negotiation" challenge instead of endlessly looping. Issue #192
- Allow specification of the NTLM domain through Mechanize#auth. Issue #193
- Documented how to convert a Mechanize::ResponseReadError into a File or Page, along with a new method #force_parse. Issue #176
- 🐛 Bug fixes
- Mechanize#get no longer accepts an options hash.
- Mechanize::Util::to_native_charset has been removed.
- Mechanize now depends on net-http-persistent 2.3+. This new version brings idle timeouts to help with the dreaded "too many connection resets" issue when POSTing to a closed connection. Issue #123
- SSL connections will be verified against the system certificate store by default.
- Added Mechanize#retry_change_requests to allow mechanize to retry POST and other non-idempotent requests when you know it is safe to do so. Issue #123
- Mechanize can now stream files directly to disk without loading them into memory first through Mechanize::Download, a pluggable parser for downloading files.
All responses larger than Mechanize#max_file_buffer are downloaded to a Tempfile. For backwards compatibility Mechanize::File subclasses still load the response body into memory.
To force all unknown content types to download to disk instead of memory set:
agent.pluggable_parser.default = Mechanize::Download
- Added Mechanize#content_encoding_hooks which allow handling of non-standard content encodings like "agzip". Patch #125 by kitamomonga
- Added dom_class to elements and the element matcher like dom_id. Patch #156 by Dan Hansen.
- Added support for the HTML5 keygen form element. See http://dev.w3.org/html5/spec/Overview.html#the-keygen-element Patch #157 by Victor Costan.
- Mechanize no longer follows meta refreshes that have no "url=" in the content attribute to avoid infinite loops. To follow a meta refresh to the same page set Mechanize#follow_meta_refresh_self to true. Issue #134 by Jo Hund.
- Updated 'Mac Safari' User-Agent alias to Safari 5.1.1. 'Mac Safari 4' can be used for the old 'Mac Safari' alias.
- When given multiple HTTP authentication options mechanize now picks the strongest method.
- Improvements to HTTP authorization:
- mechanize raises Mechanize::UnathorizedError for 401 responses which is a sublcass of Mechanize::ResponseCodeError.
- Added support for NTLM authentication, but this has not been tested.
- Mechanize::Cookie.new accepts attributes in a hash.
- Mechanize::CookieJar#<<(cookie) (alias: add!) is added. Issue #139
- Different mechanize instances may now have different loggers. Issue #122
- Mechanize now accepts a proxy port as a service name or number string. Issue #167
🐛 Bug fixes
- Mechanize now handles cookies just as most modern browsers do, roughly based on RFC 6265.
- domain=.example.com (which is invalid) is considered identical to domain=example.com.
- A cookie with domain=example.com is sent to host.sub.example.com as well as host.example.com and example.com.
- A cookie with domain=TLD (no dots) is accepted and sent if the host name is TLD, and rejected otherwise. To retain compatibility and convention, host/domain names starting with "local" are exempt from this rule.
- A cookie with no domain attribute is only sent to the original host.
- A cookie with an Effective TLD is rejected based on the public suffix list. (cf. http://publicsuffix.org/)
- "Secure" cookies are not sent via non-https connection.
- Subdomain match is not performed against an IP address.
- It is recommended that you clear out existing cookie jars for regeneration because previously saved cookies may not have been parsed correctly.
- Mechanize takes more care to avoid saving files with certain unsafe names. You should still take care not to use mechanize to save files directly into your home directory ($HOME). Issue #163.
- Mechanize#cookie_jar= works again. Issue #126
- The original Referer value persists on redirection. Issue #150
- Do not send a referer on a Refresh header based redirection.
- Fixed encoding error in tests when LANG=C. Patch #142 by jinschoi.
- The order of items in a form submission now match the DOM order. Patch #129 by kitamomonga
- Fixed proxy example in EXAMPLE. Issue #146 by NielsKSchjoedt
✅ Mechanize now uses minitest to avoid 1.9 vs 1.8 assertion availability in ✅ test/unit
- 🐛 Bug Fixes
- Restored Mechanize#set_proxy. Issue #117, #118, #119
- Mechanize::CookieJar#load now lazy-loads YAML. Issue #118
- Mechanize#keep_alive_time no longer crashes but does nothing as net-http-persistent does not support HTTP/1.0 keep-alive extensions.
- 🐛 Bug Fixes
Mechanize is now under the MIT license
- WWW::Mechanize has been removed. Use Mechanize.
- Pre connect hooks are now called with the agent and the request. See Mechanize#pre_connect_hooks.
- Post connect hooks are now called with the agent and the response. See Mechanize#post_connect_hooks.
- Mechanize::Chain is gone, as an internal API this should cause no problems.
- Mechanize#fetch_page no longer accepts an options Hash.
- Mechanize#put now accepts headers instead of an options Hash as the last argument
- Mechanize#delete now accepts headers instead of an options Hash as the last argument
- Mechanize#request_with_entity now accepts headers instead of an options Hash as the last argument
- Mechanize no longer raises RuntimeError directly, Mechanize::Error or ArgumentError are raised instead.
- The User-Agent header has changed. It no longer includes the WWW- prefix and now includes the ruby version. The URL has been updated as well.
- Mechanize now requires ruby 1.8.7 or newer.
- Hpricot support has been removed as webrobots requires nokogiri.
- Mechanize#get no longer accepts the referer as the second argument.
- Mechanize#get no longer allows the HTTP method to be changed (:verb option).
- Mechanize::Page::Meta is now Mechanize::Page::MetaRefresh to accurately depict its responsibilities.
- Mechanize::Page#meta is now Mechanize::Page#meta_refresh as it only contains meta elements with http-equiv of "refresh"
- Mechanize::Page#charset is now Mechanize::Page::charset. GH #112, patch by Godfrey Chan.
- Mechanize#get with an options hash is deprecated and will be removed after October, 2011.
- Mechanize::Util::to_native_charset is deprecated as it is no longer used by Mechanize.
🆕 New Features
- Add header reference methods to Mechanize::File so that a reponse object gets compatible with Net::HTTPResponse.
- Mechanize#click accepts a regexp or string to click a button/link in the current page. It works as expected when not passed a string or regexp.
- Provide a way to only follow permanent redirects (301) automatically: agent.redirect_ok = :permanent GH #73
- Mechanize now supports HTML5 meta charset. GH #113
- Documented various Mechanize accessors. GH #66
- Mechanize now uses net-http-digest_auth. GH #31
- Mechanize now implements session cookies. GH #78
- Mechanize now implements deflate decoding. GH #40
- Mechanize now allows a certificate and key to be passed directly. GH #71
- Mechanize::Form::MultiSelectList now implements #option_with and #options_with. GH #42
- Add Mechanize::Page::Link#rel and #rel?(kind) to read and test the rel attribute.
- Add Mechanize::Page#canonical_uri to read a tag.
- Add support for Robots Exclusion Protocol (i.e. robots.txt) and nofollow/noindex in meta tags and the rel attribute. Automatic exclusion can be turned on by setting: agent.robots = true
- Manual robots.txt test can be performed with Mechanize#robots_allowed? and #robots_disallowed?.
- Mechanize::Form now supports the accept-charset attribute. GH #96
- Mechanize::ResponseReadError is raised if there is an exception while reading the response body. This allows recovery from broken HTTP servers (or connections). GH #90
- Mechanize#follow_meta_refresh set to :anywhere will follow meta refresh found outside of a document's head. GH #99
- Add support for HTML5's rel="noreferrer" attribute which indicates no "Referer" information should be sent when following the link.
- A frame will now load its content when #content is called. GH #111
- Added Mechanize#default_encoding to provide a default for pages with no encoding specified. GH #104
- Added Mechanize#force_default_encoding which only uses Mechanize#default_encoding for parsing HTML. GH #104
🐛 Bug Fixes:
- Fixed a bug where Referer is not sent when accessing a relative URI starting with "http".
- Fix handling of Meta Refresh with relative paths. GH #39
- Mechanize::CookieJar now supports RFC 2109 correctly. GH #85
- Fixed typo in EXAMPLES.rdoc. GH #74
- The base element is now handled correctly for images. GH #72
- Image buttons with no name attribute are now included in the form's button list. GH#56
- Improved handling of non ASCII-7bit compatible characters in links (only an issue on ruby 1.8). GH #36, GH #75
- Loading cookies.txt is faster. GH #38
- Mechanize no longer sends cookies for a.b.example to axb.example. GH #41
- Mechanize no longer sends the button name as a form field for image buttons. GH #45
- Blank cookie values are now skipped. GH #80
- Mechanize now adds a '.' to cookie domains if no '.' was sent. This is not allowed by RFC 2109 but does appear in RFC 2965. GH #86
- file URIs are now read in binary mode. GH #83
- Content-Encoding: x-gzip is now treated like gzip per RFC 2616.
- Mechanize now unescapes URIs for meta refresh. GH #68
- Mechanize now has more robust HTML charset detection. GH #43
- Mechanize::Form::Textarea is now created from a textarea element. GH #94
- A meta content-type now overrides the HTTP content type. GH #114
- Mechanize::Page::Link#uri now handles both escaped and unescaped hrefs. GH #107
🆕 New Features:
- An optional verb may be passed to Mechanize#get GH #26
- The WWW constant is deprecated. Switch to the top level constant Mechanize
- SelectList#option_with and options_with for finding options
🐛 Bug Fixes:
- Rescue errors from bogus encodings
- 7bit content-encoding support. Thanks sporkmonger! GH #2
- Fixed a bug with iconv conversion. Thanks awesomeman! GH #9
- meta redirects outside the head are not followed. GH #13
- Form submissions work with nil page encodings. GH #25
- Fixing default values with serialized cookies. GH #3
- Checkboxes and fields are sorted by page appearance before submitting. #11