Changes

2.0.0a2 (unreleased)

  • Drop support for Python 3.6.
  • Revert API to OOP style.
  • Move type annotations from stub into source.

2.0.0a1 (2019-07-23)

  • Remove reducing functions; selected texts will always be concatenated (using an optional separator).
  • Convert string normalization and cleaning into transformers.
  • Add support for chaining transformers.
  • Change chaining symbol from “->” to “|”.

2.0.0a0 (2019-06-28)

  • Drop support for Python 2 and 3.4.
  • Add support for absolute XPath queries in ElementTree.
  • Add support for XPath queries that start with a parent axis in ElementTree.
  • Add shorthand notation for path extractors in specification.
  • Cache compiled XPath expressions.
  • Remove HTML charset detection.
  • Command line operations now read only from stdin.
  • Simplify CLI commands.

1.0.1 (2019-02-07)

  • Accept both .yaml and .yml as valid YAML file extensions.
  • Documentation fixes.

1.0 (2018-05-25)

  • Bumped version to 1.0.

1.0b7 (2018-03-21)

  • Dropped support for Python 3.3.
  • Fixes for handling Unicode data in HTML for Python 2.
  • Added registry for preprocessors.

1.0b6 (2018-01-17)

  • Support for writing specifications in YAML.

1.0b5 (2018-01-16)

  • Added a class-based API for writing specifications.
  • Added predefined transformation functions.
  • Removed callables from specification maps. Use the new API instead.
  • Added support for registering new reducers and transformers.
  • Added support for defining sections in document.
  • Refactored XPath evaluation method in order to parse path expressions once.
  • Preprocessing will be done only once when the tree is built.
  • Concatenation is now the default reducing operation.

1.0b4 (2018-01-02)

  • Added “–version” option to command line arguments.
  • Added option to force the use of lxml’s HTML builder.
  • Fixed the error where non-truthy values would be excluded from the result.
  • Added support for transforming node text during preprocess.
  • Added separate preprocessing function to API.
  • Renamed the “join” reducer as “concat”.
  • Renamed the “foreach” keyword for keys as “section”.
  • Removed some low level debug messages to substantially increase speed.

1.0b3 (2017-07-25)

  • Removed the caching feature.

1.0b2 (2017-06-16)

  • Added helper function for getting cache hash keys of URLs.

1.0b1 (2017-04-26)

  • Added optional value transformations.
  • Added support for custom reducer callables.
  • Added command-line option for scraping documents from local files.

1.0a2 (2017-04-04)

  • Added support for Python 2.7.
  • Fixed lxml support.

1.0a1 (2016-08-24)

  • First release on PyPI.