Changes

2.0 (2025-11-24)

  • Complete rewrite.

2.0.0a1 (2019-07-23)

  • Remove reducing functions; selected texts will always be concatenated (using an optional separator).

  • Convert string normalization and cleaning into transformers.

  • Add support for chaining transformers.

  • Change chaining symbol from “->” to “|”.

2.0.0a0 (2019-06-28)

  • Drop support for Python 2 and 3.4.

  • Add support for absolute XPath queries in ElementTree.

  • Add support for XPath queries that start with a parent axis in ElementTree.

  • Add shorthand notation for path extractors in specification.

  • Cache compiled XPath expressions.

  • Remove HTML charset detection.

  • Command line operations now read only from stdin.

  • Simplify CLI commands.

1.0.1 (2019-02-07)

  • Accept both .yaml and .yml as valid YAML file extensions.

  • Documentation fixes.

1.0 (2018-05-25)

  • Bumped version to 1.0.

1.0b7 (2018-03-21)

  • Dropped support for Python 3.3.

  • Fixes for handling Unicode data in HTML for Python 2.

  • Added registry for preprocessors.

1.0b6 (2018-01-17)

  • Support for writing specifications in YAML.

1.0b5 (2018-01-16)

  • Added a class-based API for writing specifications.

  • Added predefined transformation functions.

  • Removed callables from specification maps. Use the new API instead.

  • Added support for registering new reducers and transformers.

  • Added support for defining sections in document.

  • Refactored XPath evaluation method in order to parse path expressions once.

  • Preprocessing will be done only once when the tree is built.

  • Concatenation is now the default reducing operation.

1.0b4 (2018-01-02)

  • Added “–version” option to command line arguments.

  • Added option to force the use of lxml’s HTML builder.

  • Fixed the error where non-truthy values would be excluded from the result.

  • Added support for transforming node text during preprocess.

  • Added separate preprocessing function to API.

  • Renamed the “join” reducer as “concat”.

  • Renamed the “foreach” keyword for keys as “section”.

  • Removed some low level debug messages to substantially increase speed.

1.0b3 (2017-07-25)

  • Removed the caching feature.

1.0b2 (2017-06-16)

  • Added helper function for getting cache hash keys of URLs.

1.0b1 (2017-04-26)

  • Added optional value transformations.

  • Added support for custom reducer callables.

  • Added command-line option for scraping documents from local files.

1.0a2 (2017-04-04)

  • Added support for Python 2.7.

  • Fixed lxml support.

1.0a1 (2016-08-24)

  • First release on PyPI.