Piculet

Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library, which makes it very easy to integrate into applications. It also provides a command line interface.

Piculet is used for the parsers of the IMDbPY project.

Getting started

Piculet works with Python 3.7 and later versions. You can install it using pip:

pip install piculet

Installing Piculet creates a script named piculet which can be used to invoke the command line interface:

$ piculet -h
usage: piculet [-h] [--version] [--html] (-s SPEC | --h2x)

For example, say you want to extract some data from the file shining.html. An example specification is given in movie.json. Download both of these files and run the command:

$ cat shining.html | piculet -s movie.json

Getting help

The documentation is available on: https://tekir.org/piculet/

The source code can be obtained from: https://github.com/uyar/piculet

License

Copyright (C) 2014-2022 H. Turgut Uyar <uyar@tekir.org>

Piculet is released under the LGPL license, version 3 or later. Read the included LICENSE.txt file for details.

Indices and Tables