Piculet¶
Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library, which makes it very easy to integrate into applications. It also provides a command line interface.
Piculet is used for the parsers of the IMDbPY project.
Getting started¶
Piculet works with Python 3.7 and later versions.
You can install it using pip
:
pip install piculet
Installing Piculet creates a script named piculet
which can be used
to invoke the command line interface:
$ piculet -h
usage: piculet [-h] [--version] [--html] (-s SPEC | --h2x)
For example, say you want to extract some data from the file shining.html. An example specification is given in movie.json. Download both of these files and run the command:
$ cat shining.html | piculet -s movie.json
Getting help¶
The documentation is available on: https://tekir.org/piculet/
The source code can be obtained from: https://github.com/uyar/piculet
License¶
Copyright (C) 2014-2022 H. Turgut Uyar <uyar@tekir.org>
Piculet is released under the LGPL license, version 3 or later. Read the included LICENSE.txt file for details.