Subcommand: scrap
Scrap a single remote page or multiple local pages
Usage: sws scrap [OPTIONS] --script <SCRIPT> <--url <URL>|--files <GLOB>>
Options:
-s, --script <SCRIPT> Path to the Lua script that defines scraping logic
--url <URL> A distant html page to scrap
--files <GLOB> A glob pattern to select local files to scrap
-o, --output-file <OUTPUT_FILE> Optional file that will contain scraped data, stdout otherwise
--append Append to output file
--truncate Truncate output file
--num-workers <NUM_WORKERS> Set the number of CPU workers when scraping local files
--on-error <ON_ERROR> Scrap error handling strategy when scraping local files [possible values: fail, skip-and-log]
-q, --quiet Don't output logs
-h, --help Print help information
The parameters --url
and --files
are mutually exclusive (only one can be specified).
This subcommand is meant to either:
-
Quickly test a Lua script on a given URL (with
--url
) -
Process HTML pages that have been previously stored on disk (with
--files
)