Getting Started
Get the binary
Download the latest standalone binary for your OS on the release page, and put it in
a location available in your PATH
.
Basic example
Let's create a simple urbandict.lua
scraper for Urban Dictionary. Copy paste the
following command:
cat << 'EOF' > urbandict_demo.lua
sws.seedPages = {
"https://www.urbandictionary.com/define.php?term=Lua"
}
function scrapPage(page, context)
for defIndex, def in page:select("section .definition"):enumerate() do
local word = def:select("h1 a.word"):iter()()
if not word then
word = def:select("h2 a.word"):iter()()
end
if not word then
goto continue
end
word = word:innerHtml()
local contributor = def:select(".contributor"):iter()()
local date = string.match(contributor:innerHtml(), ".*\\?</a>%s*(.*)\\?")
date = sws.Date(date, "%B %d, %Y"):format("%Y-%m-%d")
local meaning = def:select(".meaning"):iter()()
meaning = meaning:innerText():gsub("[\n\r]+", " ")
local example = def:select(".example"):iter()()
example = example:innerText():gsub("[\n\r]+", " ")
if word and date and meaning and example then
local record = sws.Record()
record:pushField(word)
record:pushField(defIndex)
record:pushField(date)
record:pushField(meaning)
record:pushField(example)
context:sendRecord(record)
end
::continue::
end
end
EOF
You can then run it with:
sws crawl --script urbandict_demo.lua
As we have defined sws.seedPages
to be a single page (that is Urban Dictionary's
Lua definition), the scrapPage
function will be run on that single page
only. There are multiple seeding options which are detailed in the Lua scraper - Seed
definition section.
By default the resulting csv file is written to stdout, however the -o
(or
--output-file
) lets us specify a proper output file. Note that this file can be also
be appended or truncated, using the additional flags --append
or --truncate
respectively. See the crawl subcommand section for me details.
Bash completion
You can source the completion script in your ~/.bashrc
file with:
echo 'source <(sws completion)' >> ~/.bashrc