Sunday, September 20, 2015

Command line search.

 
           ______  _     _  ______   _______  ______   _______  _  _  _
          / _____)(_)   (_)(_____ \ (_______)(_____ \ (_______)(_)(_)(_)
         ( (____   _     _  _____) ) _____    _____) ) _______  _  _  _
          \____ \ | |   | ||  __  / |  ___)  |  __  / |  ___  || || || |
          _____) )| |___| || |  \ \ | |      | |  \ \ | |   | || || || |
         (______/  \_____/ |_|   |_||_|      |_|   |_||_|   |_| \_____/
         Surfraw - Shell Users' Revolutionary Front Rage Against the Web

Have done a lot of web page scraping, but never was aware of a tool called surfraw. What surfraw does is takes your browser with a command line argument of your search criteria and retreives everything via the command line. You do not have to load the browser, go to the search engine and enter in the search criteria. Saves a lot of keystrokes and the results can be easily manipulated. For example say you wanted to know in wikipedia about pregnancy.

$ surfraw wikipedia what is etopic pregnancy


or 

$ sr ask why am i pregnant?

So the basic format is:

 $ surfraw [elvi] [search criteria]

Possible elvi choices are:
codesearch    -- Search source code using Google Code Search (www.google.fr/codesearch)
comlaw        -- Search Australian Law using Comlaw (www.comlaw.gov.au)
ctan        -- Search the Comprehensive TeX Archive Network (ctan.org)
currency    -- Convert currencies with the Universal Currency Converter (www.xe.net/ucc)
cve        -- Search for CAN assignments in CVE
debbugs        -- Search the debian BTS (bugs.debian.org)
debcontents    -- Search contents of debian/ubuntu packages (packages.debian.org/packages.ubuntu.com)
deblists    -- Search debian mailing lists (lists.debian.org/search.html)
deblogs        -- Show changelogs for a package in Debian main (changelogs.debian.net)
debpackages    -- Search debian/ubuntu packages (packages.debian.org/packages.ubuntu.com)
debpkghome    -- Visit the home page for a Debian package
debpts        -- Search the Debian Package Tracking System (packages.qa.debian.org)
debsec        -- Search the Debian Security Tracker for CVE ids or package names
debvcsbrowse    -- Browse the VCS repository for a Debian package
debwiki        -- Search the Debian Wikis (wiki.debian.org & women.debian.org/wiki)
deja        -- Search usenet using Google Groups (groups.google.com)
deli        -- Search Delicious bookmarks
discogs        -- Search the Discogs database of music information (www.discogs.com)
dmoz        -- Search the Open Directory Project web directory (dmoz.org)
duckduckgo    -- Securely search the web using duckduckgo (www.duckduckgo.com)
ebay        -- Search the Ebay auction site
etym        -- Look up word origins at www.etymonline.com
excite        -- Search on Excite (www.excite.com)
finkpkg        -- Search Fink packages (pdb.finkproject.org)
foldoc        -- The Free On-Line Dictionary Of Computing (foldoc.org)
freebsd        -- Search FreeBSD related information (www.freebsd.org)
freedb        -- Search for cd track listings in FreeDB (www.freedb.org)
freshmeat    -- Search Freshmeat (www.freshmeat.net)
fsfdir        -- Search the FSF/UNESCO Free Software Directory (directory.fsf.org)
gcache        -- Search the web using Google cache (www.google.com)
genbugs        -- Search the Gentoo bug tracker (bugs.gentoo.org)
genportage    -- Search gentoo-portage.com for packages
google        -- Search the web using Google (www.google.com)
gutenberg    -- Search for books on Project Gutenberg (gutenberg.org)
happypenguin    -- Search the Linux Game Tome (www.happypenguin.org)
imdb        -- Search the Internet Movie Database (www.imdb.com)
ixquick        -- Search the web using ixquick [HTTPS] (www.ixquick.com)
jamendo        -- Search Jamendo: free music with Creative Commons licenses (www.jamendo.com)
javasun        -- Search Java API docs (java.sun.com)
l1sp        -- Search lisp documentation
lastfm        -- Search last.fm
leodict        -- Search Leo's German <-> English dictionary (dict.leo.org)
lsm        -- Search the Linux Software Map
macports    -- Search macports packages (macports.org)
mathworld    -- Search Wolfram MathWorld
mininova    -- Search the mininova bittorent source.
musicbrainz    -- Search MusicBrainz (musicbrainz.org)
netbsd        -- Search NetBSD related information (www.netbsd.org)
ntrs        -- Search the NASA Technical Report Server
openbsd        -- Search OpenBSD related information (www.openbsd.org)
openports    -- search openports for OpenBSD packages
opensearch    -- Search an OpenSearch-enabled website
pasearch    -- Search the unofficial Penny Arcade archives (pipefour.org/pa)
pgpkeys        -- Search the PGP key database
piratebay    -- Search thepiratebay.org for torrents
pubmed        -- Search medical/molbio databases (www.ncbi.nlm.nih.gov)
rae        -- Busca en el diccionario de la Real Academia de la Lengua Española (Spanish Dictionary)
rfc        -- Search RFCs (internet standards documents)
rhyme        -- Search for rhymes et al using Lycos Rhyme (rhyme.lycos.com)
rpmsearch    -- Search for RPMs in various distros
scholar        -- Search Google Scholar (scholar.google.com)
scicom        -- Search Scientific Commons
scirus        -- Search for science using Scirus (scirus.com)
scitopia    -- Search for science with scitopia.org
scpan        -- Search the Comprehensive Perl Archive Network (search.cpan.org)
scroogle    -- Search Google anonymously via Scroogle (www.scroogle.org)
slashdot    -- Search stories on Slashdot (www.slashdot.org)
slinuxdoc    -- Search entries in LDP (www.linuxdoc.org)
sourceforge    -- Search SourceForge (www.sourceforge.net)
springer    -- Search Springer for Books and Articles
stack        -- Search Stack Overflow
stockquote    -- Get a single stock quote (multiple providers)
sunonesearch    -- Search Sun One Search (onesearch.sun.com)
thesaurus    -- Look up word in Merriam-Webster's Thesaurus (www.m-w.com)
translate    -- Translate human languages (various providers)
urban        -- Search urbandictionary.com for a definition
W        -- Activate Surfraw defined web-browser
w3css        -- Validate a CSS URL with the w3c CSS validator (jigsaw.w3.org/css-validator)
w3html        -- Validate a web page URL with the w3c validator (validator.w3.org)
w3link        -- Check web page links with the w3c linkchecker (validator.w3.org/checklink)
w3rdf        -- Validate a RDF URL with the w3c RDF validator (validator.w3.org)
wayback        -- Search The Internet Archive's Wayback Machine for a URL (archive.org)
webster        -- Look up word in Merriam-Webster's Dictionary (www.m-w.com)
wetandwild    -- Real time weather information (many sources)
wikipedia    -- Search the free encyclopedia wikipedia
woffle        -- Search the web using Woffle (localhost:8080)
worldwidescience    -- Search for science with www.worldwidescience.org
yahoo        -- Search Yahoo categories (www.yahoo.com)
yandex        -- Search the web using Yandex (yandex.ru)
youtube        -- Search YouTube (www.youtube.com)
yubnub        -- Use the social command-line for the web (yubnub.org)

Installation:

Arch: $ sudo pacman -S surfraw
Debian: $ sudo apt-get install surfraw

Configuration

Surfraw gets its configuration from three sources, in order:
  1. Environment variables
  2. /etc/surfraw.conf
  3. $HOME/.surfraw.conf
/etc/surfraw.conf and $HOME/.surfraw.conf are both fragments of bourne-shell style shell script.
/etc/surfraw.conf should use def and defyn to define variables. These functions set variables unless they are already set by the environment. defyn is used for boolean configuration variables, def for all others. For instance:
 def     SURFRAW_text_browser /usr/bin/lynx
 defyn   SURFRAW_graphical  no
$HOME/.surfraw.conf should use sh-style entries, eg:
 SURFRAW_text_browser=/usr/bin/lynx
 SURFRAW_graphical=no
This is because you want them to override environment variables unconditionally.

No comments:

Post a Comment