Time saver with page scraping.
Say you just need one bit of information off the web such as the latest football score. That means you have to start your browser, use a bookmark (or manually type in the url), and then finally scan the page for the information you need. In another case, you are logged into a server via the command line where no web browser is available. Page scraping is the answer to your needs. This may not be the best example, but say you want your horoscope for the day. All you need to do is to fire up a terminal if you are not already in a terminal and just type a few letters and boom there it is. Here is the page we want to get the data from:
You can use just a small bit of program code to solve the problem (Use your favorite editor to create the program and the datafile and then make ghp executable with "chmod +x ghp").
ghp:
[code]
# Get today's horoscope
echo "--------------------------------------------"
# character width
cw=60
hsign=$1
hsign="`echo $hsign|tr '[a-z]' '[A-Z]'`"
cat $hsign
echo -n "Today's date: "
date +%D
echo "Today's horoscope for:"
lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/horoscopes-by-holiday.html" | grep $hsign | fold -sw $cw
echo "--------------------------------------------"
[/code]
You need a data file for the logo. Actually you will need to do all twelve logos (ARIES - PISCES).
VIRGO:
Then just run the program. (if you were using a mouse, you could set up a launcher or shortcut to MSWindows users to automatically run the program for you.)
You can then run this every day to get your horoscope or even someone else's like you boss or partner. Your home work is to research the grep command. See: http://www.instructables.com/id/Web-page-scraping-via-Linux/ for more info. Have fun.
(Note: this was done using Ubuntu 10.04, you will need "bash" compatible software on MSWindows for this to work. You could also use other programming languages such as VB, PHP, and a host of others to do the same thing).
You can use just a small bit of program code to solve the problem (Use your favorite editor to create the program and the datafile and then make ghp executable with "chmod +x ghp").
ghp:
[code]
# Get today's horoscope
echo "--------------------------------------------"
# character width
cw=60
hsign=$1
hsign="`echo $hsign|tr '[a-z]' '[A-Z]'`"
cat $hsign
echo -n "Today's date: "
date +%D
echo "Today's horoscope for:"
lynx -width 1000 -dump "http://www.creators.com/lifestylefeatures/horoscopes/horoscopes-by-holiday.html" | grep $hsign | fold -sw $cw
echo "--------------------------------------------"
[/code]
You need a data file for the logo. Actually you will need to do all twelve logos (ARIES - PISCES).
VIRGO:
Then just run the program. (if you were using a mouse, you could set up a launcher or shortcut to MSWindows users to automatically run the program for you.)
You can then run this every day to get your horoscope or even someone else's like you boss or partner. Your home work is to research the grep command. See: http://www.instructables.com/id/Web-page-scraping-via-Linux/ for more info. Have fun.
(Note: this was done using Ubuntu 10.04, you will need "bash" compatible software on MSWindows for this to work. You could also use other programming languages such as VB, PHP, and a host of others to do the same thing).
Update: New version that works with the web and you do not have to remember all the commands. You can find it at: http://www.instructables.com/id/Web-page-scraping-fromto-a-web-page/. Though it is rather remedial for good web master, it leaves the door open to do more advanced projects.
Comments
Post a Comment