Web::Scraper

February 27, 2011

Lately I’ve been using the nifty Web::Scraper by the prolific Tatsuhiko Miyagawa. It exposes a compact DSL of three words, process, process_first and result, to scrape sites based on XPath expressions or CSS selectors:


use Data::Dumper;
use Web::Scraper;
use URI;

# Find the <title> element and place its content
# in a hash reference with the key 'title'

my $scraper = scraper {
process '//title', title => 'TEXT';
};

my $data = $scraper->scrape(
URI->new('http://www.google.com')
);

warn Dumper $data;

Gets you

$VAR1 = {
'title' => 'Google'
};

You can also have it return an arrayref for an element, pass in callbacks, or nest scrapers.

I forked the code and added a hashref option, nice when paired with a callback that returns a hash.

scraper (a disguised constructor) and the keywords are exported into the caller’s namespace. If the potential for collision concerns you, wrap up scraper instantiation in a separate module.

The documentation is a bit thin, so in addition check out Miyagawa’s presentation slides.

Advertisements

First foray into Moose

February 26, 2011

As a startup’s sole Perl programmer I have been feeling the pressure to get things done more quickly. Whipping up a codebase from scratch, or from a chicken-scratch prototype, is very different from propping up a brittle hulking legacy codebase. I began to feel limited by Perl’s object model simply because of all the keystroking and the attention to implementation details. So yesterday I gave Moose a little try.

A well-designed toolkit always gives the new user one very useful thing right away, a small but immediate reward for the effort of trying out a new way of doing things. Moose does this with the keyword has. Describe your attributes in excruciating detail and poof! you’ve done away with mounds of keystroking. Very nice. Already a quick little boost before lunchtime, and this time not from my coffee cup.

I also noticed another side benefit. Writing classes declaratively is an aide to thinking more abstractly. The little word has puts the emphasis directly on the interface. A language and its user have a symbiotic relationship–the user writes in the language and the language guides the user’s thinking. Moose enhances Perl’s ability to lead, follow or get out of the way at the right times by letting you shift gears to greater abstraction. I found this very pleasant.


!=

February 25, 2011

Information is not knowledge. Knowledge is not wisdom. Wisdom is not truth. Truth is not beauty. Beauty is not love. Love is not Perl. Perl is THE BEST.

D’apr├Ęs Frank Zappa, who originally said this of music.


Still around

February 25, 2011

This past year I emigrated to France and married another Perl hacker. I recently got a new job as a Perl programmer, this time in Paris, so I’ll post again from time to time.