Controlling Firefox from Perl with MozRepl

June 24, 2012

On Friday I wrote my first program using AnyEvent and Coro, and it was so nifty that I decided to revive my moribund blog to describe it. This little program solved a whole raft of problems.

The problem involved determining certain properties of each of a list of URLs and updating a database. The rub is that some properties, such as redirects, are most reliably observed in the browser, and some, such whether the URL is hosted at $WORK, best determined using $WORK’s Perl modules. Now how can we put these two together to get all the info we need?

The approach I hit on involves using Perl to drive an instance of Firefox configured with some helpful extensions. MozRepl is a cool extension that lets you telnet into Firefox and program it from the inside, with access to the entire browser and the Mozilla API. NetExport is an extension to the extension Firebug which generates HAR (HTTP archive) files capturing all the data necessary for the analysis of front-end performance. HAR files serve as input to the command-line versions of PageSpeed and YSlow.

NetExport exports its results either to file or via HTTP POST. So in my program I start an embedded HTTP server, fire up Firefox, telnet into it, load a URL, and let the httpd capture and process the JSON posted by NetExport. Think of the setup as making Firefox puke into a bucket set out for that purpose. A bit imagé but you get the idea.

How do AnyEvent and Coro enter the picture? I use AnyEvent condition variables to coordinate the work between page loading with MozRepl and output handing with the web server, and a Coro thread to tell the web server to kindly stand over there while I continue with my main line of work. The hardest part is reorienting one’s brain towards the asynchronous way of thinking, not so easy after a lifetime of vanilla scripting.

So let’s see some code. The main script is delightfully short and sweet; the annotated version follows. The name of my employer has been changed to protect the innocent. And of course I use strict and warnings.


use AnyEvent;
use Coro;
use WORK::Config;
use WORK::AnyEvent::HTTPD; 
use WORK::AnyEvent::HTTPD::Handler::NetExport;  
use WORK::AnyEvent::MozRepl;
use Log::Log4perl qw(:easy);

my $cfg = WORK::Config::get_config();

# AnyEvent condition variable 

my $cv  = \$WORK::AnyEvent::HTTPD::Handler::NetExport::cv;

my @urls = qw(http://www.google.com http://www.yahoo.com);

# Tell NetExport where to post its results.
my $beacon = "http://localhost:9090/netexport";

# Handlers to process POSTed results.

my $h = WORK::AnyEvent::HTTPD::Handler::NetExport->new;

# Fire up a server and ask it to step out of the way.
async { 
    start_httpd($h);
}

# Fire up FF with good ol' system().

start_firefox();

# Connect to MozRepl with AnyEvent::Socket

mozrepl_connect();

# Set the POST URL and turn on auto export.

set_netexport_prefs($beacon);

while (@urls) {

    $$cv = AnyEvent->condvar; 

    my $url = shift @urls;

    load_page( $url ); #  Using MozRepl

    $$cv->recv;   # The POST handler will send().
    
    clear_cache(); 
}

kill_firefox();  # From within MozRepl.

Neat huh? There’s lots of blanks to fill in, but as this post is already getting a bit long, I will do that in posts to follow.