/extras/perl/site_perl/lwpcook.pod
Unknown | 309 lines | 212 code | 97 blank | 0 comment | 0 complexity | bef2208a32baf1f10b0614dea7437094 MD5 | raw file
1=head1 NAME 2 3lwpcook - The libwww-perl cookbook 4 5=head1 DESCRIPTION 6 7This document contain some examples that show typical usage of the 8libwww-perl library. You should consult the documentation for the 9individual modules for more detail. 10 11All examples should be runnable programs. You can, in most cases, test 12the code sections by piping the program text directly to perl. 13 14 15 16=head1 GET 17 18It is very easy to use this library to just fetch documents from the 19net. The LWP::Simple module provides the get() function that return 20the document specified by its URL argument: 21 22 use LWP::Simple; 23 $doc = get 'http://www.linpro.no/lwp/'; 24 25or, as a perl one-liner using the getprint() function: 26 27 perl -MLWP::Simple -e 'getprint "http://www.linpro.no/lwp/"' 28 29or, how about fetching the latest perl by running this command: 30 31 perl -MLWP::Simple -e ' 32 getstore "ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz", 33 "perl.tar.gz"' 34 35You will probably first want to find a CPAN site closer to you by 36running something like the following command: 37 38 perl -MLWP::Simple -e 'getprint "http://www.perl.com/perl/CPAN/CPAN.html"' 39 40Enough of this simple stuff! The LWP object oriented interface gives 41you more control over the request sent to the server. Using this 42interface you have full control over headers sent and how you want to 43handle the response returned. 44 45 use LWP::UserAgent; 46 $ua = LWP::UserAgent->new; 47 $ua->agent("$0/0.1 " . $ua->agent); 48 # $ua->agent("Mozilla/8.0") # pretend we are very capable browser 49 50 $req = HTTP::Request->new(GET => 'http://www.linpro.no/lwp'); 51 $req->header('Accept' => 'text/html'); 52 53 # send request 54 $res = $ua->request($req); 55 56 # check the outcome 57 if ($res->is_success) { 58 print $res->decoded_content; 59 } 60 else { 61 print "Error: " . $res->status_line . "\n"; 62 } 63 64The lwp-request program (alias GET) that is distributed with the 65library can also be used to fetch documents from WWW servers. 66 67 68 69=head1 HEAD 70 71If you just want to check if a document is present (i.e. the URL is 72valid) try to run code that looks like this: 73 74 use LWP::Simple; 75 76 if (head($url)) { 77 # ok document exists 78 } 79 80The head() function really returns a list of meta-information about 81the document. The first three values of the list returned are the 82document type, the size of the document, and the age of the document. 83 84More control over the request or access to all header values returned 85require that you use the object oriented interface described for GET 86above. Just s/GET/HEAD/g. 87 88 89=head1 POST 90 91There is no simple procedural interface for posting data to a WWW server. You 92must use the object oriented interface for this. The most common POST 93operation is to access a WWW form application: 94 95 use LWP::UserAgent; 96 $ua = LWP::UserAgent->new; 97 98 my $req = HTTP::Request->new(POST => 'http://www.perl.com/cgi-bin/BugGlimpse'); 99 $req->content_type('application/x-www-form-urlencoded'); 100 $req->content('match=www&errors=0'); 101 102 my $res = $ua->request($req); 103 print $res->as_string; 104 105Lazy people use the HTTP::Request::Common module to set up a suitable 106POST request message (it handles all the escaping issues) and has a 107suitable default for the content_type: 108 109 use HTTP::Request::Common qw(POST); 110 use LWP::UserAgent; 111 $ua = LWP::UserAgent->new; 112 113 my $req = POST 'http://www.perl.com/cgi-bin/BugGlimpse', 114 [ search => 'www', errors => 0 ]; 115 116 print $ua->request($req)->as_string; 117 118The lwp-request program (alias POST) that is distributed with the 119library can also be used for posting data. 120 121 122 123=head1 PROXIES 124 125Some sites use proxies to go through fire wall machines, or just as 126cache in order to improve performance. Proxies can also be used for 127accessing resources through protocols not supported directly (or 128supported badly :-) by the libwww-perl library. 129 130You should initialize your proxy setting before you start sending 131requests: 132 133 use LWP::UserAgent; 134 $ua = LWP::UserAgent->new; 135 $ua->env_proxy; # initialize from environment variables 136 # or 137 $ua->proxy(ftp => 'http://proxy.myorg.com'); 138 $ua->proxy(wais => 'http://proxy.myorg.com'); 139 $ua->no_proxy(qw(no se fi)); 140 141 my $req = HTTP::Request->new(GET => 'wais://xxx.com/'); 142 print $ua->request($req)->as_string; 143 144The LWP::Simple interface will call env_proxy() for you automatically. 145Applications that use the $ua->env_proxy() method will normally not 146use the $ua->proxy() and $ua->no_proxy() methods. 147 148Some proxies also require that you send it a username/password in 149order to let requests through. You should be able to add the 150required header, with something like this: 151 152 use LWP::UserAgent; 153 154 $ua = LWP::UserAgent->new; 155 $ua->proxy(['http', 'ftp'] => 'http://username:password@proxy.myorg.com'); 156 157 $req = HTTP::Request->new('GET',"http://www.perl.com"); 158 159 $res = $ua->request($req); 160 print $res->decoded_content if $res->is_success; 161 162Replace C<proxy.myorg.com>, C<username> and 163C<password> with something suitable for your site. 164 165 166=head1 ACCESS TO PROTECTED DOCUMENTS 167 168Documents protected by basic authorization can easily be accessed 169like this: 170 171 use LWP::UserAgent; 172 $ua = LWP::UserAgent->new; 173 $req = HTTP::Request->new(GET => 'http://www.linpro.no/secret/'); 174 $req->authorization_basic('aas', 'mypassword'); 175 print $ua->request($req)->as_string; 176 177The other alternative is to provide a subclass of I<LWP::UserAgent> that 178overrides the get_basic_credentials() method. Study the I<lwp-request> 179program for an example of this. 180 181 182=head1 COOKIES 183 184Some sites like to play games with cookies. By default LWP ignores 185cookies provided by the servers it visits. LWP will collect cookies 186and respond to cookie requests if you set up a cookie jar. 187 188 use LWP::UserAgent; 189 use HTTP::Cookies; 190 191 $ua = LWP::UserAgent->new; 192 $ua->cookie_jar(HTTP::Cookies->new(file => "lwpcookies.txt", 193 autosave => 1)); 194 195 # and then send requests just as you used to do 196 $res = $ua->request(HTTP::Request->new(GET => "http://www.yahoo.no")); 197 print $res->status_line, "\n"; 198 199As you visit sites that send you cookies to keep, then the file 200F<lwpcookies.txt"> will grow. 201 202=head1 HTTPS 203 204URLs with https scheme are accessed in exactly the same way as with 205http scheme, provided that an SSL interface module for LWP has been 206properly installed (see the F<README.SSL> file found in the 207libwww-perl distribution for more details). If no SSL interface is 208installed for LWP to use, then you will get "501 Protocol scheme 209'https' is not supported" errors when accessing such URLs. 210 211Here's an example of fetching and printing a WWW page using SSL: 212 213 use LWP::UserAgent; 214 215 my $ua = LWP::UserAgent->new; 216 my $req = HTTP::Request->new(GET => 'https://www.helsinki.fi/'); 217 my $res = $ua->request($req); 218 if ($res->is_success) { 219 print $res->as_string; 220 } 221 else { 222 print "Failed: ", $res->status_line, "\n"; 223 } 224 225=head1 MIRRORING 226 227If you want to mirror documents from a WWW server, then try to run 228code similar to this at regular intervals: 229 230 use LWP::Simple; 231 232 %mirrors = ( 233 'http://www.sn.no/' => 'sn.html', 234 'http://www.perl.com/' => 'perl.html', 235 'http://www.sn.no/libwww-perl/' => 'lwp.html', 236 'gopher://gopher.sn.no/' => 'gopher.html', 237 ); 238 239 while (($url, $localfile) = each(%mirrors)) { 240 mirror($url, $localfile); 241 } 242 243Or, as a perl one-liner: 244 245 perl -MLWP::Simple -e 'mirror("http://www.perl.com/", "perl.html")'; 246 247The document will not be transferred unless it has been updated. 248 249 250 251=head1 LARGE DOCUMENTS 252 253If the document you want to fetch is too large to be kept in memory, 254then you have two alternatives. You can instruct the library to write 255the document content to a file (second $ua->request() argument is a file 256name): 257 258 use LWP::UserAgent; 259 $ua = LWP::UserAgent->new; 260 261 my $req = HTTP::Request->new(GET => 262 'http://www.linpro.no/lwp/libwww-perl-5.46.tar.gz'); 263 $res = $ua->request($req, "libwww-perl.tar.gz"); 264 if ($res->is_success) { 265 print "ok\n"; 266 } 267 else { 268 print $res->status_line, "\n"; 269 } 270 271 272Or you can process the document as it arrives (second $ua->request() 273argument is a code reference): 274 275 use LWP::UserAgent; 276 $ua = LWP::UserAgent->new; 277 $URL = 'ftp://ftp.unit.no/pub/rfc/rfc-index.txt'; 278 279 my $expected_length; 280 my $bytes_received = 0; 281 my $res = 282 $ua->request(HTTP::Request->new(GET => $URL), 283 sub { 284 my($chunk, $res) = @_; 285 $bytes_received += length($chunk); 286 unless (defined $expected_length) { 287 $expected_length = $res->content_length || 0; 288 } 289 if ($expected_length) { 290 printf STDERR "%d%% - ", 291 100 * $bytes_received / $expected_length; 292 } 293 print STDERR "$bytes_received bytes received\n"; 294 295 # XXX Should really do something with the chunk itself 296 # print $chunk; 297 }); 298 print $res->status_line, "\n"; 299 300 301 302=head1 COPYRIGHT 303 304Copyright 1996-2001, Gisle Aas 305 306This library is free software; you can redistribute it and/or 307modify it under the same terms as Perl itself. 308 309