Index


RISC World

Programming in PERL

Programing interactive sites #1 - a "crash" course in Perl programming By Richard Goodwin.

These days simply creating a static web page no longer seems to be enough. Everyone wants information to be up to the minute, interactive and as detailed as possible. So how do you go about creating such content without paying a fortune to a programmer? Fortunately there are a number of solutions to web programming that are simple enough for most people to pick up without needing a degree in computer science.

The main contenders are Perl, PHP and ASP (VBScript), which can be used in a variety of ways - for instance as stand-alone programs, embedded inside a HTML page, or a combination of the two (embedding the output of a program in the middle of a HTML page). If you can edit HTML tags then you should have no problems writing useful web pages in these languages; if you can program simple programs in BASIC then you're more than qualified. In many ways these languages are Web designer BASIC - in fact ASP files are usually written in Visual BASIC script (VBScript). They're much easier to get into than, for instance, JavaScript, which is only a scripting language. Here I'll be talking about PERL, with PHP and hopefully ASP to follow.

Rest assured that I will not be attempting to turn you all into programmers - I'll provide useful snippets of code and example programs that you can use on your own sites with a minimum of modification. If you're interested in dabbling in Perl - or already write your own code - then the code here might be of interest to add to your own library of code. Perl can be used to do some very complex things, and is used by system administrators as well as professional web designers to run many aspects of the Internet, but it has the virtue of being very easy to get into. As such I'll draw some comparisons with the usual beginner's language on RISC OS machines, BBC BASIC. Perl also "borrows" from other languages such as C, but if you've taken the time to learn C I'll assume you don't need someone like me pointing out how to do the basics.

If you want to test the following programs, you can download a copy of Perl for RISC OS from the following address: http://www.perl.com/CPAN-local/ports/#acorn

Although the RISC OS version of Perl does not have some of the more advanced extra features that a Unix implementation might have - for instance it doesn't appear to have the LWP module as standard (which allows you to fetch files over the Internet), which is pretty standard on Unix implementations - it's still a pretty fully-featured version of Perl and in general you should be able to test your programs on your desktop. It will help if you have a RISC OS web server as well, such as Netplex or WebJames, which will allow you to see the result of your programs in a web browser without the expense of staying online while you're testing your program.

On your RISC OS machine, all you need to do is make sure that the file is settyped to &102 (Perl) and double-click on it like you would to run any other program - so long as the main Perl application has been found it'll run your program in the same way as a BASIC program, with any output appearing in a white box in the middle of the screen. If you're using a Web server see the instructions that came with that program. If you're uploading (or writing) on a Unix machine, such as your ISP or university's web server, then you need to make sure it's executable - that is, the server needs to know it's supposed to be run. To do this enter the command

chmod 777 <filename>
where <filename> is, obviously, the name you gave the program when you saved it. 777 means that anyone can do anything to the file - read, write and run it - so you should probably use 755 on a public web server (everyone can read and run it, but only you can alter it). FTPc can set permissions for you using the Site option.

The "Hello World" program

#!/usr/bin/perl
print "Hello World!\n";
 
That's all there is to it!

The first line - #!/usr/bin/perl - tells the computer you're running the program on where to find a copy of Perl, and it's almost always /usr/bin/perl on Unix systems. On RISC OS of course this isn't needed at all - settyping a file allows the parent program, be it Impression or Perl, to react to someone double-clicking on a file - but you'll need it when you eventually move your code onto the ISP/university web server. On PCs #!perl is usually used.

The second line - print "Hello World!\n"; - is, fairly obviously, a print statement that outputs Hello World! However, there are a couple of items that are worth pointing out. The \n part means that a linefeed is added to the end of the text. This is handy if, for instance, you're outputting HTML and want to examine the source - it's better if it's not all in one lump - but vital for certain later operations. The semi-colon on the very end of the line tells Perl that this is the end of the command. You can split commands over multiple lines, or have more than one on a line, but if you forget to put that semi-colon at the end of each command then they'll run together and you'll get errors. Note also that you need speech marks around text; if you want to have speech marks inside a piece of text you need to add a \ character in front, such as
print "say \"hello\" folks!";

If you want to get a little more fancy, here's a loop:
for ($loop=1;$loop<=10;$loop++) {
  print "$loop. Hello World!\n";
}
this will print Hello World! ten times, with a number in front of it. There are a few interesting things introduced here.

$loop is a variable - it holds a number, but could equally hold a piece of text. In BASIC you'd expect to find loop$ being used to hold text, and loop% for numbers, but in Perl you just use $variable_name for any type of variable. It can even convert between numbers and text automatically depending on what you try to do with it. If you are a BASIC programmer you might be interested to note that in the print statement the variable is inside the speechmarks along with the plain text - that's fine, and more of that later.

The for loop is almost exactly the same as a FOR...NEXT loop in BASIC; it's just that all the attributes are kept within brackets, and are separated by semi-colons. This loop starts at one ($loop=1) and will continue while $loop is less than or equal to ten ($loop<=10). It goes up in increments of one, which is the last command: $loop++. If you wanted the loop to go backwards - for instance from ten down to one - you could use something like this:
for ($loop=10;$loop>=1;$loop--) {...}
or
for ($loop=10;$loop>0;$loop=$loop-1) {...}
$loop++ adds one to $loop, and $loop-- decreases it by one, but you could jump in bigger steps using $loop=$loop+2 or the like - this is a slightly different way of writing the STEP command in BASIC. Notice how after the loop there's a curly bracket, the next line's indented and then on the final line there's a closing curly bracket? These curly brackets are the standard way you write some form of condition in Perl - a loop, a condition (for example, if... elsif... else... endif), or a sub procedure. You don't have to indent the lines inside the brackets, or put the brackets on the same line as the condition, but I write my code like that so it's easier to read.

Web Pages - First Steps

#!/usr/bin/perl
$date=`date`;
print "Content-type: text/html\n\n";
print "The data is $date\n";
 
It might not look much like a web page (Oregano won't display it properly!), but this is the first step towards adding content to your site that isn't just static. The $date=`date`; line is a bit of a cheat - the back-quotes (`) mean that Perl should run the program called date and return the results to the $date variable. If that sounds complicated then forget it, we'll be looking at better ways of getting the date later. It also probably won't work on your RISC OS desktop machine!

The important part is the line print "Content-type: text/html\n\n"; - this lets the world know that you're sending HTML content out. If you're viewing the content with a web browser, or including this as part of an existing HTML page, the program that's handling it won't know what to do with this data unless you say what content you are sending. Notice that there's a double linefeed (\n\n) at the end - this is used to say the details are ending, and the content will begin below. You can have other commands - for instance, to set cookies - in these headers, but there must always be a double linefeed afterwards or you'll get an error from your web browser.

The output from this program will look something like this:

Fri Jan 12 15:21:52 GMT 2001
I think that's a little long-winded, so we could just chop this down a bit:
#!/usr/bin/perl
$date=`date`;
$date=substr($date,0,19);
print "Content-type: text/html\n\n";
print "$date\n";
The substr($date,0,19) command takes $date, starts at the first character (0) and keeps a certain number of characters (19). In BASIC this is the same as the MID$ command, except you start counting from zero not one. So, you should just get
Fri Jan 12 15:21:52

However, you're still getting the same lame layout, just without the timezone and year on the end. Let's take control.
 
#!/usr/bin/perl
  $hour=(localtime(time))[2];
$minute=(localtime(time))[1];
$second=(localtime(time))[0];
print "Content-Type: text/html\n\n";
print "$hour:$minute.$second\n";
 
This will print a 24 hour clock. The Unix system has pretty good time facilities; but if you write a program to print time; you'll see that it usually returns a big number, not something human readable. This number is the number of seconds since 1am on January 1st 1970 (which was a Thursday apparently!) - not very useful you might think, hence the localtime and gmtime commands to convert it to something more useful. localtime takes into account the current timezone, including daylight savings time - useful for printing the current time if your target audience is a human, but if you want to use the current time and date as part of a unique ID - perhaps as a filename or serial number for a piece of data - then you want to use gmtime. Why? Think what might happen when the clocks go back...

(As it stands the program above isn't quite right - it will miss off the leading zeroes in front of the numbers (for instance 12:00.00 with come out as 12:0.0). To get the best out of this code, add some formatting commands:
  $hour=sprintf ("%-.2d",((localtime(time))[2]));
$minute=sprintf ("%-.2d",((localtime(time))[1]));
$second=sprintf ("%-.2d",((localtime(time))[0]));
...for a version that forces two digit output)

localtime can return nine elements -
($second, $minute, $hour, $monthdate, $month, $year, $weekday, $yearday, $is_daylightsavings) = localtime(time);
- which is why you use $hour=(localtime(time))[2]; meaning give me option 2 from the result of localtime (again, start counting at 0). However, as all results are numeric, and some of them start counting from zero, you probably won't want to print them straight out. So, here's some code that sets up a human-readable date too:
 
$monthdate=(qw(0 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st 22nd 23rd 24th 25th 26th 27th 28th 29th 30th 31st))[(localtime(time))[3]];

$month=(qw(January February March April May June July August September October November December))[(localtime(time))[4]];

$year=1900+(localtime(time))[5];

$weekday=(qw(Sunday Monday Tuesday Wednesday Thursday Friday Saturday))[(localtime(time))[6]];

$date="$weekday, $monthdate $month $year";

As you can see, one of Perl's strengths - and some might say weaknesses - is that There's Always More Than One Way To Do It(tm).

File Handling

Being able to handle files is very important if you want to write simple Perl scripts - as a language it was designed to process textual documents, and part of that is being able to load and save that text. Until you reach the dizzy heights of being able to build and maintain a database system (which generally isn't necessary for small amounts of data) then storing data in text files will allow you to get at information quickly and easily, be it a list of vistors' comments, a batch of news stories, or the data for a web counter.

Again there's several ways of dealing with files in Perl, but I'll stick to the tried and tested easy ways.

#!/usr/bin/perl
$text="";
open FILEHANDLE, "<datafile.txt"
     or die("Unable to open the file for reading");
while ($line=) {
  $text.=$line;
}
close FILEHANDLE;
print $text;

File reading in Perl

First of all we start off by setting up $text to be blank (="") just so that, later on, we're not operating on a variable Perl's not seen before - it can still cope with this, but it's bad programming practice (and if we were using stricter error reporting it'd generate a warning). Next we open a file for reading - the open command, not surprisingly. This takes two parameters, a file handle and the name of the file. The file handle can be any bit of text, but it's usually written in CAPITALS just so you can tell it from a command when you're looking through your code. The file name has a < sign on the front of it to tell Perl that you want to open the file for reading; a > sign is to open a file for writing, and a double - >> - would mean you're appending to the end of a file (if the file doesn't exist it'll just create it without giving an error). You can see another command tacked on to the end - the die statement means that if the file can't be opened for whatever reason the program will report an error and then stop.

The next couple of lines will load the entire file in and store it in $text. The while ($line=) basically means that while anything can be read from the file attached to FILEHANDLE, it will be read into $line. I've used $line as the variable name as this method loads the file in a line at a time, with the happy side effect that it doesn't matter if the file was created with PC-style line endings, by the time you get to use it it'll have been altered so that only Unix (and RISC OS) type line endings are present. The $text.=$line; part means that $line is added to the end of $text; the dot in this context always means "add to this variable", in the same way that in BASIC you can have number%+=4 instead of number%=number%+4 (equally in Perl you could write $text=$text.$line for this textual addition). By the time that this program is finished, the entire contents of the file will be contained in $text.

Finally we close the file - not strictly necessary as Perl will do this when it finishes running the program, but you should get in the habit in case you want to access the same file more than once in the same program.

To compare and contrast, if you were writing the same code in BASIC you'd use something like the following:
text$=""
filehandle%=OPENIN "ADFS::4.$.file"
IF filehandle%<1 THEN PRINT "Unable to open file for reading":END
REPEAT
  IF NOT EOF#filehandle% THEN
    line$=GET$#filehandle%
    text$=text$+line$+CHR$(10)+CHR$(13)
  ENDIF
UNTIL EOF#filehandle%
CLOSE#filehandle%
PRINT text$

The same file reading code in BASIC

Of course this code is not very useful in BASIC as you only have 255 characters to play with - if the text file is a long one you'll get an error, and if you got around that (by printing the line immediately rather than adding it to another variable) you'd get linefeeds in the middle of long paragraphs as BASIC chops your paragraphs into 255 character chunks. This is one reason I've switched to Perl for text handling jobs - as well as superior pattern matching and text changing abilities (more of which later), you don't have to worry about memory allocation, variable lengths and so on.

In the examples above there's a simple print command to show us the content of the file on screen, but with the addition of the content header, you could output it to a web page:

#!/usr/bin/perl
$text="";
open FILEHANDLE, "<datafile.txt" or die("Unable to open the file for reading");
while ($line=) {
  $text.=$line;
}
close FILEHANDLE;
print "Content-type: text/html\n\n";
print "<pre>";
print $text;
print "</pre>";

This version has the mime type header to denote that this is HTML content, and puts <pre> tags around the text so that a text file can be displayed as typewriter-style formatted text. However, you could create a web page, cut it up into several pieces, and then be able to produce a CGI program (or several) that use these files to output something with the same look and feel as the rest of your site. For instance, have a standard header file, a standard footer file, and put your dynamic CGI content in the middle like the meat in a sandwich!

Here's a program to write data to a file:

#!/usr/bin/perl
$text="";
open OUTPUT, ">datafile2.txt" or die("Unable to open the file for writing");
print OUTPUT "some text\n";
print OUTPUT "some more text\n";
print OUTPUT "even more text\n";
close OUTPUT;

Okay, so it outputs a pretty boring file, but you get the picture. The print statement has OUTPUT - the file handle name I've used this time - on each line before the actual text to be sent to the file. That print command's pretty useful huh? You can use it for printing to the screen, to a web page, and now a file! Notice that it doesn't have a comma between the file handle and the content to be printed - you can comma separate more than one piece of text to be printed so having one after the file handle would confuse it greatly.

It can get a bit boring if you have a few hundred lines of HTML to print out, having to have the file handle on every line, so here's a bit of a cheat:

#!/usr/bin/perl
$text="";
open OUTPUT, ">datafile2.txt" or die("Unable to open the file for writing");
select OUTPUT;
print "some text\n";
print "some more text\n";
print "even more text\n";
select STDOUT;
close OUTPUT;

By selecting OUTPUT you're telling Perl that you want everything you're about to print to go to that file handle. When you're done you then select the standard output "file handle" called STDOUT (which, of course, stands for STandarD OUTput); I know it's not strictly a file but that's how it works. There's an even cooler trick you can use to get rid of the print statements too:

print <<END;
Mary had a little bear
It was the loving kind
And every where that Mary went
She had a bear behind.
END

Here you have one print statement that says "continue printing everything that follows until you find the end marker", which in this case happens to be the word END in capital letters (which has to come at the very start of a line). It doesn't have to be the word END, and it doesn't have to be in caps, but it's easy to see that way.

Finally, you can use a very simple test to see if a file exists: -e as in
if (-e "file.txt") {
...
}
-r is to check if the file's readable, and -w checks for writable.

Taking User Input

Now you know how to print, load and save data it's time to get some input from your visitors. You'll probably already be familiar with forms, which is how the information is entered and sent off; now you can see the other side, how that data is received and used.

Fortunately there's a very easy way to find out what a user's sending to your program - you can use a Perl module, a chunk of code already provided, to do all of the hard work. The module in question is called CGI, and can basically do pretty much everything you need to construct a web page, but there's a LOT to learn if you start down that road. For instance, consider this program:

#!/usr/bin/perl
 use CGI;                                # load CGI routines
 $html = new CGI;                        # create new CGI object
 print $html->header,                    # create the HTTP header
       $html->start_html('hello world'), # start the HTML
       $html->h1('hello world'),         # level 1 header
       $html->end_html;                  # end the HTML
it's a very simple way to construct a web page using very few commands, but personally I find it a bit remote, and as the CGI Perl module is so large it could easily be the subject of a whole other article or series of articles. So, we'll just be using a simple part of this module.

#!/usr/bin/perl
use CGI qw(param);
$name=param("name");

This tells Perl to load the bit of the CGI module we need - the (param) part; and then it looks for a parameter called "name" and store it in $name. That's all you need to collect the input from a simple form, and it has the benefit of working for both POST and GET method forms.

There now follows an example web page containing a form, and a simple CGI to respond to the data sent via the form.


<html><head><title>Form</title></head>
<body text="#000000" bgcolor="#FFFFFF">

<form method="GET" action="/cgi-bin/getname">
Enter your name: <input type="text" name="name">
<br><input type="submit">
</form>

</body>
</html>


#!/usr/bin/perl
use CGI qw(param);
$name=param("name");

print <<END;
Content-type: text/html

<html><head><title>Form Response</title></head>
<body text="#000000" bgcolor="#FFFFFF">

<h1>Hello, $name!</h1>
Hi!  You said that your name was $name, and I believe you!

</body></html>
END


Of course there's no check being done to make sure that something valid was entered, so you might want to make sure:

#!/usr/bin/perl
use CGI qw(param);
$name=param("name");

print <<END;
Content-type: text/html

<html><head><title>Form Response</title></head>
<body text="#000000" bgcolor="#FFFFFF">
END

if ($name ne "") {
  print "<h1>Hello, $name!</h1>\n";
  print "Hi! You said that your name was $name!\n";
} else {
  print "<h1>No name entered</h1>\n";
  print "You didn't enter a name - don't you know what it is?\n";
  print "<br><a href=\"/formpage.html\">Click here to go back to form</a>\n";
}

print "</body></html>";

When checking against bits of text you can't use the equals sign, as this is what sets variables; to check, you have to use eq and ne, which mean equals and not-equals. So, in the example above I checked to see if there was anything entered in $name (not-equal to empty - ""). With numbers you'd use == or !=.

if and else were used in that example; to have more than two alternatives you need elsif (note the lack of second "e"). For example:

if ($name eq "Rich") {
  print "Greetings, oh master!\n";
} elsif ($name eq "Tim") {
  print "Wotcha Timmy!\n";
} elsif ($name eq "David") {
  print "Oi, Dave!\n";
} else {
  print "Do I know you?\n";
}

Processing content - Regular Expressions

Regular expressions are a powerful way of looking for patterns within data and acting upon them. In BASIC you really just have the INSTR command to look for a specific piece of text inside a larger bit of text, but in regular expressions you can match variable length bits of arbitrary data such as zero or more letters, or at least one non-space character, or between 2 and 4 numbers.

The basic match is as follows:

$maintext = "this is some text, and here are some numbers: 12345";
if ($maintext =~ m/(\d+)/) {
  print "The numbers were '$1'\n";
}

$variable =~ m// does the match against $variable; \d means a digit (number), the + means there has to be at least one, and the round brackets around the \d+ means that this is a group, the contents of which will be returned in $1 when the match is made. If you have more than one group the results will be available in $2, $3 and so on.

Here's another one:

while ($webpagetext =~ m~(<table.*?</table>)~sgi) {
  # do something with the contents of that table
}

This matches the opening and closing tags of a table, and all the contents in between. Taking the new things in order, notice how I'm using m~~ to do the match this time instead of m// - you can use pretty much any character after the m so long as you match it again at the end, and of course you can't use that character in the pattern match in between without escaping it - putting a \ in front. Basically the / character is used so much in HTML that you'd start to get zig-zig patterns as you write \/ all the time, so I use ~ from the start as it's very rarely used in web pages (apart from some URLs).

.*? is actually three things in one - the full stop (.) means any character; the star (*) means zero or more times; and in this context the question mark means "as soon as possible", which basically means that the first closing </table> rather than any others in the page (this of course means that the match won't work correctly in a page that uses nested tables, but this is just a simple example). Note that there are quite a few characters that are used in a special way, and to use them in plain text you have to escape them - add \ to the front.

Here's a list of special characters available in Perl regular expressions. It's not an exhaustive list but should give you enough to experiment with:
.any character
\sany whitespace character (tabs, linefeeds etc.)
\nlinefeed
\ttab
\dany number
\Dany not-number
\wany charcater valid in words
*zero or more times
+one or more times
?optional
()match group
[]group of characters (such as [a-z] meaning any (lowercase) letter)
{}specific number of times (as in {1,2} for 1 or 2 times, {,4} for up to 4 times, {100,} for at least 100 times)
And don't forget that $, @ and \ characters all have to be escaped because they're used by Perl to mean something else.

After the pattern has been closed, notice the sgi on the end. These modify how the match is done; i means case-insensitive, so it'll match &lt;TABLE as well as <table; because we're in a while loop we use the g option, which means that it will remember the position of the last match and start from after that next time - so you don't keep matching the same piece of text over and over; and the s treats the whole piece of text as one big lump, rather than as a series of lines - this means the match on "." (any character) will also match linefeeds instead of ending a match at the end of a line.

If you want to do substitutions instead of just matches, use the s/// method instead. This can search and replace every instance of a match with the minimum of effort. For instance:
$webpagetext=~ s/Man Utd/Nottm Forest/g;
The g means that it'll search and replace them all, without having to set up a loop!

The following will match all opening and closing <i> tags and replace them with <em> tags instead. Note that there's an optional match for / - (/?) - in the first part, and in the second part \1 (the same as using $1 outside the match) is used to put the value of what was found into the replacement. That means that if <i> is found it's replaced with <em> (because \1 is empty), but </i> is replaced with </em> (because \1 will contain the "/" character).
$webpagetext=~ s~<(/?)i>~<\1em>~gi;

That's quite a lot to take in, and this is probably too large a subject area for this article to cover in any more depth, so here's a usable example:

File latest.txt:

<newsitem>
 <headline>A headline</headline>
 <date>the date</date>
 <author>the author</author>
 <article>some text about the news item</article>
</newsitem>

<newsitem>
 <headline>Another headline</headline>
 <date>another date</date>
 <author>same author</author>
 <article>some more text about another news item</article>
</newsitem>

#!/usr/bin/perl

$text="";
open INPUT, "<latest.txt";
while ($line = <INPUT>) {
  $text.=$line;
}
close INPUT;

while ($text=~ m~<newsitem>(.*?)</newsitem>~sgi){
  $newsitemtext=$1;
  if ($newsitemtext =~ m~ <headline>(.*?)</headline>~si) {
    print "<h2>$1</h2>\n"
  }
  if ($newsitemtext =~ m~ <article>(.*?)</article>~si) {
    print "$1\n\n";
  }
}
What this does is go through the latest.txt file and isolate each news item (anything inside <newsitem>...</newsitem>. It then does a couple more checks to pick out and print the headline and article text from each of these news items. This is essentially how the Icon Bar's news service works (download http://www.iconbar.com/news/latest.txtto see how similar the data is!).

Example code

By now you should have some of the building blocks you need to start writing some of your own programs. At the end of this article I'll set you a couple of tasks, but first, as I feel like I know you can can trust you, here's some of my simple programs that you can use in your own creations.

Random pick

#!/usr/local/bin/perl

print "Content-type: text/html\n\n";
print "<html><head><title>BOFH-style excuses</title></head>\n";
print "<body bgcolor=\"#ffffff\" text=\"#000000\">\n\n";
print "<h1>BOFH Excuse Server</h1>\n";

open (BOFHEXCUSES, "<excuses.txt")
      or die("Sorry, no excuses...");
 while ($line = <BOFHEXCUSES>) {
    chomp $line;
    push @excuses, $line;
 }
close BOFHEXCUSES;

$rnd = int(rand(@excuses));
print "The cause of the problem was<br>\n $excuses[$rnd]\n";
print "</body></html>";

This is a program to load in a file called excuses.txt. The chomp command is used to trim any line feeds off the end of each line, and then that line is stored in an array called excuses - push @excuses, $line; puts the data in $line into the next available slot in the excuses array.

If you do a mathematical type check on @excuses (such as $arraynumber=@excuses) then you get the number of items stored in the array; remember though that if there's ten items, they'll be numbered 0-9. By using the rnd function you get a number between 0 and the number of items in the array (minus a small fraction), and int makes this value an integer, so $rnd should return a whole number that corresponds to an item in the array. $excuses[$rnd] is the way of getting one particular item out of that array - for instance $excuses[0] will get the first item, $excuses[1] the second and so on.

This program is one I've been running on the Acorn Arcade website since the original stopped working (although it did allow me to pinch the data file), at http://www.acornarcade.com/cgi-bin/BOFH; this version loads the data using

open (BOFHEXCUSES, "</virtual/www.acornarcade.com/webpages/ssi/excuses.txt")
(etc.), and so if you go to http://www.acornarcade.com/ssi/excuses.txt you should be able to download this file for yourself.

The upshot is that you have a file containing small bits of data which you load in and pick one at random. In this case it was an "excuse" - for instance something like "It was cosmic rays" or "The Internet hasn't paid it's bill and so it's been cut off", so someone in technical support can have a new plausible excuse for your internet connection not working - and this was displayed straight away. It could be something else like a poem (a random haiku is used on the Acorn Arcade website if you try to access a file that isn't there) or a wise saying (such as the Pearl of Wisdom at the top of the Icon Bar front page); but it could also be a piece of HTML - perhaps to display a random image? - or even a colour (or set of colours) which you can use throughout a web page so that when someone goes to your site for the first time the whole page is in blue, but next time it's all green and so on.

Fetch a file

#!/usr/local/bin/perl
use LWP;

fetch_file("http://www.iconbar.com/news/latest.txt");
print $bodytext;

sub fetch_file {
 $url = $_[0];
 $browser = LWP::UserAgent->new();
 $browser->agent("Mozilla/4.0 (compatible; MSIE 4.01; Windows95)");
 $webpage=$browser->request(HTTP::Request->new(GET => $url));
 $bodytext=$webpage->content;
}

This uses the LWP module (not present in RISC OS Perl but included as standard in other versions) to fetch a page over the Internet. I've made the code to do this into a sub-procedure, so you can see it uses fetch_file("http://www.iconbar.com/news/latest.txt");, which jumps to sub fetch_file { - the start of the procedure called fetch_file. These procedures be sprinkled anywhere within the program, but I put them near the end for easier reading of the code. You can however use the require or use commands at the start of your program to include snippets of code from an external file, which essentially dumps new code into the start of your program, but you can still have procedures in these files.

The underscore array (@_) contains whatever's passed to this procedure, so $url = $_[0]; picks out the first parameter sent to this procedure. You could send more than one parameter - for instance make_headline("Some Text","red","center") would need $text=$_[0]; $colour=$_[1]; and $align=$_[2] to pick them all up, or in the file fetch example you could just use a loop to find out what's in every position in the @_ array and fetch every page requested.

You can see that the script is pretending to be Internet Explorer 4 running on Windows 95 (so that your script doesn't fail for not being the right browser type, or more likely so that you don't show up on someone's web logs for pinching one of their pages!), and that at the very end of the procedure $bodytext is set up to contain the contents of the web page. The procedure ends and we then jump back up to the top of the program, which prints $bodytext (not very useful as the file in question is not HTML, but never mind).

You can also check to see if the fetch was successful by using if($webpage->is_success) as in:
fetch_file("http://www.iconbar.com/news/latest.txt");
if($webpage->is_success) {
  print "I got it!";
} else {
  print "Sorry, fetch failed.";
}

You should then use regular expressions etc. to process the data and print out the bits you need.

Mail Handling

What follows is a complete mail handling form so that people can enter their email address and a short message into a web page form, and the results will be emailed to you. You need a web page with a text input box with the name email (to collect the email address of who's sending the message), and another text entry box (or preferably a text area) called message to get this program working. Oh yes, and don't forget a submit button to send it off! It also requires a program called sendmail to be in the directory /usr/sbin which is where it is on my Unix system (try /usr/lib if it's not found).

#!/usr/bin/perl
use CGI qw(:param);

$owner="webmaster\@richardgoodwin.net";
$subject="message from website";
$from=param("email");
$message=param("message");

# Make sure parameters were defined (they should be,
# even if correct values are not supplied, if you
# have the right inputs in your web page, and the program
# won't fail if you don't do this, but it's better practice.
$from="" unless(defined($from));
$message="" unless(defined($message));

# if the sender did not specify an email address, use this instead
$from="webmaster\@richardgoodwin.net" if ($from eq "");

send_email($from,$owner,$subject,$message);

# Return a web page or the browser will complain -
# it doesn't have to be fancy, but you might want to give
# the user a link back to your site, and maybe jazz it up
# to look more in keeping with the rest of your site's design
print <<EOF;
Content-type: text/html

<html><head><title>Thanks!</title></head>
<body>
<h1>Thanks!</h1>
Thank you for for sending me an email -
your message has been sent on to $owner
and should arrive shortly.
</body></html>
EOF

sub send_email {
$frm=$_[0];
$to=$_[1];
$sub=$_[2];
$msg=$_[3];
open SENDMAIL, "|/usr/sbin/sendmail -oi -t -odq"
     or die("Can't run sendmail\n");
print SENDMAIL <<EOF;
From: $frm
To: $to
Subject: $sub

$msg
-- 
$frm doesn't have a .sig
 
EOF
close SENDMAIL;
}

You can use the send_email procedure to send email in a variety of other ways - to automatically thank people for their messages/requests, to warn you if there's an error in one of your scripts, to be able to send email when you aren't near your email client and so on. Notice how, when you're specifying an email address ($owner in this case) you have to escape the @ sign as it's a special character - otherwise Perl might confuse part of your email address for an array.

Your turn

Now you've got got the basics of Perl, how about trying some of these projects?
  • Write a simple script to display something based on time - check the hour so you can say good morning, good afternoon, good evening and good night at the correct time, or change the background colour to be appropriate to the time of day.
  • Write something to display a random image, perhaps with a link to a page or site appropriate to that image.
  • A simple text web counter - save the counter value to a file, then load it, use $counter++ to increment it, display this in a page and then save the value back to disc
  • Create a simple guestbook - take user input, make sure it isn't blank, and add this to a file (either append it to the end using >> to open the file, or load the file into memory, add the new data to the front - $data=$newdata.$data - and then save it back to disk). Create a new program to display the results. If you're really feeling cocky try doing both receiving and displaying in one program, using a parameter to switch between the two actions (example: $action=param("action"); $action="printform" if ($action ne "submitdata"); ).
  • Think about a more complex versions of the counter and guest book where you make sure the file isn't being accessed by two versions of the program at the same time - can you use locking (either a command, or by saving a temporary file to disc) to make sure any copies of the program know another copy is using it? Can you use the sleep command to try again in a little while? In a while loop so you can try five times and then quit? Use a parameter so more than one website can use your script? Use something like the GD library to create a graphical counter?

Further reading

O'Reilly are the experts in printing quality programming reference books, and the three books most people should take a look at are Learning Perl (a beginners guide), Programming Perl (if you're confident enough to go straight for a "proper" programming book or have spent a while learning Perl and want to move on) and Perl in a Nutshell (a reference of all the commands in Perl and the most commonly installed modules).

The Perl Cookbook has lots of recipes, sorry, useful example code, but if money's tight or you're not sure you want to make that level of commitment then there are some good websites you might want to take a look at first. http://www.perl.com/ is the obvious contender - which also happens to be an O'Reilly production but for free (I really should have asked for commission from O'Reilly shouldnt I? :) The Perl Monks (http://www.perlmonks.org/) is a home for all sorts of Perl-related stuff from code tips and examples to writing Perl poetry(!). And finally the Webmonkey over at Wired has all sorts of good things at http://hotwired.lycos.com/webmonkey/programming/perl_cgi/.

Richard Goodwin

 Index