| ||||||||||||||||||||||||||||||||
IBM home | Products & services | Support & downloads | My account |
|
Cultured Perl: One-liners 101 | ||||
Perl as a command-line utility
Those who use Perl as a programming language frequently forget that it is just as useful as a quick and dirty scripting engine for command-line operations. From the command line Perl can accomplish, in just a single line, tasks that require pages of code in most other languages. Join Teodor as he takes you through some useful examples. In order to complete this how-to, you'll need to have Perl 5.6.0 installed on your system. Preferably, your system should be a recent (2000 or later) Linux or Unix installation, but other operating systems may work as well. The examples all use the tcsh shell (though bash and others will work too). Although these examples may work with earlier versions of Perl, Linux, and other operating systems, if they fail, their failure to function should be considered an exercise for the reader to solve. The first point I'd like to make is that quick and dirty solutions shouldn't be shunned by the experienced programmer. In other columns I have emphasized documentation and thoroughness. This column will concentrate on the dark side of programming, where documentation is optional and caffeine isn't. We've all been there. The second point, just as important as the first, is that quick and dirty solutions are hard to do right. If you know how to document, test, and debug a complete script, you have a much greater chance of succeeding at one-liners. If you don't, this will be like trying to cut down a redwood tree with a herring (your skills being the herring). For the first step, you should learn your shell's peculiarities, the way that Unix passes command-line arguments to Perl and Perl's interpretation of those arguments. The essentials of the command line Processes get command-line arguments. So "perl" and "perl -w" for instance are two different invocations of the same program. Internally, Perl (similar to C) passes arguments to the script it interprets in the @ARGV array. But unlike C, Perl steals some of those arguments from the script for its own purposes. For instance, the script being interpreted does not see the "-w" parameter to the Perl interpreter, unless the script appears to want it. The shell separates arguments on space characters. The "-e" argument to Perl tells it to take whatever follows the "-e" on the command line and run it as a script. The "-M" argument says to take whatever follows and import it as a module, like a "use ModuleName" in a regular script. See the perldoc perlrun page for further information on the switches that Perl has to offer from the command line. Perhaps some examples would be best at this point. In the spirit of this column, let's use one-liners. The -MData::Dumper -e'print Dumper -@ARGV' part of the script simply prints out the contents of the @ARGV array. Listing 1. Command-line arguments
You can pass as many arguments as you want to Perl, unless your shell limits their number or length. Opening the magical filehandle <> in Perl, opens every argument passed to Perl as a filename and reads in the contents of each file line by line. The $_ variable will hold each line, by default. Shells make everything between quotes a single argument. That's why in Listing 1 we could say -e'print Dumper \@ARGV' and Perl saw it as a single one-liner script. Single quotes are better, because then you can use double quotes inside the one-liner. Double quotes in Perl serve to interpret everything between them. Perhaps another example will help to illustrate further: Listing 2. Single vs. double quotes
Things are a little better in bash than tcsh, because bash allows the inside double quotes to be escaped with a \ character. But the shell still interprets $$ inside double quotes before it passes to Perl. The bottom line is, don't use double quotes to specify your -e one-line script argument. See perldoc perlrun for more details, but basically you should find out what works on your system and stick with that. So far you have seen the -e and -M switches in action: import a module, and run a statement. Below I've listed a few other useful switches; the more complex ones have been omitted in the interest of sanity. See perldoc perlrun for the complete list and some usage ideas. Cleanliness
Data
Execution control
File operations See Listing 3 for a one-line script that renames files from aaa to bbb. The find . command prints out the list of all files and directories in the current directory and under. Give find the "-type f" parameter if you want only the files. Take the output of find, a list of files, and pass it to the one-liner. The one-line script uses the -ne parameters, which means that it could be rewritten as: Listing 4. Renaming files from aaa to bbb, decomposed
As you can see, this is a fairly complex seven-line script. The -n switch simplified things. But still, you must know the $_ variable and the s/// and -e operators (see the perldoc perlop page for details). The File::Find standard Perl module could have been used to do the file find instead of the Unix find command, but then the script would have probably been too large to be a one-liner. One-liners are a delicate balance between usefulness and obfuscation, and you have to be prepared to rewrite them as real scripts if necessary instead of keeping baby Frankenstein programs around. Here's another example of file processing: look through a directory of MP3 files with a known naming structure and extract the album name. Let's assume that the name of the file is "Artist-Album-Track#-Song.mp3". Listing 5. Finding album names for Artist-Album-Track#-Song.mp3
This script is very simple. It relies on find's behavior to always print a "./" before each filename. It then substitutes $_ with only the album name, and the -p switch automatically prints the album names. Finally, sort and uniq in sequence ensure that repeated album names will be printed only once. All the find, sort and uniq invocations could have been done with Perl, but why bother when the operating system already has those written for us? It is interesting as an exercise, but in practice the one-liner would become 20-30 lines of unnecessary code. Let's decompose the Perl script (in a simplified fashion, omitting some of the complexities of the -p switch): Listing 6. Finding album names for Artist-Album-Track#-Song.mp3, decomposed
Again, note how Perl was an intermediate tool between find, sort and uniq. Don't try to write everything in Perl. You can, and sometimes you should, but one-liners are about reuse. Also, see how simple the regular expression is. Sure, we may get a few aberrant album names if the MP3 files are not named right, but is it worth the effort to perfect that regular expression? If you need to do that much work, you probably should be using a CPAN MP3 ID3 tag module instead of parsing filenames. Know when one-liners are becoming a nuisance, rather than a tool. This is what I meant earlier when I said that you should know Perl well before starting on the one-liners. Using all your tools in your programming approach will make you a good Perl programmer, and a good programmer altogether. Data operations
We could use any regular expression instead of "aaa", of course. Note that we use the -p switch to print $_ for every line. That's necessary because the output of the Perl script is what goes inside the file! This means we can do some interesting tricks. For instance: Listing 8. Inserting line numbers in a file
This script inserts 4-digit line numbers before every line in the file. If you get a headache looking at the syntax, focus on the nearest person and ask them if they know the joke about the two camels in the zoo. They'll hit you over the head with something heavy, distracting you from your headache for a while, after which you can get back to work. Now for something more involved. We'll use Uri Guttman's excellent File::ReadBackwards module to look through a log file backwards for interesting events. (You have to install File::ReadBackwards from CPAN.) We'll search for the string "sshd" to see all notices from the sshd daemon. Listing 9. Looking through a file backwards for sshd messages
The \ characters at the end of each line tell the shell that more is coming; the line is not over yet. This 3-line script is about as large as a one-liner could be before you have to rewrite it as a real script. The same effect could be achieved with less code by saving all the lines in the file and then printing them backwards, but that's a lot less efficient than File:ReadBackwards, which actually will read the file backwards and stop at newlines. This efficiency is something you could not achieve easily from the command line. But why stop here? Let's extract all IP addresses mentioned in the sshd log messages. Listing 10. Looking through a file backwards for IPs in sshd messages
This is getting ugly! We should be moving this into a real script just about now. Note how the regular expression above captures only digits and dots, after "connection from" and a string of non-digits. This is not perfect, but it works just fine in the real world with IPv4 addresses. You should understand what's needed from your one-liner, and do exactly that. Don't over-engineer a throwaway script. You'll be sorry. On the other hand, know when a script will not be thrown away, and write your code accordingly! A real-world example Instead of fixing indexpage.pl myself (a fine exercise, but one I had no time for at 2 AM), I used a one-liner. See Listing 11 for a one-line script that renames JPG files. It was tricky, because I could not use a single quote inside the script. I finally used the ASCII value for a single quote, 39, to put a single quote in the $quote variable, and use it indirectly in the substitution. This printed out a series of "mv" commands that I could examine to make sure I was doing the right thing. Finally, I saved the commands to a file and used the shell "source" command to run every command in that file. Listing 12 shows the JPG renaming in action. After the renaming, the indexpage.pl script ran fine. Conclusion Balance power with legibility. One-liners should be thrown away, like prototypes. Otherwise you will see them again, like a nice poodle wandering off and coming back as Cujo. A few exceptions to the general usage of one-liners are acceptable. They are throwaways, not pyramids. You should never run a one-liner outright; always print out what it will do before you actually run the command. You'll save yourself a lot of white hairs. Use your one-liner skills sparingly. It's better to err on the side of caution when dealing with such wild beasts. Finally, have a lot of fun. One-liners are the best way to make Perl do the dirty work for you. Look on the Usenet newsgroups and mailing lists dedicated to Perl for ideas and critiques.
|
About IBM | Privacy | Terms of use | Contact |