Recently, I need to pick up Perl for some of my projects. After going through some websites and books, I start seeing why some of the tasks could be done much easier in Perl than in Java, the language that I am quite familiar with. Since Perl is loose in data type, not OO-enforced, non-explicit signature plus its syntax is full of symbols, some Perl programmers given the freedom and power without following the best practices could write very illegible code. Although it runs, it could be very hard to maintain. For those who feel the pains, you may like this post . However, I don't want to go extreme on this. I do see its power in text processing and system integration. After taking a look at the extensive CPAN library, I decide to give it a fair trial. Not only that, I will write a series of posts to help you to shorten your learning curve as well. This post I will put my head on Perl data type and its usages.
Data Type
Number and String
# numeric $var = 123; $var = 123.34; #float or double # very large long unsigned integer like md5 value. $bignum = Math::BigInt->new("Ox1821231223238234234"); # string - single quote: no interpretation; double quote: interpreted at runtime $myVar = 'abc'; $myVar = "product price is $price";
Array
@myarray = (1,2,4, 'abc', "abc", $myvar);
print $myarray[0]; #1 is shown
@coins = ("Quarter","Dime","Nickel"); # quotation could be hassle!
@coins = qw(Quarter Dime Nickel); # remove the quotation headache easily
# To get numeric length of an array, you can use the scalar() function or
# we can redefine the array as a scalar variable.
$coinlength = @coins;
print scalar(@coins);
print $coinlength;
# array in perl can dynamic growth in size
push(@coins, "Penny"); #add element to the end of the array
unshift(@coins, "Dollar"); #add element to the front of the array
pop(@coins); #remove last element of the array
shift(@coins); #remove first element of the array
delete $coins[1]; #delete element with index 1 in the array
@names = split(',',$namelist); #convert csv line in string to array
$namelist = join(",",@names); #reconstruct via joining them back
@Foods = qw(Pizza Steak chicken burgers);
@foods = sort(@Foods); #sort the array according to ASCII Numeric
# transform to lowercase as Capital case can mess up the order
foreach $food (@Foods) {
push(@foods, "\L$food");
}
# sort it
@foods = sort(@foods);
my $count = 0;
for (@list) {
$count++ if $_ eq "apple"; # $_ is assigned the current element of the loop
}
# slice the array
my @stuff = qw/everybody wants a rock/;
my @rock = @stuff[1 .. $#stuff]; # @rock is qw/wants a rock/
my @want = @stuff[ 0 .. 1]; # @want is qw/everybody wants/
Hash
Hash is also called associative array that is composed of a list of key-value pair entries and the keys here are unique. It is like Map in Java.
%myhash = ('key1' => 'abc', 'key2' => 5, 'key3' => $myvar);
# even number array can convert into hash
# element[i] => element[i+1] where i is 0,2,4..etc
%coins = ( "Quarter", 25, "Dime", 10, "Nickel", 5 );
$myKey = "Dime";
$coins{$myKey} = 15; # assign using variable
print $coins{"Dime"}; # output is 15;
print $coins{Dime}; # you can simplify it without quote there
foreach $key (sort keys %coins) {
print "$key: $coins{$key} \n";
}
$coins{HalfDollar} = .50; #add new element
delete($coins{HalfDollar}); #delete an element
if ( exists $hash{key} ){
#retrieve value here
}
# slice a hash
my %table = qw/schmoe joe smith john simpson bart/;
my @friends = @table{'schmoe', 'smith'}; # @friends has qw/joe john/
Subroutine
Parameters passes to subroutine will be stored in @_. You can create a reference of a subroutine and pass it to a method as function pointer.
sub my_sub{
my($msg) = @_;
print "this is my own message: $msg";
}
&my_sub(); #call with option & in front
my_sub('hello world'); #call without & in front and parameter can be dynamic here.
Reference and Dereference
So far so good? Now I am going to show you how to pass by reference rather than pass by value. The reason you want to pass by reference because it is more memory efficient. However, to do that, I have to take you to a trip of syntax mess. It is what gave me a hard time before and I hope I can make it clear so you will not be suffered the pain I had. OK. Lets start! Reference is a scalar that refers to the data stored in another variable of any type, as well as subroutine and methods. It gives you the ability to pass by reference a large variable to a function instead of pass by value.
Reference
$varRef = \$myvar; #varRef now stores the physical address of $myvar
$arrayRef = \@myarray;
$subRef = \&my_sub;
$arrayRef = [1, 2, 3]; #arrayRef is pointing to the address of an anonymous array
%hash = ('key1'=> 'var1', 'key2' => 'var2'); #regular hash
$rhash = \%hash; #create reference to the hash
$href = {'key1'=> 'var1', 'key2' => 'var2'}; #reference to anonymous hash
Dereference
To access the content of a reference, there are special ways to do that:
my $name = "Test Users";
my $rname = \$name;
sub my_func{
my ($ref) = @_;
print "Scalar ref value is $$ref"; #dereference
}
my_func($rname);
my_func(\$name);
# array can be deference in 2 ways
$arrayRef->[1];
$$arrayRef[1];
@array = @$arrayRef; #the whole array available again for regular usage
print $hash{key1}; # Ordinary hash lookup
print $$rhash{key1}; # hash replaced by $rhash
print $rhash->{key1}; # hash replaced by $rhash
# we are getting ourselves confused via using the syntax of dereference,
# we could get back the actual value via:
@{array_reference}
%{hash_reference}
${scalar_reference}
print Dumper %$rhash; # dump the whole hash (valid syntax without the bracket)
for(keys %{$rhash}) # for more clarity, enclose it in curly brackets.
{
...
}
# map of anonymous subroutines
my %hash = (add => sub {my var1 = @_; print "Add $var1\n"},
substract => sub {my var1 = @_; print "Subtract $var1 \n"} );
# get a function pointer and invoke it using -> if you need to pass parameter(s).
$hash{$add}->(5);
Confused! Give me a chart!!
I hope that you are still ok. Sometimes you see @varArray in code, it could mean the array itself or the size of it. It depends on the context. If @varArray is under scalar context, it means the length of the array. Otherwise, it means the array itself. To avoid confusion, you can always use scalar(@varArray) to get the size of the array. However, someone may use $#varArray+1 to represent the length of it. In fact, $#varArray means the last index of the array. Since array index is starting from 0, so you add 1 to get the actual size of the array. Below is a chart that I see it quite helpful when I get lost in the Perl syntax.
============================================================================================================
Expression Context Variable Evaluates to
============================================================================================================
$scalar scalar $scalar, a scalar the value held in $scalar
@array list @array, an array the list of values (in order) held in @array
@array scalar @array, an array the total number of elements in @array (same as $#array + 1)
$array[$x] scalar @array, an array the ($x+1)th element of @array
$#array scalar @array, an array the subscript of the last element in @array (same as @array -1)
@array[$x, $y] list @array, an array a slice, listing two elements from @array (same as ($array[$x], $array[$y]))
"$scalar" scalar $scalar, a scalar a string containing the contents of $scalar
"@array" scalar @array, an array a string containing the elements of @array, separated by spaces
%hash list %hash, a hash a list of alternating keys and values from %hash
$hash{$x} scalar %hash, a hash the element from %hash with the key of $x
@hash{$x, $y} list %hash, a hash a slice, listing two elements from %hash (same as ($hash{$x}, $hash{$y})
Real Life Usage
Now you have gone through basically the most common use of Perl syntax. If you are still reading this, congratulation! You can now move a step further to see how it applies to some real life examples. I am going to show how people use Hash because I see it more fun.
#--------- counting ---------
How you generate a histogram of term from a text file?
(1) Convert the text to a list of string term (you could eliminate the punctuation, normalize case, get rid of the stop words as you wish)
(2) Walk through the list and build a map of term with count as value. How? In Java, I will loop thru the list check if the term is there
in the map. If not, put it there with count=1. If yes, pull the current count of the term and increment it. In Perl, the syntax is much easier:
my %histogram;
$histogram{$_}++ for @list; # for loop on the list and each element by default is assigned to $_'
$unique = keys %histogram; # obtain the number of unique terms
@unique = keys %histogram; # obtain the list of the keys
@popular = (sort { $histogram{$b} <=> $histogram{$a} } @unique)[0..4]; # obtain top 5 based on the count
#--------- searching ---------
my $index;
for $index (0..@chambers) { # linear search
last if $chambers[$index] == $bullet; #exist at this condition
}
print "Found at index $index" if $index < @chambers;
NOTICE: we see lot of actions in perl can be controlled by condition followed. If condition returns false, the whole statement will not be executed.
#---------- dispatch table -----------
# Suppose you have a script that does several related things: it manages your to-do list by adding, editing, listing, and deleting to-do items:
>> todo add "Email Samuel about photos"
Output: Todo item 129 created
>> todo done 129
Output: Item 129 marked as done
# You might expect the script to look like:
my $command = shift @ARGV;
if ($command eq "add") { add(@ARGV) }
elsif ($command eq "list") { list(@ARGV) }
elsif ($command eq "done") { done(@ARGV) }
elsif ($command eq "edit") { edit(@ARGV) }
...
else { die "Unknown command: $command" }
# That is quite tedious. You could use hash to deal with this since we don't have 'switch' in perl:
%commands = (
add => \&add,
list => \&list,
edit => \&edit,
done => \&done,
);
my $action = shift @ARGV;
if (!exists $commands{$action}) { die "Unknown command: $command" }
$commands{$action}->(@ARGV); #dereference the subroutine and use -> for argument passing
Conclusion
Thanks for reading up such a long post. Buy yourself a coffee because you are now equipped an essential skill to write Perl. You may wonder why I haven't shown you how get your environment setup and write the first Hello World program in Perl. I don't do this because there are tons of articles showing you how to do this. Apart from that, I also skip some basic things like conditional control, packaging, library usage, exception handling and etc. I don't cover this because I see the syntax of them quite easy to grasp. Next article, I am planning to cover the power of text processing with some deep dive in regular expression. Stay tune! There will be more symbols to get familiar with. By the way, if you see some weird syntax related to data structure, put a comment on my blog so we can learn from each other.
