CIS 2.55 - Notes 0004

As far as data structures are concerned, Perl has a huge chunk of them built in, and whatever is not built in, can easily be built up from the parts that are built in. We will start our talk on plain lists, etc., and finish up with ?????.

Scalars

Scalars are used a lot when dealing with data structures. All references are scalars. Scalars is also where the data of our data structures is stored (when we store a list, the list is an array of scalars).

One thing to remember is that scalars are not arrays. Even when a scalar is storing a string, it is not an array of characters. It is simply a single scalar value.

Lists

Lists are basically arrays (and arrays are basically lists). You can declare an array simply by specifying it, for example (from notes 2):

@names = ('john','jane','bob','bill');

Perl provides functions to access such a list, and depending on the functions, you can think of it as different data structures.

Stacks

These are FILO data structures.

Well, lists are stacks. You just access them from one end. That's just about it. Perl provides the push and pop functions. Here's the description of push right from perlfunc man page:

push ARRAY,LIST

Treats ARRAY as a stack, and pushes the values of LIST onto the end of ARRAY. The length of ARRAY increases by the length of LIST. Has the same effect as

for $value (LIST) {
    $ARRAY[++$#ARRAY] = $value;
}

but is more efficient. Returns the new number of elements in the array.

Now, don't let that push ARRAY,LIST throw you off. You can still push scalars without any problems. The LIST there is just to make the function more general (ie: you can push many things at a time).

There is also the pop (also from perlfunc man page):

pop ARRAY pop

Pops and returns the last value of the array, shortening the array by one element. Has an effect similar to

$ARRAY[$#ARRAY--]

If there are no elements in the array, returns the undefined value (although this may happen at other times as well). If ARRAY is omitted, pops the @ARGV array in the main program, and the @_ array in subroutines, just like shift.

That's about it for stacks.

Lists

Lists are generally stacks with a few extra operations on other ends of the stack. So, while with push and pop we work with the end of the list, there is also the shift and unshift that allow us to work with the beginning of the list.

Again, perlfunc documentation coming up:

shift ARRAY shift

Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. If there are no elements in the array, returns the undefined value. If ARRAY is omitted, shifts the @_ array within the lexical scope of subroutines and formats, and the @ARGV array at file scopes or within the lexical scopes established by the eval '', BEGIN {}, INIT {}, CHECK {}, and END {} constructs.

See also unshift, push, and pop. shift and unshift do the same thing to the left end of an array that pop and push do to the right end.

And for unshift:

unshift ARRAY,LIST

Does the opposite of a shift. Or the opposite of a push, depending on how you look at it. Prepends list to the front of the array, and returns the new number of elements in the array.

unshift(ARGV, '-e') unless $ARGV[0] =~ /^-/;

Note the LIST is prepended whole, not one element at a time, so the prepended elements stay in the same order. Use reverse to do the reverse.

Well, with these operations, we can manipulate both ends of the list; and we can always access the middle as a regular array. Flexible, isn't it?

There are also other operations, like splice that we won't go into, but you should still look them up and learn them on your own (they're very useful).

Queues

These are FIFO data structures.

It shouldn't even have to be mentioned that you can easily implement a queue in Perl simply by combining the push/pop with shift/unshift. You push from one end, and shift, from another, or... you unshift from one end, and pop from another.

Hashes

Associative arrays are ones that store a name associated with a value. Just as with arrays, there are a few built in operations that we won't really go into. The primary one that you'll be using on a regular basis is: keys, which simply returns the list of key values from a hash.

Again, lets take the keys description right from perlfunc man pages:

keys HASH

Returns a list consisting of all the keys of the named hash. (In scalar context, returns the number of keys.) The keys are returned in an apparently random order. The actual random order is subject to change in future versions of perl, but it is guaranteed to be the same order as either the values or each function produces (given that the hash has not been modified). As a side effect, it resets HASH's iterator.

Here is yet another way to print your environment:

    @keys = keys %ENV;
    @values = values %ENV;
    while (@keys) {
        print pop(@keys), '=', pop(@values), "\n";
    }

    or how about sorted by key:

    foreach $key (sort(keys %ENV)) {
        print $key, '=', $ENV{$key}, "\n";
    }

The returned values are copies of the original keys in the hash, so modifying them will not affect the original hash. Compare values.

To sort a hash by value, you'll need to use a sort function. Here's a descending numeric sort of a hash by its values:

    foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) {
        printf "%4d %s\n", $hash{$key}, $key;
    }

As an lvalue keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to pre-extending an array by assigning a larger number to $#array.) If you say

keys %hash = 200;

then %hash will have at least 200 buckets allocated for it--256 of them, in fact, since it rounds up to the next power of two. These buckets will be retained even if you do %hash = (), use undef %hash if you want to free the storage while %hash is still in scope. You can't shrink the number of buckets allocated for the hash using keys in this way (but you needn't worry about doing this by accident, as trying has no effect).

Slices

There are also more interesting ways of accessing (and manipulating) arrays than just 1 element at a time. We can grab (or set) several values at once! Take this list as an example:

@names = (bill, john, jane, wall, tom);

Now, to display the 1st, 3rd, and 4th elements in the array, we could do something like this:

for(@names[1,3,4]){
   print $_ . "\n";
}

Notice how @names[1,3,4] picked out the exact values we needed. We can do a similar thing with hashes, take this hash as an example:

%names = (
   bill => 23,
   john => 45,
   jane => 23,
   wall => 34,
   tom => 18
);

The code:

for(@names{john, jane, tom}){
   print $_ . "\n";
}

accesses just the john, jane, and tom 's age.

?????

Now we are prepared to construct other data structures from the basic ones defined so far. For example, how do we create a list of lists?

List of Lists

We define it as:

@listoflists = (
   [ 'hi', 'hello', 'howdy' ],
   [ 'bye', 'good bye', 'l8r']
);

Notice that we're using [ and ] to declare the inner list. This is because [] defines a list reference, so in a sense, we're creating a list of references, which are references to lists.

To access it, we can simply:

$listoflists[1][2]

For example, what will the following code display?

print $listoflists[1][2] . "\n";

List of Hashes

Well, we define a list of hashes similarly to how we define a list of lists:

@listofhashes = (
   {  name => john,
      age => 23
   },
   {
      name => jane,
      age => 22
   }
);

Now, guess what the following code will display?

print $listofhashes[1]{name} . " " . $listofhashes[1]{age} . "\n";

Hash of Lists

Here's how we declare a hash of lists:

%hashoflists = (
   names => [ john, jane, bill ],
   ages => [ 23, 22, 32 ]
);

Guess the output of:

print $hashoflists{names}[2] . "\n";

Hash of Hashes

Well, the really freaky data structure is a hash of hashes. In Perl, it's actually quite easy to setup:

%hashofhashes = (
   names => {
      john => 23,
      jane => 22,
      bill => 32
   },
   ages => {
      23 => john,
      22 => jane,
      32 => bill
   }
);

And as before, try to figure out what this code does:

print $hashofhashes{names}{jane} . "\n";