Jump to content

Perl language structure: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m WPCleaner v1.31b - Fixed using WP:WCW - HTML named entities without semicolon
m →‎Data types: {{code}}
 
(41 intermediate revisions by 29 users not shown)
Line 1: Line 1:
{{original research|date=July 2017}}
The '''structure of the [[Perl]] programming language''' encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "[[there's more than one way to do it]]". As a [[programming paradigm|multi-paradigm]], dynamically [[type system|typed]] language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots,<ref>{{cite book | last1 = Orwant | first1 = Jon | title = Games, diversions, and Perl culture: best of the Perl journal | year = 2003 | accessdate = 2011-01-08 | isbn = 978-0-596-00312-8}}</ref> and is responsible for the size of the [[CPAN]] archive, a community-maintained repository of more than 100,000 modules.<ref name="home">{{cite web |title=CPAN front page|url=http://www.cpan.org/|accessdate=2011-12-09}}</ref>
The '''structure of the [[Perl]] programming language''' encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "[[there's more than one way to do it]]". As a [[programming paradigm|multi-paradigm]], dynamically [[type system|typed]] language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots{{when|date=November 2018}},<ref>{{cite book | last1 = Orwant | first1 = Jon | title = Games, diversions, and Perl culture: best of the Perl journal | year = 2003 | isbn = 978-0-596-00312-8}}</ref> and is responsible for the size of the [[CPAN]] archive, a community-maintained repository of more than 100,000 modules.<ref name="home">{{cite web |title=CPAN front page|url=http://www.cpan.org/|accessdate=2011-12-09}}</ref>


== Basic syntax ==
== Basic syntax ==
In Perl, the minimal [[Hello world]] program may be written as follows:
In Perl, the minimal [["Hello, World!" program|Hello World]] program may be written as follows:
<source lang="perl">
<syntaxhighlight lang="perl">
print "Hello, world!\n"
print "Hello, World!\n"
</syntaxhighlight>
</source>
This [[Input/output|prints]] the [[String (computer science)|string]] ''Hello, world!'' and a [[newline]], symbolically expressed by an <code>n</code> character whose interpretation is altered by the preceding [[escape character]] (a backslash). Since version 5.10, the new 'say' builtin produces the same effect even more simply:
This [[Input/output|prints]] the [[String (computer science)|string]] ''Hello, World!'' and a [[newline]], symbolically expressed by an <code>n</code> character whose interpretation is altered by the preceding [[escape character]] (a backslash). Since version 5.10, the new 'say' builtin<ref>{{cite web|url=http://perldoc.perl.org/feature.html#The-'say'-feature|title=Features|work=Perldoc|publisher=Perl.org|accessdate=24 July 2017}}</ref> produces the same effect even more simply:
<source lang="perl">
<syntaxhighlight lang="perl">
say "Hello, world!"
say "Hello, World!"
</syntaxhighlight>
</source>


An entire Perl program may also be specified as a command-line parameter to Perl, so the same program can also be executed from the command line (example shown for Unix):
An entire Perl program may also be specified as a command-line parameter to Perl, so the same program can also be executed from the command line (example shown for Unix):
<source lang="perl">
<syntaxhighlight lang="perl">
$ perl -e 'print "Hello, world!\n"'
$ perl -e 'print "Hello, World!\n"'
</syntaxhighlight>
</source>


The canonical form of the program is slightly more verbose:
The canonical form of the program is slightly more verbose:


<source lang="perl">
<syntaxhighlight lang="perl">
#!/usr/bin/perl
#!/usr/bin/perl
print "Hello, world!\n";
print "Hello, World!\n";
</syntaxhighlight>
</source>


The hash mark character introduces a [[comment (computer programming)|comment]] in Perl, which runs up to the end of the line of code and is ignored by the compiler (except on Windows). The comment used here is of a special kind: it’s called the [[Shebang (Unix)|shebang]] line. This tells Unix-like operating systems to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning <code>perl</code>. (Note that, on [[Microsoft Windows]] systems, Perl programs are typically invoked by associating the <code>.pl</code> [[Filename extension|extension]] with the Perl interpreter. In order to deal with such circumstances, <code>perl</code> detects the shebang line and parses it for switches.<ref>{{cite web | url = http://perldoc.perl.org/perlrun.html | title = perlrun | accessdate = 2011-01-08 | publisher = perldoc.perl.org - Official documentation for the Perl programming language}}</ref>)
The hash mark character introduces a [[comment (computer programming)|comment]] in Perl, which runs up to the end of the line of code and is ignored by the compiler (except on Windows). The comment used here is of a special kind: it’s called the [[Shebang (Unix)|shebang]] line. This tells Unix-like operating systems to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning <code>perl</code>. (Note that, on [[Microsoft Windows]] systems, Perl programs are typically invoked by associating the <code>.pl</code> [[Filename extension|extension]] with the Perl interpreter. In order to deal with such circumstances, <code>perl</code> detects the shebang line and parses it for switches.<ref>{{cite web | url = http://perldoc.perl.org/perlrun.html | title = perlrun | accessdate = 2011-01-08 | publisher = perldoc.perl.org - Official documentation for the Perl programming language}}</ref>)


The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to, or moved away from, the end of a block or file without having to adjust semicolons.
The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it, because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to, or moved away from, the end of a block or file without having to adjust semicolons.


Version 5.10 of Perl introduces a <code>say</code> function that implicitly appends a newline character to its output, making the minimal "Hello world" program even shorter:
Version 5.10 of Perl introduces a <code>say</code> function that implicitly appends a newline character to its output, making the minimal "Hello World" program even shorter:


<source lang="perl">
<syntaxhighlight lang="perl">
use 5.010; # must be present to import the new 5.10 functions, notice that it is 5.010 not 5.10
use 5.010; # must be present to import the new 5.10 functions, notice that it is 5.010 not 5.10
say 'Hello, world!'
say 'Hello, World!'
</syntaxhighlight>
</source>


==Data types==
==Data types==
Line 39: Line 40:


{| class="wikitable"
{| class="wikitable"
|-
|-bol
! Type
! Type
! Sigil
! Sigil
Line 46: Line 47:
|-
|-
|[[Scalar (computing)|Scalar]]
|[[Scalar (computing)|Scalar]]
|$
|{{tt|$}}
|$foo
|{{code|$foo}}
|A single value; it may be a number, a [[String (computer science)|string]], a filehandle, or a [[Reference (computer science)|reference]].
|A single value; it may be a number, a [[String (computer science)|string]], a filehandle, or a [[Reference (computer science)|reference]].
|-
|-
|[[Array data type|Array]]
|[[Array data type|Array]]
|@
|{{tt|@}}
|@foo
|{{code|@foo}}
|An ordered collection of scalars.
|An ordered collection of scalars.
|-
|-
|[[Associative array|Hash]]
|[[Associative array|Hash]]
|%
|{{tt|%}}
|%foo
|{{code|%foo}}
|A map from strings to scalars; the strings are called ''keys'', and the scalars are called ''values''. Also known as an ''associative array''.
|A map from strings to scalars; the strings are called ''keys'', and the scalars are called ''values''. Also known as an ''associative array''.
|-
|-
|[[Filehandle]]
|[[File handle]]
|none
|{{CNone|none}}
|$foo or FOO
|{{code|$foo}} or {{code|FOO}}
|An opaque representation of an open file or other target for reading, writing, or both.
|An opaque representation of an open file or other target for reading, writing, or both.
|-
|-
|[[Subroutine]]
|[[Subroutine]]
|&
|{{tt|&}}
|&foo
|{{code|&foo}}
|A piece of code that may be passed arguments, be executed, and return data.
|A piece of code that may be passed arguments, be executed, and return data.
|-
|-
|[[Perl#Typeglob_values|Typeglob]]
|[[Perl language structure#Typeglob values|Typeglob]]
|*
|{{tt|*}}
|*foo
|{{code|*foo}}
|The symbol table entry for all types with the name 'foo'.
|The [[symbol table]] entry for all types with the name 'foo'.
|}
|}


===Scalar values===
===Scalar values===


String values (literals) must be enclosed by quotes. Enclosing a string in double quotes allows the values of variables whose names appear in the string to automatically replace the variable name (or be '''[[Variable_interpolation#Interpolation|interpolated]]''') in the string. Enclosing a string in single quotes prevents variable interpolation.
String values (literals) must be enclosed by quotes. Enclosing a string in double quotes allows the values of variables whose names appear in the string to automatically replace the variable name (or be '''[[Variable interpolation#Interpolation|interpolated]]''') in the string. Enclosing a string in single quotes prevents variable interpolation.


For example, if <code>$name</code> is <code>"Jim"</code>:
For example, if <code>$name</code> is <code>"Jim"</code>:

<ul>
<li>then <code>print("My name is $name")</code> will print <code>"My name is Jim"</code> (interpolation within double quotes),</li>
*then <code>print("My name is $name")</code> will print <code>"My name is Jim"</code> (interpolation within double quotes),
<li>but <code>print('My name is $name')</code> will print <code>"My name is $name"</code> (no interpolation within single quotes).</li></ul>
*but <code>print('My name is $name')</code> will print <code>"My name is $name"</code> (no interpolation within single quotes).


To include a double quotation mark in a string, precede it with a backslash or enclose the string in single quotes. To include a single quotation mark, precede it with a backslash or enclose the string in double quotes.
To include a double quotation mark in a string, precede it with a backslash or enclose the string in single quotes. To include a single quotation mark, precede it with a backslash or enclose the string in double quotes.


Strings can also be quoted with the <code>q</code> and <code>qq</code> quote-like operators:
Strings can also be quoted with the <code>q</code> and <code>qq</code> quote-like operators:
<ul><li><code>'this'</code> and <code>q(this)</code> are identical,</li>
*<code>'this'</code> and <code>q(this)</code> are identical,
<li><code>"$this"</code> and <code>qq($this)</code> are identical.</li></ul>
*<code>"$this"</code> and <code>qq($this)</code> are identical.


Finally, multiline strings can be defined using [[here document]]s:
Finally, multiline strings can be defined using [[here document]]s:


<source lang="perl">
<syntaxhighlight lang="perl">
$multilined_string = <<EOF;
$multilined_string = <<EOF;
This is my multilined string
This is my multilined string
note that I am terminating it with the word "EOF".
note that I am terminating it with the word "EOF".
EOF
EOF
</syntaxhighlight>
</source>


Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings <code>$n</code> and <code>$m</code> are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl, <code>+</code> is always the numeric addition operator. The string concatenation operator is the period.
Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings <code>$n</code> and <code>$m</code> are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl, <code>+</code> is always the numeric addition operator. The string concatenation operator is the period.


<source lang="perl">
<syntaxhighlight lang="perl">
$n = '3 apples';
$n = '3 apples';
$m = '2 oranges';
$m = '2 oranges';
print $n + $m;
print $n + $m;
</syntaxhighlight>
</source>
Functions are provided for the [[rounding]] of fractional values to integer values: <code>int</code> chops off the fractional part, rounding towards zero; <code>POSIX::ceil</code> and <code>POSIX::floor</code> round always up and always down, respectively. The number-to-string conversion of <code>printf "%f"</code> or <code>sprintf "%f"</code> round out even, use [[Rounding#Round half to even|bankers' rounding]].
Functions are provided for the [[rounding]] of fractional values to integer values: <code>int</code> chops off the fractional part, rounding towards zero; <code>POSIX::ceil</code> and <code>POSIX::floor</code> round always up and always down, respectively. The number-to-string conversion of <code>printf "%f"</code> or <code>sprintf "%f"</code> round out even, use [[Rounding#Round half to even|bankers' rounding]].


Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:
Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:


<source lang="perl">
<syntaxhighlight lang="perl">
$false = 0; # the number zero
$false = 0; # the number zero
$false = 0.0; # the number zero as a float
$false = 0.0; # the number zero as a float
Line 118: Line 119:
$false = '0'; # the string zero
$false = '0'; # the string zero
$false = ""; # the empty string
$false = ""; # the empty string
$false = (); # the empty list
$false = undef; # the return value from undef
$false = undef; # the return value from undef
$false = 2-3+1 # computes to 0 which is converted to "0" so it is false
$false = 2-3+1 # computes to 0 that is converted to "0" so it is false
</syntaxhighlight>
</source>


All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero.
All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '{{mono|0E0}}' or '{{mono|0e0}}', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero. The empty hash <code>{}</code> is also true; in this context <code>{}</code> is not an empty block, because <code>perl -e 'print ref {}'</code> returns <code>HASH</code>.


Evaluated boolean expressions are also scalar values. The documentation does not promise which ''particular'' value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The ''defined()'' function determines whether a variable has any value set. In the above examples, ''defined($false)'' is true for every value except ''undef''.
Evaluated boolean expressions are also scalar values. The documentation does not promise which ''particular'' value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The {{code|defined()}} function determines whether a variable has any value set. In the above examples, {{code|defined($false)}} is true for every value except {{code|undef}}.


If either 1 or 0 are specifically needed, an explicit conversion can be done using the [[conditional operator]]:
If either 1 or 0 are specifically needed, an explicit conversion can be done using the [[conditional operator]]:


<source lang="perl">
<syntaxhighlight lang="perl">
my $real_result = $boolean_result ? 1 : 0;
my $real_result = $boolean_result ? 1 : 0;
</syntaxhighlight>
</source>


===Array values===
===Array values===
Line 136: Line 138:
An [[Array data type|array value]] (or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence).
An [[Array data type|array value]] (or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence).


<source lang="perl">
<syntaxhighlight lang="perl">
@scores = (32, 45, 16, 5);
@scores = (32, 45, 16, 5);
</syntaxhighlight>
</source>


The qw() quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent:
The qw() quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent:


<source lang="perl">
<syntaxhighlight lang="perl">
@names = ('Billy', 'Joe', 'Jim-Bob');
@names = ('Billy', 'Joe', 'Jim-Bob');
@names = qw(Billy Joe Jim-Bob);
@names = qw(Billy Joe Jim-Bob);
</syntaxhighlight>
</source>


The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression.
The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression.


<source lang="perl">
<syntaxhighlight lang="perl">
@scores = split(',', '32,45,16,5');
@scores = split(',', '32,45,16,5');
</syntaxhighlight>
</source>


Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar sigil must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, $month[3] is "April" (the first element in an array has an index value of 0), and @month[4..6] is ("May", "June", "July").
Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar [[sigil (computer programming)|sigil]] must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, <code>$month[3]</code> is <code>"April"</code> (the first element in an array has an index value of 0), and <code>@month[4..6]</code> is <code>("May", "June", "July")</code>.


===Hash values===
===Hash values===
Line 162: Line 164:
|first= Peter
|first= Peter
|title= Pro Perl
|title= Pro Perl
|url= http://books.google.com/books?id=1bbjLxkBLaMC
|url= https://books.google.com/books?id=1bbjLxkBLaMC
|accessdate= 2010-08-03
|accessdate= 2010-08-03
|series= Pro to Expert Series
|series= Pro to Expert Series
Line 169: Line 171:
|isbn= 978-1-59059-438-4
|isbn= 978-1-59059-438-4
|page= 64
|page= 64
|quote= [...] a string without quotes, known as a bareword string [...]
|quote= [] a string without quotes, known as a bareword string []
}}
}}
</ref>). The following lines are equivalent:
</ref>). The following lines are equivalent:


<source lang="perl">
<syntaxhighlight lang="perl">
%favorite = ('joe', "red", 'sam', "blue");
%favorite = ('joe', "red", 'sam', "blue");
%favorite = (joe => 'red', sam => 'blue');
%favorite = (joe => 'red', sam => 'blue');
</syntaxhighlight>
</source>


Individual values in a hash are accessed by providing the corresponding key, in curly braces. The <code>$</code> sigil identifies the accessed element as a scalar. For example, $favorite{joe} equals 'red'. A hash can also be initialized by setting its values individually:
Individual values in a hash are accessed by providing the corresponding key, in curly braces. The <code>$</code> sigil identifies the accessed element as a scalar. For example, {{code|$favorite{joe} }} equals {{code|'red'}}. A hash can also be initialized by setting its values individually:


<source lang="perl">
<syntaxhighlight lang="perl">
$favorite{joe} = 'red';
$favorite{joe} = 'red';
$favorite{sam} = 'blue';
$favorite{sam} = 'blue';
$favorite{oscar} = 'green';
$favorite{oscar} = 'green';
</syntaxhighlight>
</source>


Multiple elements may be accessed using the <code>@</code> sigil instead (identifying the result as a list). For example,
Multiple elements may be accessed using the <code>@</code> sigil instead (identifying the result as a list). For example,
@favorite{'joe', 'sam'} equals ('red', 'blue').
{{code|@favorite{'joe', 'sam'} }} equals {{code|('red', 'blue')}}.


===Filehandles===
===Filehandles===


Filehandles provide read and write access to resources. These are most often files on disk, but can also be a device, a [[Pipeline (Unix)|pipe]], or even a scalar value.
Filehandles provide read and write access to resources. These are most often files on disk, but can also be a device, a [[Pipeline (Unix)|pipe]], or even a scalar value.


Originally, filehandles could only be created with package variables, using the ALL_CAPS convention to distinguish it from other variables. Perl 5.6 and newer also accept a scalar variable, which will be set ([[Autovivification|autovivified]]) to a reference to an anonymous filehandle, in place of a named filehandle.<!--Using the ALL_CAPS method for filehandles is considered deprecated by the community.<ref>[http://www.modernperlbooks.com/mt/2010/04/three-arg-open-migrating-to-modern-perl.html Three Arg Open: Migrating to Modern Perl]</ref>:::blogs aren't a reliable source:::-->
Originally, filehandles could only be created with package variables, using the ALL_CAPS convention to distinguish it from other variables. Perl 5.6 and newer also accept a scalar variable, which will be set ([[Autovivification|autovivified]]) to a reference to an anonymous filehandle, in place of a named filehandle.<!--Using the ALL_CAPS method for filehandles is considered deprecated by the community.<ref>[http://www.modernperlbooks.com/mt/2010/04/three-arg-open-migrating-to-modern-perl.html Three Arg Open: Migrating to Modern Perl]</ref>:::blogs aren't a reliable source:::-->


===Typeglob values===
===Typeglob values===
Line 199: Line 201:
A typeglob value is a symbol table entry. The main use of typeglobs is creating symbol table aliases. For example:
A typeglob value is a symbol table entry. The main use of typeglobs is creating symbol table aliases. For example:


<source lang="perl">
<syntaxhighlight lang="perl">
*PI = \3.141592653; # creating constant scalar $PI
*PI = \3.141592653; # creating constant scalar $PI
*this = *that; # creating aliases for all data types 'this' to all data types 'that'
*this = *that; # creating aliases for all data types 'this' to all data types 'that'
</syntaxhighlight>
</source>


===Array functions===
===Array functions===


The number of elements in an array can be determined either by evaluating the array in scalar context or with the help of the <code>$#</code> sigil. The latter gives the index of the last element in the array, not the number of elements. The expressions scalar(@array) and ($#array&nbsp;+&nbsp;1) are equivalent.
The number of elements in an array can be determined either by evaluating the array in scalar context or with the help of the <code>$#</code> sigil. The latter gives the index of the last element in the array, not the number of elements. The expressions scalar({{code|@array}}) and (<code>$#array&nbsp;+&nbsp;1</code>) are equivalent.


===Hash functions===
===Hash functions===
Line 212: Line 214:
There are a few functions that operate on entire hashes. The ''keys'' function takes a hash and returns the list of its keys. Similarly, the ''values'' function returns a hash's values. Note that the keys and values are returned in a consistent but arbitrary order.
There are a few functions that operate on entire hashes. The ''keys'' function takes a hash and returns the list of its keys. Similarly, the ''values'' function returns a hash's values. Note that the keys and values are returned in a consistent but arbitrary order.


<source lang="perl">
<syntaxhighlight lang="perl">
# Every call to each returns the next key/value pair.
# Every call to each returns the next key/value pair.
# All values will be eventually returned, but their order
# All values will be eventually returned, but their order
# cannot be predicted.
# cannot be predicted.
while (($name, $address) = each %addressbook) {
while (($name, $address) = each %addressbook) {
print "$name lives at $address\n";
print "$name lives at $address\n";
}
}


# Similar to the above, but sorted alphabetically
# Similar to the above, but sorted alphabetically
foreach my $next_name (sort keys %addressbook) {
foreach my $next_name (sort keys %addressbook) {
print "$next_name lives at $addressbook{$next_name}\n";
print "$next_name lives at $addressbook{$next_name}\n";
}
}
</syntaxhighlight>
</source>


==Control structures==
==Control structures==
Line 233: Line 235:
It has block-oriented control structures, similar to those in the C, [[JavaScript]], and [[Java (programming language)|Java]] programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:
It has block-oriented control structures, similar to those in the C, [[JavaScript]], and [[Java (programming language)|Java]] programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:


''label'' while ( ''cond'' ) { ... }
''label'' while ( ''cond'' ) { }
''label'' while ( ''cond'' ) { ... } continue { ... }
''label'' while ( ''cond'' ) { } continue { }
''label'' for ( ''init-expr'' ; ''cond-expr'' ; ''incr-expr'' ) { ... }
''label'' for ( ''init-expr'' ; ''cond-expr'' ; ''incr-expr'' ) { }
''label'' foreach ''var'' ( ''list'' ) { ... }
''label'' foreach ''var'' ( ''list'' ) { }
''label'' foreach ''var'' ( ''list'' ) { ... } continue { ... }
''label'' foreach ''var'' ( ''list'' ) { } continue { }
if ( ''cond'' ) { ... }
if ( ''cond'' ) { }
if ( ''cond'' ) { ... } else { ... }
if ( ''cond'' ) { } else { }
if ( ''cond'' ) { ... } elsif ( ''cond'' ) { ... } else { ... }
if ( ''cond'' ) { } elsif ( ''cond'' ) { } else { }


Where only a single statement is being controlled, statement modifiers provide a more-concise syntax:
Where only a single statement is being controlled, statement modifiers provide a more-concise syntax:
Line 263: Line 265:
Perl also has two implicit looping constructs, each of which has two forms:
Perl also has two implicit looping constructs, each of which has two forms:


''results'' = grep { ... } ''list''
''results'' = grep { } ''list''
''results'' = grep ''expr'', ''list''
''results'' = grep ''expr'', ''list''
''results'' = map { ... } ''list''
''results'' = map { } ''list''
''results'' = map ''expr'', ''list''
''results'' = map ''expr'', ''list''


Line 272: Line 274:
Up until the 5.10.0 release, there was no [[switch statement]] in Perl 5. From 5.10.0 onward, a multi-way branch statement called <code>given</code>/<code>when</code> is available, which takes the following form:
Up until the 5.10.0 release, there was no [[switch statement]] in Perl 5. From 5.10.0 onward, a multi-way branch statement called <code>given</code>/<code>when</code> is available, which takes the following form:


use v5.10; # must be present to import the new 5.10 functions
use v5.10; <u># must be present to import the new 5.10 functions</u>
given ( ''expr'' ) { when ( ''cond'' ) { ... } default { ... } }
given ( ''expr'' ) { when ( ''cond'' ) { } default { } }


Syntactically, this structure behaves similarly to [[switch statement]]s found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit continues are instead necessary to emulate switch behavior.
Syntactically, this structure behaves similarly to [[switch statement]]s found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit <code>continue</code>s are instead necessary to emulate switch behavior.


For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on the forthcoming [[Perl 6]] re-design. It is implemented using a [[source filter]], so its use is unofficially discouraged.<ref>[http://www.perlmonks.org/?node_id=496084 using switch<!-- Bot generated title -->]</ref>
For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on that of sister language [[Raku (programming language)|Raku]]. It is implemented using a [[source filter]], so its use is unofficially discouraged.<ref>[http://www.perlmonks.org/?node_id=496084 using switch<!-- Bot generated title -->]</ref>


Perl includes a <code>goto label</code> statement, but it is rarely used. Situations where a <code>goto</code> is called for in other languages don't occur as often in Perl because of its breadth of flow control options.
Perl includes a <code>goto label</code> statement, but it is rarely used. Situations where a <code>goto</code> is called for in other languages don't occur as often in Perl, because of its breadth of flow control options.


There is also a <code>goto </code> statement that performs a [[tail call]]. It terminates the current subroutine and immediately calls the specified <code>''sub''</code>. This is used in situations where a caller can perform more-efficient [[Call stack|stack]] management than Perl itself (typically because no change to the current stack is required), and in deep recursion, tail calling can have substantial positive impact on performance because it avoids the overhead of scope/stack management on return.
There is also a <code>goto &amp;sub</code> statement that performs a [[tail call]]. It terminates the current subroutine and immediately calls the specified <code>''sub''</code>. This is used in situations where a caller can perform more-efficient [[Call stack|stack]] management than Perl itself (typically because no change to the current stack is required), and in deep recursion, tail calling can have substantial positive impact on performance, because it avoids the overhead of scope/stack management on return.


==Subroutines==
==Subroutines==
[[Subroutine]]s are defined with the <code>sub</code> keyword and are invoked simply by naming them. If the subroutine in question has not yet been declared, invocation requires either parentheses after the function name or an ampersand ('''&''') before it. But using '''&''' without parentheses will also implicitly pass the arguments of the current subroutine to the one called, and using '''&''' with parentheses will bypass prototypes.
[[Subroutine]]s are defined with the <code>sub</code> keyword and are invoked simply by naming them. If the subroutine in question has not yet been declared, invocation requires either parentheses after the function name or an ampersand ('''&''') before it. But using '''&''' without parentheses will also implicitly pass the arguments of the current subroutine to the one called, and using '''&''' with parentheses will bypass prototypes.


<source lang="perl">
<syntaxhighlight lang="perl">
# Calling a subroutine
# Calling a subroutine


Line 294: Line 296:


# Defining a subroutine
# Defining a subroutine
sub foo { ... }
sub foo { }


foo; # Here parentheses are not required
foo; # Here parentheses are not required
</syntaxhighlight>
</source>


A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.
A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.


<source lang="perl">
<syntaxhighlight lang="perl">
foo $x, @y, %z;
foo $x, @y, %z;
</syntaxhighlight>
</source>
The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine.
The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine.


Line 312: Line 314:
Elements of <code>@_</code> may be accessed by subscripting it in the usual way.
Elements of <code>@_</code> may be accessed by subscripting it in the usual way.


<source lang="perl">
<syntaxhighlight lang="perl">
$_[0], $_[1]
$_[0], $_[1]
</syntaxhighlight>
</source>


However, the resulting code can be difficult to read, and the parameters have [[Evaluation strategy#Call by reference|pass-by-reference]] semantics, which may be undesirable.
However, the resulting code can be difficult to read, and the parameters have [[Evaluation strategy#Call by reference|pass-by-reference]] semantics, which may be undesirable.
Line 320: Line 322:
One common idiom is to assign <code>@_</code> to a list of named variables.
One common idiom is to assign <code>@_</code> to a list of named variables.


<source lang="perl">
<syntaxhighlight lang="perl">
my ($x, $y, $z) = @_;
my ($x, $y, $z) = @_;
</syntaxhighlight>
</source>


This provides mnemonic parameter names and implements [[Evaluation strategy#Call by value|pass-by-value]] semantics. The <code>my</code> keyword indicates that the following variables are lexically scoped to the containing block.
This provides mnemonic parameter names and implements [[Evaluation strategy#Call by value|pass-by-value]] semantics. The <code>my</code> keyword indicates that the following variables are lexically scoped to the containing block.
Line 328: Line 330:
Another idiom is to shift parameters off of <code>@_</code>. This is especially common when the subroutine takes only one argument or for handling the <code>$self</code> argument in object-oriented modules.
Another idiom is to shift parameters off of <code>@_</code>. This is especially common when the subroutine takes only one argument or for handling the <code>$self</code> argument in object-oriented modules.


<source lang="perl">
<syntaxhighlight lang="perl">
my $x = shift;
my $x = shift;
</syntaxhighlight>
</source>


Subroutines may assign <code>@_</code> to a hash to simulate named arguments; this is recommended in ''Perl Best Practices'' for subroutines that are likely to ever have more than three parameters.<ref>
Subroutines may assign <code>@_</code> to a hash to simulate named arguments; this is recommended in ''[[Perl Best Practices]]'' for subroutines that are likely to ever have more than three parameters.<ref>
Damian Conway, ''[http://www.oreilly.com/catalog/perlbp/chapter/ch09.pdf Perl Best Practices]'', p.182</ref>
Damian Conway, ''[http://www.oreilly.com/catalog/perlbp/chapter/ch09.pdf Perl Best Practices] {{webarchive|url=https://web.archive.org/web/20110918134430/http://oreilly.com/catalog/perlbp/chapter/ch09.pdf |date=2011-09-18 }}'', p.182</ref>


<source lang="perl">
<syntaxhighlight lang="perl">
sub function1 {
sub function1 {
my %args = @_;
my %args = @_;
print "'x' argument was '$args{x}'\n";
print "'x' argument was '$args{x}'\n";
}
}
function1( x => 23 );
function1( x => 23 );
</syntaxhighlight>
</source>


Subroutines may return values.
Subroutines may return values.


<source lang="perl">
<syntaxhighlight lang="perl">
return 42, $x, @y, %z;
return 42, $x, @y, %z;
</syntaxhighlight>
</source>


If the subroutine does not exit via a <code>return</code> statement, then it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments.
If the subroutine does not exit via a <code>return</code> statement, it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments.


The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.
The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.


<source lang="perl">
<syntaxhighlight lang="perl">
sub list { (4, 5, 6) }
sub list { (4, 5, 6) }
sub array { @x = (4, 5, 6); @x }
sub array { @x = (4, 5, 6); @x }
Line 361: Line 363:
@x = list; # returns (4, 5, 6)
@x = list; # returns (4, 5, 6)
@x = array; # returns (4, 5, 6)
@x = array; # returns (4, 5, 6)
</syntaxhighlight>
</source>


A subroutine can discover its calling context with the <code>wantarray</code> function.
A subroutine can discover its calling context with the <code>wantarray</code> function.


<source lang="perl">
<syntaxhighlight lang="perl">
sub either {
sub either {
return wantarray ? (1, 2) : 'Oranges';
return wantarray ? (1, 2) : 'Oranges';
}
}


$x = either; # returns "Oranges"
$x = either; # returns "Oranges"
@x = either; # returns (1, 2)
@x = either; # returns (1, 2)
</syntaxhighlight>
</source>
===Anonymous functions===
{{Excerpt|Anonymous function|Perl 5|subsections=yes}}


==Regular expressions==
==Regular expressions==
The Perl language includes a specialized syntax for writing [[regular expression]]s (RE, or regexes), and the interpreter contains an engine for matching strings to regular expressions. The regular-expression engine uses a [[backtracking]] algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular-expression engine is derived from regex written by [[Henry Spencer]].
The Perl language includes a specialized syntax for writing [[regular expression]]s (RE, or regexes), and the interpreter contains an engine for matching strings to regular expressions. The regular-expression engine uses a [[backtracking]] algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular-expression engine is derived from regex written by [[Henry Spencer]].


The Perl regular-expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl and has since grown to include far more features. Many other languages and applications are now adopting [[PCRE|Perl compatible regular expressions]] over [[POSIX]] regular expressions, such as [[PHP]], [[Ruby programming language|Ruby]], [[Java (programming language)|Java]], Microsoft's [[.NET Framework]],<ref>Microsoft Corp., ".NET Framework Regular Expressions", ''.NET Framework Developer's Guide'', [http://msdn2.microsoft.com/en-us/library/hs600312(VS.71).aspx]</ref> and the [[Apache HTTP server]].
The Perl regular-expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl and has since grown to include far more features. Many other languages and applications are now adopting [[Perl Compatible Regular Expressions]] over [[POSIX]] regular expressions, such as [[PHP]], [[Ruby programming language|Ruby]], [[Java (programming language)|Java]], Microsoft's [[.NET Framework]],<ref>Microsoft Corp., ".NET Framework Regular Expressions", ''.NET Framework Developer's Guide'', [http://msdn2.microsoft.com/en-us/library/hs600312(VS.71).aspx]</ref> and the [[Apache HTTP server]].


Regular-expression syntax is extremely compact, owing to history. The first regular-expression dialects were only slightly more expressive than [[Glob (programming)|globs]], and the syntax was designed so that an expression would resemble the text that it matches.{{Citation needed|date=June 2007}} This meant using no more than a single punctuation character or a pair of delimiting characters to express the few supported assertions. Over time, the expressiveness of regular expressions grew tremendously, but the syntax design was never revised and continues to rely on punctuation. As a result, regular expressions can be cryptic and extremely dense.
Regular-expression syntax is extremely compact, owing to history. The first regular-expression dialects were only slightly more expressive than [[Glob (programming)|globs]], and the syntax was designed so that an expression would resemble the text that it matches.{{Citation needed|date=June 2007}} This meant using no more than a single punctuation character or a pair of delimiting characters to express the few supported assertions. Over time, the expressiveness of regular expressions grew tremendously, but the syntax design was never revised and continues to rely on punctuation. As a result, regular expressions can be cryptic and extremely dense.
Line 383: Line 387:
===Uses===
===Uses===


The <code>m//</code> (match) operator introduces a regular-expression match. (If it is delimited by slashes, as in all of the examples here, then the leading <code>m</code> may be omitted for brevity. If the <code>m</code> is present, as in all of the following examples, other delimiters can be used in place of slashes.) In the simplest case, an expression such as
The <code>m//</code> (match) operator introduces a regular-expression match. (If it is delimited by slashes, as in all of the examples here, the leading <code>m</code> may be omitted for brevity. If the <code>m</code> is present, as in all of the following examples, other delimiters can be used in place of slashes.) In the simplest case, an expression such as


<source lang="perl">
<syntaxhighlight lang="perl">
$x =~ /abc/;
$x =~ /abc/;
</syntaxhighlight>
</source>


evaluates to true [[if and only if]] the string <code>$x</code> matches the regular expression <code>abc</code>.
evaluates to true [[if and only if]] the string <code>$x</code> matches the regular expression <code>abc</code>.
Line 393: Line 397:
The <code>s///</code> (substitute) operator, on the other hand, specifies a search-and-replace operation:
The <code>s///</code> (substitute) operator, on the other hand, specifies a search-and-replace operation:


<source lang="perl">
<syntaxhighlight lang="perl">
$x =~ s/abc/aBc/; # upcase the b
$x =~ s/abc/aBc/; # upcase the b
</syntaxhighlight>
</source>


Another use of regular expressions is to specify delimiters for the <code>split</code> function:
Another use of regular expressions is to specify delimiters for the <code>split</code> function:


<source lang="perl">
<syntaxhighlight lang="perl">
@words = split /,/, $line;
@words = split /,/, $line;
</syntaxhighlight>
</source>


The <code>split</code> function creates a list of the parts of the string that are separated by what matches the regular expression. In this example, a line is divided into a list of its own comma-separated parts, and this list is then assigned to the <code>@words</code> array.
The <code>split</code> function creates a list of the parts of the string that are separated by what matches the regular expression. In this example, a line is divided into a list of its own comma-separated parts, and this list is then assigned to the <code>@words</code> array.
Line 411: Line 415:
Perl regular expressions can take ''modifiers''. These are single-letter suffixes that modify the meaning of the expression:
Perl regular expressions can take ''modifiers''. These are single-letter suffixes that modify the meaning of the expression:


<source lang="perl">
<syntaxhighlight lang="perl">
$x =~ /abc/i; # case-insensitive pattern match
$x =~ /abc/i; # case-insensitive pattern match
$x =~ s/abc/aBc/g; # global search and replace
$x =~ s/abc/aBc/g; # global search and replace
</syntaxhighlight>
</source>


Because the compact syntax of regular expressions can make them dense and cryptic, the <code>/x</code> modifier was added in Perl to help programmers write more-legible regular expressions. It allows programmers to place whitespace and comments ''inside'' regular expressions:
Because the compact syntax of regular expressions can make them dense and cryptic, the <code>/x</code> modifier was added in Perl to help programmers write more-legible regular expressions. It allows programmers to place whitespace and comments ''inside'' regular expressions:


<source lang="perl">
<syntaxhighlight lang="perl">
$x =~ /
$x =~ /
a # match 'a'
a # match 'a'
Line 424: Line 428:
c # then followed by the 'c' character
c # then followed by the 'c' character
/x;
/x;
</syntaxhighlight>
</source>


====Capturing====
====Capturing====


Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are ''captured''. Captured strings are assigned to the sequential built-in variables <code>$1, $2, $3, ...</code>, and a list of captured strings is returned as the value of the match.
Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are ''captured''. Captured strings are assigned to the sequential built-in variables <code>$1, $2, $3, </code>, and a list of captured strings is returned as the value of the match.


<source lang="perl">
<syntaxhighlight lang="perl">
$x =~ /a(.)c/; # capture the character between 'a' and 'c'
$x =~ /a(.)c/; # capture the character between 'a' and 'c'
</syntaxhighlight>
</source>


Captured strings <code>$1, $2, $3, ...</code> can be used later in the code.
Captured strings <code>$1, $2, $3, </code> can be used later in the code.


Perl regular expressions also allow built-in or user-defined functions to apply to the captured match, by using the <code>/e</code> modifier:
Perl regular expressions also allow built-in or user-defined functions to apply to the captured match, by using the <code>/e</code> modifier:


<source lang="perl">
<syntaxhighlight lang="perl">
$x = "Oranges";
$x = "Oranges";
$x =~ s/(ge)/uc($1)/e; # OranGEs
$x =~ s/(ge)/uc($1)/e; # OranGEs
$x .= $1; # append $x with the contents of the match in the previous statement: OranGEsge
$x .= $1; # append $x with the contents of the match in the previous statement: OranGEsge
</syntaxhighlight>
</source>


==Objects==
==Objects==
Line 448: Line 452:
There are many ways to write [[Object-oriented programming|object-oriented]] code in Perl. The most basic is using "blessed" [[Reference (computer science)|references]]. This works by identifying a reference of any type as belonging to a given package, and the package provides the methods for the blessed reference. For example, a two-dimensional point could be defined this way:
There are many ways to write [[Object-oriented programming|object-oriented]] code in Perl. The most basic is using "blessed" [[Reference (computer science)|references]]. This works by identifying a reference of any type as belonging to a given package, and the package provides the methods for the blessed reference. For example, a two-dimensional point could be defined this way:


<source lang="perl">
<syntaxhighlight lang="perl">
sub Point::new {
sub Point::new {
# Here, Point->new(4, 5) will result in $class being 'Point'.
# Here, Point->new(4, 5) will result in $class being 'Point'.
# It's a variable to support subclassing (see the perloop manpage).
# It's a variable to support subclassing (see the perloop manpage).
my ($class, $x, $y) = @_;
my ($class, $x, $y) = @_;
bless [$x, $y], $class; # Implicit return
bless [$x, $y], $class; # Implicit return
}
}


sub Point::distance {
sub Point::distance {
my ($self, $from) = @_;
my ($self, $from) = @_;
my ($dx, $dy) = ($$self[0] - $$from[0], $$self[1] - $$from[1]);
my ($dx, $dy) = ($$self[0] - $$from[0], $$self[1] - $$from[1]);
sqrt($dx * $dx + $dy * $dy);
sqrt($dx * $dx + $dy * $dy);
}
}
</syntaxhighlight>
</source>


This class can be used by invoking <code>new()</code> to construct instances, and invoking <code>distance</code> on those instances.
This class can be used by invoking <code>new()</code> to construct instances, and invoking <code>distance</code> on those instances.


<source lang="perl">
<syntaxhighlight lang="perl">
my $p1 = Point->new(3, 4);
my $p1 = Point->new(3, 4);
my $p2 = Point->new(0, 0);
my $p2 = Point->new(0, 0);
print $p1->distance($p2); # Prints 5
print $p1->distance($p2); # Prints 5
</syntaxhighlight>
</source>


Many modern Perl applications use the [[Moose (Perl)|Moose]] object system.{{Citation needed|date=June 2010}} Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API.
Many modern Perl applications use the [[Moose (Perl)|Moose]] object system.{{Citation needed|date=June 2010}} Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API.
Line 493: Line 497:
An example of a class written using the MooseX::Declare<ref>[http://search.cpan.org/perldoc?MooseX::Declare MooseX::Declare documentation]</ref> extension to Moose:
An example of a class written using the MooseX::Declare<ref>[http://search.cpan.org/perldoc?MooseX::Declare MooseX::Declare documentation]</ref> extension to Moose:


<source lang="perl">
<syntaxhighlight lang="perl">
use MooseX::Declare;
use MooseX::Declare;


Line 508: Line 512:
}
}
}
}
</syntaxhighlight>
</source>


This is a class named <code>Point3D</code> that extends another class named <code>Point</code> explained in [[Moose (Perl)#Examples|Moose examples]]. It adds to its base class a new attribute <code>z</code>, redefines the method <code>set_to</code> and extends the method <code>clear</code>.
This is a class named <code>Point3D</code> that extends another class named <code>Point</code> explained in [[Moose (Perl)#Examples|Moose examples]]. It adds to its base class a new attribute <code>z</code>, redefines the method <code>set_to</code> and extends the method <code>clear</code>.


==References==
==References==
{{reflist}}
<div class='references-small'>
<references/>
</div>


==External links==
==External links==
Line 522: Line 524:
* [http://perlmonks.org/ PerlMonks] A community committed to sharing Perl knowledge and coding tips.
* [http://perlmonks.org/ PerlMonks] A community committed to sharing Perl knowledge and coding tips.


[[Category:Perl]]
[[Category:Articles with example Perl code]]
[[Category:Articles with example Perl code]]
[[Category:Perl]]

Latest revision as of 06:40, 4 June 2024

The structure of the Perl programming language encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "there's more than one way to do it". As a multi-paradigm, dynamically typed language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots[when?],[1] and is responsible for the size of the CPAN archive, a community-maintained repository of more than 100,000 modules.[2]

Basic syntax

[edit]

In Perl, the minimal Hello World program may be written as follows:

print "Hello, World!\n"

This prints the string Hello, World! and a newline, symbolically expressed by an n character whose interpretation is altered by the preceding escape character (a backslash). Since version 5.10, the new 'say' builtin[3] produces the same effect even more simply:

say "Hello, World!"

An entire Perl program may also be specified as a command-line parameter to Perl, so the same program can also be executed from the command line (example shown for Unix):

$ perl -e 'print "Hello, World!\n"'

The canonical form of the program is slightly more verbose:

#!/usr/bin/perl
print "Hello, World!\n";

The hash mark character introduces a comment in Perl, which runs up to the end of the line of code and is ignored by the compiler (except on Windows). The comment used here is of a special kind: it’s called the shebang line. This tells Unix-like operating systems to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning perl. (Note that, on Microsoft Windows systems, Perl programs are typically invoked by associating the .pl extension with the Perl interpreter. In order to deal with such circumstances, perl detects the shebang line and parses it for switches.[4])

The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it, because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to, or moved away from, the end of a block or file without having to adjust semicolons.

Version 5.10 of Perl introduces a say function that implicitly appends a newline character to its output, making the minimal "Hello World" program even shorter:

use 5.010; # must be present to import the new 5.10 functions, notice that it is 5.010 not 5.10
say 'Hello, World!'

Data types

[edit]

Perl has a number of fundamental data types. The most commonly used and discussed are scalars, arrays, hashes, filehandles, and subroutines:

Typ Sigil Example Description
Scalar $ $foo A single value; it may be a number, a string, a filehandle, or a reference.
Array @ @foo An ordered collection of scalars.
Hash % %foo A map from strings to scalars; the strings are called keys, and the scalars are called values. Also known as an associative array.
File handle none $foo oder FOO An opaque representation of an open file or other target for reading, writing, or both.
Subroutine & &foo A piece of code that may be passed arguments, be executed, and return data.
Typeglob * *foo The symbol table entry for all types with the name 'foo'.

Scalar values

[edit]

String values (literals) must be enclosed by quotes. Enclosing a string in double quotes allows the values of variables whose names appear in the string to automatically replace the variable name (or be interpolated) in the string. Enclosing a string in single quotes prevents variable interpolation.

For example, if $name is "Jim":

  • then print("My name is $name") will print "My name is Jim" (interpolation within double quotes),
  • but print('My name is $name') will print "My name is $name" (no interpolation within single quotes).

To include a double quotation mark in a string, precede it with a backslash or enclose the string in single quotes. To include a single quotation mark, precede it with a backslash or enclose the string in double quotes.

Strings can also be quoted with the q and qq quote-like operators:

  • 'this' and q(this) are identical,
  • "$this" and qq($this) are identical.

Finally, multiline strings can be defined using here documents:

$multilined_string = <<EOF;
This is my multilined string
note that I am terminating it with the word "EOF".
EOF

Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings $n and $m are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl, + is always the numeric addition operator. The string concatenation operator is the period.

$n = '3 apples';
$m = '2 oranges';
print $n + $m;

Functions are provided for the rounding of fractional values to integer values: int chops off the fractional part, rounding towards zero; POSIX::ceil and POSIX::floor round always up and always down, respectively. The number-to-string conversion of printf "%f" oder sprintf "%f" round out even, use bankers' rounding.

Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:

$false = 0; # the number zero
$false = 0.0; # the number zero as a float
$false = 0b0; # the number zero in binary
$false = 0x0; # the number zero in hexadecimal
$false = '0'; # the string zero
$false = ""; # the empty string
$false = (); # the empty list
$false = undef; # the return value from undef
$false = 2-3+1 # computes to 0 that is converted to "0" so it is false

All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero. The empty hash {} is also true; in this context {} is not an empty block, because perl -e 'print ref {}' returns HASH.

Evaluated boolean expressions are also scalar values. The documentation does not promise which particular value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The defined() function determines whether a variable has any value set. In the above examples, defined($false) is true for every value except undef.

If either 1 or 0 are specifically needed, an explicit conversion can be done using the conditional operator:

my $real_result = $boolean_result ? 1 : 0;

Array values

[edit]

An array value (or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence).

@scores = (32, 45, 16, 5);

The qw() quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent:

@names = ('Billy', 'Joe', 'Jim-Bob');
@names = qw(Billy Joe Jim-Bob);

The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression.

@scores = split(',', '32,45,16,5');

Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar sigil must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, $month[3] is "April" (the first element in an array has an index value of 0), and @month[4..6] is ("May", "June", "July").

Hash values

[edit]

Perl programmers may initialize a hash (or associative array) from a list of key/value pairs. If the keys are separated from the values with the => operator (sometimes called a fat comma), rather than a comma, they may be unquoted (barewords[5]). The following lines are equivalent:

%favorite = ('joe', "red", 'sam', "blue");
%favorite = (joe => 'red', sam => 'blue');

Individual values in a hash are accessed by providing the corresponding key, in curly braces. The $ sigil identifies the accessed element as a scalar. For example, $favorite{joe} equals 'red'. A hash can also be initialized by setting its values individually:

$favorite{joe}   = 'red';
$favorite{sam}   = 'blue';
$favorite{oscar} = 'green';

Multiple elements may be accessed using the @ sigil instead (identifying the result as a list). For example, @favorite{'joe', 'sam'} equals ('red', 'blue').

Filehandles

[edit]

Filehandles provide read and write access to resources. These are most often files on disk, but can also be a device, a pipe, or even a scalar value.

Originally, filehandles could only be created with package variables, using the ALL_CAPS convention to distinguish it from other variables. Perl 5.6 and newer also accept a scalar variable, which will be set (autovivified) to a reference to an anonymous filehandle, in place of a named filehandle.

Typeglob values

[edit]

A typeglob value is a symbol table entry. The main use of typeglobs is creating symbol table aliases. For example:

*PI = \3.141592653; # creating constant scalar $PI
*this = *that; # creating aliases for all data types 'this' to all data types 'that'

Array functions

[edit]

The number of elements in an array can be determined either by evaluating the array in scalar context or with the help of the $# sigil. The latter gives the index of the last element in the array, not the number of elements. The expressions scalar(@array) and ($#array + 1) are equivalent.

Hash functions

[edit]

There are a few functions that operate on entire hashes. The keys function takes a hash and returns the list of its keys. Similarly, the values function returns a hash's values. Note that the keys and values are returned in a consistent but arbitrary order.

# Every call to each returns the next key/value pair.
# All values will be eventually returned, but their order
# cannot be predicted.
while (($name, $address) = each %addressbook) {
    print "$name lives at $address\n";
}

# Similar to the above, but sorted alphabetically
foreach my $next_name (sort keys %addressbook) {
    print "$next_name lives at $addressbook{$next_name}\n";
}

Control structures

[edit]

Perl has several kinds of control structures.

It has block-oriented control structures, similar to those in the C, JavaScript, and Java programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:

label while ( cond ) { … }
label while ( cond ) { … } continue { … }
label for ( init-expr ; cond-expr ; incr-expr ) { … }
label foreach var ( list ) { … }
label foreach var ( list ) { … } continue { … }
if ( cond ) { … }
if ( cond ) { … } else { … }
if ( cond ) { … } elsif ( cond ) { … } else { … }

Where only a single statement is being controlled, statement modifiers provide a more-concise syntax:

statement if cond ;
statement unless cond ;
statement while cond ;
statement until cond ;
statement foreach list ;

Short-circuit logical operators are commonly used to affect control flow at the expression level:

expr and expr
expr && expr
expr or expr
expr || expr

(The "and" and "or" operators are similar to && and || but have lower precedence, which makes it easier to use them to control entire statements.)

The flow control keywords next (corresponding to C's continue), last (corresponding to C's break), return, and redo are expressions, so they can be used with short-circuit operators.

Perl also has two implicit looping constructs, each of which has two forms:

results = grep { … } list
results = grep expr, list
results = map { … } list
results = map expr, list

grep returns all elements of list for which the controlled block or expression evaluates to true. map evaluates the controlled block or expression for each element of list and returns a list of the resulting values. These constructs enable a simple functional programming style.

Up until the 5.10.0 release, there was no switch statement in Perl 5. From 5.10.0 onward, a multi-way branch statement called given/when is available, which takes the following form:

use v5.10; # must be present to import the new 5.10 functions
given ( expr ) { when ( cond ) { … } default { … } }

Syntactically, this structure behaves similarly to switch statements found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit continues are instead necessary to emulate switch behavior.

For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on that of sister language Raku. It is implemented using a source filter, so its use is unofficially discouraged.[6]

Perl includes a goto label statement, but it is rarely used. Situations where a goto is called for in other languages don't occur as often in Perl, because of its breadth of flow control options.

There is also a goto &sub statement that performs a tail call. It terminates the current subroutine and immediately calls the specified sub. This is used in situations where a caller can perform more-efficient stack management than Perl itself (typically because no change to the current stack is required), and in deep recursion, tail calling can have substantial positive impact on performance, because it avoids the overhead of scope/stack management on return.

Subroutines

[edit]

Subroutines are defined with the sub keyword and are invoked simply by naming them. If the subroutine in question has not yet been declared, invocation requires either parentheses after the function name or an ampersand (&) before it. But using & without parentheses will also implicitly pass the arguments of the current subroutine to the one called, and using & with parentheses will bypass prototypes.

# Calling a subroutine

# Parentheses are required here if the subroutine is defined later in the code
foo();
&foo; # (this also works, but has other consequences regarding arguments passed to the subroutine)

# Defining a subroutine
sub foo {  }

foo; # Here parentheses are not required

A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.

foo $x, @y, %z;

The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine.

Arrays are expanded to their elements; hashes are expanded to a list of key/value pairs; and the whole lot is passed into the subroutine as one flat list of scalars.

Whatever arguments are passed are available to the subroutine in the special array @_. The elements of @_ are references to the actual arguments; changing an element of @_ changes the corresponding argument.

Elements of @_ may be accessed by subscripting it in the usual way.

$_[0], $_[1]

However, the resulting code can be difficult to read, and the parameters have pass-by-reference semantics, which may be undesirable.

One common idiom is to assign @_ to a list of named variables.

my ($x, $y, $z) = @_;

This provides mnemonic parameter names and implements pass-by-value semantics. The my keyword indicates that the following variables are lexically scoped to the containing block.

Another idiom is to shift parameters off of @_. This is especially common when the subroutine takes only one argument or for handling the $self argument in object-oriented modules.

my $x = shift;

Subroutines may assign @_ to a hash to simulate named arguments; this is recommended in Perl Best Practices for subroutines that are likely to ever have more than three parameters.[7]

sub function1 {
    my %args = @_;
    print "'x' argument was '$args{x}'\n";
}
function1( x => 23 );

Subroutines may return values.

return 42, $x, @y, %z;

If the subroutine does not exit via a return statement, it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments.

The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.

sub list { (4, 5, 6) }
sub array { @x = (4, 5, 6); @x }

$x = list; # returns 6 - last element of list
$x = array; # returns 3 - number of elements in list
@x = list; # returns (4, 5, 6)
@x = array; # returns (4, 5, 6)

A subroutine can discover its calling context with the wantarray function.

sub either {
    return wantarray ? (1, 2) : 'Oranges';
}

$x = either; # returns "Oranges"
@x = either; # returns (1, 2)

Anonymous functions

[edit]

Perl 5 supports anonymous functions,[8] as follows:

(sub { print "I got called\n" })->();         # 1. fully anonymous, called as created

my $squarer = sub { my $x = shift; $x * $x }; # 2. assigned to a variable

sub curry {
    my ($sub, @args) = @_;
    return sub { $sub->(@args, @_) };         # 3. as a return value of another function
}

# example of currying in Perl programming
sub sum { my $tot = 0; $tot += $_ for @_; $tot } # returns the sum of its arguments
my $curried = curry \&sum, 5, 7, 9;
print $curried->(1,2,3), "\n";    # prints 27 ( = 5 + 7 + 9 + 1 + 2 + 3 )

Other constructs take bare blocks as arguments, which serve a function similar to lambda functions of one parameter, but do not have the same parameter-passing convention as functions -- @_ is not set.

my @squares = map { $_ * $_ } 1..10;   # map and grep don't use the 'sub' keyword
my @square2 = map $_ * $_, 1..10;      # braces unneeded for one expression

my @bad_example = map { print for @_ } 1..10; # values not passed like normal Perl function

Regular expressions

[edit]

The Perl language includes a specialized syntax for writing regular expressions (RE, or regexes), and the interpreter contains an engine for matching strings to regular expressions. The regular-expression engine uses a backtracking algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular-expression engine is derived from regex written by Henry Spencer.

The Perl regular-expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl and has since grown to include far more features. Many other languages and applications are now adopting Perl Compatible Regular Expressions over POSIX regular expressions, such as PHP, Ruby, Java, Microsoft's .NET Framework,[9] and the Apache HTTP server.

Regular-expression syntax is extremely compact, owing to history. The first regular-expression dialects were only slightly more expressive than globs, and the syntax was designed so that an expression would resemble the text that it matches.[citation needed] This meant using no more than a single punctuation character or a pair of delimiting characters to express the few supported assertions. Over time, the expressiveness of regular expressions grew tremendously, but the syntax design was never revised and continues to rely on punctuation. As a result, regular expressions can be cryptic and extremely dense.

Uses

[edit]

The m// (match) operator introduces a regular-expression match. (If it is delimited by slashes, as in all of the examples here, the leading m may be omitted for brevity. If the m is present, as in all of the following examples, other delimiters can be used in place of slashes.) In the simplest case, an expression such as

$x =~ /abc/;

evaluates to true if and only if the string $x matches the regular expression abc.

The s/// (substitute) operator, on the other hand, specifies a search-and-replace operation:

$x =~ s/abc/aBc/; # upcase the b

Another use of regular expressions is to specify delimiters for the split function:

@words = split /,/, $line;

The split function creates a list of the parts of the string that are separated by what matches the regular expression. In this example, a line is divided into a list of its own comma-separated parts, and this list is then assigned to the @words array.

Syntax

[edit]

Modifiers

[edit]

Perl regular expressions can take modifiers. These are single-letter suffixes that modify the meaning of the expression:

$x =~ /abc/i; # case-insensitive pattern match
$x =~ s/abc/aBc/g; # global search and replace

Because the compact syntax of regular expressions can make them dense and cryptic, the /x modifier was added in Perl to help programmers write more-legible regular expressions. It allows programmers to place whitespace and comments inside regular expressions:

$x =~ /
 a   # match 'a'
 .   # followed by any character
 c   # then followed by the 'c' character
 /x;

Capturing

[edit]

Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are captured. Captured strings are assigned to the sequential built-in variables $1, $2, $3, …, and a list of captured strings is returned as the value of the match.

$x =~ /a(.)c/; # capture the character between 'a' and 'c'

Captured strings $1, $2, $3, … can be used later in the code.

Perl regular expressions also allow built-in or user-defined functions to apply to the captured match, by using the /e modifier:

$x = "Oranges";
$x =~ s/(ge)/uc($1)/e; # OranGEs
$x .= $1; # append $x with the contents of the match in the previous statement: OranGEsge

Objects

[edit]

There are many ways to write object-oriented code in Perl. The most basic is using "blessed" references. This works by identifying a reference of any type as belonging to a given package, and the package provides the methods for the blessed reference. For example, a two-dimensional point could be defined this way:

sub Point::new {
    # Here, Point->new(4, 5) will result in $class being 'Point'.
    # It's a variable to support subclassing (see the perloop manpage).
    my ($class, $x, $y) = @_;
    bless [$x, $y], $class;  # Implicit return
}

sub Point::distance {
    my ($self, $from) = @_;
    my ($dx, $dy) = ($$self[0] - $$from[0], $$self[1] - $$from[1]);
    sqrt($dx * $dx + $dy * $dy);
}

This class can be used by invoking new() to construct instances, and invoking distance on those instances.

my $p1 = Point->new(3, 4);
my $p2 = Point->new(0, 0);
print $p1->distance($p2); # Prints 5

Many modern Perl applications use the Moose object system.[citation needed] Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API.

Moose classes:

  • A class has zero or more attributes.
  • A class has zero or more methods.
  • A class has zero or more superclasses (aka parent classes). A class inherits from its superclass(es).
  • A class does zero or more roles, which add the ability to add pre-defined functionality to classes without subclassing.
  • A class has a constructor and a destructor.
  • A class has a metaclass.
  • A class has zero or more method modifiers. These modifiers can apply to its own methods, methods that are inherited from its ancestors, or methods that are provided by roles.

Moose roles:

  • A role is something that a class does, somewhat like mixins or interfaces in other object-oriented programming languages. Unlike mixins and interfaces, roles can be applied to individual object instances.
  • A role has zero or more attributes.
  • A role has zero or more methods.
  • A role has zero or more method modifiers.
  • A role has zero or more required methods.

Examples

[edit]

An example of a class written using the MooseX::Declare[10] extension to Moose:

use MooseX::Declare;

class Point3D extends Point {
    has 'z' => (isa => 'Num', is => 'rw');

    after clear {
        $self->z(0);
    }
    method set_to (Num $x, Num $y, Num $z) {
        $self->x($x);
        $self->y($y);
        $self->z($z);
    }
}

This is a class named Point3D that extends another class named Point explained in Moose examples. It adds to its base class a new attribute z, redefines the method set_to and extends the method clear.

References

[edit]
  1. ^ Orwant, Jon (2003). Games, diversions, and Perl culture: best of the Perl journal. ISBN 978-0-596-00312-8.
  2. ^ "CPAN front page". Retrieved 2011-12-09.
  3. ^ "Features". Perldoc. Perl.org. Retrieved 24 July 2017.
  4. ^ "perlrun". perldoc.perl.org - Official documentation for the Perl programming language. Retrieved 2011-01-08.
  5. ^ Wainwright, Peter (2005). Pro Perl. Pro to Expert Series. Apress. p. 64. ISBN 978-1-59059-438-4. Retrieved 2010-08-03. […] a string without quotes, known as a bareword string […]
  6. ^ using switch
  7. ^ Damian Conway, Perl Best Practices Archived 2011-09-18 at the Wayback Machine, p.182
  8. ^ "perlsub - Perl subroutines - Perldoc Browser". perldoc.perl.org. Retrieved 2020-11-24.
  9. ^ Microsoft Corp., ".NET Framework Regular Expressions", .NET Framework Developer's Guide, [1]
  10. ^ MooseX::Declare documentation
[edit]