Jump to content

Perl virtual machine: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
D3xter (talk | contribs)
Data stuctures: padlist. Variables.
m Disambiguating links to Hello World (disambiguation) (link changed to "Hello, World!" program) using DisamAssist.
 
(26 intermediate revisions by 22 users not shown)
Line 1: Line 1:
The '''Perl virtual machine''' is a [[Stack machine|stack-based]] [[Application virtual machine|process virtual machine]] implemented as an [[opcode]]s [[Interpreter (computing)|interpreter]] which runs previously compiled programs written in the [[Perl]] language. The opcodes interpreter is a part of the Perl interpreter, which also contains a [[compiler]] ([[Lexical analysis|lexer]], [[Parsing|parser]] and [[Compiler optimization|optimizer]]) in one executable file, commonly /usr/bin/perl on various [[Unix-like]] systems or perl.exe on [[Microsoft Windows]] systems.
{{Expert-subject|Free Software}}
The '''Perl virtual machine''' is a [[Stack machine|stack-based]] [[Application virtual machine|process virtual machine]] implemented as [[opcode|opcodes]] [[Interpreter (computing)|interpreter]] which runs a previously compiled [[Perl]] programs. Opcodes interpreter is is a part of Perl interpreter which contains also [[Compiler|compiler]] ([[Lexical analysis|lexer]], [[Parsing|parser]] and [[Compiler optimization|optimizer]]) in one executable file, commonly /usr/bin/perl on various [[Unix-like]] systems or perl.exe on [[Microsoft Windows]] systems.


==Implementation==
==Implementation==


===Opcodes===
===Opcodes===
The Perl compiler outputs a compiled program into memory as an internal structure which can be represented as tree graph which each node represents an opcode. Opcodes are represented internaly by [[typedef|typedefs]]. Each opcode has ''next'' node and ''sibling'' node, so the opcode tree can be drawed as basic OP tree starting from root node or as flat OP list in the order they would normally execute from start node. Opcodes tree can be mapped to the actual source code, so it is possible to [[Decompiler|decompile]] to high-level source code.<ref>{{cite web | url=http://perldoc.perl.org/B/Deparse.html | title=B::Deparse - Perl compiler backend to produce perl code}}</ref>
The Perl compiler outputs a compiled program into memory as an internal structure which can be represented as a tree graph in which each node represents an opcode. Opcodes are represented internally by [[typedef]]s. Each opcode has ''next'' / ''other'' and ''first'' / ''sibling'' pointers, so the opcode tree can be drawn as a basic OP tree starting from root node or as flat OP list in the order they would normally execute from start node. Opcodes tree can be mapped to the source code, so it is possible to [[Decompiler|decompile]] to high-level source code.<ref>{{cite web | url=http://perldoc.perl.org/B/Deparse.html | title=B::Deparse - Perl compiler backend to produce perl code}}</ref>


Perl's opcodes interpreter is implemented as tree walker which travels by opcode tree in execute order from start node. Each opcode is assigned with internal pp_''opname'' function, i.e. say opcode calls pp_say function of internal Perl API.
Perl's opcodes interpreter is implemented as a tree walker which travels the opcode tree in execution order from the start node, following the ''next'' or ''other'' pointers. Each opcode has a function pointer to a pp_''opname'' function, i.e. the ''say'' opcode calls the ''pp_say'' function of internal Perl API.


The phase of compiling the Perl program is hidden for end user, but it can be exposed with B Perl module<ref name="B">{{cite web | url=http://search.cpan.org/perldoc?B | title=B - The Perl Compiler Backend}}</ref> or other specialized modules which provides an access to internal API of compilator and opcode walker like B::Concise Perl module<ref>{{cite web | url=http://perldoc.perl.org/B/Concise.html | title=B::Concise - Walk Perl syntax tree, printing concise info about ops}}</ref>.
The phase of compiling a Perl program is hidden from the end user, but it can be exposed with the B Perl module<ref name="B">{{cite web | url=https://metacpan.org/module/B | title=B - The Perl Compiler Backend}}</ref> or other specialized modules, as the B::Concise Perl module.<ref>{{cite web | url=http://perldoc.perl.org/B/Concise.html | title=B::Concise - Walk Perl syntax tree, printing concise info about ops}}</ref>


An example of compiled simple [[Hello world]] program with a help of B::Concise Perl module, dumped in execute order:
An example of a simple compiled [["Hello, World!" program|Hello world]] program dumped in execute order (with the B::Concise Perl module):


<source lang="bash">
<syntaxhighlight lang="console">
$ perl -MO=Concise,-exec -E 'say "Hello, world!"'
$ perl -MO=Concise,-exec -E 'say "Hello, world!"'
1 <0> enter
1 <0> enter
2 <;> nextstate(main 46 -e:1) v:%,{
2 <;> nextstate(main 46 -e:1) v:%,{
3 <0> pushmark s
3 <0> pushmark s
Line 21: Line 20:
5 <@> say vK
5 <@> say vK
6 <@> leave[1 ref] vKP/REFC
6 <@> leave[1 ref] vKP/REFC
</syntaxhighlight>
</source>


Some opcodes (entereval, dofile, require) call Perl compiler functions which generate another opcodes in the same Perl virtual machine.
Some opcodes (entereval, dofile, require) call Perl compiler functions which in turn generate other opcodes in the same Perl virtual machine.


===Variables===
===Variables===
Perl variables can be global, dynamic (''local'' keyword), or lexical (''my'' and ''our'' keywords).
Perl variables can be global, dynamic (''local'' keyword), or lexical (''my'' and ''our'' keywords).


Global variables are accessible via the stash and the corresponding typeglob.
Global variables are accessible via the stash and the corresponding [[typeglob]].


Local variables are the same as global variables but a special opcode is generated to save its value on ''savestack'' and restore it later.
Local variables are the same as global variables but a special opcode is generated to save its value on the ''savestack'' and restore it later.


Lexical variables are stored on ''padlist''.
Lexical variables are stored in the ''padlist''.


===Data structures===
===Data structures===
Perl VM data structures are represented internally by [[typedef|typedefs]].
Perl VM data structures are represented internally by [[typedef]]s.


The internal data structures can be examined with B Perl module<ref name="B" /> or other specialized tools like Devel::Peek Perl module<ref>{{cite web | url=http://search.cpan.org/perldoc?Devel::Peek | title=Devel::Peek - A data debugging tool for the XS programmer}}</ref>.
The internal data structures can be examined with the B Perl module<ref name="B" /> or other specialized tools like the Devel::Peek Perl module.<ref>{{cite web | url=https://metacpan.org/module/Devel::Peek | title=Devel::Peek - A data debugging tool for the XS programmer}}</ref>


====data types====
====data types====
Perl has three typedefs that handle Perl's three main data types: Scalar Value (''SV''), Array Value (''AV''), Hash Value (''HV''). Perl uses a special typedef for simple signed integer type (''IV''), an unsigned integer (''IV''), a floating point number (''NV'') and string (''PV'').
Perl has three typedefs that handle Perl's three main data types: Scalar Value (''SV''), Array Value (''AV''), Hash Value (''HV''). Perl uses a special typedef for the simple signed integer type (''IV''), unsigned integers (''UV''), floating point numbers (''NV'') and strings (''PV'').


Perl uses a reference count-driven garbage collection mechanism. SVs, AVs, or HVs start their life with a reference count of 1. If the reference count of a data value ever drops to 0, then it will be destroyed and its memory made available for reuse.
Perl uses a [[Reference counting|reference count]]-driven garbage collection mechanism. SVs, AVs, or HVs start their life with a reference count of 1. If the reference count of a data value drops to 0, then it will be destroyed and its memory is made available for reuse.


Other typedefs are Glob Value (''GV'') which contains named references to the various objects, Code Value (''CV'') which contains a reference to Perl subroutine, I/O Handler (''IO''), a reference to [[Regular expression|regular expression]] (''REGEXP''; ''RV'' in Perl before 5.11), reference to compiled format for output record (''FM'') and simple reference which is a special type of scalar that point to other data types (''RV'').
Other typedefs are Glob Value (''GV'') which contain named references to various objects, Code Value (''CV'') which contain a reference to a Perl subroutine, I/O Handler (''IO''), a reference to [[regular expression]] (''REGEXP''; ''RV'' in Perl before 5.11), reference to compiled format for output record (''FM'') and simple reference which is a special type of scalar that point to other data types (''RV'').


====stash====
====stash====
Line 53: Line 52:


===Stacks===
===Stacks===
Perl has a number of stacks to store things it's currently working on.
Perl has a number of stacks to store things it is working on.


====Argument stack====
====Argument stack====
Line 62: Line 61:


====Save stack====
====Save stack====
This stack is used for saving and restoring a value of dynamically [[Scope (programming)|scoped]] local variable.
This stack is used for saving and restoring values of dynamically [[Scope (programming)|scoped]] local variables.


====Scope stack====
====Scope stack====
This stack stores information about actual scope and it is used only for debugging purposes.
This stack stores information about the actual scope and it is used only for debugging purposes.


===Other implementations===
===Other implementations===
There is no standarization for Perl language and Perl virtual machine. The internal API should be considered as non-stable and changes from version to version. The Perl virtual machine is tied closely to compiler. These things make very hard to reimplement Perl virtual machine.
There is no standardization for the Perl language and Perl virtual machine. The internal API is considered non-stable and changes from version to version. The Perl virtual machine is tied closely to the compiler.


The most known and most stable implementation is a B::C Perl module<ref>{{cite web | url=http://search.cpan.org/perldoc?B::C | title=B::C - Perl compiler's C backend}}</ref> which translates opcodes tree to representation in C language and adds own tree walker.
The most known and most stable implementation is the B::C Perl module<ref>{{cite web | url=https://metacpan.org/module/B::C | title=B::C - Perl compiler's C backend}}</ref> which translates opcodes tree to a representation in the C programming language and adds its own tree walker.


Another implementation is an Acme::Perl::VM Perl module<ref>{{cite web | url=http://search.cpan.org/perldoc?Acme::Perl::VM | title=Acme::Perl::VM - A Perl5 Virtual Machine in Pure Perl (APVM)}}</ref> which is an implementation coded in Perl language only but it is still tied with original Perl virtual machine via B:: modules.
Another implementation is an Acme::Perl::VM Perl module<ref>{{cite web | url=https://metacpan.org/module/Acme::Perl::VM | title=Acme::Perl::VM - A Perl5 Virtual Machine in Pure Perl (APVM)}}</ref> which is an implementation coded in Perl language only but it is still tied with original Perl virtual machine via B:: modules.


==See also==
==See also==
Line 83: Line 82:
*[http://perldoc.perl.org/perlhack.html#Running The Perl internals: running stage]
*[http://perldoc.perl.org/perlhack.html#Running The Perl internals: running stage]
*[http://perldoc.perl.org/perlguts.html Introduction to the Perl API]
*[http://perldoc.perl.org/perlguts.html Introduction to the Perl API]
*[http://search.cpan.org/perldoc?perloptree The "B" op tree.]
*[https://metacpan.org/module/RURBAN/B-C-1.42/perloptree.pod The "B" op tree.]


{{Perl}}
{{Perl}}


{{DEFAULTSORT:Perl Virtual Machine}}
[[Category:Perl]]
[[Category:Perl]]
[[Category:Virtual machines]]
[[Category:Stack-based virtual machines]]

Latest revision as of 09:32, 17 February 2024

The Perl virtual machine is a stack-based process virtual machine implemented as an opcodes interpreter which runs previously compiled programs written in the Perl language. The opcodes interpreter is a part of the Perl interpreter, which also contains a compiler (lexer, parser and optimizer) in one executable file, commonly /usr/bin/perl on various Unix-like systems or perl.exe on Microsoft Windows systems.

Implementation

[edit]

Opcodes

[edit]

The Perl compiler outputs a compiled program into memory as an internal structure which can be represented as a tree graph in which each node represents an opcode. Opcodes are represented internally by typedefs. Each opcode has next / other and first / sibling pointers, so the opcode tree can be drawn as a basic OP tree starting from root node or as flat OP list in the order they would normally execute from start node. Opcodes tree can be mapped to the source code, so it is possible to decompile to high-level source code.[1]

Perl's opcodes interpreter is implemented as a tree walker which travels the opcode tree in execution order from the start node, following the next or other pointers. Each opcode has a function pointer to a pp_opname function, i.e. the say opcode calls the pp_say function of internal Perl API.

The phase of compiling a Perl program is hidden from the end user, but it can be exposed with the B Perl module[2] or other specialized modules, as the B::Concise Perl module.[3]

An example of a simple compiled Hello world program dumped in execute order (with the B::Concise Perl module):

$ perl -MO=Concise,-exec -E 'say "Hello, world!"'
1  <0> enter
2  <;> nextstate(main 46 -e:1) v:%,{
3  <0> pushmark s
4  <$> const[PV "Hello, world!"] s
5  <@> say vK
6  <@> leave[1 ref] vKP/REFC

Some opcodes (entereval, dofile, require) call Perl compiler functions which in turn generate other opcodes in the same Perl virtual machine.

Variables

[edit]

Perl variables can be global, dynamic (local keyword), or lexical (my and our keywords).

Global variables are accessible via the stash and the corresponding typeglob.

Local variables are the same as global variables but a special opcode is generated to save its value on the savestack and restore it later.

Lexical variables are stored in the padlist.

Data structures

[edit]

Perl VM data structures are represented internally by typedefs.

The internal data structures can be examined with the B Perl module[2] or other specialized tools like the Devel::Peek Perl module.[4]

data types

[edit]

Perl has three typedefs that handle Perl's three main data types: Scalar Value (SV), Array Value (AV), Hash Value (HV). Perl uses a special typedef for the simple signed integer type (IV), unsigned integers (UV), floating point numbers (NV) and strings (PV).

Perl uses a reference count-driven garbage collection mechanism. SVs, AVs, or HVs start their life with a reference count of 1. If the reference count of a data value drops to 0, then it will be destroyed and its memory is made available for reuse.

Other typedefs are Glob Value (GV) which contain named references to various objects, Code Value (CV) which contain a reference to a Perl subroutine, I/O Handler (IO), a reference to regular expression (REGEXP; RV in Perl before 5.11), reference to compiled format for output record (FM) and simple reference which is a special type of scalar that point to other data types (RV).

stash

[edit]

Special Hash Value is stash, a hash that contains all variables that are defined within a package. Each value in this hash table is a Glob Value (GV).

padlist

[edit]

Special Array Value is padlist which is an array of array. Its 0th element to an AV containing all lexical variable names (with prefix symbols) used within that subroutine. The padlist's first element points to a scratchpad AV, whose elements contain the values corresponding to the lexical variables named in the 0th row. Another elements of padlist are created when the subroutine recurses or new thread is created.

Stacks

[edit]

Perl has a number of stacks to store things it is working on.

Argument stack

[edit]

Arguments are passed to opcode and returned from opcode using the argument stack. The typical way to handle arguments is to pop them off the stack, and then push the result back onto the stack.

Mark stack

[edit]

This stack saves bookmarks to locations in the argument stack usable by each function so the functions doesn't necessarily get the whole argument stack to itself.

Save stack

[edit]

This stack is used for saving and restoring values of dynamically scoped local variables.

Scope stack

[edit]

This stack stores information about the actual scope and it is used only for debugging purposes.

Other implementations

[edit]

There is no standardization for the Perl language and Perl virtual machine. The internal API is considered non-stable and changes from version to version. The Perl virtual machine is tied closely to the compiler.

The most known and most stable implementation is the B::C Perl module[5] which translates opcodes tree to a representation in the C programming language and adds its own tree walker.

Another implementation is an Acme::Perl::VM Perl module[6] which is an implementation coded in Perl language only but it is still tied with original Perl virtual machine via B:: modules.

See also

[edit]

References

[edit]
  1. ^ "B::Deparse - Perl compiler backend to produce perl code".
  2. ^ a b "B - The Perl Compiler Backend".
  3. ^ "B::Concise - Walk Perl syntax tree, printing concise info about ops".
  4. ^ "Devel::Peek - A data debugging tool for the XS programmer".
  5. ^ "B::C - Perl compiler's C backend".
  6. ^ "Acme::Perl::VM - A Perl5 Virtual Machine in Pure Perl (APVM)".
[edit]