Chapter 5. The SCI virtual machine

Table of Contents
Introduction
Interpreter initialization and the main execution loop
The SCI Heap
The Sierra PMachine
Kernel functions

Introduction

Script resources

Like any processor, the SCI virtual machine is virtually useless without code to execute. This code is provided by script resources, which constitute the logic behind any SCI game.

In order to operate on the script resource, those first have to be loaded to the heap. The heap is the only memory space that the VM can work on directly (with some restrictions); all other memory spaces have to be used implicitly or explicitly by using kernel calls. The heap also contains a stack, which is heavily used by SCI bytecode.

Each script resource may contain one or several of various script objects, listed here:

Type 1: Object
Type 2: Code
Type 3: Synonym word lists
Type 4: Said specs
Type 5: Strings
Type 6: Class
Type 7: Exports
Type 8: Relocation table
Type 9: Preload text (a flag, rather than a real section)
Type 10: Local variables

Standard SCI0 scripts (of post-0.000.396 SCI0, approximately) consist of a four-byte header, followed by a list of bytes:

[00][01]: Block type as LE 16 bit value, or 0 to terminate script resource
[02][03]: Block size as LE 16 bit value; includes header size
[04]...: Data

The code blocks contain the SCI bytecode that actually gets executed. The export block (of which there may be only one (or none at all)) contains script-relative pointers to exported functions, which can be called by the SCI operations calle and callb. The local variables block, which stores one of the four variable types, is used to share variables among the objects and classes of one script.

But the most important script members are Objects and Classes. As in the usual OOP terms, Classes refer to object prototypes, and Objects are instantiated Classes. However, unlike most OOP languages, SCI treats the base class very similar to objects, so that they may actually get called by the SCI bytecode. Therefore, they also have their own space for selectors (see below). Also, each object or class knows which class it inherits from and which class it was instantiated from (in the case of objects).

Note that all script segments are optional and 16 bit aligned; they are described in more detail below:

Object segments

Objects look like this (LE 16 bit values):

[00][01]: Magic number 0x1234
[02][03]: Local variable offset (filled in at run-time)
[04][05]: Offset of the function selector list, relative to its own position
[06][07]: Number of variable selectors (= #vs)
[08][09]: The 'species' selector
[0a][0b]: The 'superClass' selector
[0c][0d]: The '--info--' selector
[0e][0f]: The 'name' selector (object/class name)
[10]...: (#vs-4) more variable selectors
[08+ #vs*2][09+ #vs*2]: Number of function selectors (= #fs)
[0a+ #vs*2]...: Selector IDs for the functions
[08+ #vs*2 + #fs*2][09+ #vs*2 + #fs*2]zero
[0a+ #vs*2 + #fs*2]...: Function selector code pointers

For objects, the selectors are simply values for the selector IDs specified in their species class (which is either present by its offset (in-memory) or class ID (in-script)- the same for the species' superclass (superClass selector)). Info typically has one of the following values (although this does not appear to be relevant for SCI):

0x0000: Normal (statical) object
0x0001: Clone
0x8000: Class

Other values are used, but do not appear to be of relevance
[1].

Class segments

Classes look similar to objects:

[00][01]: Magic number 0x1234
[02][03]: Local variable offset (filled in at run-time)
[04][05]: Offset of the function selector list, relative to its own position
[06][07]: Number of variable selectors (= #vs)
[08][09]: The 'species' selector
[0a][0b]: The 'superClass' selector
[0c][0d]: The '--info--' selector
[0e][0f]: The 'name' selector (object/class name)
[10]...: (#vs-4) more variable selectors
[08+ #vs*2][09+ #vs*2]: Selector ID of the first varselector (0)
[0a+ #vs*2]...: Selector ID of the second etc. varselectors
[08+ #vs*4][09+ #vs*4]: Number of function selectors (#fs)
[0a+ #vs*4]...: Function selector code pointers
[08+ #vs*4 + #fs*2][09+ #vs*4 + #fs*2]: 0
[0a+ #vs*4 + #fs*2]...: Selector ID of the first etc. funcselectors

Simply put, they look like objects with each selector section followed by a list of selector IDs.

Selectors

Selectors are very important in SCI. They can be either methods or object/class-relative variables, and this makes the interpretation of SCI operations like send a bit tricky.

Each class comes with two two-dimensional tables. The first table contains selector values and selector indices[4] for each variable selector. The second table contains selector indices and script-relative method offsets. Objects look nearly identical, but they do not contain the list of selector indices for variable selectors, since those can be looked up at the class they were instantiated from (their "species", which happens to be one of the variable selectors).

Now, whenever a selector is sent for, the engine has to determine the right action to take. FreeSCI first determines whether the selector is a variable selector, by looking for it in the list of variable selector indices of the species class of the object that the "send" was sent to (classes use their own class number as their species class) [5]. If it is, the selector value is either read (if no parameter was provided to the send call) or set (if one parameter was provided). If the selector was not part of the variable selectors of the specified object, the object's methods are checked for this selector index. If they don't contain the selector index, either, then FreeSCI recurses into checking the method selectors of the object's superclasses. If it finds the selector value there, it calls the heap address corresponding to the selector index.

Function invocation

SCI provides three distinct ways for invocating a function[6]:

Calling exported functions (calle, callb)
Calling selector methods (send, self, super)
Calling PC-relative addresses (call)

Exported functions are called by providing a script number and an exported function number (which is then looked up in the script's Type 7 block). They use the object they were called from to look up local variables and selectors for self and super.

Selector methods are called by providing an object and a selector index. The selector index gets looked up in the object's selector tables, and, if it is used for a method, this method gets invocated. The provided object is used for local references.

PC-relative calls only make sense inside scripts, since they jump to a position relative to the call opcode. The calling object is used for local references.

Notes

[1]

See SQ3's inventory objects for an example

[2]

Thanks to Francois Boyer for this information

[3]

This is ignored by FreeSCI ATM, since all resources are present in memory all the time.

[4]

Those can be used as an index into vocab.997, where the selector names are stored as strings.

[5]

In practice, send looks up the heap position of the requested class in a global class table.

[6]

Of course, "manual" invocation (using push and jump operations) could also be used, but there are no special provisions for it, and it does not appear to be used in the existing SCI bytecode.

[7]

Obviously, SCI uses a call-by-value model for primitives and call-by-reference for objects