Earlier today, some 19 hours ago at the time of writing, Eric Oakford opened an issue on the SCI Companion GitHub repo. Eric is working on a big decompilation project, taking mostly demo versions of SCI games and trying to wrangle them into a recompilable state.
In the issue he’d opened, Eric described how the demo version of Leisure Suit Larry 3 didn’t decompile right, while the actual LSL3 worked fine. Here’s an example of the problem:
One of the first things a
Room instance would normally do in its
init method is to call
(super init:), letting the
Room class itself do its setup before anything specific to that room is done, like setting up actors, scripts, features, and walk polygons. In PMachine byte code that statement looks like this:
39 57 pushi 87 // the "init" selector 76 push0 // init takes no parameters 57 36 04 super Rm 4 // four bytes (two words) worth of stack to send
Now, the SCI PMachine is a stack machine, with two parts that are important to know about: the stack (because duh) and the accumulator. You can load numbers onto the accumulator, push them directly onto the stack, push the accumulator onto the stack, duplicate the top item, etcetera. Every value on that stack is a 16-bit number, as is the accumulator. Pointers? Just 16-bit numbers. Characters returned by
StrAt? 16-bit numbers, even if it’s just ASCII codes. Properties and methods to invoke in a send? Yup.
There’s a separate table, vocabulary 997, that lists the name of every selector — every method and property of a class or instance. And that’s where it says that selector #87 is
send are all three sides of the same weirdly-shaped coin, the decompiler can tell that there should be a
(super ...) command in the output, four bytes of stack space back. Since it’s not actually running anything it has no actual stack, but it can look back to find two push operations that’ll fit the bill just as well. It can tell that the first value should be a selector, so whatever is being pushed is taken to be one, which is correct — 87 is the
init selector. Then the next value is the amount of arguments given to
init, which is zero.
But everything went wrong in the demo. This is how
rm200, the overlook with the binoculars and memorial plaque, starts in the demo:
35 57 ldi 87 36 push 39 00 pushi 0 57 36 04 super Rm 4
The actual values on the stack stay the same — 87 0 — but the way they’re put on there is subtly different, and that tripped SCI Companion up.
(super init:), it decompiled the above as
In fact, all selectors in code blocks were
species. Six hours ago at the time of writing, I figured out the problem.
Now, the SCI Companion code is super hairy and I really couldn’t do it justice by just including snippets here but the gist of it?
Every operation may have a couple operands. One method in the decompiler returns what the first operand for a given operation in the byte code may be. For
pushi, that’d be the immediate value to push. For
push2, that’d be zero, one, and two. For
ldi it’s exactly the same as
pushi, just that the operand is to be put in the accumulator, not the stack. By default, this method just assumes zero.
push has no operands at all? Why would it? It pushes the accumulator’s value, after all. The only difference between
pushi 87 and
ldi 87, push is that in the latter case, the accumulator is also 87. The accumulator doesn’t matter to a send, only the contents of the stack. And
pushi 0 is just
push0 with one extra byte. And that makes these two snippets effectively the same with regards to actual execution in an SCI interpreter.
So what happens when the decompiler sees the LSL3 demo’s scripts, is that it looks back for two pushes, as it should. It finds the first,
push, which should be a selector. But the helper method that returns the value being pushed can’t return any operand — there are no operands here! So it returns the default, zero. Some confusion about it possibly being a variable later, it decides that it must be
species. And then it does this for all the sends in the demo, because they all push their values the same way.
The fix came to me when I saw the case for the
dup operation in that very same method that’s supposed to return the value that’d be pushed. It too takes no operands, yet does return a value that’s only zero if it should be. Turns out it scans back a bit, looking at the previous operation in the bytecode stream, and steals its value by calling the same helper method again, but aimed at the previous operation. The fix then is to make
push also steal its predecessor’s value. I did decide to special-case things for now, though. It’ll only do the stealy thing if it’s an
push pair, like in the LSL3 demo.
But it does work.
Addendum: You might wonder why the incorrect decompilation was
(super species?) with a question mark instead of a colon. The decompiler and interpreter alike can tell which selectors on a given class are supposed to be properties, and which are methods. When invoking a method or setting a property, the standard is to use a colon, like in
(theSong number: 4 play:), which is a property set followed by an arg-less method, and to use a question mark for property gets, like in
(= theX (gEgo x?)). And since
species is a property and there was no argument, it was taken to be a property get.