Logo Pending

Rules for UTF-8 in SCI11+

  1. It won’t work in ScummVM yet, as nothing uses it yet so I see no reason to add it. Most of the rest of SCI11+’s gimmicks do though.
  2. The internal “draw a string” function, used to write literally anything on to the screen, is where the magic happens: if the current port’s current font has more than 256 glyphs in it, the input string is interpreted as UTF-8. If it does not, things work exactly as usual.
  3. Because combining characters and glyph substitution are not supported and general punctuation like “” and are all the way in the 20002044 range, the General Punctuation block’s glyphs take the place of Combining Diacritical Marks as 03000344.
  4. Similarly, CJK Symbols, Hiragana, and Katakana are moved from 300030FF to 020002FF, where some Latin Extended-B, IPA Extensions, and Spacing Modifiers should go.
  5. Those last two points apply to the font data, not the actual text.
  6. The new kernel functions UTF8to16 and UTF16to8 will always consider their inputs to be in Unicode, no matter what the current port’s current font says. Unless you built an SCI11+ with UTF-8 support disabled, in which case none of the above applies and all these two functions do is turn 8-bit values into 16-bit.
  7. The kernel function to turn a string lower or upper case, StrCase, unlike the two I just described, will check the current font and act like it used to same as the “draw a string” function.
  8. The functions to get the lower or upper case version of a character that StrCase ends up using, tolower and toupper, have been extended to cover the full 256-character range. Several maps are included and one can be chosen at build time. We have maps for code page 437, Win-1252, ISO 8859-1, and a fair bit of Unicode.
  9. In general, SCI11+ can be considered to use Unicode 1.1 on account of SCI 1.001.100 dating from 1993, going by the version numbers and release dates for Freddy Pharkas (1.001.095) nd Leisure Suit Larry 6 (1.001.115).
[ , , ] Leave a Comment

Sluiceboxes and SetPorts

Today I got a delightful (and long) email from sluicebox, of the ScummVM SCI team. He wrote about a lot of things but one thing stood out and he’s right, I should write about it.

Remember when I fixed the imitation AGI windows in Space Quest 4? There’s something very strange going there that sluicebox pointed out in the email.

If you’ll remember:

(method (open &tmp port temp1)
  ; temp0 was unused so we're taking it for proper SetPorting.
  (= color gColor)
  (= back gBack)
  ; Set our type to ONLY wCustom, not wCustom|wNoSave, and open.
  (= type 128)
  (super open:)
  ; Nothing will have appeared because wCustom don't draw anything, but a port has been set up!
  ; Switch to drawing on the whole screen but also *save the window's port*.
  (= port (SetPort 0))
  (= temp1 1)
  ; ...
  (Graph grUPDATE_BOX lsTop lsLeft lsBottom lsRight 1)
  ; Reset to the window's port.
  (SetPort port)

But if you look at this GitHub commit from ScummVM you’ll see the interesting description

SSCI doesn’t return zero; it doesn’t return anything. This shouldn’t affect any games since no scripts should depend on a non-existent return value, but this discrepancy came up while investigating a fan script that accidentally relies on this.

So I checked the leaked source code that I made SCI11+ from.

global KERNEL(SetPort)
	if (argCount >= 6)
		picWind->port.portRect.top = arg(1);
		picWind->port.portRect.left = arg(2);
		picWind->port.portRect.bottom = arg(3);
		picWind->port.portRect.right = arg(4);
		picWind->port.origin.v = arg(5);
		picWind->port.origin.h = arg(6);
		if (argCount >= 7)
		if (arg(1))
			if ((arg(1)) == -1)

No return value, which is obvious really because the KERNEL define expands to a void function. Return values are instead handled by setting the acc global variable. So lets dig a little deeper.

RSetPort proc	pPtr:word
	mov	ax, pPtr
	mov	rThePort, ax
RSetPort endp

Nothing. It sets the rThePort global and that’s all. There’s an RGetPort function right above that does the opposite, but nothing in the kernel function calls that.

Looking back at my description of BorderWindows, there’s an important difference:

(= oldPort (GetPort))
(SetPort 0)
(Graph grUPDATE_BOX lsTop lsLeft lsBottom lsRight VISUAL)
(SetPort oldPort)

It’s very interesting indeed how this happened to Just Work. Even so, I should probably go back and correct that SQ4 script.

[ , , ] Leave a Comment

Pattern pen implementation differences

While looking into something unrelated in Space Quest 3, I noticed that the dirt on the right of the starting screen was drawn differently between DOS SCI and ScummVM. Today I looked into it a little closer, comparing SCI proper, ScummVM, SCI Companion, and SCI Viewer.

Damn, that’s some really tiny differences that you’re not gonna spot just like this. But here they are:

  • In most of them, the mound on the right looks like this:
  • Except in ScummVM, where it looks like this: (and now you know why I looked into this)
  • Below the column in the middle looks like this in SCI and SV:
  • But it looks more like this in SCI Companion and ScummVM alike:
  • The heap on the left is also mildly affected, looking like this in SCI and SV:
  • But it looks like this in SCI Companion and ScummVM:
  • And finally, SV, renowned for being Very Good At This, breaks the one rule — you don’t get to draw white on non-white:

So yeah, a slight difference in where a window border is drawn is the least of your problems.

Update: ScummVM had its pattern table corrected this week. Guess I’ll have to check out the latest nightly, huh? And yes, it does match SCI proper now. Good job everyone!

Bonus update: sluicebox suggested comparing against SCI Studio. Here you go, friend: the mound on the right looks like this in SCI Studio, a distinctive variation, and the bit under the pillar and to the left matched SCI Companion and ScummVM (past tense now),

[ , , ] 3 Comments on Pattern pen implementation differences

String literals

In various programming languages, different quotation marks and such can mean different things. In C/C++ for example, "this" is considered a plain string literal. It can contain various escape codes (\x69, \n, et al), and escaped double quotes (\"), but not raw newlines. A single-quoted literal is not a string at all, but a character literal. In C# meanwhile we have the plain double-quoted string and single-quoted character but also @"this", a variation that does allow raw newlines at the cost of not allowing escapes. In PHP meanwhile, we have double-quoted strings that, in contrast to C, can have raw newlines but also have variable interpolation — "Hi $name" will appear as Hi Mark, assuming that is that variable’s value at the time. Single-quoted strings in PHP don’t do escape sequences or interpolation, and then there’s “heredoc” strings.

SCI also has different types of string literals. Or rather, had, depending on which version you targeted. Double-quoted strings could have escape codes (\42, note the lack of a letter, and of course \n) and raw newlines, but whitespace was folded away on compilation so raw newlines and tabs were entirely for code readability’s sake, requiring a \n at the end of each line that had to have a line break. You could also use curly braces for strings, {like this}, that had exactly the same rules and limitations as double-quotes, except for one difference in storage.

Any string literal in curly braces would be stored as-is in the script resource, while double-quote strings would be stored in a matching text resource and replaced in the compiled code with a look-up key:

(Print "This is an example" #title {Kawa says})

(Yes, I’m aware that the syntax highlighter doesn’t pick up on the braces.)

Assuming this is script #42 just as an example, and this is the first place a double-quoted string appears, the above would be transformed like so:

(Print 42 0 #title {Kawa says})

The original string will be stored in a separate text resource with the same number as the script. This helps cut down memory use.

(Bonus banter: there’s a bug in the original SCI interpreters that was introduced when they added the Message resource format involving hexadecimal numbers where they accidentally used "01234567890ABCDEF", with an extra zero. This messes up any attempt to use a good third of the character set in a message resource, but not an inline string literal, so having \0E in a string literal will produce the intended while the same thing in a message will produce ¤ instead. In SCI11+, this has been corrected to just "0123456789ABCDEF".)

[ ] 2 Comments on String literals

Tracing in SCI2

Where 16-bit versions of SCI had two blank spaces in their PMachine instruction set, SCI2 introduced the _line_ and _file_ instructions, which the compiler could inject into the final bytecode so that the built-in debugger could then work out exactly which source code line matched the current instruction. Here’s how that goes down in practice.

First we make two test scripts, 42.SC and 69.SC:

(script# 42)
	Test 1
(procedure (Test)
	(Display "This is Test, (42 1).")
(script# 69)
	TestA 0
	TestB 1
	Test 42 1
(procedure (TestA)
	(Display "This is TestA, about to call TestB.")
	(Display "Back in TestA, gonna call (42 1).")
	(Display "Back in TestA.")
(procedure (TestB)
	(Display "This is TestB.")

This may look a mite different from SCI Companion code because despite everything, they are not the same. Now, compiling them both in SC version 4.100, from January 12 1995, and then pulling them back through a disassembler and annotating it a bit, we get this output:

; 42.SC
Test:	_line_	11	;(procedure (Test)
	_file_	"42.sc"
	_line_	12	;	(Display "Hello my darling.")
	lofsa	$6
	callk	Display, 2
	_line_	13	;)
; 69.SC
TestA:	_line_	17	;(procedure (TestA)
	_file_	"69.sc"
	_line_	18	;	(Display "This is TestA, about to call TestB.")
	lofsa	$6
	callk	Display, 2
	_line_	19	;	(TestB)
	call	TestB, 0
	_line_	20	;	(Display "Back in TestA, gonna call (42 1).")
	lofsa	$2a
	callk	Display, 2
	_line_	21	;	(Test)
	calle	Test, 0
	_line_	22	;	(Display "Back in TestA.")
	lofsa	$4a
	callk	Display, 2
	_line_	23	;)
TestB:	_line_	25	;(procedure (TestB)
	_file_	"69.sc"
	_line_	26	;	(Display "This is TestB.")
	lofsa	$59
	callk	Display, 2
	_line_	27	;)

Every time the PMachine encounters a _file_ opcode, it grabs a null-terminated string from the bytecode stream and places it into pm.curSourceFile. Likewise, _line_ takes a 16-bit number and places it into pm.curSourceLineNum. The built-in debugger can then notice when these two values change, find the source file, and display the correct line of code.

But there’s one tiny detail that threw me off initially. Can you see it?

When TestA calls Test, the current source file changes to 69.sc, but it doesn’t change back afterwards.

Although the SCI2 source I have here doesn’t seem to call it, there is in fact a pair of functions to push and pop debug state, preserving the value of pm.curSourceFile and pm.curSourceLineNum across module calls. Which is quite obvious when you think about it. The alternative I can see would be to insert another _file_ opcode after each out-of-module call.

[ ] Leave a Comment

Script resources – a dyad in the Force, as it were

ZvikaZ recently ran into an issue trying to hack Quest for Glory 1 VGA where they edited a particular script, and it worked fine, but when they then exported the .scr file and put it in a clean QFG1 folder, it broke in a particular way. One particular phrase stood out to me in particular:

There are ‘ch’ strings instead of the numerical values

I had a feeling what the problem might’ve been when I started reading the post but when I saw that part I knew exactly what happened.

Quest for Glory 1 VGA is an SCI11 game. That means the scripts are split up into .scr and .hep pairs, and ZvikaZ only copied the one file instead of both. One of them contains the actual script bytecode, but the other contains the amount of local variables, their default values, information on all the objects in the script, and all the text string literals in the script. It’s called a heap resource because that’s where it’s loaded.

Originally, the script and heap resources were one and the same. When a given script needed to be loaded, it would be loaded into heap memory and kept there until unloaded. And as explained before, a saved game is basically a compressed dump of the entire heap memory area, while hunk space contains all the other resources that the scripts, in turn, refer to. Now imagine for a second a script resource with a single class in it, with a single particularly big method, so that a mere fraction of the script resource describes the class, and contains any near strings and such, and all the rest of it is bytecode. Once loaded, the bytecode can’t be changed — only the class properties and any local variables can be, but all of that bytecode is still part of the heap. There’s only so much heap space available to a game, so as long as that script is resident, that bytecode will take up precious space.

SCI11 split the script resources up so that the bytecode parts would be kept in hunk space instead, swapped in from disk when actually needed by something from the script definitions in heap space. All that space taken up by PMachine bytecode is suddenly no longer part of the heap and this bad boy can fit so many script resources at once. And if your scripts use far text instead of near — text resources referenced by a module/line tuple that get loaded into hunk space, instead of "quoted strings like this" that are part of the script’s heap resource) anything those scripts try to say automatically also doesn’t take as much space. You trade a two-byte pointer for a four-byte tuple, but those numbers in turn may refer to a string of who knows what length. Savings!

ZvikaZ’s target was the script resource for QFG1‘s character creation screen, whose first class is a Room named chAlloc. That name appears in the heap resource. When ZvikaZ changed the script code and recompiled, the heap resource had its contents changed, including where exactly in the file the room’s definition started. Whatever mixed-up monstrosity resulted when ZvikaZ then tried to run the altered 203.scr against an untouched 203.hep didn’t function and notably printed ch instead of numerical statistics.

I’m honestly a little impressed it didn’t “oops” on the spot.

[ ] 1 Comment on Script resources – a dyad in the Force, as it were

SCI versions and naming

Did Sierra ever call the various versions of SCI the same names we use? We being the fans, the tool creators, and the ScummVM developers?

It’s unlikely.

One thing to keep in mind is that the interpreter was in near-constant development by one team, while other teams made the games. Every so often the game developers would pull in the latest interpreter and system scripts from a network share. Another thing to keep in mind is that the version numbers are a little weird in places, and that the games themselves had their own version numbers on top of that, so for example you could have King’s Quest 4 version 1.000.106 running on SCI 0.000.274, but also KQ4 1.000.111 on the same interpreter, released five days later, and the later update with the changed graphics that was version 1.006.003 running on SCI 0.000.502.

The first generation of SCI, the one we call “SCI0”, had versions starting with “0.000”, such as the KQ4 example above. This covers every single 16-color, parser-based, English-only game, with the lone exception of the Police Quest 2 PC-98 release. That was version “x.yyy.zzz”, no joke. This generation can also be subdivided into two blocks, where versions up to 0.000.343 had green button controls instead of using whatever the window color was set to, covering the ’88 versions of KQ4 and the first version of LSL2, and the rest covered all the other games.

What we call SCI01 had versions starting with “S.old”. At least “x.yyy.zzz” has the placeholder excuse but whatever. SCI01 games were just like SCI0 on the surface, but had support for multiple languages (previously introduced in version x.yyy.zzz), and saw no more releases than ’88 SCI0 — six of ’em. So technically there’s nothing about that version string to inspire “SCI01”, besides perhaps KQ1 using “S.old.010″ 🤔

Next up was SCI1, which came in both EGA and VGA and usually had versions starting with “1.000″. There is one game, Quest for Glory 2, with five different interpreter versions that still had the text parser (and technically one Christmas card) before it was removed in favor of the icon bar. Some SCI1 games again have interpreters with very strange versions — it appears Eco Quest and Space Quest 4, among others, had some Special Needs™, given interpreter version “1.ECO.013” and “1.SQ4.057″. But on the whole you could still tell from the first character in the version that these were SCI1 interpreters.

SCI11 removed the multi-language support in favor of things like scaling sprites and the Message resource type. All SCI11 interpreters in the wild use versions starting with”1.001“, except for the ones used in Laura Bow 2 (“2.000.274”), Quest for Glory 3 (“L.rry.083”), and Freddy Pharkas (“l.cfs.081”), among a straggler or three.

Up to now these were 16-bit real-mode applications. SCI2, with versions starting “2.000” was a 32-bit protected mode application instead, with the ability to use much more memory and run in a SuperVGA video mode. No SCI2 interpreter found in the wild seems to stray from this version pattern, mostly because all SCI2 games use version 2.000.000. SCI21, in turn, runs on interpreter version 2.100.002, although there are technically three different sub-versions of 2.100.002. That’s not confusing at all. And finally, SCI3 was only seen in interpreter version 3.000.000.

I’m thinking after the switch to 32-bits, they must’ve stopped automatically bumping version numbers on build.

So what does Sierra call them, then? Well, sources say that Sierra called the 32-bit interpreters SCI32, and the source code archive that I based SCI11+ on was SCI16.ZIP. But none of the changelogs and such seem to refer to SCI0, SCI1, or whatever.



Happy slightly belated new year 🥂

[ , ] Leave a Comment

Objects, functions, properties, and methods

Whether you’re trying to interpret SCI code in its source form, or compile it into bytecode, there are some inferences to make. Consider the following statements:

(foo1 bar:)
(foo2 bar: 42)
(foo3 69)

You can have object references, kernel calls, and local function calls, and those object references can be local instances or pointers which in turn can be stored in global variables, local variables, variables temporary to the current function or method, or properties of the current method’s object. How would you determine what each foo is?

First, you can see if there is a second item in the expression. If that item ends in :, like in the first two cases, you know that’s a selector so the identifier at the start must be an object reference of some sort. If it’s not, like in the third case, or if there is no second item at all, it must be a function or kernel call since anything else would be an error.

For the first two examples, we now know that foo must be an object. Having looked through the whole script before, we already have a list of all the parameters, temporary variables, local variables, and those from script 0, which are global. If either of those contains an item by that name, we know it’s a pointer to dereference. If it’s a local or imported object’s name, we’d be able to find that as well and can continue on. If it doesn’t appear in any of these six lists, the source code is in error.

For the other two examples, we know it must be a function or kernel call. There are three lists to check this time, being the local functions, imported functions, and kernels. Other than that, things are the same as before.

That leaves the matter of selectors. They can refer to either properties or methods, which are… actually rather trivial to tell apart considering the objects have two dictionaries, one for each type. Objects may have superclass chains reaching all the way to the Base Object and inherit properties and methods from those superclasses, but you might consider folding those superclasses’ dictionaries into the object instance’s so there’s only two to scan through.

Let’s say bar is a property. The first example would then mean “take the foo object and return its bar property’s value. Likewise the second would mean “set it to this expression.” If it’s a method, you’re given a pointer to that method’s code (which may be unique to that instance, having overwritten whatever the superclass chain started with) and you can pass it each non-selector argument in turn, until the next selector. I wrote about that before.

[ ] Leave a Comment

SCI decompilation and Weird Loops

They’re not really that weird on the face of it, but that depends on who’s looking.

The decompiler in SCI Companion is a work of art. You can tell because I didn’t write it. (I only fixed a thing or two.) But there are some things that it can’t figure out, and when that happens the function or method body is replaced with a raw asm block. For example, the copy protection in Laura Bow 2 – The Dagger of Amon Ra has loop in it that SCI Companion can’t hack.

It’s a bit much to take in but the important bits are as follows: this code block (rm18::init) has four discrete segments. The first isn’t shown here and sets up a few simple things. The second (code_0087) is a regular loop, where temp0 counts up from zero to eleven. When it hits twelve, the loop is broken and we go to section three, code_00ad. Section three is a weird loop. If you look at the check at the top we see this:

pushi #size
pushi 0
lofsa tempList
send 4
bnt code_0116

Which basically means that when (tempList size?) returns zero/is false, we skip to section four. At the bottom of the section, right before the label for code_0116, there’s the command that makes section three a loop; jmp code_00ad.

So that means that section three keeps repeating until tempList is out of items. Section two put a bunch of values in it, and section three then takes items out at random and puts them into goodList, effectively randomizing the order. The items, incidentally, are the tiles depicting the various Egyptian gods that the copy protection is all about, clones of egyptProp given increasing cel values. Section three positions them as they’re added to goodList. It’s a good routine, Brent.

The problem that trips up SCI Companion and makes it spit out the stuff in those two pictures is that it doesn’t recognize the second loop for what it is. Counting from one value to another by a given increment? Easy. Iterating over a collection? It can figure those out. But picking items from a bag until it’s empty? That’s not on the menu.

To make this decompile, then, we first need to break the loop by commenting out that last jmp command. A single ; suffices. Compile the script resource, then go back and re-decompile it. A conditional loop, of course, consists of a check and a jump. We removed the jump so now it’s just the check:

(method (init &tmp i theTile theX theY)
  (LoadMany rsVIEW 18) ; load the tiles
  (super init:)
  (gGame handsOn:)
  (gIconBar disable: 0 1 3 4 5 6 7)
  (goodList add:)
  (tempList add:)
  (= theX -32)
  (= theY 46)
  (= i 0)
  ; Instantiate twelve tiles, with increasing cel numbers.
  (while (< i 12)
    (tempList add: ((egyptProp new:) cel: i yourself:))
    (++ i)
  ; This should be "(while (tempList size?)" but we removed the jump, remember?
  (if (tempList size?)
    ; Pick a tile number.
    (= i (Random 0 (- (tempList size?) 1)))
    ; Get the i-th tile.
    (= theTile (tempList at: i))
    ; Set up the tile's position on the grid and add it to goodList.
    (goodList add:
         x: (= theX (+ theX 48))
         y: theY
    ; Once we're halfway through, CRLF to the next row.
    (if (== (goodList size?) 6)
      (= theX -32)
      (= theY 111)
    ; Actually remove the tile from tempList so we won't pick it again.
    (tempList delete: theTile)
  ; Section four
  (gGame handsOff:)
  (self setScript: sInitEm)

The cool part is that once we replace that if with a while and compile the script, the result is effectively the same as the original. Only some of the specific opcode choices are different. For example, the original uses the two-byte pushi 1 throughout (also 0 and 2), but SCI Companion’s script compiler prefers to use the one-byte push1 there. The same values are pushed regardless.

[ , , , ] Leave a Comment

Print, PrintD, and the other Print

So as established, there are three different Prints in SCI.

  • Print the function, in SCI0
  • PrintD the function, supplementing Print, in SCI1
  • Print the class, in SCI11.

They each have their own strengths and weaknesses, of course.

Max text items 1 infinite
Max buttons 6 infinite
Max icons 1 infinite
Max input fields 1 infinite
Animated icons¹ yes no yes
Size to fit yes
Size to max width yes no yes
Auto-dismiss yes no yes
Auto-layout² yes no
Position yes
Font yes
Text tuples³ yes no yes

¹: Animated icons require the ability to pass a reference to a DCIcon object instead of a view/loop/cel tuples.

²: Items added by PrintD flow to the right with a four pixel margin. Pass the #new command argument to reset the flow to the left edge and below the last item, or the #x/#y modifiers to shift the last item’s position. In the Print class, every item added must be manually positioned as everything defaults to the top-left. The Print function has its limits specifically because it automatically lays out the controls.

³: The Print class being from SCI11, it takes noun/verb/case/seq tuples.

That comes down to the following actions:

SCI0 Print: required string or tuple for text (may be empty), mode, font, width, time, title, at, draw, edit, button (up to six times), icon, dispose, window, first

SCI1 PrintD: new, at, title, first, text, button, icon, edit, x, y

SCI11 Print: addButton, addEdit, addIcon, addText, addTextF, addTitle, posn methods, plus mode, font, width, ticks, modeless, and saveCursor properties

…And then I messed everything up by rewriting PrintD as a wrapper around the Print class so it runs on SCI11, adding everything but auto-dismiss and animated icon support. It’s available from my SCI stash, of course. Hell, by this time tomorrow those last two things may well be included.

SCI11 PrintD: all of SCI1’s, plus modNum, cue, font, and width

[ ] Leave a Comment