Logo Pending


Rules for UTF-8 in SCI11+

  1. It won’t work in ScummVM yet, as nothing uses it yet so I see no reason to add it. Most of the rest of SCI11+’s gimmicks do though.
  2. The internal “draw a string” function, used to write literally anything on to the screen, is where the magic happens: if the current port’s current font has more than 256 glyphs in it, the input string is interpreted as UTF-8. If it does not, things work exactly as usual.
  3. Because combining characters and glyph substitution are not supported and general punctuation like “” and are all the way in the 20002044 range, the General Punctuation block’s glyphs take the place of Combining Diacritical Marks as 03000344.
  4. Similarly, CJK Symbols, Hiragana, and Katakana are moved from 300030FF to 020002FF, where some Latin Extended-B, IPA Extensions, and Spacing Modifiers should go.
  5. Those last two points apply to the font data, not the actual text.
  6. The new kernel functions UTF8to16 and UTF16to8 will always consider their inputs to be in Unicode, no matter what the current port’s current font says. Unless you built an SCI11+ with UTF-8 support disabled, in which case none of the above applies and all these two functions do is turn 8-bit values into 16-bit.
  7. The kernel function to turn a string lower or upper case, StrCase, unlike the two I just described, will check the current font and act like it used to same as the “draw a string” function.
  8. The functions to get the lower or upper case version of a character that StrCase ends up using, tolower and toupper, have been extended to cover the full 256-character range. Several maps are included and one can be chosen at build time. We have maps for code page 437, Win-1252, ISO 8859-1, and a fair bit of Unicode.
  9. In general, SCI11+ can be considered to use Unicode 1.1 on account of SCI 1.001.100 dating from 1993, going by the version numbers and release dates for Freddy Pharkas (1.001.095) nd Leisure Suit Larry 6 (1.001.115).
[ , , ] Leave a Comment

Sluiceboxes and SetPorts

Today I got a delightful (and long) email from sluicebox, of the ScummVM SCI team. He wrote about a lot of things but one thing stood out and he’s right, I should write about it.

Remember when I fixed the imitation AGI windows in Space Quest 4? There’s something very strange going there that sluicebox pointed out in the email.

If you’ll remember:

(method (open &tmp port temp1)
  ; temp0 was unused so we're taking it for proper SetPorting.
  (= color gColor)
  (= back gBack)
  ; Set our type to ONLY wCustom, not wCustom|wNoSave, and open.
  (= type 128)
  (super open:)
  ; Nothing will have appeared because wCustom don't draw anything, but a port has been set up!
  ; Switch to drawing on the whole screen but also *save the window's port*.
  (= port (SetPort 0))
 
  (= temp1 1)
  ; ...
  (Graph grUPDATE_BOX lsTop lsLeft lsBottom lsRight 1)
 
  ; Reset to the window's port.
  (SetPort port)
)

But if you look at this GitHub commit from ScummVM you’ll see the interesting description

SSCI doesn’t return zero; it doesn’t return anything. This shouldn’t affect any games since no scripts should depend on a non-existent return value, but this discrepancy came up while investigating a fan script that accidentally relies on this.

So I checked the leaked source code that I made SCI11+ from.

global KERNEL(SetPort)
{
	if (argCount >= 6)
	{
		picWind->port.portRect.top = arg(1);
		picWind->port.portRect.left = arg(2);
		picWind->port.portRect.bottom = arg(3);
		picWind->port.portRect.right = arg(4);
		picWind->port.origin.v = arg(5);
		picWind->port.origin.h = arg(6);
		if (argCount >= 7)
			InitPicture();
	}
	else
	{
		if (arg(1))
		{
			if ((arg(1)) == -1)
				RSetPort(menuPort);
			else
				RSetPort((RGrafPort*)Native(arg(1)));
		}
		else
		{
			RSetPort((RGrafPort*)RGetWmgrPort());
		}
	}
}

No return value, which is obvious really because the KERNEL define expands to a void function. Return values are instead handled by setting the acc global variable. So lets dig a little deeper.

RSetPort proc	pPtr:word
	mov	ax, pPtr
	mov	rThePort, ax
	ret
RSetPort endp

Nothing. It sets the rThePort global and that’s all. There’s an RGetPort function right above that does the opposite, but nothing in the kernel function calls that.

Looking back at my description of BorderWindows, there’s an important difference:

(= oldPort (GetPort))
(SetPort 0)
(Graph grUPDATE_BOX lsTop lsLeft lsBottom lsRight VISUAL)
(SetPort oldPort)

It’s very interesting indeed how this happened to Just Work. Even so, I should probably go back and correct that SQ4 script.

[ , , ] Leave a Comment

Pattern pen implementation differences

While looking into something unrelated in Space Quest 3, I noticed that the dirt on the right of the starting screen was drawn differently between DOS SCI and ScummVM. Today I looked into it a little closer, comparing SCI proper, ScummVM, SCI Companion, and SCI Viewer.

Damn, that’s some really tiny differences that you’re not gonna spot just like this. But here they are:

  • In most of them, the mound on the right looks like this:
  • Except in ScummVM, where it looks like this: (and now you know why I looked into this)
  • Below the column in the middle looks like this in SCI and SV:
  • But it looks more like this in SCI Companion and ScummVM alike:
  • The heap on the left is also mildly affected, looking like this in SCI and SV:
  • But it looks like this in SCI Companion and ScummVM:
  • And finally, SV, renowned for being Very Good At This, breaks the one rule — you don’t get to draw white on non-white:

So yeah, a slight difference in where a window border is drawn is the least of your problems.

Update: ScummVM had its pattern table corrected this week. Guess I’ll have to check out the latest nightly, huh? And yes, it does match SCI proper now. Good job everyone!

Bonus update: sluicebox suggested comparing against SCI Studio. Here you go, friend: the mound on the right looks like this in SCI Studio, a distinctive variation, and the bit under the pillar and to the left matched SCI Companion and ScummVM (past tense now),

[ , , ] 3 Comments on Pattern pen implementation differences

String literals

In various programming languages, different quotation marks and such can mean different things. In C/C++ for example, "this" is considered a plain string literal. It can contain various escape codes (\x69, \n, et al), and escaped double quotes (\"), but not raw newlines. A single-quoted literal is not a string at all, but a character literal. In C# meanwhile we have the plain double-quoted string and single-quoted character but also @"this", a variation that does allow raw newlines at the cost of not allowing escapes. In PHP meanwhile, we have double-quoted strings that, in contrast to C, can have raw newlines but also have variable interpolation — "Hi $name" will appear as Hi Mark, assuming that is that variable’s value at the time. Single-quoted strings in PHP don’t do escape sequences or interpolation, and then there’s “heredoc” strings.

SCI also has different types of string literals. Or rather, had, depending on which version you targeted. Double-quoted strings could have escape codes (\42, note the lack of a letter, and of course \n) and raw newlines, but whitespace was folded away on compilation so raw newlines and tabs were entirely for code readability’s sake, requiring a \n at the end of each line that had to have a line break. You could also use curly braces for strings, {like this}, that had exactly the same rules and limitations as double-quotes, except for one difference in storage.

Any string literal in curly braces would be stored as-is in the script resource, while double-quote strings would be stored in a matching text resource and replaced in the compiled code with a look-up key:

(Print "This is an example" #title {Kawa says})

(Yes, I’m aware that the syntax highlighter doesn’t pick up on the braces.)

Assuming this is script #42 just as an example, and this is the first place a double-quoted string appears, the above would be transformed like so:

(Print 42 0 #title {Kawa says})

The original string will be stored in a separate text resource with the same number as the script. This helps cut down memory use.

(Bonus banter: there’s a bug in the original SCI interpreters that was introduced when they added the Message resource format involving hexadecimal numbers where they accidentally used "01234567890ABCDEF", with an extra zero. This messes up any attempt to use a good third of the character set in a message resource, but not an inline string literal, so having \0E in a string literal will produce the intended while the same thing in a message will produce ¤ instead. In SCI11+, this has been corrected to just "0123456789ABCDEF".)

[ ] 2 Comments on String literals

Tracing in SCI2

Where 16-bit versions of SCI had two blank spaces in their PMachine instruction set, SCI2 introduced the _line_ and _file_ instructions, which the compiler could inject into the final bytecode so that the built-in debugger could then work out exactly which source code line matched the current instruction. Here’s how that goes down in practice.

First we make two test scripts, 42.SC and 69.SC:

(script# 42)
 
(procedure
	Test
)
 
(public
	Test 1
)
 
(procedure (Test)
	(Display "This is Test, (42 1).")
)
(script# 69)
 
(procedure
	TestA
	TestB
)
 
(public
	TestA 0
	TestB 1
)
 
(extern
	Test 42 1
)
 
(procedure (TestA)
	(Display "This is TestA, about to call TestB.")
	(TestB)
	(Display "Back in TestA, gonna call (42 1).")
	(Test)
	(Display "Back in TestA.")
)
 
(procedure (TestB)
	(Display "This is TestB.")
)

This may look a mite different from SCI Companion code because despite everything, they are not the same. Now, compiling them both in SC version 4.100, from January 12 1995, and then pulling them back through a disassembler and annotating it a bit, we get this output:

; 42.SC
;-------
Test:	_line_	11	;(procedure (Test)
	_file_	"42.sc"
	_line_	12	;	(Display "Hello my darling.")
	push1
	lofsa	$6
	push
	callk	Display, 2
	bnot
	_line_	13	;)
	ret
 
; 69.SC
;-------
TestA:	_line_	17	;(procedure (TestA)
	_file_	"69.sc"
	_line_	18	;	(Display "This is TestA, about to call TestB.")
	push1
	lofsa	$6
	push
	callk	Display, 2
	_line_	19	;	(TestB)
	push0
	call	TestB, 0
	_line_	20	;	(Display "Back in TestA, gonna call (42 1).")
	push1
	lofsa	$2a
	push
	callk	Display, 2
	_line_	21	;	(Test)
	push0
	calle	Test, 0
	_line_	22	;	(Display "Back in TestA.")
	push1
	lofsa	$4a
	push
	callk	Display, 2
	_line_	23	;)
	ret
 
TestB:	_line_	25	;(procedure (TestB)
	_file_	"69.sc"
	_line_	26	;	(Display "This is TestB.")
	push1
	lofsa	$59
	push
	callk	Display, 2
	_line_	27	;)
	ret

Every time the PMachine encounters a _file_ opcode, it grabs a null-terminated string from the bytecode stream and places it into pm.curSourceFile. Likewise, _line_ takes a 16-bit number and places it into pm.curSourceLineNum. The built-in debugger can then notice when these two values change, find the source file, and display the correct line of code.

But there’s one tiny detail that threw me off initially. Can you see it?

When TestA calls Test, the current source file changes to 69.sc, but it doesn’t change back afterwards.

Although the SCI2 source I have here doesn’t seem to call it, there is in fact a pair of functions to push and pop debug state, preserving the value of pm.curSourceFile and pm.curSourceLineNum across module calls. Which is quite obvious when you think about it. The alternative I can see would be to insert another _file_ opcode after each out-of-module call.

[ ] Leave a Comment

Separating the Game and Engine

(Originally written 2017-10-07)

To most children of the nineties, “adventure game” was synonymous with “Sierra” and “LucasArts”. When you look at Sierra games like the King’s QuestSpace QuestPolice Quest, and Leisure Suit Larry series, you notice they’re all alike in one particular aspect beyond mere presentation, beyond the fact that they all control the same way: they are all third-person point-and-click adventure games.

Now that’s simplifying a little, since the earliest entries in those series weren’t point-and-click but instead had a text parser. But that doesn’t matter here, not very much.

King's Quest 5 - Crispin's house

We have our background image, our animated main character (henceforth referred to as Ego, in keeping with the script code), other animated characters where called for, a mouse cursor, and the status line. In many Sierra games from King’s Quest 5 on the status line is blank. In earlier games it was not, often displaying the game name and player’s score.

Regardless, we can control Ego in a particular way, and by placing the mouse cursor on the status line, we can summon an icon bar:

Leisure Suit Larry 1 VGA remake, Lefty's Bar exterior. The icon bar is showing.

Click a verb on the left half, then click somewhere in the game screen to act accordingly. Simple. The other three icons open your inventory, allow you to change settings and save/restore, and explain the other buttons. Almost all point-and-click Sierra games work like this.

The important question is, how much of all this is part of the engine, and how much is part of the game? Let’s find out!

As it turns out, the only common aspects that are part of the engine are:

  • The ability to draw backgrounds.
  • The ability to draw animated elements on those backgrounds, while also letting parts of it obscure them.
  • The ability to play sound effects and music on a variety of sound hardware of the time.
  • The ability to draw text.
  • Mouse, keyboard, and joystick input.
  • The ability to save and restore the game state.
  • A simple graphical user interface — windows, buttons, text fields and such.
  • The ability to overrule the way windows look, to change their border style in script.
  • The ability to draw a status line.
  • The ability to tell how to go from one point to another, in various ways.
  • For the later games, the ability to do various color tricks.
  • For the old SCI0 games, the ability to use a pull-down menu bar.
  • For the old SCI0 games, the ability to try and transform an English sentence into recognizable keywords.

That leaves this plethora as purely scripted:

  • What an animated character is.
  • What Ego is and what they can do that other animated characters can’t.
  • What an inventory item is.
  • How to respond to a click.
  • What the icon bar is, and what buttons are on it.
  • What a room is.
  • What literally anything being an abstract “object” is. This covers a lot of ground.
  • For the old SCI0 games, what goes into the menu bar and how to react to its use.
  • For the old SCI0 games, how to invoke the text parser, and how to respond to its results.

The game code can’t tell if you’re using the 320×200 256-color VGA driver or the 640×480 dithered EGA driver, only the abstract “how many colors do we have”, or “how many voices can we play at once” instead of knowing that we specifically use the PC Speaker for music output, or the Roland MT-32.

Those games I listed at the start are all third-person point-and-click adventure games because they all share a common set of scripts that define what an adventure game is. As such, not all Sierra SCI games are adventure games:

Jones in the Fast Lane, employment officeCastle of Dr. Brain, robot programming puzzle
One is a board game, the other a series of puzzles. Dr. Brain shares the user interface scripts common to the rest of them, but Jones is nothing alike. And yet, all of them are the same game engine. The script code is in the same format throughout, as are the audiovisual elements. System calls for the script code to use are the same, all throughout.

Sierra’s Creative Interpreter is not a third-person point-and-click adventure game engine, is what I’m saying.

You could remake Myst in it, after all. If you wanted.

[ ] 3 Comments on Separating the Game and Engine

Script resources – a dyad in the Force, as it were

ZvikaZ recently ran into an issue trying to hack Quest for Glory 1 VGA where they edited a particular script, and it worked fine, but when they then exported the .scr file and put it in a clean QFG1 folder, it broke in a particular way. One particular phrase stood out to me in particular:

There are ‘ch’ strings instead of the numerical values

I had a feeling what the problem might’ve been when I started reading the post but when I saw that part I knew exactly what happened.

Quest for Glory 1 VGA is an SCI11 game. That means the scripts are split up into .scr and .hep pairs, and ZvikaZ only copied the one file instead of both. One of them contains the actual script bytecode, but the other contains the amount of local variables, their default values, information on all the objects in the script, and all the text string literals in the script. It’s called a heap resource because that’s where it’s loaded.

Originally, the script and heap resources were one and the same. When a given script needed to be loaded, it would be loaded into heap memory and kept there until unloaded. And as explained before, a saved game is basically a compressed dump of the entire heap memory area, while hunk space contains all the other resources that the scripts, in turn, refer to. Now imagine for a second a script resource with a single class in it, with a single particularly big method, so that a mere fraction of the script resource describes the class, and contains any near strings and such, and all the rest of it is bytecode. Once loaded, the bytecode can’t be changed — only the class properties and any local variables can be, but all of that bytecode is still part of the heap. There’s only so much heap space available to a game, so as long as that script is resident, that bytecode will take up precious space.

SCI11 split the script resources up so that the bytecode parts would be kept in hunk space instead, swapped in from disk when actually needed by something from the script definitions in heap space. All that space taken up by PMachine bytecode is suddenly no longer part of the heap and this bad boy can fit so many script resources at once. And if your scripts use far text instead of near — text resources referenced by a module/line tuple that get loaded into hunk space, instead of "quoted strings like this" that are part of the script’s heap resource) anything those scripts try to say automatically also doesn’t take as much space. You trade a two-byte pointer for a four-byte tuple, but those numbers in turn may refer to a string of who knows what length. Savings!

ZvikaZ’s target was the script resource for QFG1‘s character creation screen, whose first class is a Room named chAlloc. That name appears in the heap resource. When ZvikaZ changed the script code and recompiled, the heap resource had its contents changed, including where exactly in the file the room’s definition started. Whatever mixed-up monstrosity resulted when ZvikaZ then tried to run the altered 203.scr against an untouched 203.hep didn’t function and notably printed ch instead of numerical statistics.

I’m honestly a little impressed it didn’t “oops” on the spot.

[ ] 1 Comment on Script resources – a dyad in the Force, as it were

SCI versions and naming

Did Sierra ever call the various versions of SCI the same names we use? We being the fans, the tool creators, and the ScummVM developers?

It’s unlikely.

One thing to keep in mind is that the interpreter was in near-constant development by one team, while other teams made the games. Every so often the game developers would pull in the latest interpreter and system scripts from a network share. Another thing to keep in mind is that the version numbers are a little weird in places, and that the games themselves had their own version numbers on top of that, so for example you could have King’s Quest 4 version 1.000.106 running on SCI 0.000.274, but also KQ4 1.000.111 on the same interpreter, released five days later, and the later update with the changed graphics that was version 1.006.003 running on SCI 0.000.502.

The first generation of SCI, the one we call “SCI0”, had versions starting with “0.000”, such as the KQ4 example above. This covers every single 16-color, parser-based, English-only game, with the lone exception of the Police Quest 2 PC-98 release. That was version “x.yyy.zzz”, no joke. This generation can also be subdivided into two blocks, where versions up to 0.000.343 had green button controls instead of using whatever the window color was set to, covering the ’88 versions of KQ4 and the first version of LSL2, and the rest covered all the other games.

What we call SCI01 had versions starting with “S.old”. At least “x.yyy.zzz” has the placeholder excuse but whatever. SCI01 games were just like SCI0 on the surface, but had support for multiple languages (previously introduced in version x.yyy.zzz), and saw no more releases than ’88 SCI0 — six of ’em. So technically there’s nothing about that version string to inspire “SCI01”, besides perhaps KQ1 using “S.old.010″ 🤔

Next up was SCI1, which came in both EGA and VGA and usually had versions starting with “1.000″. There is one game, Quest for Glory 2, with five different interpreter versions that still had the text parser (and technically one Christmas card) before it was removed in favor of the icon bar. Some SCI1 games again have interpreters with very strange versions — it appears Eco Quest and Space Quest 4, among others, had some Special Needs™, given interpreter version “1.ECO.013” and “1.SQ4.057″. But on the whole you could still tell from the first character in the version that these were SCI1 interpreters.

SCI11 removed the multi-language support in favor of things like scaling sprites and the Message resource type. All SCI11 interpreters in the wild use versions starting with”1.001“, except for the ones used in Laura Bow 2 (“2.000.274”), Quest for Glory 3 (“L.rry.083”), and Freddy Pharkas (“l.cfs.081”), among a straggler or three.

Up to now these were 16-bit real-mode applications. SCI2, with versions starting “2.000” was a 32-bit protected mode application instead, with the ability to use much more memory and run in a SuperVGA video mode. No SCI2 interpreter found in the wild seems to stray from this version pattern, mostly because all SCI2 games use version 2.000.000. SCI21, in turn, runs on interpreter version 2.100.002, although there are technically three different sub-versions of 2.100.002. That’s not confusing at all. And finally, SCI3 was only seen in interpreter version 3.000.000.

I’m thinking after the switch to 32-bits, they must’ve stopped automatically bumping version numbers on build.

So what does Sierra call them, then? Well, sources say that Sierra called the 32-bit interpreters SCI32, and the source code archive that I based SCI11+ on was SCI16.ZIP. But none of the changelogs and such seem to refer to SCI0, SCI1, or whatever.

 

 

Happy slightly belated new year 🥂

[ , ] Leave a Comment

Server move?

Well, that took the better part of my evening!

So I got an email earlier today from the provider behind my VPS saying they’re gonna stop doing VPS, that I should switch to a KVM, and they left me a coupon for six billing periods (that is, months) of free service for said KMS. Which is nicer than what Centarra did way back then, keeping a hundred euros in service credit for themselves.

And even though I barely have a clue what the difference is between a VPS and a KVM, I did manage to do it with hopefully as few hiccups as I can manage.

Thanks to Dark Kirb for putting my mind at ease, and to Emuz for handling the domain part.

Leave a Comment