Logo Pending

Linguistics, my weakness!

This topic was suggested by Phil Fortier.

I don’t normally bother with SCI0 except to gush over the artwork, but what can you do? Phil suggested I explain the parser, so I’ll try to explain the goddamn parser.

Let’s look at something more basic first. In AGI, there was little to no sense of grammar so your input had to be more rigid. This does make it simpler to explain and thus get something to work up from.

We start by defining a dictionary of words, mapping groups of synonyms to number values:

0 (a whole bunch of “filler” words)
1 anyword
2 check, examine, look, see
3 swim, swimming, wade, wading
4 enter, go
5 acquire, get, pick, pluck, rob, swipe, take
6 climb, scale
21 building, castle, cottage, fort, house, leanto, palace, tower
22 door, doors
23 dragon

The “anyword” entry is literally that. We’ll get back to that.

In the script code, system scripts and the current room’s script alike can at any point check to see if you said something:

if (said("check", "room"))
  print("You are standing outside a castle surrounded by an alligator filled moat.");
if (said("pet", "alligator"))
  print("What!  Are you crazy?");

It should be noted that the way it’s presented here is but an illusion, granted us by the decompiler. The said function’s parameters are actually a straight series of numbers, referencing the dictionary. This way it can match both >CHECK ROOM and >LOOK ROOM. But what if you typed >LOOK AT ROOM? There’s no check for that.

When the AGI engine parses your input, it goes through it word for word and tries to match everything to the dictionary. It first sees look, matching that to entry 2 and remembering it as such. It then sees at which is one of the “filler” words and skips it. The last word found is room, which is matched to 137. So now the “said buffer” as it were contains 2 137 and the said function can compare its parameters against it. Two special meta-words, anyword and rol, are available to match literally anything and to ignore the rest of the buffer if any.

If you were to tell King’s Quest 1 to >LOOK AT THE CROCODILES it’d tell you it doesn’t understand “crocodiles”. Those aren’t in the dictionary so there’s nothing to match them against. Those are alligators in the moat.

SCI is a bit more involved than that, but you should have a good basis to work from now.

In SCI0, the dictionary is extended to include grammatical types for each entry. For example, the first item might be marked “article”. Indeed, it’s a group of words like “the”, “a”, “an’, and for our Spanish players “el”, “la”, and “los”. Then you might have a numbered entry “give, offer” marked as being a verb. Words like “lock” would be marked as being both verbs and nouns.

All that so a very nifty state machine can determine what you want to do, what you want to do it to, and what you want to do it with, among other phrases.

Every time you enter a command, a said event is fired. Through the usual systems, this event is passed down to the current room’s handleEvent method, which can tell it’s a said event. So now we know that something was said in the first place, but what exactly? At this point things turn a little AGI-like again.

(switch (pEvent type?)
    ((Said 'close/door')
      (Print "Check again! It IS closed.")
    ((Said 'look>')
        ((Said '<in,through/craft,pod,pane[<escape]')
          (Print "This task is impossible since the door is sealed from the inside.")
        ((Said '/pane')
          (Print "The window is clear enough to reveal the blackness inside.")
        ((Said '/door,door')
          (Print "The solidly built door looks to be locked in place.")
        ((Said '/nozzle')
          (Print "The pod's thrusters are very small. They have been cold for a long time.")
        ((Said '/craft,pod[<escape]')
          (Print "This is the escape pod which safely whisked you away from Vohaul's burning asteroid fortress.")
        ((Said '[<at,around,in][/area,!*]')
          (Print "You are standing in a debris-cluttered junk bay.")
    ; ...

First of all, don’t be fooled. Those strings have single quotes around them for a reason. They too are stored as numbers, and so are the other characters. For example, 'close/door' is 1157, 242, 2110. The actual order of things might be a little off to see though. Obviously the > at the end of the look clause means that the rest of the sentence is to be considered separately. That’s all good. But the / does not mean to continue from here, since 'close/door' has it right there in the middle. No, the / means that the next word is to be the direct object. The thing you want to close. The first word is the verb, considering the imperative nature of these games. If there’s a second /, that’ll be the indirect object, but we don’t have one here. The < in '/craft,pod[<escape]' modifies the object it succeeds, while the [] around it make it optional. Thus, 'look/craft,pod[<escape]' matches >LOOK CRAFT>LOOK AT ESCAPE SHUTTLE, and various other ways to phrase it. Oddly, even though the description calls it an escape pod, “pod” is not a valid synonym. Oh well.

Update some two years later, turns out Space Quest 3 version 1.018 has a massive script bug involving them adding another word group, but neglecting to recompile all the scripts. As threepwang put it, “30 scripts still reference the old vocab group ids from 0x953 to 0x990 in their Said strings.” Oof. So I decompiled the Amiga version and instead of “chute” it said “leech”. Because alphabetical order. So in the end “escape pod” would be valid, but only on any version other than the very last PC release that’s also in the SQ collections. Great.

But how does the engine know what the verb and objects are? Well, it figures that out via that state machine I mentioned. When you start to enter a command in SCI0, the User script handles showing you the command window and such, then passes what you entered to the Parse kernel command, hereafter “the parser”.

One particular resource lists all the possible types of phrase, such as “shoot”, “talk to dwarf”, or “hit redneck with plank”, but stored in the sense that a verb phrase can be just a verb, a verb followed by a direct object, or a verb followed by a direct object and an indirect object. The parser then tries to find the verb phrase that best fits the input, filling in placeholders until done.

For example, say we entered >LOOK AT HOBO. The parser will try to find the best-fitting verb phrase, starting with a bare core verb. In turn, it will consider the various core verb structures listed in the grammar, eventually finding a bare “just a verb”, which “look” matches, but there might still be a better match. The very next option is indeed “a verb followed by a position”, which matches “look” and “at” in that order. No other core verb structures match, so we can write that down and continue with the rest of our verb phrase. Which is now done.

But again, there might be a verb phrase that better matches our input and we do have more words to consider — we’ve only looked at “look at”, not the whole “look at hobo”!

The next verb phrase is “a core verb followed by a noun phrase.” Hmm. Again, we look through the various grammatical definitions of a “noun phrase”. The first one is “an article followed by a core noun.” Well, we didn’t say to look at the hobo, so that one’s out. The next one wants a hobo with an article and an adjective, then a hobo with only an adjective… but we do eventually get a noun phrase that’s just a core noun, and in turn that core noun is reduced to just the single word “hobo”.

So now we know that our verb is “look” and our direct object is “hobo”. The parser can now generate something from this information that Said can compare against, keeping it in mind until the event is eventually claimed, be it by a successful Said match or the game giving up on your strange input.

[ , , ]

Leave a Reply

Your email address will not be published. Required fields are marked *