Logo Pending

Selectors and different ways to push them

Earlier today, some 19 hours ago at the time of writing, Eric Oakford opened an issue on the SCI Companion GitHub repo. Eric is working on a big decompilation project, taking mostly demo versions of SCI games and trying to wrangle them into a recompilable state.

In the issue he’d opened, Eric described how the demo version of Leisure Suit Larry 3 didn’t decompile right, while the actual LSL3 worked fine. Here’s an example of the problem:

One of the first things a Room instance would normally do in its init method is to call (super init:), letting the Room class itself do its setup before anything specific to that room is done, like setting up actors, scripts, features, and walk polygons. In PMachine byte code that statement looks like this:

39 57       pushi 87    // the "init" selector
76          push0       // init takes no parameters
57 36 04    super Rm 4  // four bytes (two words) worth of stack to send

Now, the SCI PMachine is a stack machine, with two parts that are important to know about: the stack (because duh) and the accumulator. You can load numbers onto the accumulator, push them directly onto the stack, push the accumulator onto the stack, duplicate the top item, etcetera. Every value on that stack is a 16-bit number, as is the accumulator. Pointers? Just 16-bit numbers. Characters returned by StrAt? 16-bit numbers, even if it’s just ASCII codes. Properties and methods to invoke in a send? Yup.

There’s a separate table, vocabulary 997, that lists the name of every selector — every method and property of a class or instance. And that’s where it says that selector #87 is init.

Noting that superself, and send are all three sides of the same weirdly-shaped coin, the decompiler can tell that there should be a (super ...) command in the output, four bytes of stack space back. Since it’s not actually running anything it has no actual stack, but it can look back to find two push operations that’ll fit the bill just as well. It can tell that the first value should be a selector, so whatever is being pushed is taken to be one, which is correct — 87 is the init selector. Then the next value is the amount of arguments given to init, which is zero.

But everything went wrong in the demo. This is how rm200, the overlook with the binoculars and memorial plaque, starts in the demo:

35 57       ldi 87
36          push 
39 00       pushi 0
57 36 04    super Rm 4

The actual values on the stack stay the same — 87 0 — but the way they’re put on there is subtly different, and that tripped SCI Companion up.

Instead of (super init:), it decompiled the above as (super species?).

In fact, all selectors in code blocks were species. Six hours ago at the time of writing, I figured out the problem.

Now, the SCI Companion code is super hairy and I really couldn’t do it justice by just including snippets here but the gist of it?

Every operation may have a couple operands. One method in the decompiler returns what the first operand for a given operation in the byte code may be. For pushi, that’d be the immediate value to push. For push0push1, and push2, that’d be zero, one, and two. For ldi it’s exactly the same as pushi, just that the operand is to be put in the accumulator, not the stack. By default, this method just assumes zero.

Notice how push has no operands at all? Why would it? It pushes the accumulator’s value, after all. The only difference between pushi 87 and ldi 87, push is that in the latter case, the accumulator is also 87. The accumulator doesn’t matter to a send, only the contents of the stack. And pushi 0 is just push0 with one extra byte. And that makes these two snippets effectively the same with regards to actual execution in an SCI interpreter.

So what happens when the decompiler sees the LSL3 demo’s scripts, is that it looks back for two pushes, as it should. It finds the first, push, which should be a selector. But the helper method that returns the value being pushed can’t return any operand — there are no operands here! So it returns the default, zero. Some confusion about it possibly being a variable later, it decides that it must be species. And then it does this for all the sends in the demo, because they all push their values the same way.

The fix came to me when I saw the case for the dup operation in that very same method that’s supposed to return the value that’d be pushed. It too takes no operands, yet does return a value that’s only zero if it should be. Turns out it scans back a bit, looking at the previous operation in the bytecode stream, and steals its value by calling the same helper method again, but aimed at the previous operation. The fix then is to make push also steal its predecessor’s value. I did decide to special-case things for now, though. It’ll only do the stealy thing if it’s an ldipush pair, like in the LSL3 demo.

But it does work.


Addendum: You might wonder why the incorrect decompilation was (super species?) with a question mark instead of a colon. The decompiler and interpreter alike can tell which selectors on a given class are supposed to be properties, and which are methods. When invoking a method or setting a property, the standard is to use a colon, like in (theSong number: 4 play:), which is a property set followed by an arg-less method, and to use a question mark for property gets, like in (= theX (gEgo x?)). And since species is a property and there was no argument, it was taken to be a property get.

[ , ] 1 Comment on Selectors and different ways to push them

No start!

Back in February there was this whole deal on Twitter about the Konami Code, when the person who introduced the thing passed away, and I wondered if I was an asshole for being bothered by all of these people including the Konami PR guys posting the Code… with a start at the end.

Because the Code, as Kazuhisa Hashimoto originally introduced it, never included the start button. And I brought receipts! So here’s a quick rehash of what I wrote the day after.

This is part of the 6502 code for Contra, where it determines how many lives you should start the game with. As you can see the amount of lives is stored at address $32 and is set to either 2 or 29 depending on the value of address $24, where it tracks if the Code had been entered.

This is a view of the actual memory of the NES, or at least the relevant part of it. The right half had been cut off for size in the original tweets, but none of the values we’re looking at are on that side anyway. Focus is on $24, the Code flag, which is unset.

Entering the Code flips $24 to 1 when you press A. The only thing the start button does is… start the game. That includes running the code above to initialize the lives counter.

“But Kawa, what about Gradius?” you might ask. Well, you go start up Gradius and enter the code, and tell me what you see. You begin the game, press start to pause, enter the code, and press start to unpause. Don’t enter those button presses too quickly or you might not catch on and fool yourself! Turns out the options and such appear the moment you press A.

Here’s the RAM for Gradius during a pause. $33 tracks how far along the Code entry you are and ranges from zero up to nine. When it’s zero, the next button expected is up. When it’s nine, the next expected input is A. You finish, $33 equals ten, and your loadout changes on the spot:

Also the tracker is reset to zero so you can enter it again. Now, in Gradius you can freely make a mistake and try again because the tracker just resets to zero when you mistype. In Contra, the tracker ($3F so it’s not visible in those screenshots) is set to 255 as a “lock” value when you mess up and you don’t get to try again.

But yeah, no start in the Konami Code. Your tattoo is now ruined.


[ , , , ] Leave a Comment

On object lists, meta-tiles, and Mario

A fun fact about most 2D Super Mario platform games is that they all share a common way of storing their level data. A common paradigm as it were. Only the Game Boy games don’t.

If ROM-based games load so fast compared to disk-based games, why does Super Mario Bros 1 make you wait on a mostly-black screen before you get to play a given level? Why does Super Mario World? Surely it’s doing more than just sitting idly?

The answer? Besides graphics in the case of later games, it’s converting the level map from one format to another. From a list of objects to a tile map, to be precise. That brick-block-brick-block-brick line we all know and love from SMB1‘s world 1-1 for example? Five tiles, but only three objects. First, a brick object set to five tiles wide. Then two separate question block objects that overlap the five bricks. On load, these objects are rendered into a tile map.

(Now, SMB1 didn’t have the space to hold an entire converted level in memory and only had a screen or two at once, which is why you can’t backtrack. So in SMB1‘s case, it does in fact sit idly. Thanks to NovaSquirrel for mentioning that.)

While the NES has 8×8 pixel tiles, the map this object list is rendered to has 16×16 pixel tiles. It is what some would call a meta tile map, where each entry itself refers to a different data structure that says “meta tile 2 has this color palette and is built from these four tiles”. That’s the map format a great many games of the era and later use. When an area is about to scroll into view, that tile map is then quickly converted to VRAM-native tiles. And that’s how you can have a set of three coins be defined as one object, yet pick each coin up separately, or have a strip of bricks that you can individually break. And since that alters the big tile map in memory, if you were to backtrack (even though you can’t do that in SMB1, as mentioned, but you can in the later games) those coins and bricks would not reappear.

Sprite objects come in a separate list, usually after the level geometry, and at least for the “classic” games they are subdivided into pages, about a screen wide. They’re only instantiated when their page is just off-screen, and they’re not marked as properly dead. Which is why if you knock out a koopa trooper but leave him there, go about a screen away then double back, the trooper will be back in his starting position and perfectly fine. Your leaving that page made him despawn without marking as properly dead.

Now, the Super Mario Land games… they do what they want. SML1 for example subdivides levels into screens, which are lists of strips that can be reused at will. I think the screens themselves can also be reused. And that is then converted to a regular tile map. The original Legend of Zelda used a similar strip-based layout.

I think I remember Super Mario Land 2 used straight-up 16×16 pixel tile maps for its level geometry. Both of these methods are still better than storing several screens worth of tile map in its native size.

Using straight-up tile maps of any resolution is of course a common technique used by many games. As a rule, the larger your levels can be the larger you want your meta tiles to be. Sonic the Hedgehog has positively huge meta tiles, themselves defined in terms of smaller tiles, since your average speed almost requires levels be large to accommodate. And it makes constructing those loops easy as a bonus. Most NES games tend to have 16×16 pixel meta tiles though, because of the attribute map being that size.

[ , ] Leave a Comment

More pronoun problems in Ranma fanfics

(Edited from a Twitter rant in ten parts.)

One thing I find linguistically interesting about Ranma ½ fan fiction, especially most of the more recent works, is that they make a big fucking deal of Ranma’s pronouns. Mind you, these stories are set in Japan, starring Japanese characters, speaking Japanese. It’s all just rendered in English because Internet.

Third-person pronouns in English have genders. He/she and such, you know the ones. Since these stories are written in English, you’ll often find characters refer to Ranma with one pronoun or the other. No problem there, the original manga and anime do it too. But there’s a twist.

There are lots of stories about Ranma being transgender, especially in recent years as far as I’ve seen. Which is totally understandable, really. That’s not the problem. Write about transgender Ranma all you want. The problem, at least to me, is when characters start mentioning how other characters use this or that pronoun to refer to Ranma.

(This of course applies not just to Ranma but to any other character who shares the same curse. Let’s keep it simple, though.)

Worse, for the purpose of this rant, is this one story where Ranma joins a support group for LGBTQ people and the members all introduce themselves and state their pronouns. See, if these are Japanese characters (they are) speaking Japanese (this is implied), and my research is correct (I can only hope), that is literally not a thing they could do.

Where in English it is the third person pronouns that are gendered (he/him, she/her), Japanese has them in the first person. Ano hito, yatsu, and koitsu, those are all gender-neutral. Boku, watashi, and atai, are all gendered. And that’s just a small sample of first person pronouns.

So the very first time someone like Ranma opens his pie hole and speaks of himself, he’ll use whatever pronoun he wants. That’d be ore, a very manly one, as in “ore wa otoko da,” “I’m a guy.” It’s when Ranma uses a feminine pronoun that the eyebrows rise. Mind the phrasing there!

I was reminded of the Twitter rant this post is adapted from by another fanfic I read last night, where Genma caught himself thinking about his recently-cursed child with female pronouns. As in, the English third-person ones. It didn’t do much to damage the scene or anything but I felt mildly distracted by the idea that a Japanese man would think in English terms.

There are in fact fanfics, written in English, where Ranma will say something and maybe there’s something about the phrasing in English, and another character remarks that Ranma used a feminine pronoun, perhaps even saying the pronoun itself in Japanese, in the middle of a story otherwise written in English. Just as an example: “Ranma used atashi just now instead of ore, and he’s not trying to trick Ryōga. Something’s going on here.” Something like that. It’s quite interesting how you might handle this difference.

The episode Am I Pretty comes to mind, where Ranma’s entire way of talking changes right along with his first-person pronoun. I only watched it in Japanese, but I’d imagine the dub just only has his way of talking change. If you’ve seen it dubbed, feel free to let me know how they handled it in the comments.

Suffice it to say, as weird as pronouns can get in one language, it gets so much weirder when there’s two in play.

[ , , ] Leave a Comment

Shampoo, Cologne, Mousse

(Edited from a Twitter rant in nine parts.)

I’ve complained about Shampoo’s name and how fanfic writers tend to write her “original Chinese name” often enough. I’d like to discuss Cologne and Mousse this time.

Now then. Knowing that Shampoo is the only one of the three with a name in actual Chinese characters, Cologne and Mousse are only ever written in katakana: コロン and ムース. Koron and Mūsu. Simple, right? That’s basically exactly how the products would be pronounced, just like with Shampoo.

Problem #1: Cologne’s fandom name is usually written as Khu Lon. Sometimes I think without the H. First of all, I can find no romanization scheme where khu is a valid sound, written that way. There’s khu but that’s in a scheme that’s not even used, and kuh isn’t quite it either. Second, it seems to me to be simply the wrong sound. The vowel is all wrong!

Problem #2: Mousse is usually given the name Mu Tsu, Mu Tse, and in this one doorstopper I’m slogging through, Mse Tsu. I already covered how absurdly wrong that third one is. If these were at all right, wouldn’t his name be written ムー, tsu? But it’s not. There is no T sound in either his established katakana writing or in his name as spoken.

Shampoo’s name, as covered before, is shanpū in Japanese, with a lengthened -u. It’s shānpū in Chinese, where I believe the accent marks denote tone? Correct me on that if I’m wrong. So at worst the length of the -u is different. Likewise, Ranma’s name “translates” to Chinese as luanma. Exactly the same kanji (乱馬) and all that. Readings are cool like that. Between the well-known L/R difficulty and an easily drowned out u, that’s also basically the same when pronounced.

Why then, in the name of logic, should Cologne and Mousse have their names so different? At least back when they started, fanfic authors had no way to look this shit up — there was no Wiktionary or such back then. You really have no excuse now. You can do better than this. If you were to tell me “but those are their names”, you basically admit to being both uninformed and being too lazy. “That’s how we’ve always called it in the fandom” is arguably better, but again you can do better.

Since we can look shit up, let’s actually do look this shit up! What is “eau de cologne” in Mandarin Chinese? It’s 科隆香水, kēlóng xiāngshuǐ. “Mousse” as in the hair product? That’s 摩絲, mósī.

Shānpū, Kēlóng, Mósī. Research over.

But wait. Kēlóng? Ke? Didn’t I say earlier that the vowel is all wrong? I did. It’s a romanization difference. The zhuyin is ㄎㄜ. I gave their names just now in hanyu pinyin. In Wade-Giles, it’s written with an o. Simple. Mousse’s ㄇㄛ is mo in both.

Just for fun, I thought I’d look up some common fanmazon names, but reconsidered when I found Chinese has a perfectly good word for perfume. Oh well, I’d made my point already 🤷‍♀️

[ , , ] Leave a Comment

More regarding Interrupt 21

Last time I explained how your standard file rename function as seen in MS-DOS worked. You’d set up two CPU registers with pointers to the old and new names, set AH to 0x56, and called Int 0x21. Easy, right? And then I went into detail on how malformed inputs were handled. They weren’t handled too well, and DOSBox does it differently from MS-DOS on top of that.

But what if we had a file system and rename function that did support spaces? Maybe more than eight characters, even? In mixed case?

That is of course VFAT, an extension to regular FAT16 available in Windows 95, NT 3.5, and later. With a VFAT driver, most of the old file operations available from Int 0x21 had counterparts installed that generally took the same arguments and had the same numbers, but accepted long filenames.

So to rename a file with long filename support, you’d do exactly what you’d do before but instead of setting AH to 0x56 you’d set AX to 0x7156. Assuming Windows is running and we use the same inputs as last time, your file will now be named hello world.txt. And that’s all that takes, even if it’s a pure DOS program doing it.

Which raises a question. How do you make a pure DOS program that handles files that may have long names, may be run from Windows, and should not drop any of those long names if it is in fact running in Windows? Well, it turns out all those LFN functions — the ones starting with 0x71, all reset AX to 0x7100 if they’re not installed. A trick of the system, I suppose. So what you could do for your LFN-enabled rename function is try to use 0x7156, see if AX has reset to 0x7100, and if it has, you try again with AH set to 0x56. In other words, it’s time to bring back the rename function from SCI11… or rather a branch of SCI11+ that I’ve been working on.

rename	proc	oldName:ptr byte, newName:ptr byte
	mov	dx, oldName	; ds:dx = old name
	push	ds
	pop	es
	mov	di, newName	; es:di = new name
	mov	ax, 7156h	; LFN Rename
	int	21h
	.if	ax == 7100h	; LFN failed, try DOS 2.0 version
		mov	ah, 56h
		int	21h
	.if	carry?
		xor	ax, ax
rename	endp

It’s that easy. Of course, this is old-school MASM code which has some nice things like .if but that’s just sugar to avoid having to write compares and branches — the concept should be clear enough. An attempt to rename a file to Introduction.txt will result in exactly that on Windows, or transparently collapse to introduc.txt on plain DOS.

Note that in the actual SCI11+ code, if you’re crazy enough to look it up, there’s an extra function I made that’s called right before the DOS 2.0 rename call that replaces all spaces with underscores, which renders them about 100% not as confusing and untouchable as the one shown last time. I left that part out for brevity.

[ , , ] Leave a Comment

This is why you sanitize your inputs, 1983 edition

(This is heavily expanded from a few Twitter posts of mine.)

When you write an application that has to rename a file, you have your chosen language and platform’s standard library to do the heavy lifting for you. For example in C it’s usually int rename(const char* oldName, const char* newName), and a bunch of other languages follow suit. Why not, it’s a good function! But what does rename actually do?

In MS-DOS, this’d be handled by Interrupt 0x21, subfunction AH 0x56. By which I mean it’d set two specific processor registers (as mentioned in Save Early, Save How) to point to the old and new file names, set the AH register to 0x56, and execute the INT 0x21 instruction. A function installed by MS-DOS will then take over, doing the actual renaming, possibly returning an error value which the C function can immediately use as its’ return value. Since SCI has its own “need-to-use” library…

rename	proc	oldName:ptr byte, newName:ptr byte
	mov	dx, oldName	; ds:dx = old name
	push	ds
	pop	es
	mov	di, newName	; es:di = new name
	mov	ah, 56h
	int	21h
	.if	carry?
		xor	ax, ax
rename	endp

(Full disclosure: the SCI code actually includes a dos macro to save the programmers some typing. I unrolled it here for illustration purposes.)

All of this pretty much matches what you can find on Ralph Brown’s list. Given a suitable function prototype in C such as the one in the second paragraph, SCI can now call its own rename function as it desires.

Enough about SCI though, its function as a practical example is at an end.

But what if you gave it bad inputs? Sure, if the old name doesn’t refer to an existing file it will return 2 “file not found”, but what if the new name isn’t quite valid? Remember, this is MS-DOS; we don’t have the luxury of long file names here. It’s 8.3 or bust. I don’t see any sanity checks in the above function, and Brown’s documentation only speaks of splats.

So what happens if we have a file boop.txt and call rename("boop.txt", "hello world.txt")?

In DOSBox, you’d end up with a file hellowor.txt. You are free to further manipulate this file in any way you please. The command line won’t choke on it, file managers won’t get confused. If you wanted to manually rename it back to boop.txt from the command line, ren hellowor.txt boop.txt will work perfectly fine.

This is actually not true in real MS-DOS. If your program were to run on a real MS-DOS installation, you’d end up with hello wo.txt, an 8.3 file with a space in it. And no contemporary file manager I’ve seen can handle that. The ren command built into command.com can’t parse it — ren hello wo.txt boop.txt is three arguments where ren expects only two, and the first isn’t an existing file’s name that it can change to wo.txt.

In cmd.exe of course you can use double quotes to make it unambiguously two arguments, but this isn’t cmd.exe. What about some file managers though? I have two, Norton Commander and its big brother Norton Desktop.

In Norton Commander, the file list shows hello wo.txt, and its rename function can handle it. So can the built-in editor and viewer. Top marks for Norton Commander!

Norton Desktop on the other hand is not so sturdy. It can show the file in the list but that’s all. Trying to rename it back to boop.txt reveals the incorrectness of the source file’s name quite succinctly:

Technically, this is true. You’re not supposed to have spaces in the middle of a FAT 8.3 file name. If a file has less than eight characters before the dot, it’s secretly padded with spaces, and so are the three extension characters. And the dot isn’t even — the true name as written in the FAT directory would be BOOP    TXT. But that’s just one way Norton Desktop trips. Its viewer seems to be passed the nonexistent hello. It shrugs and asks which existing file we want to open. Its editor is given the same argument(s?) and lets us edit a brand new file named hello. In Norton Desktop’s world, it can see the file, but it can’t do much with it.

What about a contemporary Windows? Can, let’s say, the Notepad from Windows 3.1 handle this file? Okay, so technically this is commdlg.dll talking, but we’re playing for effect here.

Of course not, what did you expect by now!? Norton Commander only worked because it didn’t care enough! Would you really think one of the companies who made the FAT file system would blithely ignore one of the cardinal rules at the time?


Next time, we gettin’ hacky.


…Wait, hold up. Why does it say 1983 in the title? Well, if you notice on Ralph Brown’s site the rename function was introduced in DOS 2, which was first released in 1983. And so was I.

[ , , ] Leave a Comment

Snappy and Clear

(This was originally posted as a Twitter thread.)

I’d like to talk about bad words for a moment. Specifically, words used as tags and such for porn site content. So there’s your content warning right there.

“Dickgirl”. (pause for audience gasp) There are those who consider this word to be a Bad Word™. I respectfully disagree. I think this word is snappy and clear, like a content tag word ought to be.

(Now, I’d like to interrupt this repost to clarify that this is not about how it’s somehow not bad. It’s about how on another level it’s good. This was never meant to convince anyone otherwise, as we all know this to be impossible. With that in mind, back to the repost.)

Imagine, if you will, a fresh-faced pervert’s first go on the Internet. Our freshly-hatched pervert finds a porn site with a comprehensive tagging system, that includes hentai and its related genres and tropes. He finds an image set or a comic or such tagged “dickgirl, bukkake” among other things. What do you suppose this person, who has never seen these two words before (I know, incredible), expects to find upon reading these tags and opening the comic?

At least one chick with a dick, and the other thing is a surprise.

Y’see, when you see the word “dickgirl”, there’s only a few things you can take it to possibly mean, and only one or two make enough sense to likely be correct. It refers, of course, to a girl with a dick. It’s a hentai comic, it’s allowed to have weird shit okay?

But yeah. Two short syllables, each a perfectly clear word on their own, and there’s very little doubt as to what it means. The opposite, “cuntboy”, is exactly the same in all regards. No need to repeat myself there. “Bukkake” on the other hand… oh boy. Oooh boy! First of all, it’s Japanese. Our (I’ll remind you) uninformed perv doesn’t speak the language, so he has nothing to go on. Second of all, it’s got a non-sexy meaning too, that came earlier. Something about pouring water on your noodles from higher up? It’s not that snappy, and it’s certainly unclear as all get out. If our perv were Japanese or otherwise familiar with the other meaning, he might justifiably think “oh, the dickgirls are gonna eat noodles afterwards.” Boy is he in for a surprise!

You could say the same about “futanari”. Again, Japanese. Means, roughly, “two forms”. And frankly, reading that I’d sooner think of giant fighting robots that turn into jet planes and such than a girl with too many bits down her panties.

And please, don’t even think of suggesting “pre-op transwoman”. It’s not at all snappy, and it’s frankly too specific. Surprising, I know. Not all female-presenting dick-having characters in these stories are trans, okay? Like 99% aren’t!

(“you could also argue that while it’s definitely a bad idea to call real-life trans women by porn terms, it’s probably a lesser bad idea to call porn by the current words used by trans women, in that it may lead for those terms to become sexualized/pornographic.” “Exactly why you just keep calling dickgirl porn dickgirl porn.”)

Anyway thanks for listening to my TED talk. I’m Kawa, hobbyist linguist, and all-round lazy bastard.

[ ] Leave a Comment