code block
Thread review | |
---|---|
Kawaoneechan | I'd damn well hope so considering the amount of research time I put in that post. |
neologix | Can confirm firsthand what Kawa wrote about Chrono Trigger text stored in a tailored dictionary. One of the things I wrote in my nascent (and recently untouched) CT ROM parser is the ability to read the text data stored, and read early on in Geiger's notes about the compression/decompression method and it checked out. |
Kawaoneechan | Yup, this is a byuuboard alright. |
wareya | formally numberless conceptual-morphemic orthographical unit of the middle empire |
Kakashi | Yeah, why can't we just say singular adopted logographic Chinese character? |
wareya | love 2 unironically write the string of text "discrete kanji logogram" |
Kawaoneechan |
What's worse is, with all this talk of specific encoding schemes... repointing to make room for trivially-encoded strings is the easiest way, even on systems with "nasty" pointers. Seriously though, there's basically only one reason for Chrono Trigger to have dictionary lookup bytes and that's "ROM is expensive". |
Kakashi |
Posted by sureanem So you basically don't know what you're talking about. Gotcha. |
Kawaoneechan |
Chrono Trigger's text encoding included a large swathe of dictionary lookup bytes, mapping one byte value to two or more characters, along with the general "insert name here" bytes. This would let entire parts of words like "pedia" be saved as one byte in the original text string, but decode into the full version for display in the dialogue box. The dictionary is not based on the top 30 of a given language, but tailored to the needs of the game. Unfortunately, I don't know what the Japanese version's text encoding is like, only that the names and dictionary lookups are there too, so I don't know how it handles kanji. Does CT have kanji? |
CaptainJistuce |
Because translation isn't an exact science. Some statements will be shorter, others longer. Some will have to be reworked to fit the game's output, which can change the length. Some games just have insane lovecraftian nightmares where one would expect the text engine code to be. I don't know what specific games you're thinking of, so I can only speak in vague generalities. |
strfry("emanresu") |
Posted by Kawa Well, yeah, then, uh, case closed. Personally, I think UCS-2 should be called either Unicode or wchar_t, but "classic UTF-16" seems like a reasonable compromise to minimize confusion. So why is repointing so important, then? It's a slight improvement, but you sure could do without it if the ratios are as you say. How come it can preclude games from getting translated? |
strfry("emanresu") |
I'm not talking about any radicals or compounds, man. That sounds like something you'd want to advertise your sports drink has got lots of (or none at all, I wouldn't know - free radicals cause cancer, right?). You can fit more than two (2) alphabetic characters in the area of the sprite that would ordinarily be used to render one (1) discrete kanji logogram. Provided a kanji logogram is encoded with two bytes and an alphabetic character ordinarily would be encoded with one, this reduces the amount of space needed to encode a given sequence of alphabetic characters. |
Kakashi |
Posted by sureanem You're clearly trying to compare radicals to kanji compounds, which is just hilarious scrambling considering how radicals work. This isn't Hangul, buddy. $gf tells me you have too much time on your hands. |
Kawaoneechan |
Unfortunately, the amount of 90s/00s games on consoles and handhelds that use Unicode in any form since Unicode's inception (1991) can be counted on one hand, and UCS-2 (as "classic UTF-16 aka "Unicode"" is properly called) is considered wasteful. Fun fact about dedicating tile space to digraphs: at least one Final Fantasy fan translation that I've seen did this, with about ten at most character values being digraphs like 'll', 'il', or 'th'. I'll bet biscuits to an asskicking that this was done primarily for the visual aspect, and that the actual text was still repointed to fit. |
strfry("emanresu") |
Posted by KingMike You can put more than two characters in a kanji. Anyhow, if you're using a variable-width encoding there's not much of a point, that's true. Also, I thought about using two characters on tile early in my "career" but I stopped when I realized it looks like shit. Even if it's consistently kerned? It should at least be readable, and better than gargantuan inter-letter spacing. Posted by Kawa Yeah, it's about storage space all right, I just used the word "physically" poorly. By variable-width encoding, I mean an encoding like UTF-8 or modern UTF-16 where a character may be one or several bytes long, unlike an encoding such as ASCII, ISO 8859-1 (aka Latin-1), or classic UTF-16 (aka "Unicode") where one character is always the same amount of bytes (e.g. one char/wchar_t) |
Kawaoneechan |
Trick question: repointing is a lot of work regardless, even if it's as simple a format as the GBA. Maybe you have a banked system and each bank has its own list of pointers to its constituent strings. And then in one particular bank there's a list of pointers where each individual bank's list starts, and each of those pointers, unlike the GBA's 32, is only 16 bits wide. As for variable-width character encoding, what do you mean exactly? I feel like I have to ask just so we're all clear, considering earlier revelations. Edit: hah, didn't see KingMike's post. Again, it's not about screen real estate, but about storage space. The former wouldn't need pointer rework but UI element resizing and moving at worst. On a typical tilemap with 8x8 pixel tiles, kanji are almost always four tiles (2x2) in size simply because they're too intricate compared to kana and romaji. |
KingMike |
Also, putting two characters on one kanji is a brilliant idea, you say? You know that games with lots of kanji (or more than a handful) usually use two bytes per kanji? So how is using two bytes to represent a two-character title saving space compared to one byte for one character? Also, I thought about using two characters on tile early in my "career" but I stopped when I realized it looks like shit. (other people say they're okay with limiting it to pairs like il and ll, but I think it still looks ugly when all other characters are evenly spaced) Of course using VWF would fix the looking like shit part, but then you're back to the first problem. I've only done that with the RPG Maker games where I'm FORCED to fix text within the available space (due to the specific nature that it the games are to be editable in-game and thus compatible with the in-game engine. Also because the menu text is embedded within some custom programming language. And I sure don't want to spend more time fully reverse-engineering that language so I can write a re-encoder. See that's another thing that comes up in SNES games, I heard the Romancing SaGa games did that.) |
strfry("emanresu") |
Posted by Kawa You might have the pointers indirectly computed through some hell of lookup tables and offsets, so while technically possible it'd be a lot of work. I reckoned the developers would use something like old-style UTF-16 where everything is two bytes for simplicity. If they use a var-width character encoding, then getting the translations to fit without relocation shouldn't really be a big problem, right? |
Kawaoneechan |
Why would you not be able to move the pointers around? Some systems may have bankswitching limitations but that hardly means you can't do it. Also please address the part where your Japanese example sentence takes more bytes of storage than the English equivalent. |
strfry("emanresu") |
Posted by CaptainJistuce Well, all that makes sense. But how come it never gets used in games where you can't move the pointers around? I get that there are better compression methods, but those aren't used either. For Nintendo 64, the strings usually have printf-esque format specifiers and are stored uncompressed, so I'd imagine those issues you outline wouldn't be an issue for N64, PSX, and later. Not saying they're leaving optimizations on the table, of course they're not. There must be a good reason why it's not a good way to go at things. |