Rules for UTF-8 in SCI11+

It won’t work in ScummVM yet, as nothing uses it yet so I see no reason to add it. Most of the rest of SCI11+’s gimmicks do though.
The internal “draw a string” function, used to write literally anything on to the screen, is where the magic happens: if the current port’s current font has more than 256 glyphs in it, the input string is interpreted as UTF-8. If it does not, things work exactly as usual.
Because combining characters and glyph substitution are not supported and general punctuation like “” and … are all the way in the 2000–2044 range, the General Punctuation block’s glyphs take the place of Combining Diacritical Marks as 0300–0344.
Similarly, CJK Symbols, Hiragana, and Katakana are moved from 3000–30FF to 0200–02FF, where some Latin Extended-B, IPA Extensions, and Spacing Modifiers should go.
Those last two points apply to the font data, not the actual text.
The new kernel functions UTF8to16 and UTF16to8 will always consider their inputs to be in Unicode, no matter what the current port’s current font says. Unless you built an SCI11+ with UTF-8 support disabled, in which case none of the above applies and all these two functions do is turn 8-bit values into 16-bit.
The kernel function to turn a string lower or upper case, StrCase, unlike the two I just described, will check the current font and act like it used to same as the “draw a string” function.
The functions to get the lower or upper case version of a character that StrCase ends up using, tolower and toupper, have been extended to cover the full 256-character range. Several maps are included and one can be chosen at build time. We have maps for code page 437, Win-1252, ISO 8859-1, and a fair bit of Unicode.
In general, SCI11+ can be considered to use Unicode 1.1 on account of SCI 1.001.100 dating from 1993, going by the version numbers and release dates for Freddy Pharkas (1.001.095) nd Leisure Suit Larry 6 (1.001.115).

Logo Pending

Rules for UTF-8 in SCI11+

Leave a Reply Cancel reply