Screwtape |
Posted on 19-09-18, 12:32 in delta patching, bsdiff edition
|
Full mod
Post: #342 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
byuu's been tinkering recently with beat, his patch tool for the BPS file format, and I've been testing it out. The big change in the recent versions is that instead of brute-force searching for common parts in the old and new files, it uses a suffix array (and other things) to quickly and efficiently find the best matches. I've been benchmarking it against other tools, but although it's an improvement, it's still nowhere near as efficient as bsdiff. I've tried before to understand how bsdiff manages to be so efficient, and failed every time. Partially this is because I was reading the doctoral thesis containing "a far more sophisticated algorithm", which is hella dense. I'm not sure if I noticed it before, but the bsdiff webpage also links to a much smaller paper titled "Naïve differences of executable code", which describes a much simpler algorithm. Probably you should go read it yourself to understand it properly, but for practice I want to try describing it in my own words, and I figured I'd do it publicly so Google can find it and maybe help somebody out later. Like modern beat, bsdiff starts with suffix arrays. Unlike beat, it doesn't just start at the beginning of the target and blindly encode every byte as best it can. Instead, it maps out the common chunks of each file and analyses them. Let's look at an example:
The content "aaa" is the same between source and target, as is the content "ccc..." and "eee...". However, the target replaces "bb" with "X", "ddd" with "YYY", "ff" with "ZZZ", and moves the "aaa" block from the beginning to the end. A tool like beat would encode this diff with instructions like this: - write "X" - copy 7 bytes from source offset 5 - write "YYY" - copy 5 bytes from source offset 16 - write "ZZZ" - copy 3 bytes from source offset 0 However, bsdiff is a bit smarter. There's not much that bsdiff can do about "aaa" moving, or the changed content in "X" or "ZZZ", so it handles them in much the same way as beat would. However, it treats "ccc...", "YYY" and "eee..." specially. You see, the important thing about "ccc..." and "eee..." is not only that they exactly correspond between the source and target, but that they are the same distance apart in the source and target. When a program is updated, usually only a small part of it is updated. Most of the code remains the same, and therefore the compiler generally produces the same instructions. However, small changes often ripple through the entire program - if a data structure gains an extra field, making it 16 bytes large instead of 12, a whole bunch of 12s will change to 16s throughout the code even if the surrounding code stays the same. The key point is this: when an instruction stream like "load address; add 12; push" changes to "load address; add 16; push", the parts that stay the same stay the same distance apart, and so these kinds of changes get picked up by the "same distance apart" logic described in the previous paragraph. Here's the clever bit: when bsdiff detects a sequence of same-distance-apart matches, it doesn't record a "copy the source" or "write literal bytes" instruction, it records a "subtraction" instruction. It takes the whole consistent-length sequence, mismatches included, subtracts the source from the target, and records the subtracted data in the patch. If our source is "load address; add 12; push" and our target is "load address; add 16; push", the result of the subtraction might be 0, 0, 0, 0, 0, 0, 4, 0, 0. That is, since a whole bunch of 12s got changed to 16s, the subtraction is going to produce a whole bunch of 0s and 4s. By itself, that's not much of an improvement, but the thing is that this kind of encoding compresses really, really well. So, the bsdiff encoding of our original patch might look something like this: - write "X" - add 15 bytes to source: 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0 - write "ZZZ" - copy 3 bytes from source offset 0 The bsdiff file-format includes bzip2 compression, squeezing down those "add" operations significantly. The bad news is that because this "add" operation is a new and different thing, BPS patches can't take advantage of it. The good news is that the kinds of changes bsdiff is optimised for are fairly specific to code compiled from a high-level language. Code written in assembly generally doesn't make changes like this because it's a total pain in the ass, and I imagine things like fan-translations generally leave the original code alone or replace it wholesale, and BPS was really designed for that scenario. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-21, 12:38 in delta patching, bsdiff edition
|
Full mod
Post: #343 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Posted by sureanem xdelta3 uses rolling hashes, and does pretty well - but not as well as bsdiff. The problem with using a hash to index the contents of the source is that it's not practical to record a hash for *every* offset (that would take even more memory than a suffix array) so you pick some subset of offsets (say, the ones whose hash has 5 trailing zero bits) and hope that that's enough to find all the matches. But it's not guaranteed; there's always a chance that one of the matches has 4 or 3 trailing zero bits, or even that there's matches smaller than the rolling-hash width. If I ever write another BPS patcher, I'll start with a suffix array, and see if I can figure a way to get, say, 10 near-matches rather than the algorithmically "best" match. Because of the way BPS works, a not-quite-optimal match nearby can actually be more efficient than a slightly longer match further away. I still think there's a lot of room to make even more-efficient BPS patches. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-21, 12:41 in Official Discord
|
Full mod
Post: #344 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
byuu has set up an Official Discord for discussing higan and bsnes. I think https://discordapp.com/invite/kpd975Q is the link he posted to Twitter, I am not good with twoots. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-21, 12:50 in bsnes v109 released
|
Full mod
Post: #345 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Posted by jarsonic I'm going to guess it's the always-on variable-refresh-rate support. I suspect something about that requires bypassing the Windows compositor (which tends to lock every app at a consistent frame-rate) and therefore Windows doesn't have access to the image to copy it to the clipboard. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-22, 07:26 in delta patching, bsdiff edition
|
Full mod
Post: #346 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Posted by sureanem One of the nice things about the BPS format is that it supports files larger than 16MB. Arbitrarily sized, even, given it uses variable-length integers. It would be a shame to have a file-format that supports patching terabyte disk images, but the patch-creation tools only support files of a few megabytes. How do you mean 'further away' and 'nearby'? A BPS patching tool maintains "source offset" and "target offset" variables. When you copy a byte at offset X in the source to offset Y in the target, the "source offset" and "target offset" variables are updated appropriately. The source copy instruction stores its offsets relative to the current values of the source and target offset variables. Since they're stored as variable length integers, making the next copy instruction start close to where the last one left off can be a significant savings. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-24, 11:29 in How do CRCs work?
|
Full mod
Post: #347 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
A fifty-minute video explaining the mathematics behind CRCs, including polynomial division and finite fields: https://www.youtube.com/watch?v=izG7qT0EpBw Another fifty-minute video, explaining how all that mathematics boils away into three 74LS chips on a breadboard: https://www.youtube.com/watch?v=sNkERQlK8j8 Just for completeness, here's a ridiculously complete article that covers the same basic information, except it tries to explain every possible variant of CRC algorithm: http://ross.net/crc/crcpaper.html The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-25, 09:51 in Delta Compression tests, 2019 edition
|
Full mod
Post: #348 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
https://zork.net/~st/jottings/delta-compression-tests-2019.htmlNearly a decade ago, I spent some time learning about delta-compression tools and writing a test-harness to compare them. Recently, some of those tools were updated so I decided to update my tests and run them again. This update features Alcaro's "Flips", and byuu's new "beat" patcher. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-25, 23:31 in Computer Hardware News
|
Full mod
Post: #349 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
I'd heard there was a problem, I hadn't heard it had been diagnosed. Fascinating. > Don't ask me how pros are wiring modern GPUs to their trashcan Mac Pros As I understand it, the Trashcan has a bunch of Thunderbolt or USB-C ports or something, which can carry a PCI Express signal, so you can buy an external PCI enclosure. It's basically a box with a PCI slot in it, and a cable that connects to your Mac, much like an external hard-drive enclosure is a box with a SATA connector in it. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-27, 02:44 in Computer Hardware News
|
Full mod
Post: #350 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
That thread has a whole bunch of people who apparently assume that "silicon PCB" means "the entire computer has to be fabricated at the same time, on the same wafer, and with the same super-expensive process as the CPU". The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-28, 07:20 in Games You Played Today REVENGEANCE
|
Full mod
Post: #351 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
I've now achieved three endings for Bloodstained - the "kill Gebel too early", "kill Gebel the wrong way" and "kill Gebel the right way" endings, and got 99.7% completion (of the map, I guess, much less of items and monsters and alchemy recipes and all that). I love the game's music, the game is quite pretty (although I had the graphics on "low" so I'm sure it gets much prettier), I loved the exploration and the variety of zones, I loved that even coming back through a zone after beating the boss, I was much better at fighting all the monsters than I had been going in. My biggest problem, though, was myself rather than the game. Previously, I played the GBA Castlevanias on the way to and from work, so I if I got stuck I'd very quickly have to stop playing and do something else for a bit. On PC, when I get stuck I get stuck for a long time, and that makes me frustrated and less willing to resume playing later. Also, when I got stuck on the GBA, I was stuck on a train, so I couldn't do anything else but read item descriptions, try stupid things, etc. When I get stuck on PC, I can just look up a wiki or something, and while that's immediately gratifying, it also makes the game less fun. Bloodstained is also a good deal more obscure than the GBA Castlevanias, and has a lot more dead-ends to exhaust, making this an even bigger problem. I wish I'd first encountered this game when I was (let's say) 12, and had a whole summer to pour into it, exploring, grinding, messing with gameplay systems, and all that. Or even that I still had a regular commute - I hundred-percent-cleared Aria of Sorrow, after all. I'm sure I would have had a *blast* and made a lot of fond memories. As it is, though, even though I immediately started NG+, I think I might uninstall it and grab the next game from my overflowing "to play" queue. Who knows, I might come back to it later. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-09-29, 11:41 in [higan] WIP news
|
Full mod
Post: #352 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Just in case there's anybody here who wanted to try out a recent higan WIP, but was put off by how strange and different the UI is, I wrote up a brief tutorial on getting higan v106.217 (the latest WIP as of this post) to launch a SNES game: https://zork.net/~st/jottings/Getting_Started_with_higan_v106.217.html Currently the instructions are fairly specific to Linux, because that's what I have access to, but if somebody on Windows or macOS can figure out how to where to put the various pieces, I'd love to add that information. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-02, 00:18 in (Mis)adventures on Debian ((old)stable|testing|aghmyballs)
|
Full mod
Post: #353 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Recently, Debian Testing finally updated to gnome-terminal 3.34, in which they finally got around to making all the various menu-options accessible from title-bar controls, so you can use them even when the menu-bar is hidden. For a straight-up GNOME 3 desktop, that's an improvement. For people using gnome-terminal on a non-GNOME 3 desktop, it can be a drawback. Title-bar controls mean a client-drawn title-bar, so window-managers that actually manage windows (like i3, dwm or awesome) now have extra, redundant title-bars hanging around eating up space. I'm not angry, because it's reasonable for the GNOME people to make gnome-terminal better for GNOME at the expense of other environments, but I'm now in the market for an alternative, visually-lighter-weight terminal emulator. Unfortunately, it seems that gnome-terminal is absolutely the most polished terminal emulator around, and everything else is a step down. My current shortlist of alternatives is: - xterm is a bit primitive, but rock-solid, and I already have it configured to my liking. - pterm is a straight-up port of PuTTY to POSIX, and it feels a bit weird to use it as a local terminal - QTerminal feels weird because it's Qt-based, and sometimes it lays out Unicode text oddly I'm definitely not interested in: - st, because I don't want anything to do with the suckless "community" - kitty, because I'm not interested in managing Yet Another Config File - alacritty, ditto The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-03, 01:29 in Mozilla, *sigh*
|
Full mod
Post: #354 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Posted by Kawa It's not illegal to have a monopoly; that kind of thing can happen quite naturally without any wrongdoing or ill-intent. What's illegal is to use a monopoly in one field as leverage in another. For example, using a monopoly on oil refining to prevent competitors from buying oil pipelines or rail transport, using a monopoly on the telephone network to prevent competitors from using their own equipment, or using a monopoly on computer operating systems to prevent competitors from signing distribution deals with computer manufacturers. Notably, you don't need a 100% guaranteed monopoly to abuse your power illegally, you just need to have enough market share that your suppliers and customers will think twice about doing business with your competitors. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-16, 13:14 in Games You Played Today REVENGEANCE
|
Full mod
Post: #355 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
If you want to be an obnoxious goose, but don't want to shell out for a Switch, or pay money to Epic, may I present: Office Goose. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-22, 12:35 in Making interactive fiction
|
Full mod
Post: #356 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
I think just about every thread on this board is complaining about *some* kind of technology, but what the heck, let's add another to the pile. Recently I got a hankerin' to write a bunch of room descriptions, and so I thought I'd play with some interactive fiction (i.e. text-adventure) tools. My first thought was good ol' Inform for the classic STAB TROLL WITH SWORD adventuring, but Inform 7 is still in rewrite-limbo and Inform 6 is kind of intimidating. Since I didn't want to faff about too much with Tech just to write a few paragraphs, I decided to go with the darling of the "everyone can make games" movement, Twine. I started by downloading the Official Twine App and poking around, but it turns out to just be the website packaged in Electron, which is cool, but not *quite* my preferred environment. Also, the format used for storing games on-disk appears to be the same as the published format, minified embedded JS and all, which isn't great for source-control. Luckily I found a thing called Tweego, which is a command-line tool that takes plain-text source in a reasonable format, combines it with the Twine engine and a stylesheet to produce a distributable HTML file. The source file format has an actual specification, although Tweego is the only tool I've found that reads or writes it — Twine certainly doesn't. But this is a rant thread, so of course I'm still not quite happy. Twine has a strange idea of story formats, which appear to control everything above the level of "a game is a collection of passages". That is, how markup is parsed, what effects you can do, how scripting works, what graphical effects are available and the user-interface presented to the player. The docs suggest that the "Harlowe" format is designed to be easy to get started with, so that's what I picked. Unfortunately, it's kind of infuriating for a bunch of reasons. - The markup syntax is almost like Markdown, but not identical. - It seems to be based on the formatting model half-assed forum software uses, that just translates newlines to <br> tags instead of making actual paragraphs. That means I can't hard-wrap my source file to a convenient width, and it means any markup/scripting I use has to be contorted to use minimal whitespace, lest a newline get through and screw up the spacing of the resulting HTML. - It also seems to be based on the easiest possible scripting model, where script instructions in the markup cause JS code to execute when each passage is displayed, and so things like "what variables exist" and "how are they updated" get scattered across the game in the most spaghetti-code style imaginable. It's fundamentally designed to mix logic and presentation in the most annoying way. A while ago I played with writing a web app in Elm, and it was almost the exact opposite, with a beautiful separation of logic and presentation. Unfortunately, Elm doesn't provide Markdown or any other lightweight prose-friendly markup, and it definitely doesn't automatically provide the kind of user-interface affordances people expect a game to have, like undo, redo, saving and loading. It seems to me like there's a niche for an interactive-fiction tool that's easy to get started with *and* easy to extend, but maybe I'm failing to take Worse Is Better fully into account, and Twine really is the optimal solution. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-24, 10:00 in Making interactive fiction
|
Full mod
Post: #357 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
I hope you mean picking up keys to be used in locks, not picking up the locks themselves - that would make authoring puzzles quite difficult. :P (I think a version of Zork had a bug where you could take compass directions and put them in your inventory - in order to allow the player to GO NORTH there had to be a thing called "NORTH" in each room, and if it didn't have the "immovable object" flag set you could take it with you and cause havoc.) Something that requires a PHP backend might make it easier to for people to run on older, simpler devices without JS, but I feel like these days there's a lot more people without reliable network connectivity than without JS support, and I really like the idea of sticking a static HTML site on a webserver that people can save and run offline, or even make it a progressive web app so people can install it on their phones without me having to package it separately for each OS and device. > I prefer Inform 7 myself, really. I did play with Inform 7 back when it was still getting regular updates, and it was quite fun, but right now there's only pre-built binaries for the command-line tools, and the Linux GUI is old enough it doesn't install on my system. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-25, 14:13 in Making interactive fiction (revision 1)
|
Full mod
Post: #358 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Posted by wertigon Back in the day when iPods were a thing — I mean the iPods with the spinny wheel on the front, not the iPod Touch that was just an iPhone without a SIM card, or the iPod Nano that was just a USB stick with a headphone jack and a play button — it had a "Notes" feature, where you could stick a bunch of text-files in a directory and they'd show up in the "Notes" folder so you could browse them while you were out and about. And in fact, they supported a limited amount of HTML markup - basically just hyperlinks. I started writing an app that would take a config file of rooms and exits, and write out all the notes required to represent that as a text adventure. For example, here's an excerpt from one of my test files:
(for the heck of it, I put up the full output of this unfinished, half-assed project) Although I never quite got that far, my intention was to let the author define a bunch of binary flags, and declare that particular passages set or cleared them. For example, a passage named "takeKey" might set the "hasKey" flag. A passage could have different content depending on what flags were set, and the compiler would generate a copy of each passage for every possible combination of flags, and make sure each copy linked to the correct places. For example, let's say you have 16 passages, so we can number them in hexadecimal 0-F. Let's also say we can have four binary flags, so every possible state of every possible room can be represented by a number 00-FF. If you're in the Key Room (passage C) and the hasKey flag is not set, you're looking at 0C.html. The "take key" link in that room points at a version of the room where hasKey is true, perhaps 1C.html, and all the links in that version of the Key Room need to keep the hasKey bit set (unless one of them is "drop key"). The big problem with such a scheme is the combinatorial explosion of versions of a room. Even a basic hallway that never changes its description needs to have a bunch of different versions so that the player can pass through without the game forgetting any state. I had some ideas to limit the explosion by having the compiler only generate pages that were actually *reachable*, but I lost interest before I actually tried anything out. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-27, 04:47 in Stupid computer bullcrap we put up with.
|
Full mod
Post: #359 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
Posted by tomman I think this is one of those unfortunate path-dependent things. TeX itself is rather small, but it supports macros, and people wrote macros for all kinds of crazy and useful things. Back before reliable, high-speed internet access was widespread, people still wanted all the useful macros and fonts and other addons, so "TeX distributions" competed for who could bundle the most and most-useful stuff. Later on, when software documentation needed to be written, it was often written in TeX because that was the technical-writing system people were familiar with. Even later, when users wanted "online" documentation as well as books on shelves, people made documentation systems like TeXinfo and Sphinx that would take a master document and render it to multiple backends, including TeX. Even if you don't particularly want a 700-page manual on your shelf, or even a PDF version of the manual (because you've got a nice, searchable HTML version), the documentation tool you use probably has a TeX backend, and so there's a dependency, and so a simple task like "convert this lightweight markup file to a man page" involves downloading gigabytes of TeX that will never be used. The ending of the words is ALMSIVI. |
Screwtape |
Posted on 19-10-27, 06:02 in Stupid computer bullcrap we put up with.
|
Full mod
Post: #360 of 443 Since: 10-30-18 Last post: 1103 days Last view: 174 days |
What would extracting a DLL with 7-zip do? Generally "strip" removes everything but the actual executable code, and things the executable code directly needs; it's pretty common for strip to break things if your executable has some kind of metadata section or extra resources. I don't know about DWARF debug symbols specifically, but if you consider that it has to represent the filename and line number of every single instruction, plus the layout and field names of every struct and every stack frame, the names and values of every enum, and probably multiple copies of all of those things once C++ templates get involved... it's pretty reasonable for debug symbols to be a hefty multiple of the size of the executable size. The ending of the words is ALMSIVI. |