Threading limitations 
Author Message
User avatar

Joined: 2014-09-25 13:52
Posts: 8294
 Threading limitations
The thing that really holds back the design of higan to this day is threading.

Obviously, we have libco that runs each "processor" in its own thread (sometimes the line is fuzzy and I'll split integrated APUs from CPU cores.)

But this really isn't enough. When you look at processors, it's obvious many of them are doing multiple operations in parallel. But instead of running those operations in parallel with their own unique subthreads, we instead add lots of if((cyclecount&granularity)==0) checks in the main time functions, and then design hacky functions that act like mini state machines to advance the operation of things.

The reason we do this is because even cooperative threads just don't scale. For whatever reason, most likely pipeline stalling caused by changing the stack pointer, they're just far too slow. And now I have a situation where I pretty much require this, but can't have them because they're too slow. But if you look at the GBA PPU, the object renderer has variable timing per sprite based on whether the sprite is affine (rotozoomed) or not. This is not the kind of thing that I can make work like I have in the past with other chips via annoying clock counting kludges and mini state machines. The logic of trying to break the GBA PPU object renderer up into little one-cycle chunks would result in unreadable, sloppy code.

I've been trying to think of alternatives ... like maybe a libco-lite that doesn't swap out the stack frame. But the problem is we really need the stack frame, or all the local variables mean nothing. I could pass along a struct that contains all of the variables used in said function, and then prefix them all with "context." or something, but even that would make the code a whole lot uglier.

And I'm not even sure that'd work. Running these threads would require a scheduler that keeps track of how much time passes in each thread. And even then, serialization would become an even greater challenge, as we can't really preserve the position inside of a suspended 'libco-lite subthread' in a portable manner to be restored later. Maybe not even at all with truly position-independent code.

I really don't think C++'s proposed coroutines would save us either, and I can't even rely on a feature that isn't even in the standard yet, and only in experimental development branches of Clang. But even if we had them, I have really strong doubts that they'd magically be faster than the extremely minimal and optimized functionality I already have in libco; and would likely have the same issues with serialization.

Sigh. I really wish computers' exponential processing power would've kept going. I would so love to run the emulation with dozens of threadlets.

_________________
What the hell's going on? Can someone tell me please?
Why I'm switching faster than the channels on TV.
I'm black, then I'm white. No, something isn't right.
My enemy's invisible, I don't know how to fight.


2017-06-17 22:52
User avatar

Joined: 2014-09-27 09:23
Posts: 2201
Location: Germany
 Re: Threading limitations
In other words, you need to make the GBA PPU a state machine.

_________________
My setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → Multi Out to SCART cable → EuroSCART to Mini cable → Framemeister (with Firebrandx' profiles) → AVerMedia Live Gamer Extreme capture unit → RECentral 4 viewing/recording software


2017-06-18 00:15
User avatar

Joined: 2017-06-02 01:15
Posts: 20
Location: Yes
 Re: Threading limitations
byuu wrote:
Sigh. I really wish computers' exponential processing power would've kept going. I would so love to run the emulation with dozens of threadlets.


Do not loose hope friend! If we can find a non-silicon material suitable enough to make transistors, we just might be able to break the barrier and make processors with theoretical limits upwards of 500GHz with materials like graphene! This is of course assuming they can figure the whole lack of bandgap snag for graphene. Right now we're stuck with analogue graphene chips (which might be interesting in it of themselves).

Supposedly carbon nanotubes will work too, can have a bandgap, and might be able to get close to graphene in terms of theoretical limits. My personal bet is on the nanotubes, and I'd sincerely doubt it's practical to achieve anywhere close to 50GHz any time soon.

_________________
Warning: Krayzar is 75% weasels by volume.


2017-06-20 02:11
User avatar

Joined: 2014-09-27 09:22
Posts: 5157
Location: A chair.
 Re: Threading limitations
[Krayzar] wrote:
byuu wrote:
Sigh. I really wish computers' exponential processing power would've kept going. I would so love to run the emulation with dozens of threadlets.


Do not loose hope friend! If we can find a non-silicon material suitable enough to make transistors, we just might be able to break the barrier and make processors with theoretical limits upwards of 500GHz with materials like graphene! This is of course assuming they can figure the whole lack of bandgap snag for graphene. Right now we're stuck with analogue graphene chips (which might be interesting in it of themselves).

Supposedly carbon nanotubes will work too, can have a bandgap, and might be able to get close to graphene in terms of theoretical limits. My personal bet is on the nanotubes, and I'd sincerely doubt it's practical to achieve anywhere close to 50GHz any time soon.

Also, now that Intel has some real competition they have a reason to stop selling us incremental variations.
It did not escape my notice that they raised the roof on a higher top-end while slashing prices. Though they're still gating features to protect the profit magin on the Xeons.

_________________
Just in case you thought something could EVER be straightforward, and needed someone to dash your hopes across the rocky shoals of harsh reality.

; write !!!


2017-06-20 03:20