Monday, September 25, 2023
HomeUncategorizedI made the TI-84 emulator 10x faster by replacing the switch box

I made the TI-84 emulator 10x faster by replacing the switch box

There is a JavaScript emulator for TI83+, TI84+ and TI84+ CSE calculators called jsTIfied by Christopher writes Mitchell, founder of cemetech.net, a website for calculator enthusiasts. If you have a native option available, there’s not much reason to use it for anything else, but if you don’t, that’s great. I’m interested because it was the first emulator to support TI84+CSE when the calculator was released in the early 2010s. CSE is exciting because it adds a 320×240 color display to the 84+SE hardware platform, so all other hardware and OS access is the same except for graphics. I wanted to be one of the first game developers for CSE, but developing for calculator without an emulator and debugger was quite a pain, so I tried jsTIfied with my old calculator ROM to get a feel for it.

jsTIFied has a problem: it’s too slow. These calculators use the z80 processor and are very simple to simulate. But jsTIFied can’t even simulate a calculator model at full speed at 6MHz, and the CSE’s processor is clocked at 15MHz, so it’s even worse. jsTIFied is closed source, but I decided I’d try to do something about it anyway.

Of course the first thing you do when debugging a web application is to go to the profiler. There is only one hotspot that dwarfs all the others, and that is instruction decoding and execution switching blocks. This is what you’d expect since these calculators don’t have any other sophisticated hardware to emulate, like pixel processing units or audio chips, but it seems a little suspicious. Yes, javascript is slow, but computers made in the early 2000s can emulate this calculator at full speed using native code. Javascript overhead is not enough to explain it.

So I started digging into the actual code. I had to un-minify it, but I’m used to dealing with obfuscated code from Minecraft. The instruction decode block has a huge switch block, and additional nested switch blocks for multibyte instructions. In most languages ​​this is fine because your compiler will turn it into a jump table, so why am I not seeing jump table performance here? I was a little obsessed with javascript performance at the time due to my WebGL experiments, and I’ve learned that JS engines at the time wouldn’t optimize functions beyond a certain size. Knowing this, I split all the nested switch statements into their own functions and let the parent switch call them to see if that solves the problem.

Now I need a way to actually load my code. I quickly started a web server on my computer which passed the request to the upstream website but intercepted the request to the emulator engine and returned my modified code. I switched /etc/hosts to point the upstream domain to 127.0.0.1 and loaded my code.

Unfortunately, I see basically no speedup. At this point I’m pretty sure I’m within the size limit of the function, so there must be something else missing. I’m looking around for low level details about implementing switch statements in javascript. Eventually I found a stackoverflow post where someone tried to do the exact same thing as me: optimize the (different) z80 emulator. It was then that I saw a deeply disturbing comment that directly cited the source of Chrome’s V8 source code:

@LGB Actually in V8 (the JS engine used by Google Chrome) you need to skip a lot of hoops to optimize switch cases: all cases must be of the same type. All cases must be a string literal or a 31-bit signed integer literal. And there must be fewer than 128 cases. Even after all those hoops, all you get is the if-else you would get anyway (i.e. no jump tables or anything like that). true story.

See the post for yourself here https://stackoverflow.com/questions/18830626/should -i-use- big-switch-statements-in-javascript-without-performance-problems#comment27798374_18830724

This is not what you want to hear when looking at a simulation with lots of switch blocks devices, especially all switch blocks with more than 128 cases. I let the optimizer run on my function, but when it gets to the switch block, it says “thanks, but no thanks, I’m fine”. I’m left with only one option, which is to wrap each case of each switch in a function, dump them all into an array, and do the lookup myself. So I wrote a script to do this.

The original code looks like this:

Switch(z8.r2[Regs2_PC]+ +) { case 0x00: // nop break ; case 0x01: // do something? break; // ... case 0xDD // index register prefix switch (z8.r2[Regs2_PC]++) { case 0x00: // do something break; case 0x01: // do something break; // .. . case 0xFF break; } break; // ... case 0xFF: break; } 

Then I translate it to something like this:

 let instr_table=new Array(256); let instr_subtable_DD=new Array(256); instr_table[0]=functon() { /nop */ }; instr_table[1]=function() { /do something, maybe */ }; instr_table[0xDD]=function() { return instr_subtable_DD[read_byte(z8.r2[Regs2_PC]++)](); }; 

did it all and I succeeded ! The emulator went from being as slow as molasses to

jsarray vs switch speed comparison

I have one more trick: note that those program counter register increments don’t involve & 0xFFFF keeping the value within the proper 16 bits. That’s because I switched to storing our registers in Uint16Array , which has built-in wrapping behavior (as it gets really honest u16s’ support). I think the overflow behavior is defined in the spec, but I don’t Actually keep in mind – either way, it works fine everywhere I’ve tried, but please do me a favor and check it out for yourself. Ultimately, this gives marginal performance benefits at best, but it removes a lot of bitwise operations from the code, and it’s generally harder to mess with the range of values ​​for register operations.

Before closing, I have to warn you that I don’t recommend this as a modern performance suggestion for any javascript of your own without first checking the V8 and spidermonkey sources. 2013 was a different era, and JS engines have come a long way since then. I really hope they do better than before.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

LAST NEWS

Featured NEWS