Raiders of the Lost ROM part 2

In my previous post, I documented the process of expanding Quadromania from 128KB to 256KB, and adding music to the menu screens. This time I’ll walk through my attempts to modify in-game audio.

Naturally it would be fun to have in-game music. So, I started with what I had: a digital audio player already baked into the ROM. Invoking the launch_sound routine (discussed in part 1) was pretty easy since I could mimic other instances of this operation, but the way the music plays on the menu screens is not directly suited for in-game music. For example, on the game over screen, there is a static display of the final score and level, and the only looping at this point is the music player and checking for a button press to exit.

So the music player has complete control over the looping in this instance. But that’s not the case in the game loop, where pieces are falling constantly and the game is checking the state of the entire playfield in each loop. Innovation is required here, but this will require patching new code into the main game loop. Let’s talk about how that process works.

Patching binary code (meaning pre-compiled / pre-assembled code, as opposed to re-assembling from complete source code) with new features requires the addition of new code to perform said features. Thus, we must establish a few ground rules:

  1. There must be ROM space available to hold the new code.
  2. There must be RAM space available to hold the new code (the Lynx, unlike other systems such as the Jaguar, cannot execute code in ROM).
  3. There must be RAM space available to hold any new variables or fixed data tables required by the new code.

ROM space is not a problem, as I doubled the space from 128KB to 256KB, and most Lynx games only need a few KB of RAM for code; the rest of the 64KB is occupied by (typically two) 8KB display buffers, sometimes an 8KB collision buffer, and any graphical or audio assets utilized by the code. So I need to focus my detective work on identifying what RAM is available for code, variables, and data.

Most Lynx games set the display double-buffers to the top of RAM. This has a benefit in that the buffers can be mapped over the Suzy and Mikey HW register space and not cause any issues, so you effective get to double-purpose the high-end of the Lynx 16-bit address window. A quick check of the Quadromania source did confirm that, yes, the game is using top of RAM for this:

DISPLAY_BUFSIZE	.eq	$1FE0
Buffer1		.eq	$fff8-DISPLAY_BUFSIZE
Buffer2		.eq	Buffer1-DISPLAY_BUFSIZE

According to the source, Quadromania does not use the collision buffer, so no worries about that. Next I looked at the ROM directory to determine the size of the single-load of code + data. Using my Python script, I decoded the corresponding ROM entry into this:

Entry # 1: 06 C0 00 00 00 08 CC 8F
 Block # / Offset    0x6 / 0xc0
 RAM address         0x800
 Length              0x8fcc
 File start address  0x1900
 File end address    0xa8cb

So the code is being loaded to address $800 and has a length of $8fcc. That means the end of all code and data is $97cc. Working backwards from the end of RAM ($ffff technically, but in reality $fff8 is where Buffer1 ends), the two buffers start at $c038. Subtracting those two yields $286c or 10,348 bytes free. Perfect! That’s plenty of space for some ROM hacking trickery.

But what about variables? Think of the poor variables! Yes, I need to find a scratchpad in RAM for any new variables my patch code might require. In the old Epyx tools, they used a compiler directive to indicate that some variables belong in the lowest part of RAM starting at address zero, which is appropriately known as the ZPAGE. So I found this in the partial source (variable names redacted to protect the innocent):

    BEGIN_ZPAGE
    variable1 .ds 1
    variable2 .ds 1
    ...
    END_ZPAGE

Turns out the game was only using about 25 bytes of ZPAGE, which means I have the rest of the 256-byte ZPAGE all to myself. Plenty of space!

I decided to place the new patch code at the end of the ROM file. When I expanded the ROM to 256KB, I intentionally left roughly 1KB between each ROM file; I figured there was no need to pack everything tightly. I had 820 bytes of pad between ROM file #1 and file #2, which I assumed was plenty of space for the patch code. I updated the length field shown above to $9300 and confirmed that loading additional garbage into RAM did not adversely affect the game. It did not, so now it was time to actually a) write some new code, and b) figure out how to patch it into the main game loop.

Writing some simple new code wasn’t too hard. I created a new quadro_patch.src and confirmed I could generate something simple like a few nop instructions followed by rts, to simulate a new subroutine being invoked from the existing code. The 65c02 assembler allows for an ORG directive to place code at a specific address. This only matters if any of my code is going to call other patch routines since it will need to know the absolute address for a jsr, as opposed to a relative offset for something like a beq or bge. (Side note: if you’d like to learn about 6502 opcodes, I recommend this and this.) I placed my code at $9800 to keep the math simple in my head.

Next, I need to find a relatively harmless spot to insert a jsr $9800 into the existing ROM, just to demonstrate patching works. The safest way to do this is to find a spot where a register or variable is being loaded, like this:

    lda #2
    sta start_level
    jmp restart_game

The first two innocuous instructions accomplish one task: storing the hard-coded value of 2 (the number 2, not the address 2) into a variable called start_level. The next is a jump to another routine (akin to GOTO in ye old BASIC lingo). Using my detective skills honed in my previous post, I built that tiny code segment with a fake address for start_level and a fake address for restart_game.

    lda #2
    sta $1111 ; Fake address
    jmp $2222 ; Fake address

    Result:
    A9 02 8D 11 11 4C 22 22

Using a hex editor, I searched the ROM for the bytes $A9028D but found about 10 instances. Hmm, OK, not as easy as the last time I hunted in the ROM for a byte string. But still, 10 is not too many. I scanned all 10 instances visually, looking for a 4C occurring two bytes later. (Yes, I could write a Python script to do it, but it was quicker by hand.) Turns out there was only one. Yay! Now I have learned some particularly useful things: the ROM address where that code snippet resides and the actual addresses for start_level and restart_game. In my patch code, I can take advantage of this knowledge:

start_level .eq $7a85

.ORG $9800

; Start of patch routine

; Do some stuff
    nop
    nop
    nop

; Restore patched instructions and return
    lda #2
    sta start_level
    rts

The block of nop instructions which are literally “no operation” is a placeholder where I can put real code that does stuff. The last bit of the routine is simply mimicking the behavior of the code where I will insert the patch.

Back in the ROM, I now need to replace the code where $A9028D857a occurs (note the endian swap on the start_level address, from $7a85 to $857a) with a jsr $9800 which translates to three simple bytes: $204826. I decided to overwrite the sta $7a85 operation because it occupies 3 bytes. I did this by hand (later I would learn a better method). That means the patched ROM will now execute lda #2, then jsr $9800, then jmp restart_game. I tried this out on the Handy emulator and it worked! Very cool to know I can control my own destiny now by hacking code. 🙂

So… that was a very long tangent on how ROM patching is done. Now let’s get back to inserting music during gameplay. Using the partial source, I was able to identify where the main game loop was by using some unique looking markers in the code (loading unusual values to registers, multiple jmp or jsr instructions in a row, etc.). I then picked a spot to patch with a jsr $9800 to some newly written music player code. Again, the partial source helped here because I could see how other songs were set up in the A / X / Y regs, and what the loopback code did to keep the music playing. So I essentially copied this to my new patch just as a brute force test. It would not recognize Opt 2 to mute the song, and in fact it might end up playing at unintended times such as between levels, but it’s enough to demonstrate proof-of-concept.

This process went… OK. After a number of tries and massaging the code so it knew when to start playing a song and how many ROM segments to load before looping back, I actually got in-game digitized music working. But there were three big problems:

  1. Music was pretty much destroying all sound f/x.
  2. Music slowed down the game by about 25%.
  3. Music couldn’t be very long, as ~20 seconds requires ~125KB ROM space.

Starting with the first problem, I knew there must be a conflict in how the audio channels are being allocated. The Lynx has four audio channels, and by inspecting digi.src and hsfx.src (both from the original Epyx developer libraries), I could see they were intended to work together to manage both f/x and music across the channels. However there were a couple of tiny snippets of code that bothered me in digi.src:

#IFNDEF HSFX_ACTIVE
   stz digichannel
#ENDIF

; lots more code...

#IFDEF HSFX_ACTIVE
  jsr AllocAudio
  bcc .1
  pla
  rts
.1 
  stx digichannel
#ENDIF

This suggested that if Quadromania was not built with the HSFX_ACTIVE directive, then it would assume digichannel was always zero. A little detective work to hunt down the location of digichannel revealed it too was in the ZPAGE, and a quick play on the debugger version of Handy confirmed that digichannel was always zero regardless of where I stopped to check it. On the menu screens, that’s not a problem because there aren’t any competing sounds. But in-game, that’s a problem.

I could have tried to rebuild the whole digi.src and hsfx.src code, and swap them out of the ROM, but I took a more mundane approach: since audio channels were being allocated from low to high numbers, why not force digichannel to always be 3? It’s pretty unlikely for there to be 4+ simultaneous f/x, after all. This ended up being a pretty reliable fix, as now I could have music play simultaneous with f/x and only very rarely have a conflict where a particular f/x is drowned out.

However, the bigger issue is #2 which is performance. Also the digitized audio doesn’t sound the best in-game, but that could be an artifact of how I compressed the music. Regardless, I honestly didn’t spend a lot of time on this issue, because I did not want to learn the entire game loop from top to bottom and see if I can make it efficient enough to avoid the 25% performance drop. The performance drop has a big impact on how the game plays, so that’s a no-go. Of course, I could try writing a really lightweight MIDI-style music player, or incorporating another existing music player, but given the performance issues again I didn’t think it was worth dozens of additional hours just to experiment here.

The only way to “fix” issue #3 would be to further extend the ROM from 256KB to 512KB. This is very doable, but given I didn’t want invest the time in the fix for #2, it’s a moot point.

Feature #3: FAILED

Guess it wasn’t my finest hour! But maybe someday I’ll post a video of the in-game music test, just to show it really happened. 🙂 Tune in next time to hear me talk about stereo and panning on the Lynx and why I looked up 20-year-old source code to Ponx!