I’ve seen it argued online that the 6502 processor is not well suited to the C programming language – or vice versa. Working on a project like this certainly makes you sensitive to what the compiler is doing.
As an example, I recently made a very minor change to how a pair of structures in memory were being addressed. I went from:
MapInfo* gMap = (MapInfo*)0xB400; MapDirectory *gMapDirectory = (struct MapDirectory*)0xB500;
to:
#define gMap (*(struct MapInfo*)0xB400) #define gMapDirectory (*(struct MapDirectory*)0xB500)
The theory behind this change was to tell the compiler that instead of dereferencing a pointer, it could treat these two names as global variables of their respective types. You might not think that would make any real difference, but it reduced the size of the compiled game code by 500 bytes. On a C64, 500 bytes is enough to care about.
It turns out that dereferencing pointers on a 6502 is actually kind of weird. The easiest way to do it requires the pointer value to be stored in zero page. Then you can do an “indirect indexed”* memory operation, like this:
LDA (ptr), Y
Of course, the maximum offset you can put in the Y register is 255. If you have a bigger struct or array than that, you’re going to have to do some more math.
This also affects some of your strategies for storing data. Instead of arrays of structs like you might use in a more typical C environment, you can get better performance by breaking things into individual arrays. For example, my data storage for NPCs in the game looks like this:
// Entity List #define MAX_ENTITIES 30 #define ENTITY_TYPE ENTITY_LISTS #define ENTITY_X (ENTITY_TYPE + MAX_ENTITIES) #define ENTITY_Y (ENTITY_X + MAX_ENTITIES) #define ENTITY_NEXT (ENTITY_Y + MAX_ENTITIES) // entity below this one on square #define ENTITY_VALUE (ENTITY_NEXT + MAX_ENTITIES) // hp for things that can be attacked #define NPC_ID (ENTITY_VALUE + MAX_ENTITIES) // index for NPC name and conversation #define ENTITY_FLAGS (NPC_ID + MAX_ENTITIES)
So instead of saying “ENTITIES[3].x” to get the position of the 3rd entity in the list, you’d say ENTITY_X[3]. Doing it this way is a lot less math for the CPU to deal with.
A related issue to all this is how the 6502 manages the stack. A “typical” C implementation makes heavy use of the stack for function parameters and local variables, as well as return addresses. But the 6502’s stack uses a fixed position and a fixed size – 256 bytes (the stack pointer register is basically just another 8-bit index register like X or Y). That’s not going to go very far.
The CC65 compiler actually implements a separate software stack to take pressure off the hardware stack. If you look at the assembly output for a CC65 C program you’ll see a ton of subroutine calls to things like “pusha” and “pushax”. There’s a tradeoff here – you gain some space, but you lose speed from all those subroutine calls.
Because of this, I’ve also found myself avoiding passing and returning values, something that I’d never worry about when writing for a modern CPU. For example, when working through the details of the combat system, and all the utility functions for finding nearby enemies, animating projectiles, and so on, I eventually tore out all the parameter passing and stuffed that data into another global struct:
struct Action { uchar Entity; uchar Action; // attack, spell, etc. uchar Direction; uchar Range; // max range for attack uchar dx; // x/y representation of direction uchar dy; // e.g. (0,-1) for south uchar TargetEntity; // target of attack uchar AttackBonus; uchar Damage; }; struct Action gCurrentAction;
All the parameters that I’d ordinarily fill in – and some of the local variables that I might reuse, like dx/dy – are stored in fixed locations here. Looking at the assembly output from CC65, this technique is going to pay off – and it’ll also make it easier to interact with hand-written assembly, which I’m going to need to do sooner or later**.
*By the way, there’s also an “indexed indirect” addressing mode. Don’t ask.
**Probably sooner.
Does 6502 provide stack-indexed addressing (specify constant offset from SP), or are you restricted to pushes and pops? I’ve been wrestling with 68hc11 which only provides the latter directly, the former has to be simulated by copying SP into 16-bit (slow) Y index register, which is a major hit and nuisance.
Not directly – you have to copy the stack pointer to the X register to do anything with it. For example, you could load the byte at SP+3 with:
TSX
LDA $0103, X
which takes 6 clock cycles.