A couple days ago, I said the 6502 can’t multiply. I also said it can’t deal with numbers larger than eight bits wide. So here’s a trivial example where I get it to do both. It even almost fits on one screen.

The code above – the center columns are the actual assembly – is a multiplication algorithm for the 6502. It takes a 16-bit multiplicand times a 16-bit multiplier, giving me a 32-bit result.

(Actually, the multiplicand is 32 bits wide for implementation reasons, so you *could* multiply a 32-bit number by a 16-bit number. But then you’d have to worry about overflowing the 32-bit result.)

I don’t intend to make this a tutorial, but let me comment on what’s going on here. The two numbers we’re multiplying, and the result, are just going to exist in the computer’s memory. The multiplier is at $c104 and $c105, the multiplicand is at $c100 through $c103, and the result will go in $c106 through $c109. That’s all hexadecimal, of course, and the numbers are little-endian, if you know what that means.

Basically, we have a loop that examines each bit of the 16-bit multiplier. If that bit is 1, we add the multiplicand to the result. Of course, we have to add each byte of the 32-bit multiplicand and result separately – that’s what the whole middle section of the screen is doing.

Also, each time through the loop, we shift the multiplicand one bit to the left – that’s the ASL and ROL stuff at the bottom of the screen. A left shift is the same as multiplying by 2, so each time through we’re adding larger numbers to the result.

(The part of the code that got cut off the bottom of my screenshot, by the way, is just where we decrement our loop variable and then branch back up to the top of the algorithm at $c002.)

Really, this works the same way that people multiply large numbers by hand – it’s just using binary math instead of normal base-10 numbers. You look at each digit of the multiplier separately, generating a bunch of partial results, each with another zero at the end, that all add together for a final result.

Of course, this extends to arbitrarily large numbers. You could modify it to handle 64-bit, or even 256-bit integers, just by using this same pattern of instructions.

It’s astonishing just how much work is going on here! The loop executes *16 times*. If the bit of the multiplier we’re looking at is a 1, we have to do the add which takes up 13 instructions – and then we still have to bit-shift the multiplicand. Think about this if you’re writing a game and have to refresh the screen 60 times a second: It wouldn’t take much more than a dozen of these multiplications to use up all the time you have for doing your graphics, sound, and game logic.

And that’s the other really fun part of a project like this – seeing what you can do with limited resources. In my day job, a quad-core processor with 6 GB of RAM is limited resources. This is… a little tighter. I’m going to be working on some graphic hacks next, and I think then we’ll see just how tight these things can get.

The thing that thrills me, though, isn’t the complexity of the algorithm – it’s pretty straightforward once you’re thinking in binary. The thing I love about it is the visceral satisfaction that what I’m coding is exactly what the processor ends up doing. It’s kind of like watching a record on a turntable. There’s a feeling of the mechanical nature of what’s going on, down where the software is changing things in the real world, that can get lost in the everyday jumble of JIT compilers, high-level languages, and virtualization. Down at the bottom, it’s all electrons moving across silicon. Think about *that* the next time Windows crashes on you.

Oh, and it would be terribly rude of me not to name my sources. My main reference for 6502 assembly is Jim Butterfield’s classic *Machine Language for the Commodore 64, 128, and other Commodore Computers* – It’s an easy read but a little elementary, and he actually shies away from describing a general multiplication algorithm. For that, I went to Derek Bush and Peter Holmes’s *Commodore 64 Assembly Language Programming*.