When I first built the Project:65 computer, I was pretty much a novice at 6502 assembly language. So when it came to the hard parts, I usually opted for “easy to understand” over “fast”. Since then I’ve had more practice, I have better tools, and I’ve read a lot of material that’s given me ideas for better bit manipulation.
Bit manipulation is what this project is all about. I’m using a 65c22 VIA to communicate over SPI with a MAX3100 UART. The 6522 has a couple of parallel ports that are user-controlled. Arduino or Raspberry Pi aficionados might think of them as collections of bidirectional GPIO pins. I’m using four of these to implement the SPI communication between the two chips.
SPI CONNECTION: 6522 VIA PORT B to MAX3100 ________ ______ | | PB7|<----- RECEIVE (NEW) --+--<|DOUT PB6| | | 6522 PB5| | | MAX3100 VIA PB4| | | UART PB3|<----- RECEIVE (OLD) --+ | PB2|>------- TRANSMIT -------->|DIN PB1|>------ CHIP SELECT ------>|/CS PB0|>--------- CLOCK --------->|SCLK | | CB2|<------- INTERRUPT --------|/IRQ CB1| | ________| |______ THE 6522 PARALLEL PORT CONTAINS 8 BIDIRECTIONAL DATA PINS (PB0 TO PB7) AND 2 "CONTROL" PINS (CB1 AND CB2). ARROWS INDICATE THE DIRECTIONS WE ARE USING TO SEND AND RECEIVE DATA.
Every command to the MAX3100 is a two-byte sequence and is accompanied by a two-byte response. Rather than worry about the high-level of the protocol, though, I started out by looking at a routine I wrote called “spibyte”. This is a very generic routine that simply sends a single data byte out the SPI port while simultaneously reading back the response from the 3100, but it doesn’t do anything to interpret those responses. Still, this is where the actual bits are being banged, so it’s where most of the time is being used.
The starting version of this code was pretty simple and had a lot of branching and subroutine calls in it – one for each bit:
; sends the value in the accumulator thru SPI. ; returns the value it reads back. spibyte: sta writebuffer jsr rwbit jsr rwbit jsr rwbit jsr rwbit jsr rwbit jsr rwbit jsr rwbit jsr rwbit lda readbuffer rts ; rwbit writes one bit of writebuffer and reads ; one bit into readbuffer. Both buffers are ; left shifted by one place. rwbit: rol writebuffer bcs nonzero lda VIA_DATAB and #%11111011 ; set output (output)low nzret: sta VIA_DATAB ; set output bit inc VIA_DATAB ; set clock high ; receive bit asl readbuffer lda VIA_DATAB ; read input bit and #$8 beq reczero inc readbuffer ; add a 1 bit reczero: dec VIA_DATAB ; set clock low rts nonzero: lda VIA_DATAB ora #4 ; set output high jmp nzret
So as you can see, that’s a lot of instructions that have to be executed every time I want to send out a single byte. If I was using a UART directly on the CPU bus, this could all be done in a handful of instructions instead. Still, I could make it better.
Those subroutine calls definitely had to go. Since I wrote the original code, I’ve switched to using the ca65 assembler, which has a directive to simply incorporate a block of code n number of times. That lets me remove the subroutine call without having eight copies of the “rwbit” routine muddying up my source code.
My original implementation used the first four bits of the 6522’s parallel port (VIA_DATAB). This allowed me to make some smart moves – for example, I can toggle the status of the clock line with a single INC (increment) or DEC (decrement) operation, without even loading a value into the CPU’s accumulator.
I was also originally very careful not to change the values of the rest of the serial port. I’ve since decided that this is unnecessary – if I do use those bits in the future, it’ll probably be to support other SPI devices using the same interface.
However, this still wasn’t an optimal choice. Something I read that went over my head originally was that if the RECEIVE line (which carries the 3100’s response back to the 6522) is in bit 7 instead of bit 4, I can get rid of some logic operations and branches – most of the stuff after the “receive bit” comment. Instead, I can load the port value into the accumulator, rotate it to the left (so that bit 7 gets copied to the CPU’s carry flag), and then rotate the read buffer (so the carry flag value gets copied into its least significant bit). This way I can copy the incoming bit to the destination with three instructions, and without every caring what the value of that bit actually is. Of course, this required moving a wire on the breadboard, but it’s neat to think that you’re modifying the hardware to support a software optimization.
Here’s the improved “spibyte” routine:
;; spibyte ;; Sends the byte in accumulator and receives a ;; new byte into accumulator. .proc spibyte sta spi_writebuffer .repeat 8 ; copy the next section 8 times .scope lda #%01111000 ; base DATAB value with chip select for ; MAX3100 and a zero bit in the output ; line. rol spi_writebuffer bcc writing_zero_bit ora #%00000100 ; write a 1 bit to the output line. writing_zero_bit: sta VIA_DATAB ; write data back to the port inc VIA_DATAB ; set clock high lda VIA_DATAB ; Read input bit rol ; Shift input bit to carry flag rol spi_readbuffer ; Shift carry into readbuffer dec VIA_DATAB ; set clock low .endscope .endrepeat lda spi_readbuffer ; result goes in A rts .endproc
The core of this routine is down from 17 instructions (including the “jsr rwbit”) to 10, which is pretty satisfying. I’m pretty sure there’s still an opportunity to bum a few instructions of the routine by using some of the newer features of the 65c02, but this is a good baseline to work off of.