In the source article for my PISC project “A Minimal TTL Processor for Architecture Exploration”, by Bradford J. Rodriguez. Brad wrote the following:
Glaring Deficiencies
Many weaknesses of the PISC become evident after a short period of use, including:
a) no conditional branch microinstruction — an important need [6];
b) no provision for literal values in the microinstruction;
c) no ALU logic for multiply, divide, and right shift;
d) no logic for decoding of macroinstructions;
e) no provision for interrupts;
f) sparse coding of the ALU function select; and
g) two clocks required per microinstruction.
I have been successful in correcting a good number number of these “deficiencies” . In many cases by using Brad’s suggestion for a solution from the original article. So as a starting point towards documenting the changes I made between the original PISC 1.0a circuit and my PISC 1.0c derivative I’ll outline which of these have been solved and those that remain as “Glaring“. I’ll also provide some of my thoughts on the need for each of these features.
Glaring Deficiency “A” – No conditional branch micro-instruction. := SOLVED
Yeah, this is a biggie. Although I’ll quickly point out I have discovered that it is not totally impossible to write a useful program without such an instruction. Turns out that some very early computers built just after WWII didn’t have such an instruction either. So the early pioneers of computing programming resorted to self-modifying code. Where the program would overwrite the jump address in store. Just the thought of coding like that makes my head hurt!
So I provided some circuitry that would inhibit the write back of the ALU output to the Register File dependent on the state of the selected flag (EQ or CY). This essentially sets up the conditions required for an instruction like this:-
MOV R7, $0200 IF CY
Which translated: copies address $0200 to Register R7, which is used as the Program Counter (PC) but only if the CY flag is set. If the CY flag is not set then the move operation (really a copy) is inhibited. Conditional jump.
Glaring Deficiency “B” – No provision for literal values in the micro-instruction := REMAINS (do we care?)
I started off thinking that this was pretty important issue. Also that it should be pretty easy to solve. After spending a lot of time considering various ways to add this functionality. I eventually came to the conclusion that there was probably isn’t much point.
Any circuit that I could dream up was overly expensive in terms of complexity and chip count. And I had already added one too many IC’s to Brad’s otherwise elegant design. But the real kicker was that I could not find a way to sensibly do this in a single execute cycle. It would need at least two.
Now here’s the thing. PISC v1.0a stock standard can already load a 16 bit literal value into a register in a single execute cycle. The only thing dubious about this is that it needs an entire word in memory to store the literal following the actual load instruction. Granted most of the literals programmers use are apparently quite small. So it would be a more efficient use of available memory if we could load-up say 8 bits (0-255) of data alongside an 8 bit instruction for a one word opcode. But memory in my PISC really isn’t in short supply (128K ROM, 128K RAM in 16 bit words). So the hardware complexity required to add the feature just didn’t seem worth the effort.
Since making this decision I’ve done a fair bit of coding and the kept a concerned eye on the resulting object code sizes. Yep, they are most likely larger than the same project done in Z80 or 6502 Assembler. But not orders of magnitude larger. So I stopped worrying about it.
Later I started coding in PL/0+ which generates byte-code. Which should probably be called Word-code in the context of PISC. Since my PL/0+ word-code is being interpreted by a virtual machine this allows me to encode small literals into a single word along with an instruction. The code density of the PL/0+ object files is very high. The trade off being a reduction in execution speed for an increase in code density. Problem, if there ever was one, solved.
Glaring Deficiency “C” – No ALU logic for multiply, divide, and right shift := SOLVED (Partially)
I have added a simple Shift Register board in the data path between A-Bus and the “A” input to the ALU. This is just a small bunch of 74ALS245 buffers switched around to create one of four possible outcomes:
- Data path normal
- Data is Logical Shifted Right one bit
- Data Arithmetic Shifted Right (a variant of case 2 requiring no additional buffers).
- High Byte for Low Byte swap.
So I have solved the issue of the missing right shift. The high for low byte swap in one cycle makes coding for those pesky ASCII bytes so much easier. And is the only hardware nod my PISC gives to byte sized chunks of data.
As for the missing Multiply and Divide instructions? Well I seem to remember that the Z80 and 6502 didn’t have any either. In fact I don’t think Intel gave us this until the 8086! So I just coded my own routines 🙂
Shift Register Circuit coming soon…
Glaring Deficiency “D” – No logic for decoding of macro-instructions := REMAINS (at the hardware level)
Yes, PISC has no micro-code and no micro-code sequencer. This keeps the hardware solution so miraculously, elegantly simple. So for those instructions where it would be really nice to have a “macro-instruction” I just coded this into the Assembler. So the assembler now offloads the complexity of sequencing several instructions together.
Turns out there aren’t that many instructions where I needed to do this. CALL, RET, PUSH and POP come to mind quickly but after that I’m struggling to think of another example. To give a quick illustration of what I’m talking about. If I was to code:
RET
My PISC Assembler would generate something like:
00A3: ACEF 328: RET { rdd r5,r4 }
00A4: 2080 + inc r4
00A5: 349A + jmp r4
The three instructions required for a Return instruction.
- Read with post decrement the R5 stack pointer address contents into R4.
(Stack grows upwards in PISC) - Increment R4 so that it points to the location just after the original CALL instruction.
- Jump to the Address held in R4.
Of course all this would normally take place inside the CPU. But it would be a similar process in silicon and would likely take a similar number of cycles. In fact when I quickly checked the number of clock cycles required by my CALL and RET instructions, they seemed more or less on par with a 6502 and significantly better than a Z80. By way of quick example here are the number of clock cycles required for the return instruction for each of these three architectures:
Arch. Mnemonic Cycles
PISC RET 6 (includes the Fetch cycles)
Z80 RET 10
6502 RTS 6
(The Z80 likely needs more cycles as internally it only has a 4 bit ALU).
Of course we have once again traded hardware simplicity for increased code size. But writing the Assembler source code it is no more difficult and the resulting object code in terms of clock cycles is just as fast (or faster).
Bottom line? So long as one has sufficient memory it would seem that not having the microcode baked into silicon is not such a bad thing and it sure makes it easy to change macro-instruction definitions.
Glaring Deficiency “E” – No provision for interrupts := SOLVED
Solved by implementation of one of Brad’s own suggestions in the original article. Additional hardware allows the I/O cards in the 8 bit expansion bus to raise a single shared interrupt. On detection of the Interrupt the machine switches from using R7 as the Program Counter to R6.
This means that R6 has to be pre-loaded with the memory address of the Interrupt Service Routine before interrupts are enabled. Once the ISR (Interrupt Service Routine) has run it’s course you need R6 positioned back again at the start of the ISR ready for the next cycle. Then on completion we switch back to using R7 again. Interrupt priority is supported by the slot number the card is installed in.
Sounds simple? It wasn’t! In fact this was the hardest part of the entire project. At several dark moments I almost gave up in despair of ever getting it working. After all, my Monitor program was working just fine without darn Interrupts. Does a single user, non multitasking machine really need this?
It was quite challenge working out how to save the CPU flags. Everything, both hardware and software needed to be spot on before it would ‘fly’. I spent much time debugging hardware when in fact the particular problem I was chasing was a software bug. We got there in the end and I don’t regret the time spent. It really is quite nice to press Ctrl-C and have the machine jump back into the BIOS (without having to poll the keyboard). The things we take for granted when using PC’s!
Glaring Deficiency “E” – Sparse coding of the ALU function := SOLVED (sort of)
I took Brad’s own suggestion of adding a single 74138 that provides eight supplementary control signals. I use the 3 bits in the control word that normally set the D-Reg (Data Bus Register) value as the selection input to the 74138. I inhibit both memory and register file read/write operations for this special “Control” function cycle.
I encode this “Control” function into the instruction by setting both MRD (Memory Read) and MWR (Memory Write) at the same time. Which would normally be quite illogical (if not damaging).
So I have not changed the actual ALU encoding as such. But I have managed to squeeze in 8 additional functions and they are:-
- Logical Shift Right
- Swap high byte for low byte
- Invert the Flags on the next cycle
- Memory Banking
- IRQ Enable/Disable
- IRQ Clear
- Arithmetic Shift Right
- Halt
Glaring Deficiency “G” – Two clocks required per micro-instruction := REMAINS
PISC runs two cycles. Fetch and Execute. In the Fetch clock cycle a fixed hardware encoded instruction is executed which loads the next program instruction from memory into a special Instruction Register. On the next clock cycle, the Execute cycle the instruction held in the Instruction Register is executed.
So 50% of the time PISC is not executing your program code but a fetch instruction to get your next program instruction. Not very efficient 🙁 But it sure is simple! 🙂
And besides, no machine of mine is going to have this “delayed branch” NOP instruction nonsense littered all throughout otherwise perfectly clean and logical code 😉 :lol