ALU
Gate-Level CPU Simulator
The ALU adds, subtracts, and also does the bitwise operations: AND, OR, XOR, NOR. Plus a few more — comparison (which uses subtraction), shift, sometimes multiply, depending on the ISA. At the gate level, the ALU is wide and parallel: every operation runs concurrently on the same inputs, and a multiplexer at the end selects which result actually leaves the ALU.
class ALU:
"""A 32-bit ALU. The 'op' control bits select which result the
ALU outputs, but every result is computed in parallel — there
is no branching at the gate level.
"""
def __init__(self, width=32):
self.w = width
def __call__(self, A, B, op):
# All possible results, all computed regardless of op:
sum_ = ripple_carry_add(A, B)[:self.w]
diff = ripple_subtract(A, B)
and_ = [AND(a, b) for a, b in zip(A, B)]
or_ = [OR(a, b) for a, b in zip(A, B)]
xor_ = [XOR(a, b) for a, b in zip(A, B)]
# ... etc.
# A multiplexer (built from gates) picks the right one:
return mux([sum_, diff, and_, or_, xor_, ...], op) For a single-cycle CPU, the only knob you have is how much you can do in one cycle, and parallel evaluation is how you maximize it.
Why are all four operations always running?
Hardware doesn't branch the way software does. There's no
if op == ADD then add() else and() at the gate level.
Every cycle, every operation circuit is computing on every input,
all the time. The MUX at the end picks one result and drives Y
from it; the rest get computed and thrown away. Click each of the
demo buttons above with A=5, B=3 — the four op
modules' values stay 8, 1, 7, 6 the whole time. The
only thing that changes between clicks is which bus the MUX gates
through.
Isn't that wasteful?
In gate count, yes. The silicon for the unused operations is occupied every cycle whether you use it or not. But the alternative — running operations one at a time — would itself need extra gates and an extra cycle to sequence them. For a single-cycle CPU, the only knob you have is "how much can we do in one cycle," and parallel evaluation is how you turn that knob up. Power-conscious designs add clock-gating to silence unused branches and reclaim some power, but the gates themselves are always there.
What's inside the MUX?
More gates. A 4-to-1 multiplexer is a small AND-OR network: each operation's output bus is ANDed with a "is-this-one-selected" signal decoded from the op-control bits, and the OR of all four gated buses drives the output. Same primitives as the operations themselves — gates all the way down.
About 15 AND 10 = 10 — that's a bit-mask
15 = 1111, 10 = 1010, so
15 AND 10 = 1010 = 10. ANDing a value with a mask
keeps every bit where the mask is 1 and zeros the rest. It's the
same trick a RISC instruction decoder uses to pull out individual
fields — opcode, register indices, immediate values — from a
packed 32-bit instruction word. The same AND module sitting in
the ALU is doing real work elsewhere in the CPU.
The multiplexer is itself just gates (a tree of ANDs and ORs fed by the control bits). Crucially: there is no if statement in hardware. All branches of the computation happen in every cycle; the control signals select which one is allowed to drive the output. This is a fundamental difference from software: the cost of "dead" computations is real silicon, not zero.