• Single cycle datapaths have only one instruction in the datapath at a time.

  • Not all stages of a datapath are used at the same time.

  • So we can do stage-by-stage pipelining. Having multiple instructions in the datapath, but at different stages, to increase throughput.

  • Now, instead of the clock frequency being limited by the critical path of slowest instruction, it’s now limited by the critical path of the slowest stage.

  • 1 stage per clock cycle.

  • Split the datapath and add registers between each stage to hold a signal until next clock cycle.

Control Signals

  • Multiple instructions in the datapath at the same time, can’t just generate and forget.
  • Eg: the control inputs into EX will be for instruction 1 vs the control inputs into regfile may be for instr 2.

Two approaches:

  • Generate the control signals at each stage.
  • Generate the control signal during instruction decode and forward them along with the decoded instruction. ==<-- we do this==

List of changes

  • dest_addr input into the regfile now comes from the rd field from the WB stage.
    • By the time an instruction’s WB stage comes along, some other instruction will be in the IF/ID stage.
    • The regfile can read and write from different addresses in a single clock cycle.
  • MX Forwarding: alu output directly into ALU input.
  • WX forwarding: DMEM output directly into ALU input.
  • WM forwarding: DMEM output directly into DMEM input.
  • Instruction scheduling for replacing nops when loading.
  • Branch prediction (naive for now).

Pipelining Hazards

Structural Hazard

  • A required resource is busy and needed in multiple stages.
  • Solutions:
    • Instructions take turns/wait to use it.
    • Add more hardware.
    • Design ISA to prevent hazards - RISC-V way.

Regfile

  • regfile in riscv has separate ports for reading multiple registers.
  • Hypothetical issue: what if an instruction needed to write to two registers?

Memory

  • IMEM and DMEM will almost certainly need to be accessed together. While accessed separately, they are the same physical memory.
  • Solutions: stall or abstractions (caches) to make it such that they both appear/function more like separate mems.

Data Hazard

Issue: sub reads regfile at t = 3 and add writes to the regfile at t = 5.

Stalling

  • The following instruction so that it reads from regfile after WB of the instr it depends on.
  • This is done by adding nops1 to the code. Instructions where nothing happens.
  • The delay is cascaded to all following instructions.

MX Forwarding

  • Instead of waiting for WB and then reading from the regfile, we can try reading from the stage the value is calculated.
  • The earliest stage where we know the value of s0 is EX (T = 3).
  • We can add hardware to forward the value directly to sub’s EX (T = 4).
  • This is called MX forwarding (M EX). Changes:
  • More wires.
  • Wider muxes, since we have more inputs to select from.
  • We need extra logic to check if we need forwarding.

Forwarding Condition

  • Forwarding is done when one of the source registers of an instruction is the destination register of the previous instruction.
  • We also ignore writes to x0.

WX Forwarding

  • WB is too far along anyway.
  • But, t0 receives the value after M stage (T = 4) and add needs t0 at T = 4 for EX.
  • Even with forwarding we need to stall for one cycle.

Scheduling

  • Instead of inserting nops to stall instructions.
  • We can intelligently rearrange instructions such that some instructions which don’t rely/cause hazards can be placed after hazard-causing instructions.
  • So we still get work done in those dead cycles instead of nop.
  • Here each of the load instruction would require a stall after it (along with WX forwarding).
  • Instead of inserting a nop after each one, we see that the relevant registers do not cross-interact. The second lw doesn’t depend on updated registers.
  • So we move the second load to right below the first load. The second load replaces the first nop and the add replaces the second nop.

Control Hazard

  • Branch instruction determine the control flow of the program.
  • The next instruction depends on whether the branch was taken or not.
  • We get the result for the next PC after the EX (3rd) stage. By this time, two instructions have already entered the pipeline.
  • If the branch is taken, we have to flush the pipeline by conventing those instructions to nops.
  • We can try branch prediction! Naive prediction: branch is not taken.
  • More depth in upper years.

Footnotes

  1. nop <=> addi x0, x0, 0