soc.scoreboard package

Submodules

soc.scoreboard.addr_match module

Load / Store partial address matcher

Related bugreports: * http://bugs.libre-riscv.org/show_bug.cgi?id=216

Loads and Stores do not need a full match (CAM), they need “good enough” avoidance. Around 11 bits on a 64-bit address is “good enough”.

The simplest way to use this module is to ignore not only the top bits, but also the bottom bits as well: in this case (this RV64 processor), enough to cover a DWORD (64-bit). that means ignore the bottom 4 bits, due to the possibility of 64-bit LD/ST being misaligned.

To reiterate: the use of this module is an optimisation. All it has to do is cover the cases that are definitely matches (by checking 11 bits or so), and if a few opportunities for parallel LD/STs are missed because the top (or bottom) bits weren’t checked, so what: all that happens is: the mis-matched addresses are LD/STd on single-cycles. Big Deal.

However, if we wanted to enhance this algorithm (without using a CAM and without using expensive comparators) probably the best way to do so would be to turn the last 16 bits into a byte-level bitmap. LD/ST on a byte would have 1 of the 16 bits set. LD/ST on a DWORD would have 8 of the 16 bits set (offset if the LD/ST was misaligned). TODO.

Notes:

> I have used bits <11:6> as they are not translated (4KB pages) > and larger than a cache line (64 bytes). > I have used bits <11:4> when the L1 cache was QuadW sized and > the L2 cache was Line sized.

class soc.scoreboard.addr_match.LenExpand(bit_len, cover=1)

Bases: nmigen.hdl.ir.Elaboratable

LenExpand: expands binary length (and LSBs of an address) into unary

this basically produces a bitmap of which bytes are to be read (written) in memory. examples:

(bit_len=4) len=4, addr=0b0011 => 0b1111 << addr
=> 0b1111000
(bit_len=4) len=8, addr=0b0101 => 0b11111111 << addr
=> 0b1111111100000

note: by setting cover=8 this can also be used as a shift-mask. the bit-mask is replicated (expanded out), each bit expanded to “cover” bits.

elaborate(platform)
llen(cover)
ports()
class soc.scoreboard.addr_match.PartialAddrBitmap(n_adr, lsbwid, bitlen)

Bases: soc.scoreboard.addr_match.PartialAddrMatch

PartialAddrBitMap

makes two comparisons for each address, with each (addr,len) being extended to an unary byte-map.

two comparisons are needed because when an address is misaligned, the byte-map is split into two halves. example:

address = 0b1011011, len=8 => 0b101 and shift of 11 (0b1011)
len in unary is 0b0000 0000 1111 1111 when shifted becomes TWO addresses:
  • 0b101 and a byte-map of 0b1111 1000 0000 0000 (len-mask shifted by 11)
  • 0b101+1 and a byte-map of 0b0000 0000 0000 0111 (overlaps onto next 16)

therefore, because this now covers two addresses, we need two comparisons per address not one.

elaborate(platform)
is_match(i, j)
ports()
class soc.scoreboard.addr_match.PartialAddrMatch(n_adr, bitwid)

Bases: nmigen.hdl.ir.Elaboratable

A partial address matcher

elaborate(platform)
is_match(i, j)
ports()
class soc.scoreboard.addr_match.TwinPartialAddrBitmap(n_adr, lsbwid, bitlen)

Bases: soc.scoreboard.addr_match.PartialAddrMatch

TwinPartialAddrBitMap

designed to be connected to via LDSTSplitter, which generates pairs of addresses and covers the misalignment across cache line boundaries in the splitter. Also LDSTSplitter takes care of expanding the LSBs of each address into a bitmap, itself.

the key difference between this and PartialAddrMap is that the knowledge (fact) that pairs of addresses from the same LDSTSplitter are 1 apart is guaranteed to be a miss for those two addresses. therefore is_match specially takes that into account.

elaborate(platform)
is_match(i, j)
ports()
soc.scoreboard.addr_match.part_addr_bit(dut)
soc.scoreboard.addr_match.part_addr_byte(dut)
soc.scoreboard.addr_match.part_addr_sim(dut)
soc.scoreboard.addr_match.test_lenexpand_byte()
soc.scoreboard.addr_match.test_part_addr()

soc.scoreboard.addr_split module

Links: * https://libre-riscv.org/3d_gpu/architecture/6600scoreboard/ * http://bugs.libre-riscv.org/show_bug.cgi?id=257 * http://bugs.libre-riscv.org/show_bug.cgi?id=216

class soc.scoreboard.addr_split.LDData(dwidth, name=None)

Bases: nmigen.hdl.rec.Record

class soc.scoreboard.addr_split.LDLatch(dwidth, awidth, mlen)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
ports()
class soc.scoreboard.addr_split.LDSTSplitter(dwidth, awidth, dlen, pi=None)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
ports()
soc.scoreboard.addr_split.byteExpand(signal)
soc.scoreboard.addr_split.sim(dut)

soc.scoreboard.dependence_cell module

class soc.scoreboard.dependence_cell.DependencyRow(n_reg, n_src, cancel_mode=False)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.7 mitch alsup dependence cell, p27 adjusted to be clock-sync’d on rising edge only. mitch design (as does 6600) requires alternating rising/falling clock

  • SET mode: issue_i HI, go_i LO, reg_i HI - register is captured
    • FWD is DISABLED (~issue_i)
    • RSEL DISABLED
  • QRY mode: issue_i LO, go_i LO, haz_i HI - FWD is ASSERTED
    reg_i HI - ignored
  • GO mode : issue_i LO, go_i HI - RSEL is ASSERTED
    haz_i HI - FWD still can be ASSERTED

FWD assertion (hazard protection) therefore still occurs in both Query and Go Modes, for this cycle, due to the cq register

GO mode works for one cycle, again due to the cq register capturing the latch output. Without the cq register, the SR Latch (which is asynchronous) would be reset at the exact moment that GO was requested, and the RSEL would be garbage.

elaborate(platform)
ports()
soc.scoreboard.dependence_cell.dcell_sim(dut)
soc.scoreboard.dependence_cell.test_dcell()

soc.scoreboard.fn_unit module

soc.scoreboard.fu_dep_cell module

class soc.scoreboard.fu_dep_cell.FUDependenceCell(dummy, n_fu=1)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.7 mitch alsup dependence cell, p27

elaborate(platform)
ports()
soc.scoreboard.fu_dep_cell.dcell_sim(dut)
soc.scoreboard.fu_dep_cell.test_dcell()

soc.scoreboard.fu_fu_matrix module

class soc.scoreboard.fu_fu_matrix.FUFUDepMatrix(n_fu_row, n_fu_col)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.7 mitch alsup FU-to-Reg Dependency Matrix, p26

elaborate(platform)
ports()
soc.scoreboard.fu_fu_matrix.d_matrix_sim(dut)

XXX TODO

soc.scoreboard.fu_fu_matrix.test_fu_fu_matrix()

soc.scoreboard.fu_mem_matrix module

class soc.scoreboard.fu_mem_matrix.FUMemDepMatrix(n_fu_row, n_fu_col)

Bases: nmigen.hdl.ir.Elaboratable

implements FU-to-FU Memory Dependency Matrix

elaborate(platform)
ports()
soc.scoreboard.fu_mem_matrix.d_matrix_sim(dut)

XXX TODO

soc.scoreboard.fu_mem_matrix.test_fu_fu_matrix()

soc.scoreboard.fu_mem_picker_vec module

class soc.scoreboard.fu_mem_picker_vec.FUMem_Pick_Vec(fu_row_n)

Bases: nmigen.hdl.ir.Elaboratable

these are allocated per-FU (horizontally), and are of length fu_row_n

elaborate(platform)

soc.scoreboard.fu_picker_vec module

class soc.scoreboard.fu_picker_vec.FU_Pick_Vec(fu_row_n)

Bases: nmigen.hdl.ir.Elaboratable

these are allocated per-FU (horizontally), and are of length fu_row_n

elaborate(platform)

soc.scoreboard.fu_reg_matrix module

class soc.scoreboard.fu_reg_matrix.FURegDepMatrix(n_fu_row, n_reg_col, n_src, cancel=None)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.7 mitch alsup FU-to-Reg Dependency Matrix, p26

elaborate(platform)
ports()
soc.scoreboard.fu_reg_matrix.d_matrix_sim(dut)

XXX TODO

soc.scoreboard.fu_reg_matrix.test_d_matrix()

soc.scoreboard.fu_wr_pending module

class soc.scoreboard.fu_wr_pending.FU_RW_Pend(reg_count, n_src)

Bases: nmigen.hdl.ir.Elaboratable

these are allocated per-FU (horizontally), and are of length reg_count

elaborate(platform)

soc.scoreboard.fumem_dep_cell module

class soc.scoreboard.fumem_dep_cell.FUMemDependenceCell(dummy, n_fu=1)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.7 mitch alsup dependence cell, p27

elaborate(platform)
ports()
soc.scoreboard.fumem_dep_cell.dcell_sim(dut)
soc.scoreboard.fumem_dep_cell.test_dcell()

soc.scoreboard.global_pending module

class soc.scoreboard.global_pending.GlobalPending(dep, fu_vecs, sync=False)

Bases: nmigen.hdl.ir.Elaboratable

implements Global Pending Vector, basically ORs all incoming Function Unit vectors together. Can be used for creating Read or Write Global Pending. Can be used for INT or FP Global Pending.

Inputs: * :dep: register file depth * :fu_vecs: a python list of function unit “pending” vectors, each

vector being a Signal of width equal to the reg file.

Notes:

  • the regfile may be Int or FP, this code doesn’t care which. obviously do not try to put in a mixture of regfiles into fu_vecs.
  • this code also doesn’t care if it’s used for Read Pending or Write pending, it can be used for both: again, obviously, do not try to put in a mixture of read and write pending vectors in.
  • if some Function Units happen not to be uniform (don’t operate on a particular register (extremely unusual), they must set a Const zero bit in the vector.
elaborate(platform)
ports()
soc.scoreboard.global_pending.g_vec_sim(dut)
soc.scoreboard.global_pending.test_g_vec()

soc.scoreboard.group_picker module

Group Picker

to select an instruction that is permitted to read (or write) based on the Function Unit expressing a desire to read (or write).

The job of the Group Picker is extremely simple yet extremely important. It sits in front of a register file port (read or write) and stops it from being corrupted. It’s a “port contention selector”, basically.

The way it works is:

  • Function Units need to read from (or write to) the register file, in order to get (or store) their operands, so they each have a signal, readable (or writable), which “expresses” this need. This is an unary encoding.
  • The Function Units also have a signal which indicates that they are requesting “release” of the register file port (this because in the scoreboard, readable/writable can be permanently HI even if the FU is idle, whereas the “release” signal is very specifically only HI if the read (or write) latch is still active)
  • The Group Picker takes this unary encoding of the desire to read (or write) and, on a priority basis, activates one and only one of those signals, again as an unary output.
  • Due to the way that the Computation Unit works, that signal (Go_Read or Go_Write) will fire for one (and only one) cycle, and can be used to enable the register file port read (or write) lines. The Go_Read/Wr signal basically loops back to the Computation Unit and resets the “desire-to-read/write-expressing” latch.

In theory (and in practice!) the following is possible:

  • Separate src1 and src2 Group Pickers. This would allow instructions with only one operand to read to not block up other instructions, and it would also allow 3-operand instructions to be interleaved with 1 and 2 operand instructions.
  • Multiple Group Pickers (multi-issue). This would require a corresponding increase in the number of register file ports, either 4R2W (or more) or by “striping” the register file into split banks (a strategy best deployed on Vector Processors)
class soc.scoreboard.group_picker.GroupPicker(wid, n_src, n_dst)

Bases: nmigen.hdl.ir.Elaboratable

implements 10.5 mitch alsup group picker, p27

elaborate(platform)
ports()
soc.scoreboard.group_picker.grp_pick_sim(dut)
soc.scoreboard.group_picker.test_grp_pick()

soc.scoreboard.instruction_q module

class soc.scoreboard.instruction_q.Instruction(name=None, asmcode=True, opkls=None, do=None, regreduce_en=False)

Bases: openpower.decoder.decode2execute1.Decode2ToExecute1Type

class soc.scoreboard.instruction_q.InstructionQ(wid, opwid, iqlen, n_in, n_out)

Bases: nmigen.hdl.ir.Elaboratable

contains a queue of (part-decoded) instructions.

output is copied combinatorially from the front of the queue, for easy access on the clock cycle. only “n_in” instructions are made available this way

input and shifting occurs on sync.

elaborate(platform)
ports()
soc.scoreboard.instruction_q.instruction_q_sim(dut)
soc.scoreboard.instruction_q.test_instruction_q()

soc.scoreboard.issue_unit module

class soc.scoreboard.issue_unit.IntFPIssueUnit(n_int_insns, n_fp_insns)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
ports()
class soc.scoreboard.issue_unit.IssueUnit(n_insns)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.14 issue unit, p50

Inputs

  • n_insns:number of instructions in this issue unit.
elaborate(platform)
ports()
class soc.scoreboard.issue_unit.IssueUnitArray(units)

Bases: nmigen.hdl.ir.Elaboratable

Convenience module that amalgamates the issue and busy signals

unit issue_i is to be set externally, at the same time as the ALU group oper_i

elaborate(platform)
ports()
class soc.scoreboard.issue_unit.IssueUnitGroup(n_insns)

Bases: nmigen.hdl.ir.Elaboratable

Manages a batch of Computation Units all of which can do the same task

A priority picker will allocate one instruction in this cycle based on whether the others are busy.

insn_i indicates to this module that there is an instruction to be issued which this group can handle

busy_i is a vector of signals that indicate, in this cycle, which of the units are currently busy.

busy_o indicates whether it is “safe to proceed” i.e. whether there is a unit here that can be issued an instruction

fn_issue_o indicates, out of the available (non-busy) units, which one may be selected

elaborate(platform)
ports()
class soc.scoreboard.issue_unit.RegDecode(wid)

Bases: nmigen.hdl.ir.Elaboratable

decodes registers into unary

Inputs

  • wid:register file width
elaborate(platform)
ports()
soc.scoreboard.issue_unit.issue_unit_sim(dut)
soc.scoreboard.issue_unit.test_issue_unit()

soc.scoreboard.ldst_dep_cell module

Mitch Alsup 6600-style LD/ST scoreboard Dependency Cell

Relevant bugreports:

class soc.scoreboard.ldst_dep_cell.LDSTDepCell(n_ls=1)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.12 mitch alsup load/store dependence cell, p45

elaborate(platform)
ports()
soc.scoreboard.ldst_dep_cell.dcell_sim(dut)
soc.scoreboard.ldst_dep_cell.test_dcell()

soc.scoreboard.ldst_matrix module

Mitch Alsup 6600-style LD/ST Memory Scoreboard Matrix (sparse vector)

6600 LD/ST Dependency Table Matrix inputs / outputs

Relevant comments (p45-46):

  • If there are no WAR dependencies on a Load instruction with a computed address it can assert Bank_Addressable and Translate_Addressable.
  • If there are no RAW dependencies on a Store instruction with both a write permission and store data present it can assert Bank_Addressable

Relevant bugreports:

Notes:

  • Load Hit (or Store Hit with Data) are asserted by the LD/ST Computation Unit when it has data and address ready
  • Asserting the ld_hit_i (or stwd_hit_i) requires that the output be captured or at least taken into consideration for the next LD/STs right then. Failure to observe the xx_hold_xx_o will result in data corruption, as they are only asserted if xx_hit_i is asserted
  • The hold signals still have to go through “maybe address clashes” detection, they cannot just be used as-is to stop a LD/ST.
class soc.scoreboard.ldst_matrix.LDSTDepMatrix(n_ldst)

Bases: nmigen.hdl.ir.Elaboratable

implements 11.4.12 mitch alsup LD/ST Dependency Matrix, p46 actually a sparse matrix along the diagonal.

load-hold-store and store-hold-load accumulate in a priority-picking fashion, ORing together. the OR gate from the dependency cell is here.

elaborate(platform)
ports()
soc.scoreboard.ldst_matrix.d_matrix_sim(dut)

XXX TODO

soc.scoreboard.ldst_matrix.test_d_matrix()

soc.scoreboard.mdm module

class soc.scoreboard.mdm.FUMemMatchMatrix(n_fu, addrbitwid)

Bases: soc.scoreboard.fu_reg_matrix.FURegDepMatrix, soc.scoreboard.addr_match.PartialAddrMatch

implement a FU-Regs overload with memory-address matching

elaborate(platform)

soc.scoreboard.mem_dependence_cell module

class soc.scoreboard.mem_dependence_cell.MemDepRow(n_reg)

Bases: nmigen.hdl.ir.Elaboratable

implements 1st phase Memory Depencency cell

elaborate(platform)
ports()
soc.scoreboard.mem_dependence_cell.dcell_sim(dut)
soc.scoreboard.mem_dependence_cell.test_dcell()

soc.scoreboard.mem_fu_matrix module

class soc.scoreboard.mem_fu_matrix.MemFUDepMatrix(n_fu_row, n_reg_col)

Bases: nmigen.hdl.ir.Elaboratable

implements 1st phase Memory-to-FU Dependency Matrix

elaborate(platform)
ports()
soc.scoreboard.mem_fu_matrix.d_matrix_sim(dut)

XXX TODO

soc.scoreboard.mem_fu_matrix.test_d_matrix()

soc.scoreboard.mem_fu_pending module

class soc.scoreboard.mem_fu_pending.MemFU_Pend(reg_count)

Bases: nmigen.hdl.ir.Elaboratable

these are allocated per-FU (horizontally), and are of length reg_count

elaborate(platform)

soc.scoreboard.mem_select module

class soc.scoreboard.mem_select.Mem_Rsv(fu_count)

Bases: nmigen.hdl.ir.Elaboratable

these are allocated per-Register (vertically), and are each of length fu_count

elaborate(platform)

soc.scoreboard.memfu module

class soc.scoreboard.memfu.MemFunctionUnits(n_ldsts, addrbitwid)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
ports()

soc.scoreboard.old_score6600 module

soc.scoreboard.reg_select module

class soc.scoreboard.reg_select.Reg_Rsv(fu_count, n_src)

Bases: nmigen.hdl.ir.Elaboratable

these are allocated per-Register (vertically), and are each of length fu_count

elaborate(platform)

soc.scoreboard.shadow module

class soc.scoreboard.shadow.BranchSpeculationRecord(n_fus)

Bases: nmigen.hdl.ir.Elaboratable

A record of which function units will be cancelled and which allowed to proceed, on a branch.

Whilst the input is a pair that says whether the instruction is under the “success” branch shadow (good_i) or the “fail” shadow (fail_i path), when the branch result is known, the “good” path must be cancelled if “fail” occurred, and the “fail” path cancelled if “good” occurred.

therefore, use “good|~fail” and “fail|~good” respectively as output.

elaborate(platform)
ports()
class soc.scoreboard.shadow.ShadowMatrix(n_fus, shadow_wid=0, syncreset=False)

Bases: nmigen.hdl.ir.Elaboratable

Matrix of Shadow Functions. One per FU.

Inputs * :n_fus: register file width * :shadow_wid: number of shadow/fail/good/go_die sets

Notes:

  • Shadow enable/fail/good are all connected to all Shadow Functions (incoming at the top)

  • Output is an array of “shadow active” (schroedinger wires: neither alive nor dead) and an array of “go die” signals, one per FU.

  • the shadown must be connected to the Computation Unit’s write release request, preventing it (ANDing) from firing (and thus preventing Writable. this by the way being the

    whole point of having the Shadow Matrix…)

  • go_die_o must be connected to both the Computation Unit’s src-operand and result-operand latch resets, causing both of them to reset.

  • go_die_o also needs to be wired into the Dependency and Function Unit Matrices by way of over-enabling (ORing) into Go_Read and Go_Write, resetting every cell that is required to “die”

elaborate(platform)
ports()
class soc.scoreboard.shadow.WaWGrid(n_fus, shadow_wid)

Bases: nmigen.hdl.ir.Elaboratable

An NxM grid-selector which raises a 2D bit selected by N and M

elaborate(platform)
soc.scoreboard.shadow.shadow_sim(dut)
soc.scoreboard.shadow.test_shadow()

soc.scoreboard.shadow_fn module

class soc.scoreboard.shadow_fn.ShadowFn(slen, syncreset=False)

Bases: nmigen.hdl.ir.Elaboratable

implements shadowing 11.5.1, p55, just the individual shadow function

shadowing can be used for branches as well as exceptions (interrupts), load/store hold (exceptions again), and vector-element predication (once the predicate is known, which it may not be at instruction issue)

Inputs * :shadow_wid: number of shadow/fail/good/go_die sets

notes: * when shadow_wid = 0, recover and shadown are Consts (i.e. do nothing)

elaborate(platform)
ports()
soc.scoreboard.shadow_fn.shadow_fn_unit_sim(dut)
soc.scoreboard.shadow_fn.test_shadow_fn_unit()

soc.scoreboard.test_iq module

testing of InstructionQ

class soc.scoreboard.test_iq.IQSim(dut, iq, n_in, n_out)

Bases: object

rcv()
send()
soc.scoreboard.test_iq.mk_insns(n_insns, wid, opwid)
soc.scoreboard.test_iq.test_iq()

soc.scoreboard.test_mem2_fu_matrix module

class soc.scoreboard.test_mem2_fu_matrix.MemSim(regwid, addrw)

Bases: object

ld(addr)
st(addr, data)
class soc.scoreboard.test_mem2_fu_matrix.Memory(regwid, addrw)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
class soc.scoreboard.test_mem2_fu_matrix.Scoreboard(rwid, n_regs)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
ports()
soc.scoreboard.test_mem2_fu_matrix.create_random_ops(dut, n_ops, shadowing=False, max_opnums=3)
soc.scoreboard.test_mem2_fu_matrix.int_instr(dut, op, imm, src1, src2, dest, branch_success, branch_fail)
soc.scoreboard.test_mem2_fu_matrix.mem_sim(dut)
soc.scoreboard.test_mem2_fu_matrix.print_reg(dut, rnums)
soc.scoreboard.test_mem2_fu_matrix.scoreboard_sim(dut, alusim)
soc.scoreboard.test_mem2_fu_matrix.test_mem_fus()
soc.scoreboard.test_mem2_fu_matrix.test_scoreboard()

soc.scoreboard.test_mem_fu_matrix module

class soc.scoreboard.test_mem_fu_matrix.MemFunctionUnits(n_int_alus)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
ports()
class soc.scoreboard.test_mem_fu_matrix.MemSim(regwid, addrw)

Bases: object

ld(addr)
st(addr, data)
class soc.scoreboard.test_mem_fu_matrix.Memory(regwid, addrw)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
class soc.scoreboard.test_mem_fu_matrix.Scoreboard(rwid, n_regs)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
ports()
soc.scoreboard.test_mem_fu_matrix.create_random_ops(dut, n_ops, shadowing=False, max_opnums=3)
soc.scoreboard.test_mem_fu_matrix.int_instr(dut, op, imm, src1, src2, dest, branch_success, branch_fail)
soc.scoreboard.test_mem_fu_matrix.mem_sim(dut)
soc.scoreboard.test_mem_fu_matrix.print_reg(dut, rnums)
soc.scoreboard.test_mem_fu_matrix.scoreboard_sim(dut, alusim)
soc.scoreboard.test_mem_fu_matrix.test_mem_fus()
soc.scoreboard.test_mem_fu_matrix.test_scoreboard()

Module contents