soc.experiment package¶

Subpackages¶

Submodules¶

soc.experiment.alu_fsm module¶

Simple example of a FSM-based ALU

This demonstrates a design that follows the valid/ready protocol of the ALU, but with a FSM implementation, instead of a pipeline. It is also intended to comply with both the CompALU API and the nmutil Pipeline API (Liskov Substitution Principle)

The basic rules are:

p.ready_o is asserted on the initial (“Idle”) state, otherwise it keeps low.
n.valid_o is asserted on the final (“Done”) state, otherwise it keeps low.
The FSM stays in the Idle state while p.valid_i is low, otherwise it accepts the input data and moves on.
The FSM stays in the Done state while n.ready_i is low, otherwise it releases the output data and goes back to the Idle state.

class soc.experiment.alu_fsm.CompFSMOpSubset(name=None)¶: Bases: soc.fu.base_input_record.CompOpSubsetBase

class soc.experiment.alu_fsm.Dummy¶: Bases: object

class soc.experiment.alu_fsm.Shifter(width)¶

Bases: nmigen.hdl.ir.Elaboratable

Simple sequential shifter

Prev port data: * p.data_i.data: value to be shifted * p.data_i.shift: shift amount * When zero, no shift occurs. * On POWER, range is 0 to 63 for 32-bit, * and 0 to 127 for 64-bit. * Other values wrap around.

Operation type * op.sdir: shift direction (0 = left, 1 = right)

Next port data: * n.data_o.data: shifted value

class NextData(width)¶: Bases: object

class PrevData(width)¶: Bases: object

elaborate(platform)¶

ports()¶

soc.experiment.alu_fsm.test_shifter()¶

soc.experiment.alu_hier module¶

Experimental ALU: based on nmigen alu_hier.py, includes branch-compare ALU

This ALU is deliberately designed to add in (unnecessary) delays into different operations so as to be able to test the 6600-style matrices and the CompUnits. Countdown timers wait for (defined) periods before indicating that the output is valid

A “real” integer ALU would place the answers onto the output bus after only one cycle (sync)

class soc.experiment.alu_hier.ALU(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

class soc.experiment.alu_hier.Adder(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.alu_hier.BranchALU(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

class soc.experiment.alu_hier.BranchOp(width, op)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.alu_hier.Dummy¶: Bases: object

class soc.experiment.alu_hier.DummyALU(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

class soc.experiment.alu_hier.Multiplier(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.alu_hier.Shifter(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.alu_hier.SignExtend(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.alu_hier.Subtractor(width)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

soc.experiment.alu_hier.alu_sim(dut)¶

soc.experiment.alu_hier.run_op(dut, a, b, op, inv_a=0)¶

soc.experiment.alu_hier.test_alu()¶

soc.experiment.alu_hier.test_alu_parallel()¶

soc.experiment.alu_hier.write_alu_gtkw(gtkw_name, clk_period=1e-06, sub_module=None, pysim=True)¶: Common function to write the GTKWave documents for this module

soc.experiment.cache_ram module¶

class soc.experiment.cache_ram.CacheRam(ROW_BITS=16, WIDTH=64, TRACE=True, ADD_BUF=False)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

soc.experiment.compalu module¶

class soc.experiment.compalu.ComputationUnitNoDelay(rwid, alu)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

soc.experiment.compalu.op_sim(dut, a, b, op, inv_a=0, imm=0, imm_ok=0)¶

soc.experiment.compalu.scoreboard_sim(dut)¶

soc.experiment.compalu.test_scoreboard()¶

soc.experiment.compalu_multi module¶

Computation Unit (aka “ALU Manager”).

Manages a Pipeline or FSM, ensuring that the start and end time are 100% monitored. At no time may the ALU proceed without this module notifying the Dependency Matrices. At no time is a result production “abandoned”. This module blocks (indicates busy) starting from when it first receives an opcode until it receives notification that its result(s) have been successfully stored in the regfile(s)

Documented at http://libre-soc.org/3d_gpu/architecture/compunit

class soc.experiment.compalu_multi.CompUnitRecord(subkls, rwid, n_src=None, n_dst=None, name=None)¶

Bases: soc.fu.regspec.RegSpec, nmutil.iocontrol.RecordObject

base class for Computation Units, to provide a uniform API and allow “record.connect” etc. to be used, particularly when it comes to connecting multiple Computation Units up as a block (very laborious)

LDSTCompUnitRecord should derive from this class and add the additional signals it requires

Subkls:	the class (not an instance) needed to construct the opcode
Rwid:	either an integer (specifies width of all regs) or a “regspec”

see https://libre-soc.org/3d_gpu/architecture/regfile/ section on regspecs

class soc.experiment.compalu_multi.MultiCompUnit(rwid, alu, opsubsetkls, n_src=2, n_dst=1, name=None)¶

Bases: soc.fu.regspec.RegSpecALUAPI, nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

get_fu_out(i)¶

ports()¶

soc.experiment.compalu_multi.find_ok(fields)¶: find_ok helper function - finds field ending in “_ok”

soc.experiment.compalu_multi.go_record(n, name)¶

soc.experiment.compldst_multi module¶

LOAD / STORE Computation Unit.

This module covers POWER9-compliant Load and Store operations, with selection on each between immediate and indexed mode as options for the calculation of the Effective Address (EA), and also “update” mode which optionally stores that EA into an additional register.

take the time to review the links, video, and diagram.¶

Stores are activated when Go_Store is enabled, and use a sync’d “ADD” to compute the “Effective Address”, and, when ready the operand (src3_i) is stored in the computed address (passed through to the PortInterface)

Loads are activated when Go_Write[0] is enabled. The EA is computed, and (as long as there was no exception) the data comes out (at any time from the PortInterface), and is captured by the LDCompSTUnit.

Both LD and ST may request that the address be computed from summing operand1 (src[0]) with operand2 (src[1]) or by summing operand1 with the immediate (from the opcode).

Both LD and ST may also request “update” mode (op_is_update) which activates the use of Go_Write[1] to control storage of the EA into a second operand in the register file.

Thus this module has TWO write-requests to the register file and THREE read-requests to the register file (not all at the same time!) The regfile port usage is:

LD-imm 1R1W

LD-imm-update 1R2W

LD-idx 2R1W

LD-idx-update 2R2W

ST-imm 2R

ST-imm-update 2R1W

ST-idx 3R

ST-idx-update 3R1W

It’s a multi-level Finite State Machine that (unfortunately) nmigen.FSM is not suited to (nmigen.FSM is clock-driven, and some aspects of the nested FSMs below are combinatorial).

One FSM covers Operand collection and communication address-side with the LD/ST PortInterface. its role ends when “RD_DONE” is asserted

A second FSM activates to cover LD. it activates if op_is_ld is true

A third FSM activates to cover ST. it activates if op_is_st is true

The “overall” (fourth) FSM coordinates the progression and completion of the three other FSMs, firing “WR_RESET” which switches off “busy”

Full diagram:

https://libre-soc.org/3d_gpu/ld_st_comp_unit.jpg

Links including to walk-through videos:

https://libre-soc.org/3d_gpu/architecture/6600scoreboard/

http://libre-soc.org/openpower/isa/fixedload

http://libre-soc.org/openpower/isa/fixedstore

Related Bugreports:

https://bugs.libre-soc.org/show_bug.cgi?id=302

https://bugs.libre-soc.org/show_bug.cgi?id=216

Terminology:

EA - Effective Address

LD - Load

ST - Store

class soc.experiment.compldst_multi.LDSTCompUnit(pi=None, rwid=64, awid=48, opsubset=<class 'soc.fu.ldst.ldst_input_record.CompLDSTOpSubset'>, debugtest=False, name=None)¶

Bases: soc.fu.regspec.RegSpecAPI, nmigen.hdl.ir.Elaboratable

LOAD / STORE Computation Unit

pi: a PortInterface to the memory subsystem (read-write capable)
rwid: register width
awid: address width

src_i: Source Operands (RA/RB/RC) - managed by rd[0-3] go/req

data_o: Dest out (LD) - managed by wr[0] go/req
addr_o: Address out (LD or ST) - managed by wr[1] go/req
exception_o: Address/Data Exception occurred. LD/ST must terminate

TODO: make exception_o a data-type rather than a single-bit signal: (see bug #302)

oper_i: operation being carried out (POWER9 decode LD/ST subset)
issue_i: LD/ST is being “issued”.
shadown_i: Inverted-shadow is being held (stops STORE and WRITE)
go_rd_i: read is being actioned (latches in src regs)
go_wr_i: write mode (exactly like ALU CompUnit)
go_ad_i: address is being actioned (triggers actual mem LD)
go_st_i: store is being actioned (triggers actual mem STORE)
go_die_i: resets the unit back to “wait for issue”

busy_o: function unit is busy
rd_rel_o: request src1/src2
adr_rel_o: request address (from mem)
sto_rel_o: request store (to mem)
req_rel_o: request write (result)
load_mem_o: activate memory LOAD
stwd_mem_o: activate memory STORE

Note: load_mem_o, stwd_mem_o and req_rel_o MUST all be acknowledged in a single cycle and the CompUnit set back to doing another op. This means deasserting go_st_i, go_ad_i or go_wr_i as appropriate depending on whether the operation is a ST or LD.

Note: LDSTCompUnit takes care of LE/BE normalisation: * LD data is normalised after receipt from the PortInterface * ST data is normalised prior to sending onto the PortInterface TODO: use one module for the byte-reverse as it’s quite expensive in gates

elaborate(platform)¶

get_fu_out(i)¶

get_out(i)¶: make LDSTCompUnit look like RegSpecALUAPI

ports()¶

class soc.experiment.compldst_multi.LDSTCompUnitRecord(rwid, opsubset=<class 'soc.fu.ldst.ldst_input_record.CompLDSTOpSubset'>, name=None)¶: Bases: soc.experiment.compalu_multi.CompUnitRecord

class soc.experiment.compldst_multi.TestLDSTCompUnit(rwid)¶

Bases: soc.experiment.compldst_multi.LDSTCompUnit

elaborate(platform)¶

class soc.experiment.compldst_multi.TestLDSTCompUnitRegSpec¶

Bases: soc.experiment.compldst_multi.LDSTCompUnit

elaborate(platform)¶

soc.experiment.compldst_multi.ldst_sim(dut)¶

soc.experiment.compldst_multi.load(dut, src1, src2, imm, imm_ok=True, update=False, zero_a=False, byterev=True)¶

soc.experiment.compldst_multi.store(dut, src1, src2, src3, imm, imm_ok=True, update=False, byterev=True)¶

soc.experiment.compldst_multi.test_scoreboard()¶

soc.experiment.compldst_multi.test_scoreboard_regspec()¶

soc.experiment.compldst_multi.wait_for(sig, wait=True, test1st=False)¶

soc.experiment.cscore module¶

soc.experiment.cxxsim module¶

soc.experiment.dcache module¶

DCache

based on Anton Blanchard microwatt dcache.vhdl

soc.experiment.dcache.CacheRamOut()¶

soc.experiment.dcache.CacheTagArray()¶

soc.experiment.dcache.CacheValidBitsArray()¶

class soc.experiment.dcache.DCache¶

Bases: nmigen.hdl.ir.Elaboratable

Set associative dcache write-through TODO (in no specific order): * See list in icache.vhdl * Complete load misses on the cycle when WB data comes instead of

at the end of line (this requires dealing with requests coming in while not idle…)

cache_tag_read(m, r0_stall, req_index, cache_tag_set, cache_tags)¶: Cache tag RAM read port

dcache_fast_hit(m, req_op, r0_valid, r0, r1, req_hit_way, req_index, req_tag, access_ok, tlb_hit, tlb_hit_way, tlb_req_index)¶

dcache_log(m, r1, valid_ra, tlb_hit_way, stall_out)¶

dcache_request(m, r0, ra, req_index, req_row, req_tag, r0_valid, r1, cache_valids, replace_way, use_forward1_next, use_forward2_next, req_hit_way, plru_victim, rc_ok, perm_attr, valid_ra, perm_ok, access_ok, req_op, req_go, tlb_pte_way, tlb_hit, tlb_hit_way, tlb_valid_way, cache_tag_set, cancel_store, req_same_tag, r0_stall, early_req_row)¶: Cache request parsing and hit detection

dcache_slow(m, r1, use_forward1_next, use_forward2_next, cache_valids, r0, replace_way, req_hit_way, req_same_tag, r0_valid, req_op, cache_tags, req_go, ra)¶

elaborate(platform)¶

maybe_plrus(m, r1, plru_victim)¶: Generate PLRUs

maybe_tlb_plrus(m, r1, tlb_plru_victim)¶: Generate TLB PLRUs

rams(m, r1, early_req_row, cache_out_row, replace_way)¶

Generate a cache RAM for each way. This handles the normal reads, writes from reloads and the special store-hit update path as well.

Note: the BRAMs have an extra read buffer, meaning the output is pipelined an extra cycle. This differs from the icache. The writeback logic needs to take that into account by using 1-cycle delayed signals for load hits.

reservation_comb(m, cancel_store, set_rsrv, clear_rsrv, r0_valid, r0, reservation)¶: Handle load-with-reservation and store-conditional instructions

reservation_reg(m, r0_valid, access_ok, set_rsrv, clear_rsrv, reservation, r0)¶

stage_0(m, r0, r1, r0_full)¶: Latch the request in r0.req as long as we’re not stalling

tlb_read(m, r0_stall, tlb_valid_way, tlb_tag_way, tlb_pte_way, dtlb_valid_bits, dtlb_tags, dtlb_ptes)¶: TLB Operates in the second cycle on the request latched in r0.req. TLB updates write the entry at the end of the second cycle.

tlb_search(m, tlb_req_index, r0, r0_valid, tlb_valid_way, tlb_tag_way, tlb_hit_way, tlb_pte_way, pte, tlb_hit, valid_ra, perm_attr, ra)¶

tlb_update(m, r0_valid, r0, dtlb_valid_bits, tlb_req_index, tlb_hit_way, tlb_hit, tlb_plru_victim, tlb_tag_way, dtlb_tags, tlb_pte_way, dtlb_ptes)¶

writeback_control(m, r1, cache_out_row)¶: Return data for loads & completion control logic

class soc.experiment.dcache.DCachePendingHit(tlb_pte_way, tlb_valid_way, tlb_hit_way, cache_valid_idx, cache_tag_set, req_addr, hit_set)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.dcache.DTLBUpdate¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

soc.experiment.dcache.HitWaySet()¶

class soc.experiment.dcache.MemAccessRequest(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.dcache.Op¶

Bases: enum.Enum

An enumeration.

OP_BAD = 1¶

OP_LOAD_HIT = 3¶

OP_LOAD_MISS = 4¶

OP_LOAD_NC = 5¶

OP_NONE = 0¶

OP_STCX_FAIL = 2¶

OP_STORE_HIT = 6¶

OP_STORE_MISS = 7¶

soc.experiment.dcache.PLRUOut()¶

class soc.experiment.dcache.PermAttr(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.dcache.RegStage0(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.dcache.RegStage1(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.dcache.Reservation¶: Bases: nmutil.iocontrol.RecordObject

soc.experiment.dcache.RowPerLineValidArray()¶

class soc.experiment.dcache.State¶

Bases: enum.Enum

An enumeration.

IDLE = 0¶

NC_LOAD_WAIT_ACK = 3¶

RELOAD_WAIT_ACK = 1¶

STORE_WAIT_ACK = 2¶

soc.experiment.dcache.TLBPLRUOut()¶

soc.experiment.dcache.TLBPtesArray()¶

soc.experiment.dcache.TLBTagEAArray()¶

soc.experiment.dcache.TLBTagsArray()¶

soc.experiment.dcache.TLBValidBitsArray()¶

soc.experiment.dcache.dcache_load(dut, addr, nc=0)¶

soc.experiment.dcache.dcache_random_sim(dut)¶

soc.experiment.dcache.dcache_sim(dut)¶

soc.experiment.dcache.dcache_store(dut, addr, data, nc=0)¶

soc.experiment.dcache.extract_perm_attr(pte)¶

soc.experiment.dcache.get_index(addr)¶

soc.experiment.dcache.get_row(addr)¶

soc.experiment.dcache.get_row_of_line(row)¶

soc.experiment.dcache.get_tag(addr)¶

soc.experiment.dcache.is_last_row(row, last)¶

soc.experiment.dcache.is_last_row_addr(addr, last)¶

soc.experiment.dcache.ispow2(x)¶

soc.experiment.dcache.next_row(row)¶

soc.experiment.dcache.read_tag(way, tagset)¶

soc.experiment.dcache.read_tlb_pte(way, ptes)¶

soc.experiment.dcache.read_tlb_tag(way, tags)¶

soc.experiment.dcache.test_dcache(mem, test_fn, test_name)¶

soc.experiment.dcache.write_tlb_pte(way, ptes, newpte)¶

soc.experiment.dcache.write_tlb_tag(way, tags, tag)¶

soc.experiment.icache module¶

ICache

based on Anton Blanchard microwatt icache.vhdl

Set associative icache

TODO (in no specific order): * Add debug interface to inspect cache content * Add snoop/invalidate path * Add multi-hit error detection * Pipelined bus interface (wb or axi) * Maybe add parity? There’s a few bits free in each BRAM row on Xilinx * Add optimization: service hits on partially loaded lines * Add optimization: (maybe) interrupt reload on fluch/redirect * Check if playing with the geometry of the cache tags allow for more

efficient use of distributed RAM and less logic/muxes. Currently we write TAG_BITS width which may not match full ram blocks and might cause muxes to be inferred for “partial writes”.

Check if making the read size of PLRU a ROM helps utilization

soc.experiment.icache.CacheRamOut()¶

soc.experiment.icache.CacheTagArray()¶

soc.experiment.icache.CacheValidBitsArray()¶

class soc.experiment.icache.ICache¶

Bases: nmigen.hdl.ir.Elaboratable

64 bit direct mapped icache. All instructions are 4B aligned.

elaborate(platform)¶

icache_comb(m, use_previous, r, req_index, req_row, req_hit_way, req_tag, real_addr, req_laddr, cache_valid_bits, cache_tags, access_ok, req_is_hit, req_is_miss, replace_way, plru_victim, cache_out_row)¶

icache_hit(m, use_previous, r, req_is_hit, req_hit_way, req_index, req_tag, real_addr)¶

icache_log(m, req_hit_way, ra_valid, access_ok, req_is_miss, req_is_hit, lway, wstate, r)¶

icache_miss(m, cache_valid_bits, r, req_is_miss, req_index, req_laddr, req_tag, replace_way, cache_tags, access_ok, real_addr)¶

icache_miss_clr_tag(m, r, replace_way, cache_valid_bits, req_index, tagset, cache_tags)¶

icache_miss_idle(m, r, req_is_miss, req_laddr, req_index, req_tag, replace_way, real_addr)¶

icache_miss_wait_ack(m, r, replace_way, inval_in, stbs_done, cache_valid_bits)¶

itlb_lookup(m, tlb_req_index, itlb_ptes, itlb_tags, real_addr, itlb_valid_bits, ra_valid, eaa_priv, priv_fault, access_ok)¶

itlb_update(m, itlb_valid_bits, itlb_tags, itlb_ptes)¶

maybe_plrus(m, r, plru_victim)¶

rams(m, r, cache_out_row, use_previous, replace_way, req_row)¶

soc.experiment.icache.PLRUOut()¶

class soc.experiment.icache.RegInternal¶: Bases: nmutil.iocontrol.RecordObject

soc.experiment.icache.RowPerLineValidArray()¶

class soc.experiment.icache.State¶

Bases: enum.Enum

An enumeration.

CLR_TAG = 1¶

IDLE = 0¶

WAIT_ACK = 2¶

soc.experiment.icache.TLBPtesArray()¶

soc.experiment.icache.TLBTagArray()¶

soc.experiment.icache.TLBValidBitsArray()¶

soc.experiment.icache.get_index(addr)¶

soc.experiment.icache.get_row(addr)¶

soc.experiment.icache.get_row_of_line(row)¶

soc.experiment.icache.get_tag(addr)¶

soc.experiment.icache.hash_ea(addr)¶

soc.experiment.icache.icache_sim(dut)¶

soc.experiment.icache.is_last_row(row, last)¶

soc.experiment.icache.is_last_row_addr(addr, last)¶

soc.experiment.icache.ispow2(n)¶

soc.experiment.icache.next_row(row)¶

soc.experiment.icache.read_insn_word(addr, data)¶

soc.experiment.icache.read_tag(way, tagset)¶

soc.experiment.icache.test_icache(mem)¶

soc.experiment.icache.write_tag(way, tagset, tag)¶

soc.experiment.imem module¶

class soc.experiment.imem.TestMemFetchUnit(pspec)¶

Bases: soc.minerva.units.fetch.FetchUnitInterface, nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

soc.experiment.l0_cache module¶

L0 Cache/Buffer

This first version is intended for prototyping and test purposes: it has “direct” access to Memory.

The intention is that this version remains an integral part of the test infrastructure, and, just as with minerva’s memory arrangement, a dynamic runtime config selects alternative memory arrangements rather than replaces and discards this code.

Links:

class soc.experiment.l0_cache.CacheRecord(name=None)¶: Bases: nmigen.hdl.rec.Record

class soc.experiment.l0_cache.DataMerger(array_size)¶

Bases: nmigen.hdl.ir.Elaboratable

Merges data based on an address-match matrix. Identifies (picks) one (any) row, then uses that row, based on matching address bits, to merge (OR) all data rows into the output.

Basically, by the time DataMerger is used, all of its incoming data is determined not to conflict. The last step before actually submitting the request to the Memory Subsystem is to work out which requests, on the same 128-bit cache line, can be “merged” due to them being: (A) on the same address (bits 4 and above) (B) having byte-enable lines that (as previously mentioned) do not conflict.

Therefore, put simply, this module will: (1) pick a row (any row) and identify it by an index labelled “idx” (2) merge all byte-enable lines which are on that same address, as

indicated by addr_match_i[idx], onto the output

elaborate(platform)¶

class soc.experiment.l0_cache.DataMergerRecord(name=None)¶

Bases: nmigen.hdl.rec.Record

{data: 128 bit, byte_enable: 16 bit}

class soc.experiment.l0_cache.L0CacheBuffer(n_units, pimem, regwid=64, addrwid=48)¶

Bases: nmigen.hdl.ir.Elaboratable

L0 Cache / Buffer

Note that the final version will have two interfaces per LDSTCompUnit, to cover mis-aligned requests, as well as two 128-bit L1 Cache interfaces: one for odd (addr[4] == 1) and one for even (addr[4] == 1).

This version is to be used for test purposes (and actively maintained for such, rather than “replaced”)

There are much better ways to implement this. However it’s only a “demo” / “test” class, and one important aspect: it responds combinatorially, where a nmigen FSM’s state-changes only activate on clock-sync boundaries.

Note: the data byte-order is not expected to be normalised (LE/BE) by this class. That task is taken care of by LDSTCompUnit.

elaborate(platform)¶

ports()¶

class soc.experiment.l0_cache.L0CacheBuffer2(n_units=8, regwid=64, addrwid=48)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.l0_cache.TestDataMerger(methodName='runTest')¶

Bases: unittest.case.TestCase

test_data_merger()¶

class soc.experiment.l0_cache.TestDualPortSplitter(methodName='runTest')¶

Bases: unittest.case.TestCase

test_dual_port_splitter()¶

class soc.experiment.l0_cache.TestL0Cache(methodName='runTest')¶

Bases: unittest.case.TestCase

test_l0_cache_test_bare_wb()¶

test_l0_cache_testpi()¶

class soc.experiment.l0_cache.TstDataMerger2¶

Bases: nmigen.hdl.ir.Elaboratable

addr_match(j, addr)¶

elaborate(platform)¶

class soc.experiment.l0_cache.TstL0CacheBuffer(pspec, n_units=3)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

soc.experiment.l0_cache.data_merger_merge(dut)¶

soc.experiment.l0_cache.data_merger_test2(dut)¶

soc.experiment.l0_cache.l0_cache_ld(dut, addr, datalen, expected)¶

soc.experiment.l0_cache.l0_cache_ldst(arg, dut)¶

soc.experiment.l0_cache.l0_cache_st(dut, addr, data, datalen)¶

soc.experiment.l0_cache.wait_addr(port)¶

soc.experiment.l0_cache.wait_busy(port, no=False)¶

soc.experiment.l0_cache.wait_ldok(port)¶

soc.experiment.lsmem module¶

class soc.experiment.lsmem.TestMemLoadStoreUnit(pspec)¶

Bases: soc.minerva.units.loadstore.LoadStoreUnitInterface, nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

soc.experiment.mem_types module¶

mem_types

based on Anton Blanchard microwatt common.vhdl

class soc.experiment.mem_types.DCacheToLoadStore1Type(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.DCacheToMMUType(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.Fetch1ToICacheType(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.ICacheToDecode1Type(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.LoadStore1ToDCacheType(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.LoadStore1ToMMUType(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.MMUToDCacheType(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.MMUToICacheType(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mem_types.MMUToLoadStore1Type(name=None)¶: Bases: nmutil.iocontrol.RecordObject

soc.experiment.mmu module¶

MMU

based on Anton Blanchard microwatt mmu.vhdl

class soc.experiment.mmu.MMU¶

Bases: nmigen.hdl.ir.Elaboratable

Radix MMU

Supports 4-level trees as in arch 3.0B, but not the two-step translation for guests under a hypervisor (i.e. there is no gRA -> hRA translation).

elaborate(platform)¶

mmu_0(m, r, rin, l_in, l_out, d_out, addrsh, mask)¶

proc_tbl_wait(m, v, r, data)¶

radix_read_wait(m, v, r, d_in, data)¶

radix_tree_idle(m, l_in, r, v)¶

segment_check(m, v, r, data, finalmask)¶

class soc.experiment.mmu.RegStage(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.mmu.State¶

Bases: enum.Enum

An enumeration.

DO_TLBIE = 1¶

IDLE = 0¶

PROC_TBL_READ = 3¶

PROC_TBL_WAIT = 4¶

RADIX_FINISH = 9¶

RADIX_LOAD_TLB = 8¶

RADIX_LOOKUP = 6¶

RADIX_READ_WAIT = 7¶

SEGMENT_CHECK = 5¶

TLB_WAIT = 2¶

soc.experiment.mmu.dcache_get(dut)¶: simulator process for getting memory load requests

soc.experiment.mmu.mmu_sim(dut)¶

soc.experiment.mmu.mmu_wait(dut)¶

soc.experiment.mmu.test_mmu()¶

soc.experiment.pi2ls module¶

PortInterface to LoadStoreUnitInterface adapter

PortInterface LoadStoreUnitInterface ————- ———————-

is_ld_i/1 x_ld_i is_st_i/1 x_st_i

data_len/4 x_mask/16 (translate using LenExpand)

busy_o/1 most likely to be x_busy_o go_die_i/1 rst? addr.data/48 x_addr_i (x_addr_i[:4] goes into LenExpand) addr.ok/1 probably x_valid_i & ~x_stall_i

addr_ok_o/1 no equivalent. might work using x_stall_i exception_o/2(?) m_load_err_o and m_store_err_o

ld.data/64 m_ld_data_o ld.ok/1 probably implicit, when x_busy drops low st.data/64 x_st_data_i st.ok/1 probably kinda redundant, set to x_st_i

class soc.experiment.pi2ls.Pi2LSUI(name, lsui=None, data_wid=64, mask_wid=8, addr_wid=48)¶

Bases: soc.experiment.pimem.PortInterfaceBase

elaborate(platform)¶

get_rd_data(m)¶

set_rd_addr(m, addr, mask)¶

set_wr_addr(m, addr, mask)¶

set_wr_data(m, data, wen)¶

class soc.experiment.pi2ls.Pi2LSUI1(name, pi=None, lsui=None, data_wid=64, mask_wid=8, addr_wid=48)¶

Bases: nmigen.hdl.ir.Elaboratable

connect_port(inport)¶

elaborate(platform)¶

splitaddr(addr)¶: split the address into top and bottom bits of the memory granularity

soc.experiment.pimem module¶

L0 Cache/Buffer

This first version is intended for prototyping and test purposes: it has “direct” access to Memory.

The intention is that this version remains an integral part of the test infrastructure, and, just as with minerva’s memory arrangement, a dynamic runtime config selects alternative memory arrangements rather than replaces and discards this code.

Links:

class soc.experiment.pimem.PortInterface(name=None, regwid=64, addrwid=48)¶

Bases: nmutil.iocontrol.RecordObject

defines the interface - the API - that the LDSTCompUnit connects to. note that this is NOT a “fire-and-forget” interface. the LDSTCompUnit must be kept appraised that the request is in progress, and only when it has a 100% successful completion can the notification be given (busy dropped).

The interface FSM rules are as follows:

if busy_o is asserted, a LD/ST is in progress. further requests may not be made until busy_o is deasserted.
only one of is_ld_i or is_st_i may be asserted. busy_o will immediately be asserted and remain asserted.
addr.ok is to be asserted when the LD/ST address is known. addr.data is to be valid on the same cycle.

addr.ok and addr.data must REMAIN asserted until busy_o is de-asserted. this ensures that there is no need for the L0 Cache/Buffer to have an additional address latch (because the LDSTCompUnit already has it)
addr_ok_o (or exception.happened) must be waited for. these will be asserted only for one cycle and one cycle only.
exception.happened will be asserted if there is no chance that the memory request may be fulfilled.

busy_o is deasserted on the same cycle as exception.happened is asserted.
conversely: addr_ok_o must ONLY be asserted if there is a HUNDRED PERCENT guarantee that the memory request will be fulfilled.
for a LD, ld.ok will be asserted - for only one clock cycle - at any point in the future that is acceptable to the underlying Memory subsystem. the recipient MUST latch ld.data on that cycle.

busy_o is deasserted on the same cycle as ld.ok is asserted.
for a ST, st.ok may be asserted only after addr_ok_o had been asserted, alongside valid st.data at the same time. st.ok must only be asserted for one cycle.

the underlying Memory is REQUIRED to pick up that data and guarantee its delivery. no back-acknowledgement is required.

busy_o is deasserted on the cycle AFTER st.ok is asserted.

connect_port(inport)¶

class soc.experiment.pimem.PortInterfaceBase(regwid=64, addrwid=4)¶

Bases: nmigen.hdl.ir.Elaboratable

Base class for PortInterface-compliant Memory read/writers

addrbits¶

connect_port(inport)¶

elaborate(platform)¶

get_rd_data(m)¶

ports()¶

set_rd_addr(m, addr, mask)¶

set_wr_addr(m, addr, mask)¶

set_wr_data(m, data, wen)¶

splitaddr(addr)¶: split the address into top and bottom bits of the memory granularity

class soc.experiment.pimem.TestMemoryPortInterface(regwid=64, addrwid=4)¶

Bases: soc.experiment.pimem.PortInterfaceBase

This is a test class for simple verification of the LDSTCompUnit and for the simple core, to be able to run unit tests rapidly and with less other code in the way.

Versions of this which are compatible (conform with PortInterface) will include augmented-Wishbone Bus versions, including ones that connect to L1, L2, MMU etc. etc. however this is the “base lowest possible version that complies with PortInterface”.

elaborate(platform)¶

get_rd_data(m)¶

ports()¶

set_rd_addr(m, addr, mask)¶

set_wr_addr(m, addr, mask)¶

set_wr_data(m, data, wen)¶

soc.experiment.plru module¶

class soc.experiment.plru.PLRU(BITS=2)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

soc.experiment.score6600 module¶

class soc.experiment.score6600.CompUnitALUs(rwid, opwid, n_alus)¶

Bases: soc.experiment.score6600.CompUnitsBase

elaborate(platform)¶

class soc.experiment.score6600.CompUnitBR(rwid, opwid)¶

Bases: soc.experiment.score6600.CompUnitsBase

elaborate(platform)¶

class soc.experiment.score6600.CompUnitLDSTs(rwid, opwid, n_ldsts, mem)¶

Bases: soc.experiment.score6600.CompUnitsBase

elaborate(platform)¶

class soc.experiment.score6600.CompUnitsBase(rwid, units, ldstmode=False)¶

Bases: nmigen.hdl.ir.Elaboratable

Computation Unit Base class.

Amazingly, this class works recursively. It’s supposed to just look after some ALUs (that can handle the same operations), grouping them together, however it turns out that the same code can also group groups of Computation Units together as well.

Basically it was intended just to concatenate the ALU’s issue, go_rd etc. signals together, which start out as bits and become sequences. Turns out that the same trick works just as well on Computation Units!

So this class may be used recursively to present a top-level sequential concatenation of all the signals in and out of ALUs, whilst at the same time making it convenient to group ALUs together.

At the lower level, the intent is that groups of (identical) ALUs may be passed the same operation. Even beyond that, the intent is that that group of (identical) ALUs actually share the same pipeline and as such become a “Concurrent Computation Unit” as defined by Mitch Alsup (see section 11.4.9.3)

elaborate(platform)¶

class soc.experiment.score6600.FunctionUnits(n_regs, n_int_alus)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.score6600.IssueToScoreboard(qlen, n_in, n_out, rwid, opwid, n_regs)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

class soc.experiment.score6600.Scoreboard(rwid, n_regs)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

soc.experiment.score6600.create_random_ops(dut, n_ops, shadowing=False, max_opnums=3)¶

soc.experiment.score6600.disable_issue(dut)¶

soc.experiment.score6600.instr_q(dut, op, funit, op_imm, imm, src1, src2, dest, branch_success, branch_fail)¶

soc.experiment.score6600.int_instr(dut, op, imm, src1, src2, dest, branch_success, branch_fail)¶

soc.experiment.score6600.power_instr_q(dut, pdecode2, ins, code)¶

soc.experiment.score6600.power_sim(m, dut, pdecode2, instruction, alusim)¶

soc.experiment.score6600.print_reg(dut, rnums)¶

soc.experiment.score6600.scoreboard_branch_sim(dut, alusim)¶

soc.experiment.score6600.scoreboard_sim(dut, alusim)¶

soc.experiment.score6600.test_scoreboard()¶

soc.experiment.score6600.wait_for_busy_clear(dut)¶

soc.experiment.score6600.wait_for_issue(dut, dut_issue)¶

soc.experiment.score6600_multi module¶

class soc.experiment.score6600_multi.CompUnitALUs(rwid, opwid, n_alus)¶

Bases: soc.experiment.score6600_multi.CompUnitsBase

elaborate(platform)¶

class soc.experiment.score6600_multi.CompUnitBR(rwid, opwid)¶

Bases: soc.experiment.score6600_multi.CompUnitsBase

elaborate(platform)¶

class soc.experiment.score6600_multi.CompUnitLDSTs(rwid, opwid, n_ldsts, l0)¶

Bases: soc.experiment.score6600_multi.CompUnitsBase

elaborate(platform)¶

class soc.experiment.score6600_multi.CompUnitsBase(rwid, units, ldstmode=False)¶

Bases: nmigen.hdl.ir.Elaboratable

Computation Unit Base class.

Amazingly, this class works recursively. It’s supposed to just look after some ALUs (that can handle the same operations), grouping them together, however it turns out that the same code can also group groups of Computation Units together as well.

Basically it was intended just to concatenate the ALU’s issue, go_rd etc. signals together, which start out as bits and become sequences. Turns out that the same trick works just as well on Computation Units!

So this class may be used recursively to present a top-level sequential concatenation of all the signals in and out of ALUs, whilst at the same time making it convenient to group ALUs together.

At the lower level, the intent is that groups of (identical) ALUs may be passed the same operation. Even beyond that, the intent is that that group of (identical) ALUs actually share the same pipeline and as such become a “Concurrent Computation Unit” as defined by Mitch Alsup (see section 11.4.9.3)

elaborate(platform)¶

class soc.experiment.score6600_multi.FunctionUnits(n_reg, n_int_alus, n_src, n_dst)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

class soc.experiment.score6600_multi.IssueToScoreboard(qlen, n_in, n_out, rwid, opwid, n_regs)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

class soc.experiment.score6600_multi.Scoreboard(rwid, n_regs)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

soc.experiment.score6600_multi.create_random_ops(dut, n_ops, shadowing=False, max_opnums=3)¶

soc.experiment.score6600_multi.disable_issue(dut)¶

soc.experiment.score6600_multi.instr_q(dut, op, funit, op_imm, imm, src1, src2, dest, branch_success, branch_fail)¶

soc.experiment.score6600_multi.int_instr(dut, op, imm, src1, src2, dest, branch_success, branch_fail)¶

soc.experiment.score6600_multi.power_instr_q(dut, pdecode2, ins, code)¶

soc.experiment.score6600_multi.power_sim(m, dut, pdecode2, instruction, alusim)¶

soc.experiment.score6600_multi.print_reg(dut, rnums)¶

soc.experiment.score6600_multi.scoreboard_branch_sim(dut, alusim)¶

soc.experiment.score6600_multi.scoreboard_sim(dut, alusim)¶

soc.experiment.score6600_multi.test_scoreboard()¶

soc.experiment.score6600_multi.wait_for_busy_clear(dut)¶

soc.experiment.score6600_multi.wait_for_issue(dut, dut_issue)¶

soc.experiment.sim module¶

class soc.experiment.sim.MemSim(regwid, addrw)¶

Bases: object

ld(addr)¶

st(addr, data)¶

class soc.experiment.sim.RegSim(rwidth, nregs)¶

Bases: object

check(dut)¶

dump(dut)¶

op(op, op_imm, imm, src1, src2, dest)¶

setval(dest, val)¶

soc.experiment.testmem module¶

class soc.experiment.testmem.TestMemory(regwid, addrw, granularity=None, init=True, readonly=False)¶

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)¶

ports()¶

soc.experiment.wb_types module¶

wb_types

based on Anton Blanchard microwatt wishbone_types.vhdl

soc.experiment.wb_types.WBAddrType()¶

soc.experiment.wb_types.WBDataType()¶

class soc.experiment.wb_types.WBIOMasterOut(name=None)¶: Bases: nmutil.iocontrol.RecordObject

class soc.experiment.wb_types.WBIOSlaveOut(name=None)¶: Bases: nmutil.iocontrol.RecordObject

soc.experiment.wb_types.WBIOSlaveOutInit()¶

class soc.experiment.wb_types.WBMasterOut(name=None)¶: Bases: nmutil.iocontrol.RecordObject

soc.experiment.wb_types.WBMasterOutInit()¶

soc.experiment.wb_types.WBMasterOutVector()¶

soc.experiment.wb_types.WBSelType()¶

class soc.experiment.wb_types.WBSlaveOut(name=None)¶: Bases: nmutil.iocontrol.RecordObject

soc.experiment.wb_types.WBSlaveOutInit()¶

soc.experiment.wb_types.WBSlaveOutVector()¶

soc.experiment package¶

Subpackages¶

Submodules¶

soc.experiment.alu_fsm module¶

soc.experiment.alu_hier module¶

soc.experiment.cache_ram module¶

soc.experiment.compalu module¶

soc.experiment.compalu_multi module¶

soc.experiment.compldst_multi module¶

take the time to review the links, video, and diagram.¶

soc.experiment.cscore module¶

soc.experiment.cxxsim module¶

soc.experiment.dcache module¶

soc.experiment.icache module¶

soc.experiment.imem module¶

soc.experiment.l0_cache module¶

soc.experiment.lsmem module¶

soc.experiment.mem_types module¶

soc.experiment.mmu module¶

soc.experiment.pi2ls module¶

soc.experiment.pimem module¶

soc.experiment.plru module¶

soc.experiment.score6600 module¶

soc.experiment.score6600_multi module¶

soc.experiment.sim module¶

soc.experiment.testmem module¶

soc.experiment.wb_types module¶

Module contents¶