ieee754.part_mul_add package

Submodules

ieee754.part_mul_add.adder module

Partitioned Integer Addition.

See: * https://libre-riscv.org/3d_gpu/architecture/dynamic_simd/add/

class ieee754.part_mul_add.adder.FullAdder(width)

Bases: nmigen.hdl.ir.Elaboratable

Full Adder.

Attribute in0:the first input
Attribute in1:the second input
Attribute in2:the third input
Attribute sum:the sum output
Attribute carry:
 the carry output

Rather than do individual full adders (and have an array of them, which would be very slow to simulate), this module can specify the bit width of the inputs and outputs: in effect it performs multiple Full 3-2 Add operations “in parallel”.

elaborate(platform)

Elaborate this module.

class ieee754.part_mul_add.adder.MaskedFullAdder(width)

Bases: nmigen.hdl.ir.Elaboratable

Masked Full Adder.

Attribute mask:the carry partition mask
Attribute in0:the first input
Attribute in1:the second input
Attribute in2:the third input
Attribute sum:the sum output
Attribute mcarry:
 the masked carry output

FullAdders are always used with a “mask” on the output. To keep the graphviz “clean”, this class performs the masking here rather than inside a large for-loop.

See the following discussion as to why this is no longer derived from FullAdder. Each carry is shifted here before being ANDed with the mask, so that an AOI cell may be used (which is more gate-efficient) https://en.wikipedia.org/wiki/AND-OR-Invert https://groups.google.com/d/msg/comp.arch/fcq-GLQqvas/vTxmcA0QAgAJ

elaborate(platform)

Elaborate this module.

class ieee754.part_mul_add.adder.PartitionedAdder(width, part_pts, partition_step=1)

Bases: nmigen.hdl.ir.Elaboratable

Partitioned Adder.

Performs the final add. The partition points are included in the actual add (in one of the operands only), which causes a carry over to the next bit. Then the final output removes the extra bits from the result.

In the case of no carry: partition: …. P… P… P… P… (32 bits) a : …. …. …. …. …. (32 bits) b : …. …. …. …. …. (32 bits) exp-a : ….P….P….P….P…. (32+4 bits, P=1 if no partition) exp-b : ….0….0….0….0…. (32 bits plus 4 zeros) exp-o : ….xN…xN…xN…xN… (32+4 bits - x to be discarded) o : …. N… N… N… N… (32 bits - x ignored, N is carry-over)

However, with carry the behavior is a little different: partition: p p p p (4 bits) carry-in : c c c c c (5 bits) C = c & P: C C C C c (5 bits) I = P=>c : I I I I c (5 bits) a : AAAA AAAA AAAA AAAA AAAA (32 bits) b : BBBB BBBB BBBB BBBB BBBB (32 bits) exp-a : 0AAAACAAAACAAAACAAAACAAAAc (32+4+2 bits, P=1 if no partition) exp-b : 0BBBBIBBBBIBBBBIBBBBIBBBBc (32+2 bits plus 4 zeros) exp-o : o….oN…oN…oN…oN…x (32+4+2 bits - x to be discarded) o : …. N… N… N… N… (32 bits - x ignored, N is carry-over) carry-out: o o o o o (5 bits)

A couple of differences should be noted:
  • The expanded a/b/o have 2 extra bits added to them. These bits allow the carry-in for the least significant partition to be injected, and the carry out for the most significant partition to be extracted.
  • The partition bits P and 0 in the first example have been replaced with bits C and I. Bits C and I are set to 1 when there is a partition and a carry-in at that position. This has the effect of creating a carry at that position in the expanded adder, while preventing carries from the previous partition from propogating through to the next. These bits are also used to extract the carry-out information for each partition, as when there is a carry out in a partition, the next most significant partition bit will be set to 1

Additionally, the carry-out bits must be rearranged before being output to move the most significant carry bit for each partition into the least significant bit for that partition, as well as to ignore the other carry bits in that partition. This is accomplished by the MoveMSBDown module

Attribute width:
 the bit width of the input and output. Read-only.
Attribute a:the first input to the adder
Attribute b:the second input to the adder
Attribute output:
 the sum output
Attribute part_pts:
 the input partition points. Modification not supported, except for by Signal.eq.
elaborate(platform)

Elaborate this module.

ieee754.part_mul_add.mul_pipe module

Integer Multiplication.

class ieee754.part_mul_add.mul_pipe.AllTermsPipe(pspec, n_inputs)

Bases: nmutil.pipemodbase.PipeModBaseChain

get_chain()

gets module

class ieee754.part_mul_add.mul_pipe.MulPipe_8_16_32_64(id_wid=0, op_wid=0)

Bases: nmutil.singlepipe.ControlBase

Signed/Unsigned 8/16/32/64-bit partitioned integer multiplier pipeline

elaborate(platform)

handles case where stage has dynamic ready/valid functions

ispec()
ospec()
class ieee754.part_mul_add.mul_pipe.MulStages(pspec, part_pts)

Bases: nmutil.pipemodbase.PipeModBaseChain

get_chain()

ieee754.part_mul_add.multiply module

Integer Multiplication.

class ieee754.part_mul_add.multiply.AddReduce(inputs, output_width, register_levels, part_pts, part_ops, partition_step=1)

Bases: ieee754.part_mul_add.multiply.AddReduceInternal, nmigen.hdl.ir.Elaboratable

Recursively Add list of numbers together.

Attribute inputs:
 input Signal``s to be summed. Modification not supported, except for by ``Signal.eq.
Attribute register_levels:
 List of nesting levels that should have pipeline registers.
Attribute output:
 output sum.
Attribute partition_points:
 the input partition points. Modification not supported, except for by Signal.eq.
elaborate(platform)

Elaborate this module.

static get_max_level(input_count)
static next_register_levels(register_levels)

Iterable of register_levels for next recursive level.

class ieee754.part_mul_add.multiply.AddReduceData(part_pts, n_inputs, output_width, n_parts)

Bases: object

eq(rhs)
eq_from(part_pts, inputs, part_ops)
class ieee754.part_mul_add.multiply.AddReduceInternal(pspec, n_inputs, part_pts, partition_step=1)

Bases: object

Iteratively Add list of numbers together.

Attribute inputs:
 input Signal``s to be summed. Modification not supported, except for by ``Signal.eq.
Attribute register_levels:
 List of nesting levels that should have pipeline registers.
Attribute output:
 output sum.
Attribute partition_points:
 the input partition points. Modification not supported, except for by Signal.eq.
create_levels()

creates reduction levels

class ieee754.part_mul_add.multiply.AddReduceSingle(pspec, lidx, n_inputs, partition_points, partition_step=1)

Bases: nmutil.pipemodbase.PipeModBase

Add list of numbers together.

Attribute inputs:
 input Signal``s to be summed. Modification not supported, except for by ``Signal.eq.
Attribute register_levels:
 List of nesting levels that should have pipeline registers.
Attribute output:
 output sum.
Attribute partition_points:
 the input partition points. Modification not supported, except for by Signal.eq.
static calc_n_inputs(n_inputs, groups)
create_next_terms()

create next intermediate terms, for linking up in elaborate, below

elaborate(platform)

Elaborate this module.

static full_adder_groups(input_count)

Get inputs indices for which a full adder should be built.

static get_max_level(input_count)

Get the maximum level.

All register_levels must be less than or equal to the maximum level.

ispec()
ospec()
class ieee754.part_mul_add.multiply.AllTerms(pspec, n_inputs)

Bases: nmutil.pipemodbase.PipeModBase

Set of terms to be added together

elaborate(platform)
ispec()
ospec()
class ieee754.part_mul_add.multiply.FinalAdd(pspec, lidx, n_inputs, partition_points, partition_step=1)

Bases: nmutil.pipemodbase.PipeModBase

Final stage of add reduce

elaborate(platform)

Elaborate this module.

ispec()
ospec()
class ieee754.part_mul_add.multiply.FinalOut(pspec, part_pts)

Bases: nmutil.pipemodbase.PipeModBase

selects the final output based on the partitioning.

each byte is selectable independently, i.e. it is possible that some partitions requested 8-bit computation whilst others requested 16 or 32 bit.

elaborate(platform)
ispec()
ospec()
class ieee754.part_mul_add.multiply.FinalReduceData(part_pts, output_width, n_parts)

Bases: object

eq(rhs)
eq_from(part_pts, output, part_ops)
class ieee754.part_mul_add.multiply.InputData

Bases: object

eq(rhs)
eq_from(part_pts, a, b, part_ops)
class ieee754.part_mul_add.multiply.IntermediateData(part_pts, output_width, n_parts)

Bases: object

eq(rhs)
eq_from(part_pts, outputs, intermediate_output, part_ops)
class ieee754.part_mul_add.multiply.IntermediateOut(width, out_wid, n_parts)

Bases: nmigen.hdl.ir.Elaboratable

selects the HI/LO part of the multiplication, for a given bit-width the output is also reconstructed in its SIMD (partition) lanes.

elaborate(platform)
class ieee754.part_mul_add.multiply.Intermediates(pspec, part_pts)

Bases: nmutil.pipemodbase.PipeModBase

Intermediate output modules

elaborate(platform)
ispec()
ospec()
class ieee754.part_mul_add.multiply.LSBNegTerm(bit_width)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
class ieee754.part_mul_add.multiply.Mul8_16_32_64(register_levels=())

Bases: nmigen.hdl.ir.Elaboratable

Signed/Unsigned 8/16/32/64-bit partitioned integer multiplier.

XXX NOTE: this class is intended for unit test purposes ONLY.

Supports partitioning into any combination of 8, 16, 32, and 64-bit partitions on naturally-aligned boundaries. Supports the operation being set for each partition independently.

Attribute part_pts:
 

the input partition points. Has a partition point at multiples of 8 in 0 < i < 64. Each partition point’s associated Value is a Signal. Modification not supported, except for by Signal.eq.

Attribute part_ops:
 

the operation for each byte. The operation for a particular partition is selected by assigning the selected operation code to each byte in the partition. The allowed operation codes are:

attribute OP_MUL_LOW:
 the LSB half of the product. Equivalent to RISC-V’s mul instruction.
attribute OP_MUL_SIGNED_HIGH:
 the MSB half of the product where both a and b are signed. Equivalent to RISC-V’s mulh instruction.
attribute OP_MUL_SIGNED_UNSIGNED_HIGH:
 the MSB half of the product where a is signed and b is unsigned. Equivalent to RISC-V’s mulhsu instruction.
attribute OP_MUL_UNSIGNED_HIGH:
 the MSB half of the product where both a and b are unsigned. Equivalent to RISC-V’s mulhu instruction.
elaborate(platform)
ispec()
ospec()
class ieee754.part_mul_add.multiply.OrMod(wid)

Bases: nmigen.hdl.ir.Elaboratable

ORs four values together in a hierarchical tree

elaborate(platform)
class ieee754.part_mul_add.multiply.OutputData

Bases: object

eq(rhs)
class ieee754.part_mul_add.multiply.Part(part_pts, width, n_parts, pbwid)

Bases: nmigen.hdl.ir.Elaboratable

a key class which, depending on the partitioning, will determine what action to take when parts of the output are signed or unsigned.

this requires 2 pieces of data per operand, per partition: whether the MSB is HI/LO (per partition!), and whether a signed or unsigned operation has been requested.

once that is determined, signed is basically carried out by splitting 2’s complement into 1’s complement plus one. 1’s complement is just a bit-inversion.

the extra terms - as separate terms - are then thrown at the AddReduce alongside the multiplication part-results.

elaborate(platform)
class ieee754.part_mul_add.multiply.Parts(pbwid, part_pts, n_parts)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
class ieee754.part_mul_add.multiply.ProductTerm(width, twidth, pbwid, a_index, b_index)

Bases: nmigen.hdl.ir.Elaboratable

this class creates a single product term (a[..]*b[..]). it has a design flaw in that is the output that is selected, where the multiplication(s) are combinatorially generated all the time.

elaborate(platform)
class ieee754.part_mul_add.multiply.ProductTerms(width, twidth, pbwid, a_index, blen)

Bases: nmigen.hdl.ir.Elaboratable

creates a bank of product terms. also performs the actual bit-selection this class is to be wrapped with a for-loop on the “a” operand. it creates a second-level for-loop on the “b” operand.

elaborate(platform)
class ieee754.part_mul_add.multiply.Signs

Bases: nmigen.hdl.ir.Elaboratable

determines whether a or b are signed numbers based on the required operation type (OP_MUL_*)

elaborate(platform)
ieee754.part_mul_add.multiply.get_term(value, shift=0, enabled=None)

ieee754.part_mul_add.multiplytmp module

Integer Multiplication.

class ieee754.part_mul_add.multiplytmp.AddReduce(inputs, output_width, register_levels, partition_points, part_ops)

Bases: nmigen.hdl.ir.Elaboratable

Recursively Add list of numbers together.

Attribute inputs:
 input Signal``s to be summed. Modification not supported, except for by ``Signal.eq.
Attribute register_levels:
 List of nesting levels that should have pipeline registers.
Attribute output:
 output sum.
Attribute partition_points:
 the input partition points. Modification not supported, except for by Signal.eq.
create_levels()

creates reduction levels

elaborate(platform)

Elaborate this module.

static get_max_level(input_count)
static next_register_levels(register_levels)

Iterable of register_levels for next recursive level.

class ieee754.part_mul_add.multiplytmp.AddReduceSingle(inputs, output_width, register_levels, partition_points, part_ops)

Bases: nmigen.hdl.ir.Elaboratable

Add list of numbers together.

Attribute inputs:
 input Signal``s to be summed. Modification not supported, except for by ``Signal.eq.
Attribute register_levels:
 List of nesting levels that should have pipeline registers.
Attribute output:
 output sum.
Attribute partition_points:
 the input partition points. Modification not supported, except for by Signal.eq.
create_next_terms()
elaborate(platform)

Elaborate this module.

static full_adder_groups(input_count)

Get inputs indices for which a full adder should be built.

static get_max_level(input_count)

Get the maximum level.

All register_levels must be less than or equal to the maximum level.

class ieee754.part_mul_add.multiplytmp.FinalOut(out_wid)

Bases: nmigen.hdl.ir.Elaboratable

selects the final output based on the partitioning.

each byte is selectable independently, i.e. it is possible that some partitions requested 8-bit computation whilst others requested 16 or 32 bit.

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.FullAdder(width)

Bases: nmigen.hdl.ir.Elaboratable

Full Adder.

Attribute in0:the first input
Attribute in1:the second input
Attribute in2:the third input
Attribute sum:the sum output
Attribute carry:
 the carry output

Rather than do individual full adders (and have an array of them, which would be very slow to simulate), this module can specify the bit width of the inputs and outputs: in effect it performs multiple Full 3-2 Add operations “in parallel”.

elaborate(platform)

Elaborate this module.

class ieee754.part_mul_add.multiplytmp.IntermediateOut(width, out_wid, n_parts)

Bases: nmigen.hdl.ir.Elaboratable

selects the HI/LO part of the multiplication, for a given bit-width the output is also reconstructed in its SIMD (partition) lanes.

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.LSBNegTerm(bit_width)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.MaskedFullAdder(width)

Bases: nmigen.hdl.ir.Elaboratable

Masked Full Adder.

Attribute mask:the carry partition mask
Attribute in0:the first input
Attribute in1:the second input
Attribute in2:the third input
Attribute sum:the sum output
Attribute mcarry:
 the masked carry output

FullAdders are always used with a “mask” on the output. To keep the graphviz “clean”, this class performs the masking here rather than inside a large for-loop.

See the following discussion as to why this is no longer derived from FullAdder. Each carry is shifted here before being ANDed with the mask, so that an AOI cell may be used (which is more gate-efficient) https://en.wikipedia.org/wiki/AND-OR-Invert https://groups.google.com/d/msg/comp.arch/fcq-GLQqvas/vTxmcA0QAgAJ

elaborate(platform)

Elaborate this module.

class ieee754.part_mul_add.multiplytmp.Mul8_16_32_64(register_levels=())

Bases: nmigen.hdl.ir.Elaboratable

Signed/Unsigned 8/16/32/64-bit partitioned integer multiplier.

Supports partitioning into any combination of 8, 16, 32, and 64-bit partitions on naturally-aligned boundaries. Supports the operation being set for each partition independently.

Attribute part_pts:
 

the input partition points. Has a partition point at multiples of 8 in 0 < i < 64. Each partition point’s associated Value is a Signal. Modification not supported, except for by Signal.eq.

Attribute part_ops:
 

the operation for each byte. The operation for a particular partition is selected by assigning the selected operation code to each byte in the partition. The allowed operation codes are:

attribute OP_MUL_LOW:
 the LSB half of the product. Equivalent to RISC-V’s mul instruction.
attribute OP_MUL_SIGNED_HIGH:
 the MSB half of the product where both a and b are signed. Equivalent to RISC-V’s mulh instruction.
attribute OP_MUL_SIGNED_UNSIGNED_HIGH:
 the MSB half of the product where a is signed and b is unsigned. Equivalent to RISC-V’s mulhsu instruction.
attribute OP_MUL_UNSIGNED_HIGH:
 the MSB half of the product where both a and b are unsigned. Equivalent to RISC-V’s mulhu instruction.
elaborate(platform)
class ieee754.part_mul_add.multiplytmp.OrMod(wid)

Bases: nmigen.hdl.ir.Elaboratable

ORs four values together in a hierarchical tree

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.Part(epps, width, n_parts, n_levels, pbwid)

Bases: nmigen.hdl.ir.Elaboratable

a key class which, depending on the partitioning, will determine what action to take when parts of the output are signed or unsigned.

this requires 2 pieces of data per operand, per partition: whether the MSB is HI/LO (per partition!), and whether a signed or unsigned operation has been requested.

once that is determined, signed is basically carried out by splitting 2’s complement into 1’s complement plus one. 1’s complement is just a bit-inversion.

the extra terms - as separate terms - are then thrown at the AddReduce alongside the multiplication part-results.

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.PartitionPoints(partition_points=None)

Bases: dict

Partition points and corresponding ``Value``s.

The points at where an ALU is partitioned along with ``Value``s that specify if the corresponding partition points are enabled.

For example: {1: True, 5: True, 10: True} with width == 16 specifies that the ALU is split into 4 sections: * bits 0 <= i < 1 * bits 1 <= i < 5 * bits 5 <= i < 10 * bits 10 <= i < 16

If the partition_points were instead {1: True, 5: a, 10: True} where a is a 1-bit Signal: * If a is asserted:

  • bits 0 <= i < 1
  • bits 1 <= i < 5
  • bits 5 <= i < 10
  • bits 10 <= i < 16
  • Otherwise
    • bits 0 <= i < 1
    • bits 1 <= i < 10
    • bits 10 <= i < 16
as_mask(width)

Create a bit-mask from self.

Each bit in the returned mask is clear only if the partition point at the same bit-index is enabled.

Parameters:width – the bit width of the resulting mask
eq(rhs)

Assign PartitionPoints using Signal.eq.

fits_in_width(width)

Check if all partition points are smaller than width.

get_max_partition_count(width)

Get the maximum number of partitions.

Gets the number of partitions when all partition points are enabled.

like(name=None, src_loc_at=0, mul=1)

Create a new PartitionPoints with ``Signal``s for all values.

Parameters:
  • name – the base name for the new ``Signal``s.
  • mul – a multiplication factor on the indices
part_byte(index, mfactor=1)
class ieee754.part_mul_add.multiplytmp.PartitionedAdder(width, partition_points)

Bases: nmigen.hdl.ir.Elaboratable

Partitioned Adder.

Performs the final add. The partition points are included in the actual add (in one of the operands only), which causes a carry over to the next bit. Then the final output removes the extra bits from the result.

partition: …. P… P… P… P… (32 bits) a : …. …. …. …. …. (32 bits) b : …. …. …. …. …. (32 bits) exp-a : ….P….P….P….P…. (32+4 bits, P=1 if no partition) exp-b : ….0….0….0….0…. (32 bits plus 4 zeros) exp-o : ….xN…xN…xN…xN… (32+4 bits - x to be discarded) o : …. N… N… N… N… (32 bits - x ignored, N is carry-over)

Attribute width:
 the bit width of the input and output. Read-only.
Attribute a:the first input to the adder
Attribute b:the second input to the adder
Attribute output:
 the sum output
Attribute partition_points:
 the input partition points. Modification not supported, except for by Signal.eq.
elaborate(platform)

Elaborate this module.

class ieee754.part_mul_add.multiplytmp.Parts(pbwid, epps, n_parts)

Bases: nmigen.hdl.ir.Elaboratable

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.ProductTerm(width, twidth, pbwid, a_index, b_index)

Bases: nmigen.hdl.ir.Elaboratable

this class creates a single product term (a[..]*b[..]). it has a design flaw in that is the output that is selected, where the multiplication(s) are combinatorially generated all the time.

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.ProductTerms(width, twidth, pbwid, a_index, blen)

Bases: nmigen.hdl.ir.Elaboratable

creates a bank of product terms. also performs the actual bit-selection this class is to be wrapped with a for-loop on the “a” operand. it creates a second-level for-loop on the “b” operand.

elaborate(platform)
class ieee754.part_mul_add.multiplytmp.Signs

Bases: nmigen.hdl.ir.Elaboratable

determines whether a or b are signed numbers based on the required operation type (OP_MUL_*)

elaborate(platform)
ieee754.part_mul_add.multiplytmp.get_term(value, shift=0, enabled=None)

ieee754.part_mul_add.partpoints module

Integer Multiplication.

class ieee754.part_mul_add.partpoints.PartitionPoints(partition_points=None)

Bases: dict

Partition points and corresponding ``Value``s.

The points at where an ALU is partitioned along with ``Value``s that specify if the corresponding partition points are enabled.

For example: {1: True, 5: True, 10: True} with width == 16 specifies that the ALU is split into 4 sections: * bits 0 <= i < 1 * bits 1 <= i < 5 * bits 5 <= i < 10 * bits 10 <= i < 16

If the partition_points were instead {1: True, 5: a, 10: True} where a is a 1-bit Signal: * If a is asserted:

  • bits 0 <= i < 1
  • bits 1 <= i < 5
  • bits 5 <= i < 10
  • bits 10 <= i < 16
  • Otherwise
    • bits 0 <= i < 1
    • bits 1 <= i < 10
    • bits 10 <= i < 16
as_mask(width, mul=1)

Create a bit-mask from self.

Each bit in the returned mask is clear only if the partition point at the same bit-index is enabled.

Parameters:
  • width – the bit width of the resulting mask
  • mul – a “multiplier” which in-place expands the partition points typically set to “2” when used for multipliers
as_sig()

Create a straight concatenation of self signals

eq(rhs)

Assign PartitionPoints using Signal.eq.

fits_in_width(width)

Check if all partition points are smaller than width.

get_max_partition_count(width)

Get the maximum number of partitions.

Gets the number of partitions when all partition points are enabled.

like(name=None, src_loc_at=0, mul=1)

Create a new PartitionPoints with ``Signal``s for all values.

Parameters:
  • name – the base name for the new ``Signal``s.
  • mul – a multiplication factor on the indices
part_byte(index, mfactor=1)
ieee754.part_mul_add.partpoints.make_partition(mask, width)

from a mask and a bitwidth, create partition points. note that the assumption is that the mask indicates the breakpoints in regular intervals, and that the last bit (MSB) of the mask is therefore ignored. mask len = 4, width == 16 will return:

{4: mask[0], 8: mask[1], 12: mask[2]}
mask len = 8, width == 64 will return:
{8: mask[0], 16: mask[1], 24: mask[2], …. 56: mask[6]}

Module contents