Assembler for imaginary single-instruction CPU

From Electriki
Jump to navigationJump to search


NOTE: this page is copied here because this is (sort of) electronics-related - the original page is here.

Macro-assembler for an imaginary single-instruction CPU

This page demonstrates a macro-assembler for a CPU that does not exist.

Background

I'm (slowly) making a hardware-implementation alongside a simple simulator for a 1-instruction CPU, simple enough to make from scratch, using only transistors and passives.

The main goal is to have fun along the way. Eventually, perhaps, this could be made into an educational tool - every signal is accessible, and there is no "black box" element, as is the case with pretty much any devboard I saw.

Because of the primitive nature of the CPU (1 instruction operating on 1 data-bit), there's a real need for an assembler to create compound statements to implement something resembling native instructions in "normal" CPUs.

CPU

From an assembly point of view, about the only thing worth mentioning is the way the CPU interfaces with the outside world, to give a bit of context for appreciating the semantics of the only instruction available.

Other hardware-properties such as timing, power-supply, subsystems and pin-out are not discussed here.

Summary

The CPU is based around a D-latch and a mux:

CPU

(Additional latches and signals are omitted for convenience. The "+1"-block is meant to increment a 16-bit word. Visible I/O lines are discussed in the following section.)

The latch reads and writes data, 1 bit at a time: it inverts its input-value, and outputs it on the same pin using a relatively high output-impedance.

The mux uses the latch-output/-state to decide between incremented program-counter and branch-address from a dedicated input: if the data-bit is clear after inverting, take the branch, else resume execution at the next instruction.

Peripheral memory

From the CPU's point of view, there are 3 buses as can be seen in the previous figure:

  • address-output, used for indexing a program-ROM
  • branch-address input: alternate next-program-counter value, depending on data-value
  • data-input/-output, used to interface to RAM

Note that, contrary to the ROM, the mentioned RAM is not indexed by the CPU itself.

Although it's possible to create simple state-machines or encoders using e.g. a switch on the 1-bit data-line and connecting address-output to branch-address input in a clever way, the intended use for the CPU is to cooperate with an external RAM as well as a ROM, where part of the ROM-data indexes the RAM:

CPU-system

NOTE: I screwed up the ROM's bitfields in this figure - lower 16 bits are connected to CPU's branch-address input, while higher 16 bits are connected to the RAM's address-input.

As can be seen, the ROM uses half of its data-width to index a RAM, the data of which connects back to the CPU. Reads and writes on the CPU's data-input/-output line thus take place on the RAM.

Bitness..?

To be honest, I don't know how to classify this thing in terms of bitness (i.e. "8-bit", "16-bit", etc.).

If bitness depends on the total width of data-lines going in or out, well, then this would be a 17-bit CPU (16 bits for branch-address input, and 1 bit for true data-input/-output).

Alternatively, one could probably say that the 16 branch-address input bits are a dedicated part of the addressing scheme - they can not be used for anything else. Furthermore, if the latch is considered the only item connected to the data-bus, then this would be a 1-bit CPU.

What do you think..?

Native instruction: IBC (Invert and Branch if Clear)

The only "native instruction" implements a conditional branch, depending on the state of data-input/-output after the latch has inverted it. (There is nothing else this CPU can do, and therefore, the term "instruction" may make little sense; the term "opcode" makes no sense at all here.)

I call this instruction IBC, with semantics "invert data-bit, and take branch if data-bit is clear/zero".

This is the only instruction implemented natively by the assembler; everything else has to come from user-defined macros.

If you play along at home, the format of the IBC-instruction in the examples below is:

ibc <data> <branch>

...where <data> evaluates to the location of the data to operate on (i.e. the bit to flip), and <branch> evaluates to the branch-address to take, in case the bit is clear/zero after inverting. Operands can be literal values, or variables/expressions or labels, respectively.

A program for this CPU ultimately consists only of operand fields for the branch-address input (<branch>) and RAM's address-input (<data>): 16 bits each, making for 32-bit program-words.

Assembler

Although possible, it quickly becomes tedious to program using only IBC. However, more complex pseudo-instructions can be defined in terms of IBC.

To do this, I made a macro-assembler in Tcl (wiki).

I find Tcl to be a brilliant language to implement DSLs, as I have done on a few occasions. The DSL used here and elsewhere is itself Tcl, exploiting its very minimal syntax, so parsing and allowing for inline Tcl-snippets come for free.

Using this tool, compound instructions can be defined as macros, to be instantiated or nested at will. Expansion of all macros in a program would yield a program containing only IBC-instructions.

The following sections give a summary of its features. This is by no means a professional or stable product - I'm just toying around, and pretty new to Tcl myself. Error-checking is currently not implemented, and testing has been pretty minimal so far.

Labels

Labels are placeholders for program-addresses.

Below follows a typical macro-definition, making use of a local label "done":

labels

The label can be referenced in branch-like compound instructions, or in IBC itself. Forward references are allowed. Labels are defined using a colon, e.g. ": here" (note the whitespace).

Note that variables - discussed hereafter - can be dereferenced using the normal Tcl "$", labels can not. This has to do with the fact that it's impossible to know the address corresponding to a label in a forward reference.

Variables

A variable can be declared inside a macro-body, before its use as alternative for literal data-address in an instruction. Variable-declaration is done using a period, e.g. ". mydata" (note the whitespace). Variables are typeless.

An example of declaration and use:

variables

Note that, since the DSL is Tcl, it's perfectly valid to introduce additional (helper-)variables using the normal "set" constructs.

Where "DSL-variables" (declared using a period) get assigned the next free data-address, it's possible to set "Tcl-variables" to arbitrary values, and use them as opcode-fields.

This can for example be useful for specifying address-constants or offsets.

Declaration of a DSL-variable results in a Tcl-variable with the same name coming into existence at that point. Therefore, there is no difference in referencing a DSL-variable or Tcl-variable: both can be used with the usual "$" notation, e.g. "$myvar".

Macro-arguments

Macros can take arguments, if so specified in the corresponding macro-definition.

Arguments can be literals, variables, expressions containing either or both, or labels (but not expressions using labels).

Use of an argument passed to a macro is shown below:

arguments

Expressions

Not much to say here - everything that's allowed in Tcl, is allowed here, since the DSL is Tcl.

Beware that it's not possible to use expressions containing labels.

Result/output and possible follow-up

At this point, assembler-output is a (debug-)dump of the program-memory resulting from expanding all macros, top-down. Generated program-memory contents start at offset 0.

In its current form, the assembler cannot be used for anything useful - output is only generated to verify it would produce a valid ROM-image, if it could.

Simulation/validation

The next step could be to make a simple PC-side simulator/validator for generated ROM-images (after assembly) and RAM-images (after running a program): expected behaviour of code-snippets could be described in some form, and checked against the generated data.

Examples of checks could be:

  • verify the program does what it should do - check data-memory after execution against expected values
  • verify the program doesn't access data-memory outside an expected region
  • compare raw ROM-images against previously generated versions as part of regression-testing
  • validate actual execution-time against expected values in terms of CPU-cycles

Such a tool is IMHO pretty much necessary before attempting to run programs on actual hardware - nobody needs assembler- or user-bugs at that point.

Higher-level language

Although assembly is fine for smaller programs and proof-of-concepts, I would like to make a higher-level language.

Given its limited code-space, combined with the extreme RISC nature of the CPU, it's unlikely that complex programs will ever be made for it.

This somehow puts a practical limit on the need for features in a higher-level language. For example, it probably makes little sense to implement complex types, classes, functional programming, memory-management, etc.

I played around with a small imperative language with either 4- or 8-bit fixed-type variables, simple looping-constructs, conditionals, and function-calls.

An interesting challenge is to simulate a call-stack. The CPU has nothing like indirect addressing or computed addresses. Something resembling a function-call mechanism with data-passing back and forth is possible, and it should be fun to see something come alive that now only exists on paper.

Oh well, we'll see.

Examples: compound instructions

As mentioned before, compound pseudo-instructions can be defined in terms of IBC.

I made a minimal instruction-set using macros, somehow resembling instructions on actual simple CPUs.

This instruction-set still operates on single bits at a time. This is inconvenient for any practical use. Therefore, it makes sense to implement a 4- or 8-bit instruction-set on top of these 1-bit instructions - it's just a matter of effort.

The current 1-bit set of pseudo-instructions is displayed below. An arrow indicates dependence, i.e. every instruction eventually depends on the canonical instruction IBC:

1-bit instructions

A few instructions "close to IBC" are displayed in grey. These instructions only make use of IBC:

    # Invert a bit.

    macro not1 reg {
    
                ibc1    $reg    done
        : done
    
    }



    # Set bit.

    macro set1 reg {
    
        : repeat
                ibc1    $reg    repeat
    }



    # Clear and unconditionally branch.

    macro cb1 reg branch {
    
        ibc1    $reg    $branch
        ibc1    $reg    $branch
    }

Apart from that, they are in no way special.

For each of the following examples, a sub-graph of instruction-dependency is given. The implemented compound-instruction is coloured green, IBC-instruction is coloured yellow, and direct dependencies of the implemented instruction in question are given in salmon.

salmon-coloured shirt

Ross agrees - salmon is definitely not pink!

B: unconditional Branch

b-instruction

    macro b1 branch {
    
        . tmp
    
        cb1     $tmp    $branch
    }

BC: Branch if bit is Clear

bc-instruction

    macro bc1 reg branch {
    
        not1    $reg 
        ibc1    $reg    $branch
    }

CLR: clear bit

clr-instruction

    macro clr1 reg {
    
                ibc1    $reg    done
                not1    $reg  
        : done
    }

AND: bitwise AND

and-instruction

    macro and1 reg mask {
    
                ibc1    $mask   x
                not1    $mask
                cb1     $reg    done
        : x
                not1    $mask
        : done
    }

OR: bitwise OR

or-instruction

    macro or1 reg mask {
    
                ibc1    $mask   x
                ibc1    $mask   done
        : x
                not1    $mask
                set1    $reg 
        : done
    }

MOV: move (copy) bit

mov-instruction

    macro mov1 from to {
    
                ibc1    $from   a
                cb1     $to     b
        : a
                set1    $to
        : b
                not1    $from
    }

BS: Branch if bit is Set

bs-instruction

    macro bs1 reg branch {
    
                bc1     $reg    done
                b1              $branch
        : done
    }

XOR: bitwise XOR

xor-instruction

    macro xor1 reg mask {
    
        . tmp1
        . tmp2
    
        mov1    $reg    $tmp1
        not1    $tmp1           ;#  tmp1 = ~reg
        and1    $tmp1   $mask   ;#  tmp1 = ~reg & mask
    
        mov1    $mask   $tmp2
        not1    $tmp2           ;#  tmp2 = ~mask
        and1    $reg    $tmp2   ;#  reg := reg & ~mask
    
        or1     $reg    $tmp1   ;#  reg := ( reg & ~mask ) | ( ~reg & mask )
    }

ADD: full adder

add-instruction

    macro add1 ci a b sum co {
    
        . tmp1
        . tmp2
    
        # Calculate sum = a ^ b ^ ci
    
        mov1    $a      $tmp1
        xor1    $tmp1   $b      ;#  tmp1 = a ^ b
        mov1    $tmp1   $sum
        xor1    $sum    $ci     ;#  sum = a ^ b ^ ci
    
        # Calculate carry-out = ( a & b ) | ( ci & ( a ^ b ) )
    
        mov1    $a      $tmp2
        and1    $tmp2   $b      ;#  tmp2 = a & b
        mov1    $ci     $co
        and1    $co     $tmp1   ;#  co = ci & ( a ^ b )
        or1     $co     $tmp2   ;#  co = ( a & b ) | ( ci & ( a ^ b ) )
    }

Assembling an example

An example code-snippet is given below:

    source 1bit.asm
    
    main {
    
        . ci
        . a
        . b
        . sum
        . co
    
        add1  $ci $a $b $sum $co  
    }

As can be seen, a single full-adder is instantiated.

All the 1-bit pseudo-instructions implemented in the previous examples are bundled up and included through file "1bit.asm". Macro-names are all suffixed "1" (e.g. "add" becomes "add1"), to indicate 1-bit width.

Assembling goes as follows:

    $ ./mas.tcl in.asm > out.lst

(Debug-)output looks like this (after splitting text into multiple columns):

    0000: 0001 0003  |  0010: 0002 0013  |  0020: 0003 0023  |  0030: 0008 0033  |  0040: 0006 0042  |  0050: 0004 0052
    0001: 0005 0004  |  0011: 0008 0014  |  0021: 0003 0023  |  0031: 0008 0033  |  0041: 0006 0041  |  0051: 0005 0052
    0002: 0005 0004  |  0012: 0008 0014  |  0022: 0003 0022  |  0032: 0008 0032  |  0042: 0001 0043  |  0052: 0006 0054
    0003: 0005 0003  |  0013: 0008 0013  |  0023: 0005 0024  |  0033: 0000 0034  |  0043: 0002 0047  |  0053: 0006 0056
    0004: 0001 0005  |  0014: 0002 0015  |  0024: 0003 0027  |  0034: 0008 0035  |  0044: 0002 0045  |  0054: 0006 0055
    0005: 0005 0008  |  0015: 0008 0016  |  0025: 0007 0028  |  0035: 0008 0039  |  0045: 0006 0048  |  0055: 0004 0055
    0006: 0007 0009  |  0016: 0008 001a  |  0026: 0007 0028  |  0036: 0008 0037  |  0046: 0006 0048  |  
    0007: 0007 0009  |  0017: 0008 0018  |  0027: 0007 0027  |  0037: 0003 003a  |  0047: 0002 0048  |  
    0008: 0007 0008  |  0018: 0005 001b  |  0028: 0003 0029  |  0038: 0003 003a  |  0048: 0000 004b  |  
    0009: 0005 000a  |  0019: 0005 001b  |  0029: 0007 002a  |  0039: 0008 003a  |  0049: 0004 004c  |  
    000a: 0007 000b  |  001a: 0008 001b  |  002a: 0000 002e  |  003a: 0007 003c  |  004a: 0004 004c  |  
    000b: 0002 000f  |  001b: 0007 001d  |  002b: 0000 002c  |  003b: 0007 003e  |  004b: 0004 004b  |  
    000c: 0002 000d  |  001c: 0007 001f  |  002c: 0007 002f  |  003c: 0007 003d  |  004c: 0000 004d  |  
    000d: 0007 0010  |  001d: 0007 001e  |  002d: 0007 002f  |  003d: 0003 003d  |  004d: 0005 0051  |  
    000e: 0007 0010  |  001e: 0005 001e  |  002e: 0000 002f  |  003e: 0001 0041  |  004e: 0005 004f  |  
    000f: 0002 0010  |  001f: 0005 0022  |  002f: 0000 0032  |  003f: 0006 0042  |  004f: 0004 0052  |  

Note that the amount of generated code is huge. This is only a single full adder!!!

I didn't pay particular attention to code-size for the bigger compound instructions (such as "XOR" and "ADD"), but nevertheless, 64k program-words may not be too much of a luxury...

Source

For reference, the assembler's own source-code is pasted below:

    #!/usr/bin/env tclsh
    
    
    
    foreach op { + - * / & | << >> < <= > >= && || } { proc $op { a b } [ list expr \$a $op \$b ] }
    
    proc ! x { expr { ! $x } }
    
    proc unless { cond script } { if { ! $cond } { uplevel $script } }
    
    proc die msg { puts "FATAL: $msg"; exit 1 }
    
    proc llast li { lindex $li end }
    
    proc lleader li { lrange $li 0 end-1 }
    
    
    
    # "With-for-lists": execute a script operating on a list-item. 
    #
    # The item is available as "objvar", and can be changed by the script.
    # Upon succesful execution of "script", the item will be replaced by the
    # possibly changed item.
    #
    # Return-value is the script's return-code.
    
    proc lwith { objvar listvar index script } {
    
        upvar $objvar  obj
        upvar $listvar li
    
        set obj [ lindex $li $index ]
    
        try        { set ret [ uplevel $script ]
        } on ok {} { lset li $index $obj 
        }
    
        return $ret
    }
    
    
    
    # Generates a proc, replacing tags in the given proc-body. 
    #
    # (This is probably more convenient than use of "list" or using escapes for longer proc-bodies.)
    #
    # Example:   makeproc myproc { a b } { puts $a<SEP>$b } { <SEP> *** }
    #               -->   myproc { a b } { puts $a***$b   }
    
    proc makeproc { name arglist body replace } {
    
        set body [ string map $replace $body ]
    
        proc $name $arglist $body
    }
    
    
    
    # "pword": operations on program-words (program-mem items)
    #
    # A program-word is a 32-bit entity, consisting of:
    #   - a data-address "daddr" (bits 31..16)
    #   - a branch-address "baddr" (bits 15..0)
    #
    # Semantics of these fields are described at the definition of the only instruction, "ibc".
    
    namespace eval pword {
    
        namespace export *
        namespace ensemble create
    
        proc daddr w { >> [ & $w 0xffff0000 ] 16 }
    
        proc baddr w {      & $w 0x0000ffff      }
    
        proc create { daddr baddr }  { | [ << $daddr 16 ] $baddr }
    }
    
    
    
    # "pmem": program-memory singleton and operations thereon
    #
    # Program-memory consists of 32-bit program-word ("pword") entries. 
    
    namespace eval pmem {
    
        namespace export *
        namespace ensemble create
    
        namespace eval our { variable pmem {} }
    
    
    
        proc dump {} { 
    
            set offs 0
    
            foreach w $our::pmem {
    
                puts  [ format "%04x: %04x %04x"  $offs  [ pword daddr $w ]  [ pword baddr $w ] ]
    
                incr offs
            }
        }
    
    
    
        proc set-baddr { offs baddr } {
    
            set daddr [ pword daddr [ lindex $our::pmem $offs ] ]
    
            lset our::pmem $offs [ pword create $daddr $baddr ]
        }
    
    
    
        proc here {} { 
    
            return [ llength $our::pmem ]
        }
    
    
    
        proc append { daddr baddr } { 
    
            lappend  our::pmem  [ pword create $daddr $baddr ] 
        }
    }
    
    
    
    # "label" - placeholder for program-memory address
    #
    # An instruction or macro-instance can refer to a label instead of an address-literal as 
    # branch-target. Each such reference is resolved to the actual program-address corresponding 
    # to the label.
    #
    # A label can either be "qualified" or "unqualified": qualified labels contain, apart from the
    # label-name, the macro nesting-level in which they were first defined - unqualified labels do not.
    #
    # Label-qualification is necessary to be able to reuse label-names across macros, which is extremely
    # convenient.
    #
    # label-layout:   [         $name ]   (unqualified), or
    #                 [ $level, $name ]   (qualified)
    #
    # (This namespace also contains some functionality to distinguish labels from literals.)
    
    namespace eval label {
    
        namespace export *
        namespace ensemble create
    
        proc is-literal x { && [ string is integer $x ] [ >= $x 0 ] }
    
        proc is-name x { ! [ is-literal [ lindex $x end ] ] }
    
        proc qualify label { lrange [ concat [ context top ] $label ] end-1 end }
    
        proc is-qualified label { expr { [ llength $label ] > 1 } }
    
    
    
        proc unpack { label "->" levelvar namevar } { 
            
            upvar  $levelvar level  $namevar name
    
            set label [ qualify $label ]
    
            set level [ lindex $label 0 ]
            set name  [ lindex $label 1 ]
        }
    }
    
    
    
    # "context" - macro-frame, containing housekeeping for inner-most (current) and upper macro nesting-levels
    #
    # Macros can be nested. That is, a macro-definition can contain macro-instances. 
    #
    # All code is contained inside some macro-body. There is no "global" scope, but instead a top-level macro-instance 
    # called "main", which is the only macro explicitly instantiated by the assembler.
    #
    # Each (nested) macro-instance corresponds to a certain context, storing information about that instance.
    # Contexts of nested macro-instances form a stack.
    #
    # Each context contains:
    #
    #   - macro-name, mainly for debugging
    #   - data-address of the next declared macro-local variable (variable-offset, or "varoffs")
    #   - name ("name") and program-memory ("addr") corresponding to labels/labels defined in this macro
    #   - for each such label, all instriction-addresses where the label is referenced ("reflocs"), to be resolved 
    #     later (see comment with "resolve-refs" hereafter)
    #
    # Macros can take arguments if so specified in the corresponding macro-definition. Literals, labels and arbitrary 
    # expressions not containing labels can be used as arguments.
    #
    # A typical use of arguments is to have macro-instances work with data-variables declared in upper macro-instances.
    
    namespace eval context {
    
        namespace export *
        namespace ensemble create
    
    
    
        # Layout:
        #
        #   "stack": 
        #   [
        #       "name"    : $name,
        #       "varoffs" : $varoffs,
        #       "labels"  : 
        #       {
        #           "name" : 
        #           {
        #               "addr"    : $addr,
        #               "reflocs" : [ ... ]
        #           }
        #       }
        #   ]
    
        namespace eval our { variable stack {} }
    
    
    
        proc nesting {} { llength $our::stack }
    
        proc top {} { - [ nesting ] 1 }
    
        proc at level { lindex $our::stack $level }
    
        proc from { level "get" propname } { dict get [ context at $level ] $propname }
    
        proc varoffs {} { expr { [ nesting ] ? [ from [ top ] get varoffs ] : 0 } }
    
        proc new { "with" "name" name "with" "vars" "at" varoffs } { dict create  name $name  varoffs $varoffs  labels {} }
    
        proc enter { "new" "with" "name" name } { lappend our::stack [ new with name $name with vars at [ varoffs ] ] }
    
        proc leave {} { set our::stack [ lleader $our::stack ] }
    
        proc new-label {} { dict create  addr {}  reflocs {} }
    
        proc with-context { objname "at" level script } { uplevel [ list lwith $objname our::stack $level $script ] }
    
        proc incr-varoffs {} { with-context c at [ top ] { dict incr c varoffs } }
    
    
    
        proc with-label { label script } { 
            
            label unpack $label -> level name
            
            uplevel  [ list with-context c at $level  [ list dict with c labels $name $script ] ] 
        }
    
    
    
        proc label-exists label { label unpack $label -> level name; dict exists [ from $level get labels ] $name }
    
    
    
        proc touch-label label {
    
            unless [ label-exists $label ] {
    
                label unpack $label -> level name
    
                with-context c at $level {
    
                    dict set c labels $name [ new-label ]
                }
            }
        }
    
    
    
        proc set-labeladdr { label addr } { touch-label $label; with-label $label [ list set addr $addr ] } 
    
        proc add-refloc { label paddr } { touch-label $label; with-label $label [ list lappend reflocs $paddr ] }
    
        proc get-labeladdr label { with-label $label { set addr } }
    
    
    
        # Resolve label-referenes.
        #
        # Labels (defined using a colon, e.g. ": branch_here") can be forward-referenced.
        #
        # Labels are defined using a colon, e.g. ": mylabel".
        #
        # Since the address corresponding to a label may not be known before it is used, all (forward) 
        # references to that label must be replaced with its actual address at some point.
        #
        # By definition, the address corresponding to each label defined in a macro-instance is known 
        # when that instance goes out of scope (since all code corresponding to the macro-instance
        # and all nested instances has been emitted at that point, and all code-locations are thus known).
        #
        # Therefore, when a macro-instance goes out of scope, all references to its labels can be resolved.
    
        proc resolve-refs {} {
    
            set labels [ from [ top ] get labels ]
    
            dict for { name props } $labels {
    
                set baddr [ dict get $props addr ]
    
                foreach refloc [ dict get $props reflocs ] {
    
                    pmem set-baddr $refloc $baddr
                }
            }
        }
    }
    
    
    
    # Canonical instruction: Invert & Branch if Clear.
    #
    # Semantics are roughly as follows: invert the bit at data-address or label "data", and
    # if after this inversion the bit is clear/zero, transfer execution to program-address or label 
    # "branch", else resume execution at the next program-instruction.
    #
    # (Execution is done by actual hardware or emulator - not shown here.)
    #
    # The "data"-argument can be a numeric literal, label, or expression containing either/both.
    #
    # The "branch"-argument can be a numeric literal, expression containing literals, or label,
    # but not an expression containing labels. 
    
    proc ibc1 { data branch } {
    
        if [ label is-name $data ] {
    
            # Data-references are resolved right here (and must be resolvable).
    
            set data [ context get-labeladdr $data ]
        }
    
    
        if [ label is-name $branch ] { 
    
            # Branch-references are only resolved when leaving the current context.
    
            context add-refloc $branch [ pmem here ]
    
            set branch 0xffff
        }
    
    
        pmem append $data $branch
    }
    
    
    
    # Label-definition.
    
    proc : labelname { context set-labeladdr $labelname [ pmem here ] }
    
    
    
    # Variable-declaration.
    #
    # When we accept the fact that (macro-local) variables need to be declared before they are referenced, 
    # it becomes possible to substitute variable-addresses for references on-the-fly (their address is
    # already known at the point of declaration). 
    #
    # Therefore, variables don't need to be resolved to data-addresses later on.
    #
    # For each declared variable, a Tcl-variable will be created at the stack-level of the macro-body.
    # Macro-variables can then be refered to in the normal Tcl way (i.e. "$my_var").
    
    proc . varname { 
    
        set addr [ context varoffs ]
        context incr-varoffs
    
        uplevel set $varname $addr
    }
    
    
    
    # Macro-definition: use this to define compound instructions. 
    #
    # Defining a macro this way generates a generator-proc with the same name as the macro.
    #
    # This proc basically does the following, in this order:
    #
    #   1) fully qualify all labels passed to it in arguments, while still in the parent-macro's context/nesting-level
    #   2) create and enter new context
    #   3) emit the macro-body (which may itself instantiate nested macros)
    #   4) resolve pending references to labels defined in this macro
    #   5) destroy the created context, i.e. exit the macro-instance
    #
    # (Last argument in "args" is the macro-body.)
    
    proc macro { name args } {
    
        set script   [ llast   $args ]
        set argnames [ lleader $args ]
    
        makeproc $name $argnames {
    
            # Before entering new context, qualify all label-type arguments:
            #
            # unqualified label-type formal parameters get re-assigned with their fully qualified form 
            # (using the nesting-level of the parent-context to qualify them), while address-literal 
            # parameters are left untouched.
            #
            # (There is probably interp-magic for this, but we use the formal parameter-list and 
            # string-substitution instead.)
    
            foreach _argname [ list <ARGNAMES> ] {
    
                set _argval [ set $_argname ]
    
                if [ label is-name $_argval ] {
    
                    context touch-label $_argval   ;# only for forward label-references
    
                    set addr [ context get-labeladdr $_argval ]
    
                    # If this is a data-label (i.e. address is known already), fill it in, else qualify the label.
    
                    set $_argname [ expr { $addr eq ""  ?  [ label qualify $_argval ]  :  $addr } ]
                }
            }
    
            # Enter new context and expand macro-contents.
    
            context enter new with name <NAME>
    
            <SCRIPT>
    
            # Addresses of all encountered labels in this context are known at this point, so let's resolve them before leaving.
    
            context resolve-refs
    
            context leave
    
        }  [ list <SCRIPT>    $script    \
                  <ARGNAMES>  $argnames  \
                  <NAME>      $name      ]
    } 
    
    
    
    ########################################################################################################################
    
    
    
    unless  [ llength $argv ]  { die "need argument <infile>" }
    lassign $argv infile
    
    
    
    # (All code to be assembled should occur within a "main"-block in order to be assembled.)
    
    proc main block { macro code_in_main $block }
    
    source $infile
    
    code_in_main
    
    
    
    pmem dump

For reference: graph-generation

Graphs in this text were made using Graphviz' "Dot"-tool, using something like this (for the complete graph at the top):

    digraph G {
    
        node [ fontname = Helvetica ]
    
        NOT -> IBC
        SET -> IBC
        CLR -> IBC, NOT
        CB  -> IBC
        B   -> CB
        BC  -> NOT, IBC
        BS  -> BC, B
        MOV -> IBC, CB, SET, NOT
        AND -> IBC, NOT, CB
        OR  -> IBC, NOT, SET
        XOR -> MOV, NOT, AND, OR
        ADD -> MOV, XOR, AND, OR
    
        IBC          [ fillcolor = yellow,    style = filled ]
        NOT, SET, CB [ fillcolor = "#e0e0e0", style = filled ]
    }

and processed as follows:

    dot -Tpng in.dot -o out.png

That's all!