Merge branch 'master' of dim13.org:j1

author: Dimitri Sokolyuk <demon@dim13.org> 2019-03-01 17:14:59 +0100
committer: Dimitri Sokolyuk <demon@dim13.org> 2019-03-01 17:14:59 +0100
commit: c0165d167d7cb40d80028bcf7a4a6b160b5a7e83 (patch)
tree: 2864f6706b18f1b7e9b91c58ce2f3f6db1681cff
parent: e4b7bcd0903b4d2bcfe305e4e78415e32a3a32b5 (diff)
parent: 7e6d8cad618d9b3b4d84b035067d2d672e02f573 (diff)
1 files changed, 805 insertions, 0 deletions
diff --git a/doc/tcjassem.txt b/doc/tcjassem.txt
new file mode 100644
index 0000000..97ed164
--- /dev/null
+++ b/doc/tcjassem.txt
@@ -0,0 +1,805 @@
+                         B.Y.O.ASSEMBLER
+                              -or-
+          Build Your Own (Cross-) Assembler....in Forth
+
+                        by Brad Rodriguez
+
+
+ A. INTRODUCTION
+
+    In a previous issue of this journal I described how to
+    "bootstrap" yourself into a new processor, with a simple
+    debug monitor.  But how do you write code for this new CPU,
+    when you can't find or can't afford an assembler?  Build
+    your own!
+
+    Forth is an ideal language for this.  I've written cross-
+    assemblers in as little as two hours (for the TMS320, over a
+    long lunch break).  Two days is perhaps more common; and one
+    processor (the Zilog Super8) took me five days.  But when
+    you have more time than money, this is a bargain.
+
+    In part 1 of this article I will describe the basic
+    principles of Forth-style assemblers -- structured,
+    single-pass, postfix.  Much of this will apply to any
+    processor, and these concepts are in almost every Forth
+    assembler.
+
+    In part 2 I will examine an assembler for a specific CPU:
+    the Motorola 6809.  This assembler is simple but not
+    trivial, occupying 15 screens of source code.  Among other
+    things, it shows how to handle instructions with multiple
+    modes (in this case, addressing modes).  By studying this
+    example, you can figure out how to handle the peculiarities
+    of your own CPU.
+
+ B. WHY USE FORTH?
+
+    I believe that Forth is the easiest language in which to
+    write assemblers.
+
+    First and foremost, Forth has a "text interpreter" designed
+    to look up text strings and perform some related action.
+    Turning text strings into bytes is exactly what is needed to
+    compile assembler mnemonics!  Operands and addressing modes
+    can also be handled as Forth "words."
+
+    Forth also includes "defining words," which create large
+    sets of words with a common action.  This feature is very
+    useful when defining assembler mnemonics.
+
+    Since every Forth word is always available, Forth's
+    arithmetic and logical functions can be used within the
+    assembler environment to perform address and operand
+    arithmetic.
+
+    Finally, since the assembler is entirely implemented in
+    Forth words, Forth's "colon definitions" provide a
+    rudimentary macro facility, with no extra effort.
+
+ C. THE SIMPLEST CASE: ASSEMBLING A NOP
+
+    To understand how Forth translates mnemonics to machine
+    code, consider the simplest case: the NOP instruction (12
+    hex on the 6809).
+
+    A conventional assembler, on encountering a NOP in the
+    opcode field, must append a 12H byte to the output file and
+    advance the location counter by 1.  Operands and comments
+    are ignored.  (I will ignore labels for the time being.)
+
+    In Forth, the memory-resident dictionary is usually the
+    output "file."  So, make NOP a Forth word, and give it an
+    action, namely, "append 12H to the dictionary and advance
+    the dictionary pointer."
+
+                   HEX
+                   : NOP,   12 C, ;
+
+    Assembler opcodes are often given Forth names which include
+    a trailing comma, as shown above.  This is because many
+    Forth words -- such as AND XOR and OR -- conflict with
+    assembler mnemonics.  The simplest solution is to change the
+    assembler mnemonics slightly, usually with a trailing comma.
+    (This comma is a Forth convention, indicating that something
+    is appended to the dictionary.)
+
+ D. THE CLASS OF "INHERENT" OPCODES
+
+    Most processors have many instructions, like NOP, which
+    require no operands.  All of these could be defined as Forth
+    colon definitions, but this duplicates code, and wastes a
+    lot of space.  It's much more efficient to use Forth's
+    "defining word" mechanism to give all of these words a
+    common action.  In object-oriented parlance, this builds
+    "instances" of a single "class."
+
+    This is done with Forth's CREATE and DOES>.  (In fig-Forth,
+    as used in the 6809 assembler, the words are <BUILDS and
+    DOES>.)
+
+         : INHERENT          ( Defines the name of the class)
+             CREATE          ( this will create an instance)
+                 C,          ( store the parameter for each
+                                                       instance)
+             DOES>           ( this is the class' common action)
+                 C@          ( get each instance's parameter)
+                 C,          ( the assembly action, as above)
+             ;               ( End of definition)
+
+         HEX
+         12  INHERENT NOP,   ( Defines an instance NOP, of class
+                                INHERENT, with parameter 12H.)
+         3A  INHERENT ABX,   ( Another instance - the ABX instr)
+         3D  INHERENT MUL,   ( Another instance - the MUL instr)
+
+    In this case, the parameter (which is specific to each
+    instance) is simply the opcode to be assembled for each
+    instruction.
+
+    This technique provides a substantial memory savings, with
+    almost no speed penalty.  But the real advantage becomes
+    evident when complex instruction actions -- such as required
+    for parameters, or addressing modes -- are involved.
+
+ E. HANDLING OPERANDS
+
+    Most assembler opcodes, it is true, require one or more
+    operands.  As part of the action for these instructions,
+    Forth routines could be written to parse text from the input
+    stream, and interpret this text as operand fields.  But why?
+    The Forth envrionment already provides a parse-and-interpret
+    mechanism!
+
+    So, Forth will be used to parse operands.  Numbers are
+    parsed normally (in any base!), and equates can be Forth
+    CONSTANTs.  But, since the operands determine how the opcode
+    is handled, they will be processed first.  The results of
+    operand parsing will be left on Forth's stack, to be picked
+    up by the opcode word.  This leads to Forth's unique postfix
+    format for assemblers: operands, followed by opcode.
+
+    Take, for example, the 6809's ORCC instruction, which takes
+    a single numeric parameter:
+
+                   HEX
+                   : ORCC,   1A C,  C, ;
+
+    The exact sequence of actions for ORCC, is:  1) put 1A hex
+    on the parameter stack; 2) append the top stack item (the
+    1A) to the dictionary, and drop it from the stack; 3) append
+    the new top stack item (the operand) to the dictionary, and
+    drop it from the stack.  It is assumed that a numeric value
+    was already on the stack, for the second C, to use.  This
+    numeric value is the result of the operand parsing, which,
+    in this case, is simply the parsing of a single integer
+    value:
+
+                   HEX
+                   0F ORCC,
+
+    The advantage here is that all of Forth's power to operate
+    on stack values, via both built-in operators and
+    newly-defined functions, can be employed to create and
+    modify operands.  For example:
+
+         HEX
+         01 CONSTANT CY-FLAG       ( a "named" numeric value)
+         02 CONSTANT OV-FLAG
+         04 CONSTANT Z-FLAG
+           ...
+         CY-FLAG Z-FLAG +  ORCC,   ( add 1 and 4 to get operand)
+
+    The extension of operand-passing to the defining words
+    technique is straightforward.
+
+
+ F. HANDLING ADDRESSING MODES
+
+    Rarely can an operand, or an opcode, be used unmodified.
+    Most of the instructions in a modern processor can take
+    multiple forms, depending on the programmer's choice of
+    addressing mode.
+
+    Forth assemblers have attacked this problem in a number of
+    ways, depending on the requirements of the specific
+    processor.  All of these techniques remain true to the Forth
+    methodology: the addressing mode operators are implemented
+    as Forth words.  When these words are executed, they alter
+    the assembly of the current instruction.
+
+    1. Leaving additional parameters on the stack.
+         This is most useful when an addressing mode must always
+         be specified.  The addressing-mode word leaves some
+         constant value on the stack, to be picked up by the
+         opcode word.  Sometimes this value can be a "magic
+         number" which can be added to the opcode to modify it
+         for the different mode.  When this is not feasible, the
+         addressing-mode value can activate a CASE statement
+         within the opcode, to select one of several actions.
+         In this latter case, instructions of different lengths,
+         possibly with different operands, can be assembled
+         depending on the addressing mode.
+
+    2. Setting flags or values in fixed variables.
+         This is most useful when the addressing mode is
+         optional.  Without knowing whether an addressing mode
+         was specified, you don't know if the value on the stack
+         is a "magic number" or just an operand value.  The
+         solution: have the addressing mode put its magic number
+         in a predefined variable (often called MODE).  This
+         variable is initialized to a default value, and reset
+         to this default value after each instruction is
+         assembled.  Thus, this variable can be tested to see if
+         an addressing mode was specified (overriding the
+         default).
+
+    3. Modifying parameter values already on the stack.
+         It is occasionally possible to implement addressing
+         mode words that work by modifying an operand value.
+         This is rarely seen.
+
+    All three of these techniques are used, to some extent,
+    within the 6809 assembler.
+
+    For most processors, register names can simply be Forth
+    CONSTANTs, which leave a value on the stack.  For some
+    processors it is useful to have register names specify
+    "register addressing mode" as well.  This is easily done by
+    defining register names with a new defining word, whose
+    run-time action sets the addressing mode (either on the
+    stack or in a MODE variable).
+
+    Some processors allow multiple addressing modes in a single
+    instruction.  If the number of addressing modes is fixed by
+    the instruction, they can be left on the stack.  If the
+    number of addressing modes is variable, and it is desired to
+    know how many have been specified, multiple MODE variables
+    can be used for the first, second, etc.  (In one case -- the
+    Super8 -- I had to keep track of not only how many
+    addressing modes were specified, but also where among the
+    operands they were specified.  I did this by saving the
+    stack position along with each addressing mode.)
+
+    Consider the 6809 ADD instruction.  To simplify things,
+    ignore the Indexed addressing modes for now, and just
+    consider the remaining three addressing modes: Immediate,
+    Direct, and Extended.  These will be specified as follows:
+
+                         source code     assembles as
+         Immediate:     number # ADD,       8B nn
+         Direct:        address <> ADD,     9B aa
+         Extended:      address ADD,        BB aa aa
+
+    Since Extended has no addressing mode operator, the
+    mode-variable approach seems to be indicated.  The Forth
+    words # and <> will set MODE.
+
+    Observe the regularity in the 6809 opcodes.  If the
+    Immediate opcode is the "base" value, then the Direct opcode
+    is this value plus 10 hex, and the Extended opcode is this
+    value plus 30 hex.  (And the Indexed opcode, incidentally,
+    is this value plus 20 hex.)  This applies uniformly across
+    almost all 6809 instructions which use these addressing
+    modes.  (The exceptions are those opcodes whose Direct
+    opcodes are of the form 0x hex.)
+
+    Regularities like this are made to be exploited!  This is a
+    general rule for writing assemblers: find or make an opcode
+    chart, and look for regularities -- especially those
+    applying to addressing modes or other instruction modifiers
+    (like condition codes).
+
+    In this case, appropriate MODE values are suggested:
+
+              VARIABLE MODE  HEX
+              : #        0 MODE ! ;
+              : <>      10 MODE ! ;
+              : RESET   30 MODE ! ;
+
+    The default MODE value is 30 hex (for Extended mode), so a
+    Forth word RESET is added to restore this value.  RESET will
+    be used after every instruction is assembled.
+
+    The ADD, routine can now be written.  Let's go ahead and
+    write it using a defining word:
+
+              HEX
+              : GENERAL-OP   \ base-opcode --
+                   CREATE C,
+                   DOES>     \ operand --
+                   C@             \ get the base opcode
+                   MODE @ +       \ add the "magic number"
+                   C,             \ assemble the opcode
+                   MODE @ CASE    
+                      0 OF C, ENDOF    \ byte operand
+                     10 OF C, ENDOF    \ byte operand
+                     30 OF , ENDOF     \ word operand
+                   ENDCASE 
+                   RESET ;
+
+              8B GENERAL-OP ADD,
+
+    Each "instance" of GENERAL-OP will have a different base
+    opcode.  When ADD, executes, it will fetch this base opcode,
+    add the MODE value to it, and assemble that byte.  Then it
+    will take the operand which was passed on the stack, and
+    assemble it either as a byte or word operand, depending on
+    the selected mode.  Finally, it will reset MODE.
+
+    Note that all of the code is now defined to create
+    instructions in the same family as ADD:
+
+              HEX 89 GENERAL-OP ADC,
+                  84 GENERAL-OP AND,
+                  85 GENERAL-OP BIT,
+                        etc.
+
+    The memory savings from defining words really become evident
+    now.  Each new opcode word executes the lengthy bit of DOES>
+    code given above; but each word is only a one-byte Forth
+    definition (plus header and code field, of course).
+
+    This is not the actual code from the 6809 assembler -- there
+    are additional special cases which need to be handled.  But
+    it demonstrates that, by storing enough mode information,
+    and by making liberal use of CASE statements, the most
+    ludicrous instruction sets can be assembled.
+
+
+ G. HANDLING CONTROL STRUCTURES
+
+    The virtues of structured programming, have long been sung
+    -- and there are countless "structured assembly" macro
+    packages for conventional assemblers.  But Forth assemblers
+    favor label-free, structured assembly code for a pragmatic
+    reason: in Forth, it's simpler to create assembler
+    structures than labels!
+
+    The structures commonly included in Forth assemblers are
+    intended to resemble the programming structures of
+    high-level Forth.  (Again, the assembler structures are
+    usually distinguished by a trailing comma.)
+
+ 1. BEGIN, ... UNTIL,
+
+    The BEGIN, ... UNTIL, construct is the simplest assembler
+    structure to understand.  The assembler code is to loop back
+    to the BEGIN point, until some condition is satisfied.  The
+    Forth assembler syntax is
+
+                   BEGIN,    more code     cc UNTIL,
+
+    where 'cc' is a condition code, which has presumably been
+    defined -- either as an operand or an addressing mode -- for
+    the jump instructions.
+
+    Obviously, the UNTIL, will assemble a conditional jump.  The
+    sense of the jump must be "inverted" so that if 'cc' is
+    satisfied, the jump does NOT take place, but instead the
+    code "falls through" the jump.  The conventional assembler
+    equivalent would be:
+
+                   xxx:  ...
+                         ...
+                         ...
+                         JR  ~cc,xxx
+
+    (where ~cc is the logical inverse of cc.)
+
+    Forth offers two aids to implementing BEGIN, and UNTIL,.
+    The word HERE will return the current location counter
+    value.  And values may be kept deep in the stack, with no
+    effect on Forth processing, then "elevated" when required.
+
+    So: BEGIN, will "remember" a location counter, by placing
+    its value on the stack.  UNTIL, will assemble a conditional
+    jump to the "remembered" location.
+
+              : BEGIN, ( - a)      HERE ;
+              : UNTIL, ( a cc - )  NOTCC  JR, ;
+
+    This introduces the common Forth stack notation, to indicate
+    that BEGIN, leaves one value (an address) on the stack.
+    UNTIL, consumes two values (an address and a condition code)
+    from the stack, with the condition code on top.  It is
+    presumed that a word NOTCC has been defined, which will
+    convert a condition code to its logical inverse.  It is also
+    presumed that the opcode word JR, has been defined, which
+    will expect an address and a condition code as operands.
+    (JR, is a more general example than the branch instructions
+    used in the 6809 assembler.)
+
+    The use of the stack for storage of the loop address allows
+    BEGIN, ... UNTIL, constructs to be nested, as:
+
+         BEGIN,  ...  BEGIN, ... cc UNTIL,  ...  cc UNTIL,
+
+    The "inner" UNTIL, resolves the "inner" BEGIN, forming a
+    loop wholly contained within the outer BEGIN, ... UNTIL,
+    loop.
+
+ 2. BEGIN, ... AGAIN,
+
+    Forth commonly provides an "infinite loop" construct,
+    BEGIN ... AGAIN , which never exits.  For the sake of
+    completeness, this is usually implemented in the assembler
+    as well.
+
+    Obviously, this is implemented in the same manner as BEGIN,
+    ... UNTIL, except that the jump which is assembled by AGAIN,
+    is an unconditional jump.
+
+ 3. DO, ... LOOP,
+
+    Many processors offer some kind of looping instruction.
+    Since the 6809 does not, let's consider the Zilog Super8;
+    its Decrement-and-Jump-Non-Zero (DJNZ) instruction can use
+    any of 16 registers as the loop counter.  This can be
+    written in structured assembler:
+
+              DO,    more code   r LOOP,
+
+    where r is the register used as the loop counter.  Once
+    again, the intent is to make the assembler construct
+    resemble the high-level Forth construct.
+
+              : DO,   ( - a)      HERE ;
+              : LOOP, ( a r - )   DJNZ, ;
+
+    Some Forth assemblers go so far as to make DO, assemble a
+    load-immediate instruction for the loop counter -- but this
+    loses flexibility.  Sometimes the loop count isn't a
+    constant.  So I prefer the above definition of DO, .
+
+ 4. IF, ... THEN,
+
+    The IF, ... THEN, construct is the simplest forward-
+    referencing construct.  If a condition is satisfied, the
+    code within the IF,...THEN, is to be executed; otherwise,
+    control is transferred to the first instruction after THEN,.
+
+    (Note that Forth normally employs THEN, where other
+    languages use "endif."  You can have both in your
+    assembler.)
+
+    The Forth syntax is
+
+              cc IF,  ... ... ...  THEN,
+
+    for which the "conventional" equivalent is
+
+                   JP   ~cc,xxx
+                   ...
+                   ...
+                   ...
+              xxx:
+
+    Note that, once again, the condition code must be inverted
+    to produce the expected logical sense for IF, .
+
+    In a single pass assembler, the requisite forward jump
+    cannot be directly assembled, since the destination address
+    of the jump is not known when IF, is encountered.  This
+    problem is solved by causing IF, to assemble a "dummy" jump,
+    and stack the address of the jump's operand field.  Later,
+    the word THEN, (which will provide the destination address)
+    can remove this stacked address and "patch" the jump
+    instruction accordingly.
+
+           : IF, ( cc - a)   NOT 0 SWAP JP,  ( conditional jump
+                             HERE 2 - ;     with 2-byte operand)
+           : THEN, ( a)   HERE SWAP ! ;   ( store HERE at the
+                                              stacked address)
+
+    IF, inverts the condition code, assembles a conditional jump
+    to address zero, and then puts on the stack the address of
+    the jump address field.  (After JP, is assembled, the
+    location counter HERE points past the jump instruction, so
+    we need to subtract two to get the location of the address
+    field.)   THEN, will patch the current location into the
+    operand field of that jump.
+
+    If relative jumps are used, additional code must be added to
+    THEN, to calculate the relative offset.
+
+ 5. IF, ... ELSE, ... THEN,
+
+    A refinement of the IF,...THEN, construct allows code to be
+    executed if the condition is NOT satisfied.  The Forth
+    syntax is
+
+              cc IF,  ... ...  ELSE,  ... ...  THEN,
+
+    ELSE, has the expected meaning: if the first part of this
+    statement is not executed, then the second part is.
+
+    The assembler code necessary to create this construct is:
+
+                   JP   ~cc,xxx
+                   ...            ( the "if" code)
+                   ...
+                   JP   yyy
+              xxx: ...            ( the "else" code)
+                   ...
+              yyy:
+
+    ELSE, must modify the actions of IF, and THEN, as follows:
+    a) the forward jump from IF, must be patched to the start of
+    the "else" code ("xxx"); and b) the address supplied by
+    THEN, must be patched into the unconditional jump
+    instruction at the end of the "if" code ("JP yyy").  ELSE,
+    must also assemble the unconditional jump.  This is done
+    thus:
+
+          : ELSE ( a - a)   0 T JP,    ( unconditional jump)
+                         HERE 2 -      ( stack its address
+                                            for THEN, to patch)
+                         SWAP          ( get the patch address
+                                            of the IF, jump)
+                         HERE SWAP !   ( patch it to the current
+                                            location, i.e., the
+                         ;                  next instruction)
+
+    Note that the jump condition 'T' assembles a "jump always"
+    instruction.  The code from IF, and THEN, can be "re-used"
+    if the condition 'F' is defined as the condition-code
+    inverse of 'T':
+
+         : ELSE ( a - a)   F IF,  SWAP  THEN, ;
+
+    The SWAP of the stacked addresses reverses the patch order,
+    so that the THEN, inside ELSE, patches the original IF; and
+    the final THEN, patches the IF, inside ELSE,.  Graphically,
+    this becomes:
+
+         IF,(1)   ...   IF,(2)  THEN,(1)   ...   THEN,(2)
+                        \______________/
+                          inside ELSE,
+
+    IF,...THEN, and IF,...ELSE,...THEN, structures can be
+    nested.  This freedom of nesting also extends to mixtures of
+    these and BEGIN,...UNTIL, structures.
+
+ 6. BEGIN, ... WHILE, ... REPEAT,
+
+    The final, and most complex, assembler control structure is
+    the "while" loop in which the condition is tested at the
+    beginning of the loop, rather than at the end.
+
+    In Forth the accepted syntax for this structure is
+
+              BEGIN,  evaluate  cc WHILE,  loop code  REPEAT,
+
+    In practice, any code -- not just condition evaluations --
+    may be inserted between BEGIN, and WHILE,.
+
+    What needs to be assembled is this:  WHILE, will assemble a
+    conditional jump, on the inverse of cc, to the code
+    following the REPEAT,.  (If the condition code cc is
+    satisfied, we should "fall through" WHILE, to execute the
+    loop code.)  REPEAT, will assemble an unconditional jump
+    back to BEGIN.  Or, in terms of existing constructs:
+
+          BEGIN,(1)  ...  cc IF,(2)  ...  AGAIN,(1)  THEN,(2)
+
+    Once again, this can be implemented with existing words, by
+    means of a stack manipulation inside WHILE, to re-arrange
+    what jumps are patched by whom:
+
+              : WHILE, ( a cc - a a)   IF, SWAP ;
+              : REPEAT, ( a a - )   AGAIN,  THEN, ;
+
+    Again, nesting is freely permitted.
+
+
+ H. THE FORTH DEFINITION HEADER
+
+    In most applications, machine code created by a Forth
+    assembler will be put in a CODE word in the Forth
+    dictionary.  This requires giving it an identifying text
+    "name," and linking it into the dictionary list.
+
+    The Forth word CREATE performs these functions for the
+    programmer.  CREATE will parse a word from the input stream,
+    build a new entry in the dictionary with that name, and
+    adjust the dictionary pointer to the start of the
+    "definition field" for this word.
+
+    Standard Forth uses the word CODE to distinguish the start
+    of an assembler definition in the Forth dictionary.  In
+    addition to performing CREATE, the word CODE may set the
+    assembler environment (vocabulary), and may reset variables
+    (such as MODE) in the assembler.  Some Forths may also
+    require a "code address" field; this is set by CREATE in
+    some systems, while others expect CODE to do this.
+
+
+ I. SPECIAL CASES
+
+ 1. Resident vs. cross-compilation
+
+    Up to now, it has been assumed that the machine code is to
+    be assembled into the dictionary of the machine running the
+    assembler.
+
+    For cross-assembly and cross-compilation, code is usually
+    assembled for the "target" machine into a different area of
+    memory.  This area may or may not have its own dictionary
+    structure, but it is separate from the "host" machine's
+    dictionary.
+
+    The most common and straightforward solution is to provide
+    the host machine with a set of Forth operators to access the
+    "target" memory space.  These are made deliberately
+    analogous to the normal Forth memory and dictionary
+    operators, and are usually distinguished by the prefix "T".
+    The basic set of operators required is:
+
+              TDP            target dictionary pointer DP
+              THERE          analogous to HERE, returns TDP
+              TC,            target byte append C,
+              TC@            target byte fetch C@
+              TC!            target byte store C!
+              T@             target word fetch @
+              T!             target word store !
+
+    Sometimes, instead of using the "T" prefix, these words will
+    be given identical names but in a different Forth
+    vocabulary.  (The vocabulary structure in Forth allows
+    unambiguous use of the same word name in multiple contexts.)
+    The 6809 assembler in Part 2 assumes this.
+
+ 2. Compiling to disk
+
+    Assembler output can be directed to disk, rather than to
+    memory.  This, too, can be handled by defining a new set of
+    dictionary, fetch, and store operators.  They can be
+    distinguished with a different prefix (such as "T" again),
+    or put in a distinct vocabulary.
+
+    Note that the "patching" manipulations used in the
+    single-pass control structures require a randomly-
+    accessible output medium.  This is not a problem with disk,
+    although heavy use of control structures may result in some
+    inefficient disk access.
+
+ 3. Compiler Security
+
+    Some Forth implementations include a feature known as
+    "compiler security," which attempts to catch mismatches of
+    control structures.  For example, the structure
+
+              IF, ... cc UNTIL,
+
+    would leave the stack balanced (UNTIL, consumes the address
+    left by IF,), but would result in nonsense code.
+
+    The usual method for checking the match of control
+    structures is to require the "leading" control word to leave
+    a code value on the stack, and the "trailing" word to check
+    the stack for the correct value.  For example:
+
+              IF,  leaves a 1;
+              THEN,  checks for a 1;
+              ELSE,  checks for a 1 and leaves a 1;
+              BEGIN,  leaves a 2;
+              UNTIL,  checks for a 2;
+              AGAIN,  checks for a 2;
+              WHILE,  checks for a 2 and leaves a 3;
+              REPEAT,  checks for a 3.
+
+    This will detect most mismatches.  Additional checks may be
+    included for the stack imbalance caused by "unmatched"
+    control words.  (The 6809 assembler uses both of these error
+    checks.)
+
+    The cost of compiler security is the increased complexity of
+    the stack manipulations in such words as ELSE, and WHILE,.
+    Also, the programmer may wish to alter the order in which
+    control structures are resolved, by manually re-arranging
+    the stack; compiler security makes this more difficult.
+
+ 4. Labels
+
+    Even in the era of structured programming, some programmers
+    will insist on labels in their assembler code.
+
+    The principal problem with named labels in a Forth assembler
+    definition is that the labels themselves are Forth words.
+    They are compiled into the dictionary -- usually at an
+    inconvenient point, such as inside the machine code.  For
+    example:
+
+              CODE TEST  ...  machine code  ...
+                   HERE CONSTANT LABEL1
+                   ...  machine code  ...
+                   LABEL1 NZ JP,
+
+    will cause the dictionary header for LABEL1 -- text, links,
+    and all -- to be inserted in the middle of CODE.  Several
+    solutions have been proposed:
+
+      a) define labels only "outside" machine code.
+         Occasionally useful, but very restricted.
+
+      b) use some predefined storage locations (variables) to
+         provide "temporary," or local, labels.
+
+      c) use a separate dictionary space for the labels, e.g.,
+         as provided by the TRANSIENT scheme [3].
+
+      d) use a separate dictionary space for the machine code.
+         This is common practice for meta-compilation; most
+         Forth meta- compilers support labels with little
+         difficulty.
+
+ 5. Table Driven Assemblers
+
+    Most Forth assemblers can handle the profusion of addressing
+    modes and instruction opcodes by CASE statements and other
+    flow-of-control constructs.  These may be referred to as
+    "procedural" assemblers.
+
+    Some processors, notably the Motorola 68000, have
+    instruction and addressing sets so complex as to render the
+    decision trees immense.  In such cases, a more "table-
+    driven" approach may save substantial memory and processor
+    time.
+
+    (I avoid such processors.  Table driven assemblers are much
+    more complex to write.)
+
+ 6. Prefix Assemblers
+
+    Sometimes a prefix assembler is unavoidable.  (One example:
+    I recently translated many K of Super8 assembler code from
+    the Zilog assembler to a Forth assembler.)  There is a
+    programming "trick" which simulates a prefix assembler,
+    while using the assembler techniques described in this
+    article.
+
+    Basically, this trick is to "postpone" execution of the
+    opcode word, until after the operands have been evaluated.
+    How can the assembler determine when the operands are
+    finished?  Easy: when the next opcode word is encountered.
+
+    So, every opcode word is modified to a) save its own
+    execution address somewhere, and b) execute the "saved"
+    action of the previous opcode word.  For example:
+
+              ...  JP  operand    ADD  operands  ...
+
+    JP stores its execution address (and the address of its
+    "instance" parameters) in a variable somewhere.  Then, the
+    operands are evaluated.  ADD will fetch the information
+    saved by JP, and execute the run-time action of JP.  The JP
+    action will pick up whatever the operands left on the stack.
+    When the JP action returns, ADD will save its own execution
+    address and instance parameters, and the process continues.
+    (Of course, JP would have executed its previous opcode.)
+
+    This is confusing.  Special care must be taken for the first
+    and last opcodes in the assembler code.  If mode variables
+    are used, the problem of properly saving and restoring them
+    becomes nightmarish.  I leave this subject as an exercise
+    for the advanced student...or for an article of its own.
+
+ J. CONCLUSION
+
+    I've touched upon the common techniques used in Forth
+    assemblers.  Since I believe the second-best way to learn is
+    by example, in part 2 I will present the full code for the
+    6809 assembler.  Studying a working assembler may give you
+    hints on writing an assembler of your own.
+
+    The BEST way to learn is by doing!
+
+ K. REFERENCES
+
+ 1. Curley, Charles, Advancing Forth.  Unpublished manuscript
+    (1985).
+
+ 2. Wasson, Philip, "Transient Definitions," Forth Dimensions
+    III/6 (Mar-Apr 1982), p.171.
+
+ L. ADDITIONAL SOURCES
+
+ 1. Cassady, John J., "8080 Assembler," Forth Dimensions III/6
+    (Mar-Apr 1982), pp. 180-181.  Noteworthy in that the entire
+    assembler fits in less than 48 lines of code.
+
+ 2. Ragsdale, William F., "A FORTH Assembler for the 6502," Dr.
+    Dobb's Journal #59 (September 1981), pp. 12-24.  A simple
+    illustration of addressing modes.
+
+ 3. Duncan, Ray, "FORTH 8086 Assembler," Dr. Dobb's Journal #64
+    (February 1982), pp. 14-18 and 33-46.
+
+ 4. Perry, Michael A., "A 68000 Forth Assembler," Dr. Dobb's
+    Journal #83 (September 1983), pp. 28-42.
+
+ 5. Assemblers for the 8080, 8051, 6502, 68HC11, 8086, 80386,
+    68000, SC32, and Transputer can be downloaded from the Forth
+    Interest Group (FORTH) conference on GEnie.
+
+\ No newline at end of file
author	Dimitri Sokolyuk <demon@dim13.org>	2019-03-01 17:14:59 +0100
committer	Dimitri Sokolyuk <demon@dim13.org>	2019-03-01 17:14:59 +0100
commit	c0165d167d7cb40d80028bcf7a4a6b160b5a7e83 (patch)
tree	2864f6706b18f1b7e9b91c58ce2f3f6db1681cff
parent	e4b7bcd0903b4d2bcfe305e4e78415e32a3a32b5 (diff)
parent	7e6d8cad618d9b3b4d84b035067d2d672e02f573 (diff)