6. The Assembler This section will not attempt to teach you assembly language. Rather, it will explain the syntax you are expected to use in your source files, and document the features that are available to you in the assembler. 6.1 Number Format The assembler accepts decimal, hexadecimal; and binary numerical data. Hex numbers must be preceded by a "$" and binary numbers by a "%", thus the following four instructions are all equivalent: 1. LDA #100 2. LDA #$64 3. LDA #%1100100 4. LDA #%01100100 As implied, leading zeros are ignored. The "#" stands for "number or "data", and the effect of all four instructions is to load the accumulator with the decimal number 100. A number not preceded by a "#" is interpreted as an address: LDA 1000 LDA $3E8 LDA %111101000 are all ways to load the accumulator with the byte that resides in memory location $3E8. Use the number format that is appropriate for clarity. For example, the data table: DA $1 DA $A DA $64 DA #3E8 DA $2710 The above is much more mysterious than its decimal equivalent: DA 1 DA 10 DA 100 DA 1000 DA 10000 6.2 Source Code Format A line of source code typically looks like: LABEL OPCODE OPERAND ;COMMENT A line containing only a comment must begin with either a "*" or ";". Comments starting with a ";" will be tabbed to the comment field, while comment lines beginning with a "*" will begin in column 1. The assembler will accept an empty line the source and will treat it as a SKP 1 instruction, except the line number will be printed. The number of spaces separating the fields is not important, except for the EDITOR which expects a single space. The gist of the above is; it won't harm the assembly. However, the listing will look strange if there is more than one space separating the fields. The maximum allowable LABEL length is 13 characters, but more than 8 will produce messy assembly listings. A LABEL must begin with a character at least as large, in ASCII value, as the colon, and may not contain any character less, in ASCII value, than the number zero. The assembler examines only the first three characters of the OPCODE (with certain exceptions such as the SWEET 16 opcode POPD). For example, you can use PAGE instead of PAG (because of the exception, the fourth letter must not be a D, however). The assembler listing will truncate the opcode to seven letters. In addition, the listing will not look well with an opcode longer than four characters unless there is no operand. The maximum allowable combined OPERAND+COMMENT length is 64 characters. You will get an error message if this limit is exceeded. Also, a comment line by itself is limited to 64 characters. Same error message applies. 6.3 Expressions To make clear the syntax accepted and/or required by the assembler, we must define what is meant by an "expression". Expressions are built up from "primitive expressions" by use of arithmetic and logical operations. The primitive expressions are: 1. A label 2. A decimal number 3. A hexadecimal number (preceded by a "$"). 4. A binary number (preceded by "%"). 5. Any ASCII character either, preceded or enclosed by double or single quotes. 6. The character "*" which stands for the present address. All number formats accept 16-bit data and leading zeros are never required. In case 5,, the value of the primitive expression is the value of the ASCII character. The high bit will be on if the double quote (") is used, and off if the single quote (') is used. The assembler supports the four arithmetic operations: +, -, /, and *. It also supports the three logical operations: ! = Exclusive OR, . (period) = OR, and & = AND. Some examples of legal operations are: LABEL1-LABEL2 2*LABEL+$231 1234+%10111 K 0 LABEL&$7F *-2 LABEL.%10000000 Parentheses have another meaning and are not allowed in expressions. All arithmetic and logical operations are done from left to right (2+3*5 would assemble as 25 and not 17). Parentheses are normally used to change the order of evaluation in an expression. If the need arises to perform such an operation, partial "sums" can be collected in dummy labels and finally combined to obtain the desired effect. Using the above example where the answer was 25, and assuming the desired answer was 17: LABEL1 EQU 3*5 LABEL2 EQU 2+LABEL1 6.4 Immediate Data For those opcodes such as LDA, CMP, etc., which accept immediate data (numbers as opposed to the contents of addresses) the immediate mode is signaled by preceding the expression with a "#". An example is LDX #3 which would load the X register with the value 3 rather than the contents of address 3. In addition: #expression Produces the high byte of the expression. #expression Also produces the low byte of the expression (the 6502 does not accept 2-byte data). #/expression Produces the high byte of the expression and is an optional syntax. The recommended syntax is "<" and ">" for low and high byte respectively. Whatever syntax you use, be consistent, use it in every case so your reader won't think there is something special about the "one case" where you specify "<" for low byte. The ability of the assembler to evaluate expressions such as LAB1-LAB2-1 is very useful for the following type of code: COMPARE LDX #FOUND-DATA-1 LOOP CMP DATA,X BEQ FOUND DEX BPL LOOP JMP REJECT ;not found DATA HEX E3BC3498 FOUND RTS With this type of code, if you add or delete some of the "Data", then the X-index for the comparing loop is automatically adjusted. 6.5 Addressing Modes (6502 opcodes) The assembler accepts all of the 6502 opcodes with the standard mnemonics. It accepts BLT (Branch if Less Than) as an equivalent to BCC, and BGE (Branch if Greater than or Equal) as an equivalent to BCS. There are 12 addressing modes on the 6502. The appropriate MERLIN syntax for these are: 1. Implied OPCODE CLC 2. Accumulator OPCODE ROR 3. Immediate data OPCODE #expr ADC #$F8 CMP #'M' LDX #>L1-L2-1 4. Zero Page Address OPCODE exp ROL 6 5. Zero Page Indexed X OPCODE exp,X LDA $E0,X 6. Zero Page Indexed Y OPCODE exp,Y STX LAB,Y 7. Absolute Address OPCODE exp BIT $300 8. Absolute Indexed X OPCODE exp,X STA $4000,X 9. Absolute Indexed Y OPCODE exp,Y SBC LABL-1,Y 10. Indirect JMP (expr) JMP ($3F2) Indirect Preindexed X OPCODE (exp,X) LDA (6,X) 11. Indirect Postindexed Y OPCODE (exp),Y STA ($FE),Y There is no difference in syntax for zero page and absolute modes. The assembler automatically uses zero page mode when appropriate. In the indexed, indirect modes, only a zero page expression is allowed, and the assembler will give an error message if the "expression" does not evaluate to a zero page address. When an instruction uses the "accumulator mode" MERLIN does not require (or accept) an operand. Some assemblers perversely require you to put an "A" in the operand for this mode. The assembler will decide the legality of the addressing mode for any given opcode. MERLIN provides the ability to force non-zero page addressing. In order to do this you add "anything" to the end of the opcode except "D". As an example: LDA $10 ;assembles as zero page (2 bytes) LDA: $10 ;assembles as non-zero page (3 bytes) The use of a character that "prints" is encouraged. In addition, the use of the same character throughout will aid your readers. An appropriate comment is always in order. 6.6 Sweet 16 Opcodes The assembler accepts all Sweet 16 opcodes with the standard mnemonics. The usual Sweet 16 registers do not have to be "equated" and the "R" is optional. TED II+ users will be glad to know that the SET opcode works as it should, with numbers or labels. For the SET opcode, either a space or comma may be used between the register and the data part of the operands; that is, SET R3,LABEL is the equivalent to SET R3 LABEL (the second is not as elegant as the first, however). It should be noted that the NUL opcode is assembled as a 1 byte opcode (the same as hex 0D) and not a two byte skip as this would be interpreted by ROM Sweet 16. This is intentional and is done for internal reasons. 6.7 Pseudo Opcodes - Directives 6.7.1 EQU (=) The EQUals pseudo op has an optional syntax: LABEL EQU expression ;comment LABEL = expression ;comment The above two examples, assuming expression is the same, will generate the same value for LABEL. Either syntax is used to define the value of a LABEL, usually an exterior address or a constant for which a meaningful name is desired (good programming practices dictate that all constants be given a meaningful name and comment. The meaning of "magic" numbers tends to fade when the program source is read at a later time). In any case, it is recommended that the EQU"s all be located at the beginning of the program. The assembler will not permit an EQU to a zero page number after the label equated has been used, since bad code could result from such a situation. Also, see the pseudo op VAR. 6.7.2 ORG Establishes the address at which the program is designs to run. It defaults to the present value of Merlin HIMEM ($8000 by default). Usually, there will be one ORG and it will be at the start of the program. If more than one ORG is used, the first establishes the BLOAD address. This can be used to create an object file that would load at one address even though it might be designed to run at another. You cannot specify "ORG *-1", etc. to back up the object pointers as is possible with some assemblers. For this, you must use "DS-1". 6.7.3 OBJ OBJ expression OBJect establishes the address at which the object code will be placed during assembly. it defaults to MERLIN HIMEM. There is rarely any need to use this pseudo-op and programmers are urged to not use it. If OBJ is specified at some address above BASIC HIMEM or SYM it will defeat generation of object code. This may be used when sending a long listing to a printer or when using the "direct assembly to disk" (DSK) opcode. 6.7.4 PUT PUT filename PUT filename[,Sx,Dy], where Sx and Dy are Slot and Slot parameters in standard DOS syntax, will read the named file, with a "T." prefixed unless the filename begins with a character less than ASCII "@" ("space" is ideal), and "inserts" it at the location of the PUT opcode. A. The "insert" referred to above is misleading. Actually, the code is placed (generated) just behind the previously assembled "Main" source. When the PUT file is exhausted, Main will continue. B. Text files are required by this facility in order to assure memory protection. C. A memory error will occur if the PUT file causes the assembly to go beyond HIMEM. D. PUT files are in memory one at a time, so a very large program can be assembled using this facility. There are two restrictions on a PUT file: 1. There cannot be macro definitions within a PUT file. They must be within the main source. 2. A PUT file may NOT call another file with a PUT opcode. However, it is permitted to have the "main program" contain nothing but macro definitions and PUT opcodes. Any variable (e.g. ]LABEL) may be used as "local" variables. The usual local variables ]1 through ]8 may be set up for this purpose using the "VAR" opcode. PUT provides a simple and straight forward way to incorporate often used subroutines, such as MSGOUT and/or PRDEC, in a program. One simply has a collection of proven useful subroutines that are called in, as needed, by PUT. Since PUT accepts Slot and/or Drive parameters, these subroutines do not need to be on the same disk as the source for "main". 6.7.5 VAR VAR expr1,expr2,expr3,...,expr8 VAR is a convenient way to equate all or some of the variables ]1 through ]8 at the same time. VAR 3;$42;LABEL will set ]1 = 3, ]2 = $42, and ]3 = LABEL. VAR is designed to be used just before a PUT in order to pass parameters for use during the assembly. In fact, if a PUT uses any of the variables ]1 through ]8 except in >>> lines for calling macros, they must be declared prior to the PUT. 6.7.6 SAV SAV filename[Sx,Dy] where Sx and/or Dy are Slot and Drive parameters in standard DOS format. SAV will save the current object code under the specified name. This acts exactly as does the MERLIN EXEC mode object saving command, except it can be done several times during assembly. After a save, the MERLIN object area is "empty" and the object address is set to the last specification of OBJ, or if it is not present, MERLIN HIMEM by default. The SAV command sets the address of the saved file to the "correct" value. For example, the first file will have an origin of the initial ORG command, the second will have the last address of the first+1, and the third will have the address of the second+1,... When BLOADed later, they will go to the correct location(s). Together the PUT and SAV opcodes make it possible to assemble extremely large files. 6.7.7 DSK DSK filename DSK instructs the assembler to assemble the following code directly to disk. IF DSK is already in effect, the old file will be closed and a new one begun. DSK is used primarily for extremely large files. For moderately sized programs, SAV is preferred since it is 30% faster and theoretically more reliable. If the CHK opcode is specified, it will be disabled when DSK is in effect. 6.7.8 END This opcode is not needed by MERLIN. It is provided so MERLIN can assemble source code originally written for assemblers that do require an END statement. In any event, good programming dictates that it should be specified (Don't you feel better when you see both the ORG and END opcodes surrounding your precious source?). 6.8 Formatting 6.8.1 LST ON/OFF LST ON or LST OFF LST controls whether the assembly listing is to be sent to the Apple screen and/or other output device. You may, for example, use LST to send only a portion of the assembly listing sent to the printer. Any number of LST instructions may be in the source. If the LST condition is off at the end of the assembly, then the symbol table will not be printed. Please note that a CNTRL D (^D) toggles the LST flag during the second (printing) pass of the assembly. 6.8.2 EXP EXP ON or EXP OFF (EXPand macro assembly) EXP ON will cause both the macro call and the generated code to be printed during the second pass of the assembly. EXP OFF will print only the PMC pseudo ops. In either case, there is no effect on the generated code. 6.8.3 PAU PAU (PAUse) PAU causes the second pass of the assembly to pause until any key is hit. 6.8.4 PAG PAG (PAGe) This sends a form feed ($8C) to the printer. It has no effect at any time on the screen. 6.8.5 AST AST "expression" AST sends "expression" number of asterisks to the listing. 6.8.6 SKP SKP "expression" SKP sends "expression" carriage returns (spaces/skips) to the listing. 6.8.7 TR TR ON or TR OFF TR ON limits object code printout to three bytes per line. This means that long HEX statements will print only the first three bytes (all bytes are present, just not printed). TR OFF causes all object bytes to be printed. The assembler TR command is NOT the same as the EDITOR TR command. 6.9 STRINGS 6.9.1 ASC ASC puts a delimited ASCII string into the object code. The only restriction on the delimiter is that it cannot appear in the string itself. Different delimiters have different effects. Any delimiter less than (in ASCII code) the single quote (') will produce a string with the high bit on (set). Otherwise, the high bit will be off. For example, the delimiters !"#$%& will produce a string in "negative ASCII" (high bit on), and the delimiters '()+? will produce a string in "positive ASCII" (high bit off). Usually, the double quote (") and the single quote (') are the delimiters of choice, but, other delimiters provide a means of inserting a string containing either of the quotation symbols. 6.9.2 DCI DCI "d-string" (Dextral Character Inverted) DCI has all the rules as the ASC pseudo op. The only difference is that the last character is generated with an opposite high bit from the others. This is used by string manipulation routines to tell the end of a string. 6.9.3 INV INV d-string INV generates a string in inverse format. All delimiters have the same effect. 6.9.4 FLS FLS d-string FLS generates a string in flashing format. all delimiters have the same effect. 6.9.5 REV REV d-string (REVerse) Generates a string in reverse. For example: REV "DISK VOLUME" gives; EMULOV KSID The delimiter rules are the same as for ASC pseudo op. 6.10 Data and Allocation 6.10.1 DA DA "expression" (Define Address) This stores the two byte value of the operand, usually an address in the object code, low byte first. DA $FDF0 will generate F0 FD. DA also accepts multiple data (e.g. DA 1,10,100). 6.10.2 DDB DDB expression (Define Double Byte) DDB stores a two byte operand with the high byte first. This is the compliment instruction to DA. DDB accept multiple data on the same opcode. See DA example. 6.10.3 DFB DFB expression (DeFine Byte) DFB generates the bytes specified by expression. It accepts several bytes of data, separated by commas. The standard number format is used and arithmetic is done as usual. The symbols "<" and ">" are used to specify the low and high bytes of a label. If the "<" or ">" symbols are omitted, the low byte is assumed and taken. Either of the two should appear as the first character of an expression or immediately following a "#." The instruction: DFB >LAB1-LAB2 will produce the high byte of the expression LAB1-LAB2 The expression DFB $34,100,LAB1-LAB2,%1011,>LAB1-LAB2 is a properly formatted DFB statement which will generate the hex object code 34 64 DE 0B09, assuming that LAB1 = $81A2 and LAB2 = $77C4. 6.10.4 HEX HEX operand(s) HEX allows direct insertion of hexidecimal data (no expressions and/or labels allowed). Unlike all other cases, the "$" qualifier is not required or accepted with this command. The operand must consist of one or more pairs of numbers which may be separated by commas or adjacent. An error message will be generated if the operand contains an odd number of digits or ends in a comma, or as in all cases, contains more than 64 characters. 6.10.5 DS DS expression (Define Storage) DS reserves space for string storage data. It does not generate code. For example, DS 10 will set aside 10 decimal bytes for storage. Because DS adjusts the object code pointer, instructions "DS -1" can be used to back up the object and address pointers one byte. 6.10.6 KBD KBD (KeyBoarD) KBD allows a label to be equated from the keyboard during assembly. Its syntax is: LABEL KBD. 6.10.7 LUP LUP expression (Loop) --^ (end of loop) An example of the syntax is: LUP 4 ASL --^ This will assemble as: ASL ASL ASL ASL and will show that way in the assembly listing, with repeated line numbers. Perhaps the major use of LUP is for table building. As an example: ]A = 0 LUP $FF ;build 255 word table ]A = ]A+1 DFB ]A --^ ;end of LUP The above will build a 255 word table that contains 1, 2, 3, ..., $FF. The maximum LUP value is %8000. The LUP opcode will be ignored if you try to use more than this. 6.10.8 CHK CHK expression (CHecKsum) CHK places a checksum byte into the object code at the location of the CHK opcode (usually at the end of the program). It cannot be used when DSK is in effect. 6.10.9 ERR ERR expression (ERRor) ERR will cause a forced error if the expression has a non zero value. The error will consist of the message "Break in line ???" to be printed. For example, ERR may be used to insure that your program does not exceed address $95FF by adding the final line: ERR *-1/$9600. Another available syntax is: ERR ($300)-4C. This will produce an error on the first pass, and abort the assembly, if location $300 does not contain the value $4C. 6.10.10 USR USR opcode This is a user definable pseudo opcode. It does a JSR $B6DA. This location will contain a RTS after a boot, a BRUN MERLIN, or BRUN BOOT ASM. To set up your routine you should BRUN it from the EXEC command after CATALOG. This should just set up a JMP at $B6DA to your main routine then RTS. The following flags and entry points may be used by your routine. USRADS = $B6DA ;must have a JMP to your routine PUTBYTE = $E5F6 ;see below EVAL = $E5F9 ;see below PASSNUM = $2 ;contains assembly pass number ERRCNT = $1D ;error count VALUE = $55 ;value returned by EVAL OPNDLEN = $BB ;contains the combined length of ;the operand and comment NOTFOUN = $FD ;see discussion of EVAL WORKSP = $280 ;contains the operand and ;comment in positive ASCII Your routine will be called by the USR opcode with A=0, Y=0 and carry set. To direct the assembler to put a byte in the object code, you perform a JSR PUTBYTE with the byte in A. PUTBYTE will preserve Y but will scramble A and X. It returns with the zero flag clear (so that a BNE always branches). On the first pass, PUTBYTE adjusts the object and address pointers, so that the contents of the registers is not important. You MUST call PUTBYTE the same number of times on each pass or the pointers will not be kept correctly and the assembly and other parts of the program will be incorrect. If your program needs to evaluate the operand, or part of it, you can do this by a JSR EVAL. The X register must point to the first character of the portion of the operand you wish to evaluate (set X=0 to evaluate the expression at the start of the operand). On return from EVAL, X will point to the character following the evaluated expression. The Y register will contain a 0, 1, or 2 accordingly as this character is a right parenthesis, space, or comma. Any character not allowed in an expression will cause the assembly to abort with the message "BAD OPERAND." If some label in the expression is not recognized, then location NOTFOUN will be non zero. On the second pass you will get an "UNKNOWN LABEL" message and the rest of your routine will be ignored. On return from EVAL, the computed value of the expression will be in location VALUE and VALUE+1, low byte first. On the first pass, this value will be incorrect if NOTFOUN is non zero. Appropriate locations for your routine are $300-$3CF and $8A0-$8FF. You must not write outside these ranges! For a longer routine, you may use high memory, just below $9853. If you are sure that the label table will not exceed $1000 bytes, you could use the EDITOR command "SYM" to protect your routine from being over written by the object code. SYM would have to be set at least one byte below your code. You can use zero page locations $60-$6F, but should not alter any other locations. Also, you must not change anything from $226 to $27F, or anything from $2C4 to $2FF. Upon return from your routine (RTS), the USR line will be printed (on the second pass). To gain further understanding of the use of USR, read the source file SCRAMBLE.S or, for a more sophisticated example, the file FLOAT.S. SCRAMBLE.S uses the USR opcode to put an ASCII string into the object code in a scrambled format. FLOAT.S is a somewhat complicated routine that uses Applesoft to compute the packed (five byte) form of a specified floating point number, and put it in the object code. FLOAT.S can be used only on an Apple ][+ or //e. When you use the USR opcode in a source file, it is wise to include some sort of check (in source) that the required routine (yours) is in memory. If, for example, your routine contains the byte $31 at location $310 then: ERR (310)-$31 will test that byte and abort assembly if it is not there. Similarly, if you know that the required routine should assemble exactly two bytes of data, then you can check for it by the following code: LABEL USR OPERAND ERR *-LABEL-2 This will force an error on the second pass if USR does not produce exactly two object bytes. It is possible to use USR for several different routines in the same source. For example, your routine could check the first operand expression for an index to the desired routine and act accordingly. Thus,"USR 1,whatever" would branch to the first routine, "USR 2,stuff" to the second, etc. 6.11 Conditionals 6.11.1 DO DO expression DO, together with ELSE and FIN are the conditional assembly pseudo ops. If the operand evaluates to zero, the the assembly will stop generating obj.