Path: news1.icaen!news.uiowa.edu!uunet!in3.uu.net!144.212.100.12!news.mathworks.com!news1.chicago.cic.net!iagnet.net!128.223.220.30!logbridge.uoregon.edu!news.uoregon.edu!smalltown.uoregon.edu!cie-2.uoregon.edu!nparker
From: nparker@cie-2.uoregon.edu (Neil Parker)
Newsgroups: comp.sys.apple2,comp.sys.apple2.programmer
Subject: Re: Looking for 32-bit signed integer math routines in 6502 assembly
Followup-To: comp.sys.apple2.programmer
Date: 27 Aug 1997 21:49:26 GMT
Organization: University of Oregon Campus Information Exchange
Lines: 143
Message-ID: <5u27d6$2q4@smalltown.uoregon.edu>
References: <Pine.SOL.3.96.970824193831.22408C-100000@post.its.mcw.edu>
NNTP-Posting-Host: cie-2.uoregon.edu
NNTP-Posting-User: nparker
Xref: news1.icaen comp.sys.apple2:124593 comp.sys.apple2.programmer:9346

In article <Pine.SOL.3.96.970824193831.22408C-100000@post.its.mcw.edu> Ronald
Kneusel <rkneusel@post.its.mcw.edu> writes:
>
>Subject says it all.  Specifically, I'm looking for 32-bit _signed_ 
>multiplication and division:
>
>32-bit * 32-bit = 64-bit (or 32-bit ignoring overflow)
>
>32-bit / 32-bit = 32-bit (plus remainder)
>
>I've found 16-bit routines but was wondering if anyone had 32-bit routines 
>in 6502 assembly handy.

First, my apologies for posting all this machine language in comp.sys.apple2,
but that *is* where the original question is.  I've tried to direct followups
to comp.sys.apple2.programmer.

Attached below is a 32*32=64 multiplication routine.  I just banged it out
tonight, but it seems to give the correct answers for a few test problems.

Alas, it's unsigned, but that's not much of a problem.  If a 32-bit result
is sufficient, just use MUL4X4 as-is...the low-order 32 bits are the
correct low-order bits of the signed product.  If you need a 64-bit result,
you'll have to do a little pre- and post-processing:

        LDA A4      ;Compute sign of result
        EOR B4
        PHP         ;Save on stack
        LDY #0
        BIT A4      ;A<0?
        BPL TRYB    ;If not, try B
        LDX #$FC    ;Else negate A
        SEC
L1      TYA
        SBC A4+1,X  ;(this only works if A1..A4 are on page 0)
        STA A4+1,X
        INX
        BNE L1
TRYB    BIT B4      ;B<0?
        BPL DOMUL   ;If not, multiply
        LDX #$FC    ;Else negate B
        SEC
L2      TYA
        SBC B4+1,X  ;(this only works if B1..B4 are on page 0)
        STA B4+1,X
        INX
        BNE L2
DOMUL   JSR MUL4X4  ;Do the (unsigned) multiplication
        PLP         ;Should result be <0?
        BPL DONE    ;If not, then done
        LDY #0      ;Else negate result
        LDX #$F8
        SEC
L3      TYA
        SBC R8+1,X  ;(this only works if R1..R8 are on page 0)
        STA R8+1,X
        INX
        BNE L3
DONE    RTS

Here's the multiplication routine.  It wants the four-byte multiplicand
in locations A1 (low byte) through A4 (high byte), and the four-byte
multiplier in B1 through B4.  The result comes out in R1 through R8.
The four low-order bytes of R must be the same memory locations as the four
bytes of B--i.e. B1 = R1, B2 = R2, etc.  Calling MUL4X4 destroys B.

For efficiency, all twelve memory locations should reside on page 0, but
this is not required (unlike the signed routine above, which does require
everything to be on page 0).

This is not the most compact possible routine...some loops have been
unrolled for speed, and some code has been added to optimize for zero bytes
in B.

MUL4X4  LDA #0      ;Init result to 0
        STA R5
        STA R6
        STA R7
        STA R8
        JSR MUL1    ;Multiply A by 1st byte of B
        JSR MUL1    ;Multiply A by 2nd byte of B
        JSR MUL1    ;Multiply A by 3rd byte of B
; Multiply A by 4th byte of B, by falling into...
MUL1    LDY B1      ;Get low byte of B
        LDA B2      ;Shift remaining bytes down, to make room for result
        STA B1
        LDA B3
        STA B2
        LDA B4
        STA B3
        TYA         ;Get low byte of B back
        BNE NON0    ;Is it 0?
        LDA R5      ;If so, shift result down by a whole byte, and return
        STA R4
        LDA R6
        STA R5
        LDA R7
        STA R6
        LDA R8
        STA R7
        STY R8      ;(note Y=0 here)
        RTS
NON0    LDX #8      ;Otherwise prepare to shift 8 bits
M1      LSR         ;Get low bit of multiplier
        TAY         ;Save remaining multiplier bits
        BCC M2      ;Low bit 0?
        CLC         ;If not, add multiplicand to result...
        LDA A1
        ADC R5
        STA R5
        LDA A2
        ADC R6
        STA R6
        LDA A3
        ADC R7
        STA R7
        LDA A4
        ADC R8
        STA R8
M2      ROR R8      ;...and shift everything down 1 bit
        ROR R7
        ROR R6
        ROR R5
        ROR R4
        TYA         ;Get multiplier bits back
        DEX         ;Done 8 bits yet?
        BNE M1      ;If not, go do more
        RTS         ;Else done

The 32/32=32+remainder division routine will have to wait for another day--
I don't have one handy at the moment, and besides, this article is already
getting a bit too long.

               - Neil Parker

P.S.  Concerning the division routine:  Do you care which way the quotient
rounds when the result is negative?  Should it go to the more negative result,
or toward 0, or something else?
-- 
Neil Parker                       | Unsolicited commercial e-mail to my
nparker@cie-2.uoregon.edu         | address is not welcome, and will be
nparker@cie.uoregon.edu           | discarded unread.
http://cie-2.uoregon.edu/~nparker |