Path: blue.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!math.ohio-state.edu!howland.reston.ans.net!torn!news.unb.ca!upei.ca!usenet From: george@grover.psyc.upei.ca (George Taylor) Newsgroups: comp.sys.apple2.programmer Subject: Re: ASM multiply/divide Date: 6 Jul 1994 18:59:53 GMT Organization: University of Prince Edward Island, Charlottetown, PEI Canada Lines: 28 Message-ID: <2veuv9$1u6@atlas.cs.upei.ca> References: Reply-To: yurik@io.org NNTP-Posting-Host: grover.psyc.upei.ca Sean Dockery writes (whole bunch of divide/multiply routines). Well, I just stumbled on this newsgroup by accident. Normally I read comp.sys.cbm , being a 64 programmer. But, 6502 is my area of expertise, and in 'our' world, we have much faster routines. I have a multiply which takes only 40 cycles. this requires 1.5k of tables, however. It works on a difference of squares algorithm. 4*a*b=(a+b)^2-(a-b)^2 You can derive the code from this formula... Hint: put a in a zp location, and b in y, so you can do lda (adr),y to read from 0-510 location table. As for an algorithmic divide, an 8 bit divide can be done at 14 cycles per bit at best. If you start using tables, the speed can be decreased by about 50%. I find it interesting to see how 'developed' coders are in a different 6502 world... Btw, the difference of squares method is so powerful, I hear it saves even a few cycles on an Amiga running with a 68000! (as compared to a HARDWARE divide). Algorithms are in fact being replaced by tables now in the design of some new math comprocessors. (see the lastest transactions of IEEE for the floating point article). Just out of curiousity, has anyone done solid vectors on the apple? On the 64, you can plot an 80x80 solid cube, rotating in 3d, in a time of about two frames per rotation ( in 4 colors). That's around 40,000 cycles. How fast is multiply on the 65816? Later, yurik/TEG Path: blue.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!math.ohio-state.edu!howland.reston.ans.net!spool.mu.edu!torn!news.unb.ca!upei.ca!usenet From: student@bert.psyc.upei.ca (Student Demonstations) Newsgroups: comp.sys.apple2.programmer Subject: Re: ASM multiply/divide Date: 7 Jul 1994 10:19:10 GMT Organization: University of Prince Edward Island, Charlottetown, PEI Canada Lines: 45 Message-ID: <2vgkqu$g64@atlas.cs.upei.ca> References: <2veuv9$1u6@atlas.cs.upei.ca> Reply-To: yurik@io.org NNTP-Posting-Host: bert.psyc.upei.ca In article <2veuv9$1u6@atlas.cs.upei.ca> george@grover.psyc.upei.ca (George Taylor) writes: > As for an algorithmic divide, an 8 bit divide can be done at 14 cycles per > bit at best. If you start using tables, the speed can be decreased by > about 50%. I now follow up to myself with the code :) 8 bit unsigned divide ldx #8 lda dividend div cmp divisor bcs s sbc divisor s rol quotient rol dex bne div timing in cycles: overhead 5 "0" bit 20 "1" bit 18 range 148 to 164 average 156 unrolled, with dividend in A register, and last iteration optimized: 105 average of 14 cycles per bit. dividend, quotient, and divisor are 8 bit values stored in zero page. The remainder is left in A. note: dividend/divisor must be < 2. So this is not a fully general routine. To handle the full range, you must do a prefix of the dividend to reduce it to the right range, then fix the quotient after. This is true of any shift and subtract divide. True divides are much slower. There are two faster routines I have written, but this is the fastest algorithmic version. A faster version uses decision tree optimization and requires over 256 bytes of code. The fastest version requires tables which I cannot calculate without a computer. I mean, I cannot even easily specify how to find them, since it is not a formula. -yurik ps if anyone has some interesting math algorithms or graphics ideas, please send me. Yes, I have already found the 14 cycle per point line draw (not including plot) and the all integer ellipse and circle :) Search my name in comp.sys.cbm for more interesting info.