Short: Patch CopyMem/Quick for 68060(040) v1.5d Author: sintonen@iki.fi (Harry "Piru" Sintonen) Uploader: sintonen@iki.fi (Harry "Piru" Sintonen) Type: util/boot Requires: 68060 or 68040, Kickstart 2.04 Version: 1.5d Description: This is a small patch which replace the CopyMem and CopyMemQuick functions of exec.library. These functions are optimized for the 68060 processor. They should also work with the 68040 processor, howevery they might not be the fastest possible for 68040. The patch tests for a 68040 or 060 processor. If it can't find one, it doesn't install the patch and exit with a return code of 20 (=fail). It also fails, if it can't allocate the necessary memory. If MorphOS PPC kernel is running it won't install the patch and will exit with a return code of 5 (=warn). If the CPU is a 68040 CMQ060 will install a slightly improved version of v1.4 routines. If CPU is a 68060 routines with new movem-loop are picked instead. Note that due these movem-copyloops v1.5 is slightly slower in chipmem copies than v1.4. However fast->fast copies are sped up, so I don't consider this a problem, esp. since most copies are fast->fast. In average (measured with "TestIt" from CopyMemQuicker V2.8) these routines are 29.4% faster than Kickstart 3.1 ones. CMQ060 v1.5 is in average 2.5% faster than CMQ060 v1.4. The full source code is included. The source code was compiled with GenAm 3.14, it also compiles with PhxAss. Installation: Just copy CMQ060 into c: And insert CMQ060 in your s:startup-sequence Some notes about Move16: Move16 is a new assembler command of the 68040 and 060 processors. It moves 16 bytes at once and it uses burst accesses. Andreas Kleinert and Thomas Richter said there could be problems with the Move16 command on the Amiga, especially in the chipram, caused by the DMA of the custom chips. So v1.5 of CMQ060 doesn't use Move16 from or into memory below $01000000 (Chipram, ZorroII-Fastram, I/O-Space, Kickstart,...). Move16 is only used, when the source and destination addresses are both higher than $00ffffff (32-bit fastram). (If you didn't get any errors with V1.3 and want to get the most speed improvement, you could use CMQ060_Move16. This version use Move16 also below $01000000, but you might get problems. If you want to avoid all problems which Move16 could cause [the 68040 has some Move16 bugs], you should use Aminet:util/boot/CMQ030. This one never uses Move16 and is still faster than the other available patches.) Some notes about the movem bug: Some CPU Cards have a bug in the bus controller and these cards fail to perform movem properly with odd addresses. CMQ060 v1.5 autodetect such cards and will use move-loop instead of movem-loop with them. If move- loop is picked the performance will drop slightly compared to movem- loop. Fortunately such defect cards are rare. Special thanks to Harald Frank who patiently explained the bug to me, and gave me idea how to autodetect it. Version 1.5 author: Harry "Piru" Sintonen Original CMQ060 author: Dirk Busse Kropsburgstraße 8 D-67141 Neuhofen Germany <100.141999@germanynet.de> Speed comparision: There are some similar patches available on the Aminet: CopyMemQuicker V2.8 from 1994 -> Aminet:util/boot/COPMQR28.lha PCM V1.0 from 1996 -> Aminet:util/boot/PCM_1.0.lha Also MCP patches these functions. Here are some test results. All results were measured on the same AMIGA 1200 with a phase5 Blizzard PPC with 060 @ 50MHz. Blizzard PPC memory speed setting for M68K was set to fastest possible. The most surprising result is that PCM V1.0 is in average *slower* than original Kickstart 3.1 routines! "TestIt" from CopyMemQuicker V2.8 orig COPMQR MCP PCM CMQ030 CMQ060 CMQ060 CMQ060 KS 3.1 V2.8 V1.33b1 V1.0 V1.1 V1.4 V1.5 Move16 CopyMem routines V1.5 565×64kB L->L 2.04 2.08 1.92 1.56 1.91 1.52 1.51 1.51 147×64kB L->L+1 0.94 0.68 0.57 0.68 0.56 0.57 0.56 0.56 413×64kB L->E 1.66 1.70 1.61 1.91 1.57 1.61 1.59 1.59 147×64kB L->E+1 0.94 0.68 0.57 0.68 0.56 0.57 0.56 0.56 147×64kB L+1->L 0.94 0.67 0.57 0.60 0.56 0.57 0.55 0.56 382×64kB L+1->L+1 1.62 1.39 1.29 1.05 1.30 1.03 1.02 1.02 147×64kB L+1->E 0.94 0.68 0.57 0.69 0.57 0.57 0.56 0.56 501×64kB L+1->E+1 1.91 1.89 1.95 2.34 1.96 1.96 1.93 1.93 501×64kB E->L 1.92 1.92 1.94 2.06 1.92 1.95 1.90 1.90 147×64kB E->L+1 0.94 0.67 0.57 0.68 0.57 0.57 0.55 0.55 382×64kB E->E 1.62 1.39 1.29 1.06 1.30 1.03 1.02 1.02 147×64kB E->E+1 0.94 0.68 0.57 0.68 0.57 0.57 0.56 0.56 147×64kB E+1->L 0.94 0.67 0.57 0.60 0.56 0.57 0.55 0.56 413×64kB E+1->L+1 1.71 1.70 1.60 1.93 1.61 1.60 1.56 1.56 147×64kB E+1->E 0.94 0.67 0.57 0.69 0.57 0.57 0.55 0.55 564×64kB E+1->E+1 2.10 2.06 1.91 1.56 1.92 1.52 1.50 1.50 33900×1kB L->L 0.43 0.42 0.37 1.49 0.36 0.36 0.36 0.36 9400×1kB L->L+1 0.58 0.33 0.20 0.24 0.20 0.19 0.19 0.19 24000×1kB E->E 0.68 0.30 0.26 1.01 0.27 0.26 0.26 0.26 196000×128B L->L 0.55 0.45 0.41 1.12 0.32 0.35 0.33 0.33 155000×128B E->E 0.75 0.40 0.34 1.10 0.34 0.30 0.30 0.31 588000×19B L->L 0.85 0.61 0.72 0.93 0.53 0.53 0.53 0.53 622000×18B L->L 0.86 0.51 0.71 0.89 0.51 0.50 0.50 0.51 663000×17B L->L 0.75 0.68 0.76 0.80 0.51 0.53 0.53 0.55 956000×16B L->L 0.82 0.71 1.04 1.05 0.59 0.72 0.55 0.55 1060000×8B L->L 0.85 0.72 0.89 1.03 0.62 0.53 0.55 0.55 1430000×4B L->L 0.80 0.63 0.94 1.12 0.71 0.45 0.45 0.48 2190000×1B L->L 0.74 0.61 1.40 0.88 0.44 0.66 0.66 0.70 CopyMemQuick 565×64kB L->L 2.04 2.06 1.91 1.56 1.91 1.52 1.51 1.51 33900×1kB L->L 0.43 0.43 0.37 1.27 0.36 0.36 0.35 0.35 196000×128B L->L 0.53 0.43 0.38 1.09 0.31 0.32 0.30 0.30 956000×16B L->L 0.73 0.63 0.94 1.06 0.42 0.58 0.42 0.42 1060000×8B L->L 0.53 0.57 0.80 0.63 0.44 0.42 0.42 0.42 1430000×4B L->L 0.43 0.51 0.80 0.60 0.31 0.28 0.28 0.31 Total 35.63 30.70 31.48 36.84 27.31 25.80 25.16 25.31 History: 1.0 (12.Sep.1998) - First public version. 1.1 (15.Sep.1998) - V1.0 exits with a return code of 10 (=error), if it can't find a 68040 or 68060 or can't get the necessary memory. V1.1 exits, in this cases, with a return code of 20 (=fail). - Fixed a mistake in the readme. 1.1b (19.Sep.1998) (I didn't changed the Patch itself! It's the same as V1.1) - Added the Testresults of MCP V1.30 into the readme. - Added CMQ060beep and CMQ060beepCMQ (see above). 1.2 (29.Nov.1998) - Added the Testresults of MCP V1.32b12 into the readme. - Changed the source code. There was a problem with a wrong written program which expects the address of the last source byte +1 in A0 and the address of the last destination byte +1 in A1. This version of CMQ060 solves problems with such badly programs. It's now 100 Bytes longer, but the speed is the same. Big moves by the CopyMem function will be one or two cycles faster, but you didn't recognize it. 1.3 (5.Jan.1999) All changes made to this version doesn't effect the speed. They are only to avoid problems with future versions of AMIGA OS. - changed the version string to the "standard" format - changed BMI to BCS and BPL to BCC -> now CMQ030 could move blocks bigger than 2 GigaByte ;-) 1.4 (3.Apr.1999) - CMQ060 now doesn't use Move16 into/from memory below $01000000 - added CMQ060Move16 (It's the same as CMQ060 V1.3) - added the test results of CMQ030 (Does never use Move16) 1.5 (5.Sep.2000) - Totally rewrote the source code. - Bugfix: Fixed major bug from the patch init: If the memory was allocated near 64k boundary CMQ060 trashed innocent memory and crashed the system completely. Odds were 1/8192 for this to happen. - Speedup: Removed two pipeline stalls from big copies. - Speedup: Optimized non-move16 copy loop, now it uses movem.l instead of move.l. Slightly slower in chipmem copies, however fast -> fast copies sped up. - Speedup: Unrolled the bigcopy-loops to do 256 bytes per iteration. - Added MorphOS check, it makes no sense to slow down MorphOS with m68k patches. - Redid all speedtests, MCP test with 1.33b1. Added V1.4 result for reference. Cleaned up this readme. 1.5b (6.Sep.2000) - With 68040 the move-loop is faster then movem-loop. So, now always pick move-loop for 68040. Thanks to Chip for benchmark results. - Added autodetect for movem buscontroller bug. Now automagically pick between movem- and move-loop on 68060. - Fixed Kickstart requirement, 68040 wasn't officially supported before Kickstart 2.04. 1.5c (7.Sep.2000) - Bugfix: movem buscontroller bug autodetect was bugged. Fixed. - Made the source compile with PhxAss. 1.5d (11.Sep.2000) - Bugfix: movem buscontroller bug autodetect still had a potential problem. Fixed.