Mac OS X Numerics Benchmarks

11 Oct 2025

In the twenty plus years since I wrote this, hardware has come a long way. On a 2021 MacBook Pro with an M1 Pro chip, a billion square roots can be added in a quarter of a second. Amazing!

AltiVec

27 Jun 2003

Now that the new G5 Macs have been announced, I got more interested in AltiVec, which is also found in the G4 chips. I wrote a simple sum of the square roots program using vecLib.

My current daily machine is a Power Macintosh G4 (Mirrored Drive Doors) with twin 1 GHz CPUs, 1 GB of RAM, and 360 GB of drive space. This machine can sum 100,000,000 single-precision square roots in less than 5 seconds!

Here's the program, which will build on Mac OS X using GCC or under MPW with MRC:

/*
 *	altivec.c - AltiVec benchmark program.
 *	altivec is Copyright Daniel K. Allen, 2003. (This program, not the Velocity Engine.)
 *	All rights reserved.
 *
 *	26 Jun 2003 - Created by Dan Allen in MPW & Terminal simultaneously.
 *
 *	Dual 1 GHz G4 Power Mac running Mac OS X 10.2.6 times:
 *
 *		cc altivec.c -o altivec -framework vecLib -faltivec -O3 -mdynamic-no-pic
 *
 *			100,000,000 square roots in 4.69 seconds
 *	  		1,000,000 square roots in  .05 seconds
 *
 *		MRC altivec.c -opt speed,unroll -vector on 
 *
 *			100,000,000 square roots in 5.03 seconds
 *	  		1,000,000 square roots in  .05 seconds
 *
 *
 */


#ifdef powerc
#include 
#else
#include 
#endif
#include 
#include 


typedef union {
	vector float v;
	float f[4];
} vf;


main(int argc,char *argv[])
{
  vf a,b;
	int i = 0,n;
	double sum = 0;
	clock_t t = clock();


	n = (argc == 2) ? atoi(argv[1]) : 1000000;
  while (i < n) {
		a.f[0] = i++;
		a.f[1] = i++;
		a.f[2] = i++;
		a.f[3] = i++;
		b.v = vsqrtf(a.v);
		sum += b.f[0];
		sum += b.f[1];
		sum += b.f[2];
		sum += b.f[3];
	}
	t = clock() - t;
  printf("Time: %.2f sec\n Sum: %d sqrts = %.8f\n",t/(float)CLOCKS_PER_SEC,i,sum);
  return 0;
}


/*


MRC altivec.c -o altivec.o -opt speed,unroll -vector on 
PPCLink -o altivec altivec.o "{PPCLibraries}InterfaceLib" "{PPCLibraries}MathLib" "{PPCLibraries}StdCLib" "{PPCLibraries}StdCRuntime.o" "{PPCLibraries}PPCCRuntime.o" "{PPCLibraries}PPCToolLibs.o"  "{PPCLibraries}vecLib"
SetFile altivec -d . -m . -t MPST -c 'MPS '


*/

Floating Point

Jan 2002

A quiet improvement in OS X 10.1.2 appears to be faster math library routines which have greatly improved numerics benchmark scores. GCC 2.95.2 now appears to have competitive codegen with MRC. In this case it turned out that improved math libraries in 10.1.2 make the difference.

Once again this proves that benchmarking rarely measures an individual component but reflects an entire system: CPU, memory, buses, disks, I/O, an operating system, the compiler, and libraries all contribute to the final result.

All tests performed on a 450 MHz Power Macintosh G4 Cube with 512 MB of RAM.
Well all except the Windows test, which was on a Dell OptiPlex 450 MHz Pentium III with 320 MB of RAM.
The machine was also playing www.kpig.com at 128 Kbaud via iTunes where noted.
All times are in seconds, thus smaller numbers mean better performance.

Bench Scores

OS Version	iTunes	Compiler	Integer Addition	Sum of Sqrts	Simulation	Most Remote (Trig)
Mac OS 9.2.2	-	MRC -opt speed	1.17	0.23	0.07	0.30
Mac OS X 10.1	-	gcc -O3	1.09	1.15	0.11	0.49
Mac OS X 10.1.1	-	gcc -O3	1.08	1.29	0.11	0.49
Mac OS X 10.1.2	-	gcc -O3	1.09	0.25	0.07	0.27
Mac OS X 10.1.2 & 9.2.2	-	MRC -opt speed	1.15	0.23	0.07	0.30
Windows 2000 SP2	-	Visual C++ 6.0 -Ox	1.12	0.13	0.09	0.28
Mac OS X 10.1	playing	gcc -O3	1.11	1.35	0.12	0.51
Mac OS X 10.1.2	playing	gcc -O3	1.14	0.27	0.07	0.28
Mac OS X 10.1.2 & 9.2.2	playing	MRC -opt speed	1.33	0.27	0.10	0.35
Mac OS 9.2.2	playing	MRC -opt speed	1.25	0.25	0.08	0.33

Mac OS X is now faster than 9.x!

Jun 2003

GCC 3.3 is now out and sometime I'll update these numbers for my faster machine and the new compiler.

The bench tool is a collection of small numeric loops that are common in scientific and engineering programming. This tool is written in C by Dan Allen, with contributions by Dr. Paul A. Finlayson of JPL. The benchmark scores used to take a long time. Fast machines will require us to update the benchmarks so that the differences are more apparent and are not in the noise.

Back to Dan Allen's home page.

Created:  22 Dec 2001
Modified: 11 Oct 2025