4*4 Matrix Multiply

BSides

Member
Oct 22, 2001
43
0
0
What is the current best algorithm for the fewest muls and adds to multiply 2 4*4 matrices? (Hmm... We should make this a contest.)
 

Moohooya

Senior member
Oct 10, 1999
677
0
0
You will need 64 multiplications and 48 additions. Anything else and either your matrix is sparse or you have some application knowledge of the numbers contained.
 

Agent004

Senior member
Mar 22, 2001
492
0
0


<< What is the current best algorithm for the fewest muls and adds to multiply 2 4*4 matrices? (Hmm... We should make this a contest.) >>



By hand of course, I agreed that You will need 64 multiplications and 48 additions.
 

Xalista

Member
May 30, 2001
113
0
0
Assuming you are trying to multiply to random matrices (meaning you don't have any info on the values in the matrix):

The fastest way to multiply two 4*4 matrices is probably the na&iuml;ve method of multiplying every row of the first matrix with every column of the second. For two 4*4 matrices this comes down to 64 muls and 48 adds.

The time complexity of this algoritm is O(n^3).
There is an algoritm with time complexity O(n^log 7), which is know as Strassen's algoritm.
Because the constant that is hidden in the running time of Strassen's algorithm is larger than the constant factor in the O(n^3) algoritm, Strassen's algoritm is typically only used for large matrices.

It is possible to multiply two matrices in less time than O(n^lg 7). The current upperbound is approxamately O(n^2,376), which is due to a method by Coppersmith and Winograd. ("Matrix multiplication via arithmetic progression.", Journal of Symbolic Computation, 9(3):251-280, 1990)

Most of this information comes from: Introduction to Algorithms, 2nd ed. by Cormen, Leiserson, Rivest and Stein.
This book also has a very clear explantion of Strassen's algorithm.
 

BSides

Member
Oct 22, 2001
43
0
0
Seeing that I don't have that journal, do you have a fairly concise algorithm for this implementation? I know that many GPUs out there will do this multiplication in hardware, but I'm not sure if this is a very commonplace solution. I imagine this uses braceting for the algebraic parts of the solution...

Also, are multiplies and adds the same speed on newer Intel processors? I'm working on another problem involving complex multiplies. i.e. (A+Bi)(C+Di) I've found a solution that adds a few adds in place of a mul....
 

Xalista

Member
May 30, 2001
113
0
0
Sorry, I don't have an implementation, nor have I ever read that article myself.

But again, the fastest way is probably the standard matrix multiply.
I do know that a few years ago a lot of people were doing these things in asm and it was a big deal if you could save on a mul or div.

You could search for some of those implementations.
 

BSides

Member
Oct 22, 2001
43
0
0
Hmm. I found something interesting. There is a way to do a 2*2 matrix multiply in fewer muls and more adds... Going off the same principals of a 4*4 mul being 64 muls and 48 adds...

Q1 = (A11+A22)*(B11+B22)
Q2 = (A21+A22)*B11

so on and so forth, and then adding the new variables together to make the product's cell values. It takes 18 adds total, and 7 muls.

Is the multiply on pentium architecture processors slow enough to warrant a software solution like this?
 

Moohooya

Senior member
Oct 10, 1999
677
0
0
Hardware multiplications are pretty damn fast these days. Divisions can be slower (but I think they use tables to speed them up.)

Honestly don't know if a multiplication is much slower than an add, but the compiler might be able to reorder enough instructions to keep the pipeline flowing straight through.

As you talking integer of floating point? Not that I'd know anything by your answer, but the ALU and FPU are two seperate beast, and I'm sure take a different number of cycles for the mul and add.