Advantages of block matrices over conventional matrices - computational complexity?

Status
Not open for further replies.

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
One software package I use a lot performs, as a key part of its work, a deconvolution. Given the high noise in the input data, it uses the generally accepted technique of linear algebraic deconvolution by using the truncated singular value decomposition of a Toeplitz matrix.

In practice, the measured convolution kernel needs considerable zero-padding - to at least 2x or even 3x the size of the measurements (20-30 points). However, like most commercial packages it actually implements the zero padding by making A a circulant block matrix.

So I was wondering what the reason for that might be - given the increased complexity of calculating the SVD of a block matrix, over a simple circulant matrix.

I was wondering if this was a computational complexity issue. One data set may need 1,000,000 deconvolutions - and I could imagine that if each deconvolution required the SVD for a 60x60 matrix - this could be an issue.

Presumably there is a good reason for this - as the block matrix technique seems to be almost ubiquitous.
 
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
One software package I use a lot performs, as a key part of its work, a deconvolution. Given the high noise in the input data, it uses the generally accepted technique of linear algebraic deconvolution by using the truncated singular value decomposition of a Toeplitz matrix.

In practice, the measured convolution kernel needs considerable zero-padding - to at least 2x or even 3x the size of the measurements (20-30 points). However, like most commercial packages it actually implements the zero padding by making A a circulant block matrix.

So I was wondering what the reason for that might be - given the increased complexity of calculating the SVD of a block matrix, over a simple circulant matrix.

I was wondering if this was a computational complexity issue. One data set may need 1,000,000 deconvolutions - and I could imagine that if each deconvolution required the SVD for a 60x60 matrix - this could be an issue.

Presumably there is a good reason for this - as the block matrix technique seems to be almost ubiquitous.

From your description, I don't quite see exactly what the matrix looks like... so I'll stay general w/this. Like are we comparing forming the SVD of a say 180x180 circulant matrix to the SVD of a block circulant matrix (total size 180x180) of 60x60 blocks, where each block is not circulant?

As far as I know, the fast fourier transform technique used to form the SVD of a circulant matrix can be generalized to the case of a block circulant matrix, see here:
http://portal.acm.org/ft_gateway.cf...GUIDE&dl=GUIDE&CFID=93930288&CFTOKEN=14577962

In particular, if you have M (unique) blocks of size mXn, the SVD of a block-circulant matrix takes M times the time for SVD a mXn matrix. So it's quite cheap if the blocks are related (sounds like they are for your problem...?).

This should be much faster than forming the SVD of a system that is 180x180 (as opposed to 3 blocks of 60x60), unless I'm misunderstanding your problem (I bet I am).

Additionally if the end goal is to perform a (de)convolution, I'm not sure that the SVD is ever being formed explicitly. For circulant matrices, convolution can be performed with 3 DFTs and a dot product (matrix vector product). Deconvolution is equally simple b/c inversion here is trivial. The SVD allows you to do this but you don't really need to form the entire SVD factorization, since the singular vectors come from the 'fourier matrix' F (where F*x is the DFT of x) & the singular values come from F*v, where v is the first row if the circulant matrix.

I didn't look into it too deeply but it seems like the algorithm in that paper can be used for this?

btw if you can't access that paper, let me know and I can email it to you or place it in a public web space.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
Thx for that.

I'd suspected it was a computational complexity thing - but block matrices are a bit out of my experience, so I didn't really know how they worked, and the impact on computation.

I don't think you are misunderstanding the problem. The problem is simply do you calculate the SVD of a 90x90 sparse matrix, or a 3x3 block matrix of 30x30 related blocks. On my reading of the abstract, it seems likely that doing the latter is much easier.

I've not been able to read the paper, but I do think that really matters that much - it looks like a very technical report, for an arcane computer system, so I don't know how much of it would really be of direct relevance to myself. Unless you think it is particularly interesting, you don't need to go to any effort to upload a copy anywhere.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Thx for that.

I'd suspected it was a computational complexity thing - but block matrices are a bit out of my experience, so I didn't really know how they worked, and the impact on computation.

I don't think you are misunderstanding the problem. The problem is simply do you calculate the SVD of a 90x90 sparse matrix, or a 3x3 block matrix of 30x30 related blocks. On my reading of the abstract, it seems likely that doing the latter is much easier.

I've not been able to read the paper, but I do think that really matters that much - it looks like a very technical report, for an arcane computer system, so I don't know how much of it would really be of direct relevance to myself. Unless you think it is particularly interesting, you don't need to go to any effort to upload a copy anywhere.

Sure.

Yeah I wasn't sure if I was properly understanding the situation b/c I'm not familiar w/the application. (Like if the 90x90 matrix were itself circulant, then the block approach may not be faster.) But in general, block-structure is preferable. Most standard operations with matrices scale like M^2 or M^3 (where the matrix is MxM).

Say this is divided into B mxm blocks. For example, say M=100, m=10 and we have an M^2 algorithm: so the cost is M^2 or B*m^2. If the matrix is just a bunch of random numbers, then whether we use 100 10x10 blocks or 1 100x100 block doesn't change the asymptotic timing. (But it does have implications for real-world performance, which is where things like the optimized BLAS get the bulk of their speedup.)

But if the matrix were sparse or the blocks were related (so that B<100), then it's easy to see how the block structure makes things cheaper. And again, never overlook the potential advantages of 'good' working set sizes (which can be obtained by manipulating block sizes) for improving cache performance.

Posting things online is easy enough. I've pm'd it to you in the interest of not listing my public webspace for all to explore. The paper is quite short at 5 pages. The first bits describe the algorithm used & discusses runtimes, which could be interesting to you. It may not be exactly what happens in your software, but my guess is that it's close. There's very little that's special to the Cray-2 (which you can think of as somewhere between an SSE-enabled CPU and a GPU). Well, at least not anymore; at the time it was pretty damn special. (I dunno how concerned you are with high performance computing, but it's often amusing to read old vs new 'supercomputer' papers--so much has changed.) The rest just shows some performance results (but no comparison to other algorithms).
 
Status
Not open for further replies.