Need help optimizing some code ( lots of help )

Lord Banshee · Jan 31, 2008

ok i have some code running on a DSP C6713, but i am using C/C++ to code so i am sure any good C programmer can give me some hints.

Well i have this section of code that loops a lot and inside this loop it access a large set of memory adds and avg's of the values in two different memory locations:

so i have something like this:

loop a 0 to (big#)
_ loop b 0 to (big#)
_ _ c = (b + a)
_ _ sum = abs(mem1(b) + mem(c))
_ end loop
_ avg(a) = sum / big#
end loop

Is there any faster methods of doing th above pseudo code?

My problem is i have the code running in Matlab and it takes matlab on my laptop 10 seconds to find an answer but it takes my DSP board 10 minutes. The only think i can think of is that those vector abilities have shown me how amazing they are and i wish i would run matlab on the dsp....

Also I was thinking it was so slow as all these memory calls are off-chip, external SDRAM, as the memory arrays are too large to fit in the cache of the DSP.

any help would be would be nice, i am not a very good programmer so if there are things i am missing or you think i am missing it would nice.

Markbnj · Jan 31, 2008

I haven't done any DSP programming in years, but I'm sure that cache issue is huge. Perhaps rather than running a double loop you should implement a scheme to fetch the number of rows that will fit in the cache, and process the data in those rows, while possibly pre-fetching the next set of rows.

Crusty · Jan 31, 2008

sudo code? Someone's been using linux too much lately! 🙂

EagleKeeper · Jan 31, 2008

Reverse your loop

It is faster to test against Zero than against a number.

int iLast = big#
while (iLast--)
{
... Logic
}

Also attempt to keep your two counters in a register rather than in memory.

register int iLast;

Lord Banshee · Jan 31, 2008

Markbnj,
That might work if i can give my self a dedicated memory in the cache to work with, I do not want to be worrying if i overwrote anything in the cache, also it would look like i need to learn how to use DMA also 🙁

Crusty,
lol, fixed "pseudo"

Common Courtesy,
while loop: Interesting, i will see how many clock cycles that actually saves me.
register: Wow did not know that existed, i guess thats what i get for not taking a DSP class.

Thanks for all the great ideas, i actually found something i missed when i started my post, i actually had two loops inside the main loop, i was able to use only one loop now just like what i wrote in the pseudo code.

If anyone has any suggestions on DSP forums, books, or what ever that would be great also. I am using the TMS320C6713.

Thanks again

Cerebus451 · Jan 31, 2008

Not sure if you are doing this already, but since your memory accesses are sequential inside the loop, you don't need to calculate the memory offset each time, just bump it up inside the loop. For example:
ptr avgptr
ptr cptr
ptr bptr
avgptr = &avg(0)
loop a 0 to (big#)
_cptr=&mem(a)
_bptr=&mem1(0)
_ loop b 0 to (big#)
_ _ sum = abs(*bptr + *cptr)
_ _ bptr++
_ _ cptr++
_ end loop
_ *avgptr = sum / big#
_ avgptr++
end loop

You could further enhance that by storing the initial cptr and bptr locations in other variables and using those (you would increment the cptr start location in the outer loop).

Lord Banshee · Jan 31, 2008

Cerebus451,
In this specific loop i am not doing it like that (i know i should be), just now started the optimizing stage, the first round was seeing if it will work (code idea). Thanks for the reminder as i do have similar pointer techniques in my other functions.

*Edit*
Cerebus451,
I actually just made this change on a sample code set i have running in VisualC++.Net 2008 and the result for using pointer increments compared to individual variable increments was no loss of instructions for my sample code it was both 14 instructions. (viewing the disassembler while debugging. You would think that using the pointer method would be faster, but i guess in the end the pointer is just another variable just holding a memory location so you are incrementing an integer no matter what you do. I would assume the DSP compiler will result in very similar finding (will find out tomorrow as the DSP board is in your lab at school).

Common Courtesy,
From VisualC++.Net 2008 sample code that one change is saving about 4 instruction calls, thats a start 🙂 , i would like to decrease my original cycle count down by 10 fold. I should be around 2.5 times smaller now with some other things i've changed, still much more i hope to achieve.

Thanks and if any other ideas what would be amazing

smack Down · Feb 4, 2008

I'm guessing on the DSP the abs function is really slow. It looks like you need to improve the algorithm if you want a 10 fold increase in speed playing around with instructions isn't going to get you that in most cases. If you can find out what is slow in the systems memory, i/o, compute. Then find ways to use what ever is slowest in the fastest way.

EagleKeeper · Feb 4, 2008

Activate profiling to determine where the bottlenecks are.

911paramedic · Feb 4, 2008

Originally posted by: Crusty
sudo code? Someone's been using linux too much lately! 🙂

Too fnny!

man sudo

Definition
Actually means pseudo, but like all our other commands we left out letters to make it more confusing for normal people!!0001!0001

Need help optimizing some code ( lots of help )

Lord Banshee

Golden Member

Markbnj

Elite Member <br>Moderator Emeritus

Crusty

Lifer

EagleKeeper

Discussion Club Moderator<br>Elite Member

Lord Banshee

Golden Member

Cerebus451

Golden Member

Lord Banshee

Golden Member

smack Down

Diamond Member

EagleKeeper

Discussion Club Moderator<br>Elite Member

911paramedic

Diamond Member

TRENDING THREADS