Assembly code question

polarmystery · Mar 27, 2012

Can someone explain to me what the purpose of the first 6 and the last 5 instructions do? I don't quite understand. (I don't understand what the bge function and blt functions are for.) Thanks in advance.

eLiu · Mar 27, 2012

I'm the opposite of an expert on assembly, but aren't bge & blt the equivalent of
if(condition)
goto label;

What did I miss? :/

Cogman · Mar 27, 2012

The first 4 instructions move the intermediate value of A, B, C, and N into r2->r5. The fifth line zeros out r10 and the sixth line (which is somewhat useless) makes sure that r5 is greater than r10 (which currently contains zero.

Line 13 adds 1 to r10, the is the condition (r10 is a counting variable). Line 14 -> 16 add 4 to the registers. This is done because these registers contain memory addresses and memory addresses point to byte locations. Ints are generally 4 bytes long (so to move forward, you have to go forward 4 places in memory).

gorydetails · Mar 27, 2012

Addi rx,ry
: Add indirect

Rx <= Rx + mem[Ry]

Accumulate value in memory pointe by Ry.

The Cpu manual.

Addi rx,ry,imm

Rx <= Rx + mem[Ry + imm]

EagleKeeper · Mar 27, 2012

bge - branch if greater than equal to
blt - branch if less than

exdeath · Mar 29, 2012

polarmystery said:
Can someone explain to me what the purpose of the first 6 and the last 5 instructions do? I don't quite understand. (I don't understand what the bge function and blt functions are for.) Thanks in advance.

The first 6 initialize array indices, a loop target value, and a current loop iteration counter.

The last 5 increment the array indices and current loop counter, compare the loop counter to the target loop count, and then repeat either repeat another iteration or stop.

The middle looks look some kind of compare and store, almost looks like a bubble sort.

The first bge checks to make sure we are good for at least one loop iteration before even starting the loop. The second bge is a sort condition, if it's met it skips over the first sw to the second one, but if it's not met then the first sw is used, and the b skips over the unused one. The branching compares which value is greater or less and then changes control of the program to branch or fall through to store the desired balue back into the array. The final blt is the loop control, eg: a for loop.

This is actually kind of dirty and could be cleaned up or even changed to conditional moves and reduced to a single fixed sw that is always executed. It would eliminate the branches and be much cleaner to read.

It looks something like this (using array notation instead of pointers for clarity). In reality they are pointers because they offsetting from a base address of zero, so A for example is the address of the first element of an array.

Code:

int * array;  // initialized elsewhere

void myfunction (int A, int B, int C, int N)
{
  int i;

  if ( N < 0 )
     return;
  
  for ( i=0; i < N; ++i )
  { 
     if ( array[A] >= array[B] )
       array[C] = array[A];
     else
       array[C] = array[B];
     
     A++;
     B++;
     C++;
  }   
}

exdeath · Mar 29, 2012

Clarified after actually thinking about it, ignore some/most of my previous post (I'll leave it there to show analysis progress).

It's an array MAX function. It compares two arrays A and B, compares each element, and stores the bigger one in array C in the same position.

The way A, B, and C are incremented and kept separate give it away, it can't be the same array.

I picked random numbers to fill A, and B, it's assumed they are initialized outside the scope of your example snippet.

So the real high level code looks like this (again using array notation instead of pointer notation):

Code:

int A[10] = { 1, 4, 6, 3, 1, 6, 4, 3, 9, 5 }; 
int B[10] = { 7, 3, 9, 0, 2, 7, 8, 1, 3, 8 };  
int C[10];
int N = 10;
int i;

if ( N > 0 )
{
  for ( i=0; i < N; ++i )
  { 
     if ( A[i] >= B[i] )
       C[i] = A[i];
     else
       C[i] = B[i];
  }
}

After execution, C[10] = { 7, 4, 9, 3, 2, 7, 8, 3, 9, 8 }

In the assembly what you are seeing is in the last 5 instructions is copyofA += i*sizeof(int) to get the increment by 4 for A, B, and C.

A pointer version which is closer to the assembly code, and using a function call because the parameter passing facilitates automatic creation of temp copies of A, B, and C which we can increment (what you see in the first 3 lines of your assembly):

Code:

void main()
{
  int a[10] = { 1, 4, 6, 3, 1, 6, 4, 3, 9, 5 }; 
  int b[10] = { 7, 3, 9, 0, 2, 7, 8, 1, 3, 8 };  
  int c[10];
  
  maximize(a, b, c, 10);

}

inline void maximize( int * A, int * B, int * C, int N)
{
  int i;
 
  if ( N > 0 )
  {
    for ( i=0; i < N; ++i )
    { 
       if ( *A >= *B )
         *C = *A;
       else
         *C = *B;

       A++;
       B++;  
       C++;
    }
  }
}

Disclaimer: not tested or debugged

exdeath · Mar 29, 2012

Line 6 "bge" omitted because it's an automatically generated artifact of the assembly code incrementing the loop counter and checking the loop condition at the end of the loop rather than the start, so there needs to be something at the start to prevent from running the first iteration of the loop if it's not appropriate. Without line 6, if N is zero, it will have run one iteration of the loop before ending on line 17 and computed on possibly zero size arrays causing a possible fault or memory corruption.

Code:

void main()
{
  int a[10] = { 1, 4, 6, 3, 1, 6, 4, 3, 9, 5 }; 
  int b[10] = { 7, 3, 9, 0, 2, 7, 8, 1, 3, 8 };  
  int c[10];
  
  maximize(a, b, c, 10);

}

inline void maximize( int * A, int * B, int * C, int N)
{
  int i;
 
  for ( i=0; i < N; ++i )
  { 
     if ( *A >= *B )
       *C = *A;
     else
       *C = *B;
    
     A++;
     B++;  
     C++;
  }
}

exdeath · Mar 29, 2012

Just for fun, something like this would look nicer and run much better:

Code:

main:
	add   r10, r0, r0
	addi  r2,  r0, A
	addi  r3,  r0, B
	addi  r4,  r0, C
	addi  r5,  r0, N
loop:
	bge   r10, r5, end
	lw    r20, 0(r2)
	addi  r10, r10, 1
	lw    r21, 0(r3)
	addi  r2,  r2,  4
	cmp   r20, r21
	movge r22, r20
	cmp   r20, r21
	movlt r22, r21
	sw    r22, 0(r4) 
	addi  r3,  r3,  4
	b     loop
	addi  r4,  r4,  4        ; delay slot on unconditional branch ???
end:

Only one conditional branch, with proper delay slot usage of the branches and loads/stores, should be zero stalls.

Will read A[N] though 1 element passed the array when the loop ends but its a dummy read.

My cmp/movge/movlt probably aren't proper MIPS, but you get the idea.

gorydetails · Mar 29, 2012

i recommend to use pmaxsd (packed max of sign dword) or similar sse4, instead of such scalar code

exdeath · Mar 29, 2012

Yes SIMD is great, but you have to either prep the code to pad and align the scalars at the start and end, or put restrictions on the data set to be 128 aligned and a multiple of QWORDs (ideal), further beyond the scope of this exercise than I already went.

Search

Assembly code question

polarmystery

Diamond Member

eLiu

Diamond Member

Cogman

Lifer

gorydetails

Member

EagleKeeper

Discussion Club Moderator<br>Elite Member

exdeath

Lifer

exdeath

Lifer

exdeath

Lifer

exdeath

Lifer

gorydetails

Member

exdeath

Lifer

TRENDING THREADS