/*--------------------------------------------------------------------------- histo.asm: Histogram Subroutine ----------------------------------------------------------------------------- Description: An image histogram is an array summarizing the number of occurrences of each grey level in an image. If a pixel in a typical monochrome image can take on any of 256 distinct values, the histogram would need 256 bins. In each bin the number of ocurrences of this grey level is stored. The algorithm used here assumes the monochrome image is stored in a buffer in Data Memory, and the histogram will also be formed in Data Memory. The tasks to be performed on each pixel are: 1. Get next pixel from DM 2. Add its value to the base address of the histogram array 3. Use this value as a pointer to read the current number stored in the corresponding bin in the histogram array. 4. Increment that bin value 5. Write the bin value back. ----------------------------------------------------------------------------- Notes: The operations described above take five instructions and six cycles on the 21020: 1 r0=dm(i0,m0); 2 m8=r0; 3 [ wait state *] 4 r1=pm(m8,i8); 5 r1=r1+1; 6 pm(i8,m8)=r1; * = due to m8 load following by immediate use To reduce the cycle time, we try to interleave another pixel operation with these instructions: 1 r0=dm(i0,m0); 2 m8=r0, r2=dm(i0,m0); 3 [ wait state ] m9=r2; 4 r1=pm(m8,i8); [ wait state ] 5 r1=r1+1, r3=pm(m9,i8); 6 pm(m8,i8)=r1, r3=r3+1; 7 [r0=dm(i0,m0),] pm(m9,i8)=r3; <-- including next loop start But, unfortunately, instruction 2 and 7 are not legal 21020 instructions. Instruction 7 is the biggest problem - post- and pre-modify mixed read-writes cannot by combined in a multifunction instruction. Another tack is to use the ALU to calculate the indirect adress, avoiding pre-modify indirect addressing: 1 r0=dm(i0,m0); 2 r0=r0+r15; [assuming r15=b8=start of histogram] 3 i8=r0; 4 [ wait state ] 5 r1=pm(i8,m8); [m8 must = 0] 6 r1=r1+1; 7 pm(i8,m8)=r1; ALthough this is one cycle longer than the previous approach, we can this roll correctly: r0=dm(i0,m0); 1 r0=r0+r15, r2=dm(i0,m0); [ loop start ] 2 i8=r0, r2=r2+r15 3 [ wait state ] i9=r2; 4 r1=pm(i8,m8); [ wait state ] 5 r1=r1+1, r3=pm(i9,m8); 6 pm(i8,m8)=r1, r3=r3+1; 7 r0=dm(i0,m0), pm(i9,m8)=r3; [ loop end ] But there's still a problem - if neighboring pixels have the same value, this loop will not calculate the correct value. One sum is lost because the second pixel reads the same bin value as the first pixel, before the first has a chance to increment it. We could try delaying the read of the second bin value until the first one has been written: r0=dm(i0,m0); r0=r0+r15; i8=r0; 1 [ wait state ] r2=dm(i0,m0); 2 r1=pm(i8,m8); r2=r2+r15; 3 r1=r1+1; i9=r2; 4 pm(i8,m8)=r1; [ wait state ] 5 r0=dm(i0,m0); r3=pm(i9,m8); 6 r0=r0+r15; r3=r3+1; 7 i8=r0; pm(i9,m8)=r3; But instruction 6 has a conflict - we cannot perform two additions in one instruction. We now must either add an instruction to this loop, or maintain two separate histograms which will be combined at the end. For normal operations, where there is much more data than bins, this is the preferable method. FOr example, for a 512x512 pixel image with 256 grey levels (bins), 917,000 cycles are spent histogramming the data, and 512 cycles, or 0.05%, are spent on combining the bins. The final, rolled loop, using two seperate histograms is contained in the subroutine below. ----------------------------------------------------------------------------- Program Characteristics: Calling Registers: b0,i0 = data buffer start l0 = data buffer length m0 = 1 b1,i1 = final histogram bin buffer start l1 = final histogram bin buffer length b8,i8 = temporary bin buffer #1 start l8 = temporary bin buffer #1 length m8 = 1 b9,i9 = temporary bin buffer #2 start l9 = temporary bin buffer #2 length m15 = 0 Note: l1 = l8 = l9 = N = # of bins. N must be even, positive, and >= 4 Registers Used: r0 input data 1 r1 bin value 1 r2 input data 2 r3 bin value 2, tmp register r14 temp bin buffer #1 start r15 temp bin buffer #2 start Return Values: histogram output bins pointed to by i1 Computation Time: 3.5N + 2B + 12 cycles where N = number of data values B = number of bins ----------------------------------------------------------------------------- Author: Jim Donahue, Analog Devices DSP Division Revised: 27-MAR-91 ----------------------------------------------------------------------------*/ .SEGMENT /pm pm_code; .GLOBAL histo; histo: r3 = l1; { r3 = N = # of bins } r3 = lshift r3 by -1, r14=i8; { r3 = N/2, initialize r14 } r3 = r3 - 1, r15=i9; { r3 = N/2-1, initialize r15 } {------------------------------------------------- | Do the histogram into 2 temporary bins in PM +------------------------------------------------} r0=dm(i0,m0); lcntr = r3, do hloop until lce; r0=r0+r14, r2=dm(i0,m0); r2=r2+r15, i8=r0; i9=r2; r1=pm(i8,m15); r1=r1+1, r3=pm(i9,m15); r3=r3+1, pm(i8,m15)=r1; hloop: r0=dm(i0,m0), pm(i9,m15)=r3; r0=r0+r14, r2=dm(i0,m0); r2=r2+r15, i8=r0; i9=r2; r1=pm(i8,m15); r1=r1+1, r3=pm(i9,m15); r3=r3+1, pm(i8,m15)=r1; pm(i9,m15)=r3; {------------------------------------------------- | Now combine the bins back into DM +------------------------------------------------} i8=b8; i9=b9; r2=l1; r2=r2-1, r0=pm(i8,m8); r1=pm(i9,m8); lcntr=r2, do combine until lce; r2=r0+r1, r0=pm(i8,m8); combine: dm(i1,m0)=r2, r1=pm(i9,m8); rts (db); r2=r0+r1; dm(i1,m0)=r2; .ENDSEG;