An Assembly Language Programming Aid

(C) 2003 Hank Wallace

I know a guy who writes all his programs in assembly language. “C! Ack!”, he says, “I can do anything in assembly that you can do in C, and it’s faster”, and he proceeds to flap through his latest ten-thousand liner. Most programmers are at the opposite extreme, gritting their teeth at the thought of a large assembly language program. They would rather have a faithful compiler hassle out all the details while they worry about more interesting problems. I don’t blame them.

However, for the bold and fainthearted alike, there is a tool presented here for writing assembly language programs. It deals simply with the nuts and bolts of how we write assembler code; no AI, CASE, 4GL, fuzzy logic, SDI, or “hot new country,” just common sense. You won’t find it mentioned in the current UML literature frenzy. It is much more fundamental and practical than that, concerning personal programming discipline. After all, if you give a sloppy, undisciplined programmer great tools, he will still produce sloppy programs, only faster. What we shall examine is a way to apply the benefits of high level languages to assembly language programs, since most assembly programmers start with high level language programming anyway.

When was the last time you worked on someone else’s assembly program? Were you able to work comfortably with the code and attending documentation, or did it leave you on the verge of a slanderous rampage? The complaints I have voiced most often (and loudly in an empty lab) are these:

This program does not have enough high-level documentation — it’s all details.
This program has wasteful documentation of its simplest parts.
This program is packed with spaghetti code and interlocking loops.
There is not enough information here to even assemble the original source (makefiles, etc.), much less make changes.
The comments are written in a gramatically incorrect format, eliminating important words, such as verbs and articles, for brevity.
This programmer wasted his time on long-winded comments that are as extensive as the source code. Why shouldn’t I just skip the comments and decipher the actual code?
This flowchart or state chart is out of date with reference to the current source.
(Add your favorite gripe here.)

What is the cure for this disease? More expensive tools? A faster PC? Code generators? No, just simple programming discipline, possibly augmented with the Golden Rule: Program unto others as you would have them program unto you.

Let’s look at two characteristics of a high level language which are important in the construction of assembly language programs. The C language is a suitable candidate for our discussion since many use it. First, programs written in C have a fundamental structure. The basic element of the program is the statement. Statements are gathered into groups called blocks, delimited by curly braces. The execution of each block of statements is controlled by IF-THEN-ELSE, DO-WHILE, WHILE, FOR, and SWITCH constructs depending on the function of the program. Most procedural languages have all of these constructs. The primary point is that they allow the control and collection of statements into groups which themselves may be so grouped, permitting the orderly expression hierarchical algorithms.

The second important characteristic of C and most high level languages is that the programmer usually does not have to know anything about the host computer’s assembly language to implement an algorithm. This means that the programs are somewhat machine independent. You guessed it — we are going to make assembly language programs more portable.

Keep It Simple

We have been discussing ways to more easily write assembly language programs. Our running commentary for this installment will be provided by Bob and Jane, two engineers leaning against the counter in front of the cappucino machine in the break room. They were previously leaning against the foosball table, but it disappeared in the dot-com meltdown.

The technique involves writing an assembly program using two easy activities. It is first written in a structured, machine- independent high level language, and then hand compiled into assembly language for the target processor. The high level language text becomes the commentary for the assembly program. Your brain takes the place of the computer-based compiler.

[Bob – “Holy mother of mnemonics! Is he telling me to be a human compiler?”
Jane – “Perhaps it would improve your personality, cappucino breath.”]

This discipline (oh! that word again) has a number of advantages. The main one is that the algorithm can be conceptually separated from it’s implementation on a particular microprocessor. In other words, since the algorithm is completed in a high level language version before assembly coding starts, the programmer designing the algorithm need not be too concerned about the details of the microprocessor, details which can be very distracting. He may concentrate on the details of the algorithm. This is the familiar divide and conquer approach with the division taking place vertically, if you will, between the operand and comment fields on the assembly listing.

[Bob – “But the program IS the algorithm.”
Jane – “I had to modify one of your program/algorithms last week. It took me 10 hours to do a simple change. I have not seen so much codependence since my last marriage. You were at a customer’s site hashing out some other bugs. Listen up, Bob.”]

Assembly		High Level Language
L345: cmp jnz add mul mov jmp L056: shl mov	; ; ; ; ; ; ; ; ;	if (x == 2) y=(x+2)*3; else y<<=4;
Algorithm coded by programmer Y. Algorithm details not distracting. Test using same routines as for HLL implementation.		Algorithm designed by programmer X. Algorithm designed and tested before assembly coding starts. Test using HLL compiler.

Assembly

High Level Language


L345: cmp
      jnz
      add
      mul
      mov
      jmp
L056:    
      shl
      mov


;
;
;
;
;
;
;
;
;


        if (x == 2)
                   
          y=(x+2)*3;
                    
                    
                    
        else        
          y<<=4;

Algorithm coded by programmer Y.

Algorithm details not distracting.

Test using same routines as for HLL implementation.

Algorithm designed by programmer X.

Algorithm designed and tested before assembly coding starts.

Test using HLL compiler.

Figure 1. Divide-and-conquer assembly language programming.

Now sometimes the details of the implementation depend heavily on the processor hardware, as with DSP algorithms. However, much of a DSP program is still shoveling of data and housekeeping, and that code can benefit from this technique.

Another advantage is the improvement of program documentation due to the built in structure. Sometimes I have wished for a good bit of pseudocode to give me a bird’s eye view of how someone else’s assembly program works, rather than sighting down a hundred pages of the twin monoliths of code-tab-comments. This method provides that extra information naturally and displaces the flat comment-per-line documentation technique. In fact, the result is usually fewer but more meaningful program comments. Gone are pearls like, “add 5 to accumulator” on an add-immediate instruction, or, “is accumulator zero?” on a conditional branch instruction. Such prattle is hereafter reserved for high school projects.

[Bob – “My programs don’t need improved documentation! I know exactly what they are supposed to do. Most of the time. Anyway, every instruction is explained in the comment field.”
Jane – “If your programs don’t need improved documentation, then I don’t need an antidepressant. You spend 75% of your time restating what each instruction clearly does. Sheesh!”]

Many programmers use flowcharts faithfully when writing programs, maintaining them as the development of a program progresses. Though this is a good method, the technique presented here is better for complex programs, I believe, because the documentation resides with the assembly code and is in structured form. Large flowcharts usually have mazes of interconnections which somewhat preclude block structure and encourage spaghetti code programming. However, augmentation of this technique with a flowchart improves documentation as far as a general functional overview is concerned. If you are using a more advanced documentation or development system with hand coded assembly programs, this technique is still beneficial.

[Bob – “I prefer my coffee stained napkin documentation system. The lack of writing area leads to simpler designs.”
Jane – “Don’t forget your penchant for post-it notes. I found one on your source code CD the other day with some design details. Not to worry — they were out of date.”]

The next installment will detail a simple series of steps to implement this method.

[Bob – “If it’s so simple, why haven’t I heard of it before?”
Jane – “Because it involves discipline.”]

Show Me!

Let’s look now at details of how to write an assembly language program without all the spaghetti code and pain relievers necessary in the past. The technique is presented as a series of steps.

Step 1.

Fully define the algorithm or task that you wish the computer to perform. This sounds obvious, but many programmers start typing mnemonics before they have finished defining the problem and it’s proposed solution, leading to spaghetti code and Bugs From Hell for other programmers to battle. Flowcharts, state transition diagrams, data flow diagrams, data structure maps, etc., are the net results of this step.

Step 2.

Choose a programming language in which you will express your algorithm. I suggest that it have all the constructs cited above for C. It is best to choose the structured language with which you are most familiar, although pseudocode (structured English) can be used, if you are consistent.

Step 3.

Express your algorithm in the high level language or pseudocode. To make the task easier and the program more readable, don’t use any structures that you wouldn’t want to code in assembly. For example, I avoid multidimensional arrays and such beasts as structures of arrays of structures, etc. It is also best to express a complex mathematical operation as a series of simpler ones for clarity, with an eye toward using registers as temporary variables. Indentation of program blocks is a key aid to program readability. And comment your high level language code; it sounds ridiculous but hard disk storage space on your PC is cheap — your time is not. Note that if you are one of those high level language programmers who uses more goto’s than if statements, you are not going to get all the benefits of this method. Learn how to write a goto- less program first. You must avoid interlocking loops, jumping into and out of loops, jumping into subroutines to take advantage of a needed sequence of instructions, setting flags so you can jump into a section of code and know where to return, and other such nonsense.

Step 4.

Once you have checked over your high level language implementation and tested key algorithms by compiling and running them, you are ready to start assembly coding. Extract identifier names and their memory requirements from your algorithm and allocate space for variables in assembler. It is important that all major variables and data structures be defined and organized before you go to the next step.

Step 5.

Next, hand compile your high level language program into the assembly language for your target machine. It sounds complicated but is actually very easy. You don’t have to know how the algorithm works to compile the program. The algorithm is contained in the high level language version and need not be disturbed during the coding process. In fact, I have assembly coded programs which another programmer has pseudocoded with good results.

Oh! I almost forgot Bob and Jane…

[Bob – “That’s stupid! Write the program twice? Define the problem before coding? Hand compile? What drug is this guy taking, anyway?”
Jane – “I don’t know, but it’s certainly not Java!”]

Now that we have some concrete steps to follow, let’s look a how we can generate an assembly program by hand using those steps.

Start the compilation process by converting your pseudocode to assembly comments, typically by adding a few tabs and a comment character to the start of each line. Then place labels on all branch destinations (see below for details). The form Lnnn is convenient, where nnn is a three digit number. Adjust the label style to suit your needs and tastes. Descriptive labels are used only for subroutine and variable names. They are not needed in general because you can follow the indentation of the pseudocode to find branch destinations very quickly. This also eliminates the confusion of cryptic labels, meaningless to any but the original programmer, and then only for about ten minutes.

Next, add the assembly code. Each of the basic constructs mentioned above has a simple rule for compilation. Since you are using only a few basic constructs, much of your code can be copied from place to place as you work, reduced to assembler macros, or coded by editor macros. The following examples use C as the high level language. The mnemonics are those of the x86 family.

IF-THEN-ELSE

                        

                cmp     x,2     ;if (x == 2)
                jnz     L001
                                ;  {
                                ;    /* statement block */
                                ;  }
                jmp     L002
        L001:                   ;else
                                ;  {
                                ;    /* statement block */
                                ;  }
        L002:

IF-THEN


                cmp     x,2     ;if (x == 2)
                jnz     L001
                                ;  {
                                ;    /* statement block */
                                ;  }
        L001:

Listing 1. The IF-THEN-ELSE and IF-THEN constructs.

WHILE

See Listing 2. The WHILE looping construct tests the continuation condition at the start of the loop. The loop can therefore execute zero or more times. A branch instruction at the end of the enclosed block of code always transfers control back to the top of the loop for testing of the continuation condition. The compound conditional tests of x and y in Listing 2 are coded separately for readability.


        L001:   cmp     x,2     ;while ((x == 2) &&
                jnz     L002
                cmp     y,3     ;      (y == 3))
                jnz     L002
                                ;  {
                                ;    /* statement block */
                jmp     L001    ;  }
        L002:

Listing 2. The WHILE construct.

DO-WHILE

See Listing 3. The DO-WHILE looping construct tests the continuation condition at the end of the loop. The loop executes at least one time. A branch instruction at the end of the enclosed block of code transfers control back to the top of the loop if the continuation condition is true.

Many micros implement special instructions that make the DO-WHILE more efficient, like the PIC’s decfsz, and the 8051’s djnz.


        L001:                   ;do
                                ;  {
                                ;    /* statement block */
                                ;  }
                cmp     x,2     ;while ((x == 2) &&
                jnz     L002
                cmp     y,3     ;      (y == 3));
                jz      L001
        L002:

Listing 3. The DO-WHILE construct.

FOR

See Listing 4. The FOR construct is a special case of the WHILE construct where the loop continuation condition is derived from the value of a variable which is modified upon each pass through the loop. You can implement the more complex FOR conditions that C allows by changing the continuation test and INC X instruction accordingly. Also, some processors have special decrement-jump instructions which can trim down the code size some, as noted above.


                mov     x,1     ;for (x=1; x!=10; x++)
        L001:   cmp     x,10
                jz      L002
                                ;  {
                                ;    /* statement block */
                inc     x       ;  }
                jmp     L001
        L002:

Listing 4. The FOR construct.

SWITCH

See Listing 5. The SWITCH construct allows the selective execution of one of a number of statement blocks depending on the run-time value of a single variable, called the control variable. The switch statement consists of a sequence of comparisons of the control variable against constant values. Associated with each constant is a block of code to be executed when the control variable equals the constant. The comparisons occur in written sequence until a match is found, at which time the associated block of code is executed. (Some C compilers search the cases bottom-up or use jump tables, but I like top-down searching for ease of reading.) If no match is found, a DEFAULT block of code is executed. Placing the most-likely executed cases at the top of the construct will speed the average execution time. And by all means, use jump tables if you need the speed.


                mov     ax,x    ;switch (x)
                                ;  {
                cmp     ax,1    ;    case 1:
                jnz     L001
                                ;      /* statement block */
                jmp     L005    ;      break;
        L001:   cmp     ax,3    ;    case 3:
                jnz     L002
                                ;      /* statement block */
                jmp     L005    ;      break;
        L002:   cmp     ax,4    ;    case 4:
                jnz     L003
                                ;      /* statement block */
                jmp     L005    ;      break;
        L003:   cmp     ax,8    ;    case 8:
                jnz     L004
                                ;      /* statement block */
                jmp     L005    ;      break;
        L004:                   ;    default:
                                ;      /* statement block */
                                ;      break;
        L005:                   ;  }

Listing 5. The SWITCH construct.

An Example

Let’s look at an example. Say, we need a routine to sort a buffer of 256 bytes in ascending order and sort time is not important. We are given the address of the buffer and permission to use any CPU registers. First, the buffer declaration looks like this:


buffer    ds   256

It is assumed to exist prior to the entry of our routine and contain 256 bytes of unsorted data.

Since the data design has been done, we proceed to the rest of the problem, the algorithm. A simple double loop byte swapping scheme will do nicely. Step 1, algorithm design, is complete.

I have chosen C for the high level language and the coded algorithm follows. Step 2, choosing a programming language, is complete.

See Listing 6. It took just a few minutes to scratch that out and type it in. I compiled this program with a C compiler and tested it with sample data to decrease debug time. Step 3, coding the algorithm, is complete.


void sort(void)
/* This function sorts a list of 256 
   bytes in ascending order. The 
   buffer[] is assumed to be filled 
   with data. A double loop swap sort 
   is used. Register values are 
   undefined at entry and exit. */ 
{
  int
    i, /* buffer index */
    j; /* buffer index */
  char
    temp; /* used in byte swapping */

  /* double loop compares each pair 
     of bytes */
  for (i=0; i<255; i++)
    {
      for (j=i+1; j<256; j++) { /* compare pairs of bytes */ if (buffer[i]>buffer[j]) 
            {
              /* swap the bytes */
              temp=buffer[i]; 
              buffer[i]=buffer[j];
              buffer[j]=temp;
            }
        }
    }
} /* end of sort() */

Listing 6. High level coding of sorting algorithm.
Examining the code regarding variable space needs, I see three variables (i, j, temp), which I can easily fit into CPU registers. This is a convenient situation because no memory space is needed for these local variables. Step 4, data space allocation, is complete.

The hand compiled version of the routine is shown in Listing 7.


;void sort(void)
;/* This function sorts a list of 256 
;   bytes in ascending order. The 
;   buffer[] is assumed to be filled 
;   with data. A double loop swap sort 
;   is used. Register values are 
;   undefined at entry and exit. */ 
;{
;  int
;    i, /* buffer index, register SI */
;    j; /* buffer index, register DI */
;  char
;    temp; /* used in byte swapping, register AH */

        .model small
        .data
        extrn   buffer
        .code
sort:   mov     ax,seg buffer
        mov     ds,ax
        mov     bx,offset buffer
                                        ; /* double loop compares each 
                                        ;    pair of bytes */
        mov     si,0                    ;  for (i=0; i<255; i++)
L001:   cmp     si,255
        jnb     L006
                                        ;    {
        mov     di,si                   ;      for (j=i+1; j<256; j++) 
        inc     di 
L005:   cmp     di,256 
        jnb     L002 
                                        ;        { 
                                        ;          /* compare pairs of bytes */ 
        mov     al,[si+bx]              ;          if (buffer[i]>buffer[j]) 
        cmp     al,[di+bx]
        jbe     L004
                                        ;            {
                                        ;              /* swap the bytes */
        mov     ah,al                   ;              temp=buffer[i]; 
        mov     al,[di+bx]              ;              buffer[i]=buffer[j];
        mov     [si+bx],al
        mov     [di+bx],ah              ;              buffer[j]=temp;
                                        ;            }
L004:
        inc     di                      ;        }
        jmp     L005
L002:
        inc     si                      ;    }
        jmp     L001
L006:
        ret                             ;} /* end of sort() */
        end

Listing 7. Assembly coded sorting algorithm.
One can see from this example that the indentation leads the eye quickly to branch sources and destinations, showing the basic structure of the program. Comments have been added indicating which registers are used for variable storage. This example program is ready to assemble and debug, completing step five.

Writing an assembly language program with the method we have been discussing is not hard, but there are some technical details you must consider.

Procedures and functions are handled differently on each type of microprocessor, depending on the stack capability. Processors with a stack indexed addressing mode make stack based parameter passing easy and efficient. At the opposite end of the scale, control oriented micros with 4 or 8 level hardware stacks are not amenable to stack based parameter passing, so CPU registers or RAM must be used. Due to these variabilities, uniformity in parameter passing is not possible as it is with the other control structures, so I will not consider it further here except to stress the importance of consistency within a program. Comments at the start of each procedure or function should indicate values of input and output variables, how and where they are passed, and the operations performed on them.

One disadvantage of compiler-generated code is that it is usually not as efficient as that generated by an experienced programmer. The technique described here likewise introduces some code inefficiency. In deeply nested structures one may find branch instructions that jump to branch instructions. But this adds little to program size and preserves the basic block structure that is one of the goals of this technique. Most of the time you will have plenty of space in ROM or RAM for this minor overhead. However, memory restrictions may well dictate the elimination of every nonessential instruction in some situations. I have certainly had my back against the ROM wall and compromised in this area.

I have found program changes to be much simpler using this programming discipline. If a particular block must be rewritten, it is deleted, replaced with new pseudocode and assembly coded. The block structure guarantees that there are no execution paths other than through the top and bottom of each block, so any block can be removed and recoded without worry of affecting another part of the program that might have used a piece of the deleted block. In short, the block structure eliminates ‘spaghetti code’ perils and interlocking loops. The D-word (discipline) is very important here, for an ounce of spaghetti is worth a pound of curses.

Program porting is also aided with this programming technique. It is easy to use a word processor to strip the assembly code from a working program and recompile it for a different processor, either by hand, or with an actual compiler. Most of the data structures will remain unchanged during the translation, too, speeding the task.

Another byproduct is that nested IF statements produce multiple labels on one instruction. You might be tempted to remove all but one, but don’t because the multiple labels make it easy to insert new code (another ELSE clause, for example) at a later date without affecting other branch destinations. Remember, labels are cheaper than your time.

You may also use assembler macros to ease the coding process, though the examples in this article do not.

Since there is no external authority imposing discipline on the programmer using this system (as a compiler would), each programmer will use it a little differently. This is okay. The main goal is to foster a consistency of style and structure, first to increase productivity, and second to allow following programmers to understand the subject programs.

In conclusion, the advantages of this method are the separation of the algorithm from the assembly language, improvement of documentation through the use of structured comments, and improved mental health of your successors resulting in fewer death threats. The first item saves time now, the second later, and the third may save your life. I have used this discipline to write numerous assembly language programs for many, many years on 4-, 8-, and 16-bit machines with considerable success and positive comments from other programmers. I adopted the technique after writing programs using other methods and have noticed a dramatic increase in productivity, ease of program development and debug, and consistency of documentation. May this discipline allow you similar profit.

Author Biography

Hank Wallace is the owner of Atlantic Quality Design, Inc., a consulting firm located in Fincastle, Virginia. He has experience in many areas of embedded software and hardware development, and system design. See www.aqdi.com for more information.

Atlantic Quality Design, Inc. R&D Services

For Companies on a Budget and Schedule