Developing and Using Procedures in Assembly Language

Một phần của tài liệu Yury magda visual c++ optimization with assembly code a list publishing (2004) (Trang 87 - 101)

In programs, especially large ones, it is often required to repeatedly implement the same task and, therefore, to write the same group of commands many times. In order to avoid this repetition, a programmer usually develops it once according to certain rules. In appropriate points of the program, he or she simply passes control to these commands, which return control after their execution. Such a group of commands, which implements a task and is designed so that it can be used any number of times at any places of code, is called a subroutine or a procedure. In contrast to a subroutine, the rest of the program is usually called the main program. This chapter discusses the principles of building procedures in assembler. These procedures are saved in files with the ASM extension and compiled to individual object module files with the OBJ extension. To use a procedure in an object module, it is necessary to add the OBJ file to the C++ project and to call the procedure in accordance with calling conventions adopted in C++ .NET.

An assembly subroutine corresponds to a C++ function, and a call to the subroutine corresponds to a function call.

Implementing subroutines in assembly language is a more complicated process than declaring functions in C++ . NET. From now on, we will use the terms “subroutine”, “procedure”, and “function” synonymously.

In C++, many aspects of work with subroutines are hidden from a programmer, and their implementation is the compiler’s job. In the assembler, a programmer has to do much work on his or her own. Although it is more difficult to write programs in assembler than in C++, the use of assembler gives full control over the program code and makes it possible to achieve a higher optimization of the application as a whole. What follows is a discussion regarding the development of procedures in the context of conventions adopted for Microsoft MASM 6.14 assembler, though the main principles are valid for any other assembler on the Intel platform.

How should you declare a procedure in the assembler? Its declaration looks like this:

<procedure name> proc <parameter>

<procedure body>

<procedure name> endp

The body of the procedure (its commands) is preceded by the proc (procedure) directive and followed by the endp (end of procedure) directive. For example, a piece of code that declares the AsmSub procedure could look as follows:

AsmSub proc . . . ret AsmSub endp

The procedure must be terminated with the ret command. One ASM file can contain several procedures. Here is an example of combining two procedures named AsmSub1 and AsmSub2 in one module:

. . . .code

AsmSub1 proc . . .

ret

AsmSub1 endp AsmSub2 proc . . .

ret

AsmSub2 endp end

The proc directive is considered the entry point to the procedure. Note that there is no colon after the name in the proc directive. Nevertheless, this name is considered a label and points to the first command of the procedure.

file:///D|/2/0021.html (1 von 14) [25.01.2008 00:11:10]

The name of the procedure can be specified in a jump command. Then control will be passed to the first command of the procedure.

The proc directive has a parameter. It is either near or far. If the parameter is missing, it is considered near (this is why the near parameter is usually omitted). When the near parameter is used or the parameter is missing, the procedure is called “near,” and with the far parameter, it is called “far.” A near procedure can be called only from that command segment, in which it was declared, while a far procedure can be called from any command segment (including the segment, in which it was declared). This is the difference between near and far procedures.

For 32-bit applications considered in this book, all procedure calls are near.

It should be mentioned that the names and labels declared in an assembly procedure are not local. This is why they must be unique relative to the other names used in the program. In assembler, it is possible to declare a procedure inside another procedure. However, this does not provide any advantages, so programmers rarely use nested procedures.

Now, we will discuss how procedures are called, and how they return. When programming in C++, it is enough for you to specify the name and actual parameters of a procedure in order to execute it. The work of the procedure and return to the main program are hidden from you by the compiler. However, if you write a procedure in assembler, you will have to implement all interaction between the main program and the procedure by yourself. Here are instructions on how to do this.

Two problems arise: How can you make the procedure work from the main program, and how can you return from the procedure to the main program? The first problem is easily solved. Simply execute a jump command to the first command of the procedure. In other words, specify the name of the procedure in a jump command. The other problem is more complicated. The procedure can be called from different places within the main program, therefore, it should return to different places. The procedure itself does not know where to return, but the main program does. Therefore, when calling a procedure, the main program must tell it a so-called return address. This is the address of the command in the main program, to which the procedure must return control after it completes.

Usually, it is the address of the command next to the call command. The main program tells this address to the procedure, and the procedure returns control to this address. Since different calls to a procedure tell it different return addresses, the procedure returns control to different places in the main program.

How can you tell a procedure the return address? This can be done in different ways. First, you can pass it via a register. The main program writes the return address to a register, and the procedure reads it and jumps

accordingly. Second, you can use the stack. Before the main program calls a procedure, it pushes the return address on the stack, and the procedure pops it and uses to jump. It is a common practice to pass the return address via the stack, so we will use only this method.

Passing the return address via the stack and returning to this address can be implemented with the commands that are already familiar to you. However, procedures are used in actual programs very often, so the processor’s

command set includes special commands that make it simpler to implement jumps between the main program and procedures. These are the call command and the ret command, which are familiar to you. The main variants of these commands are:

call <procedure name>

ret

The call command pushes the address of the next command on the stack and jumps to the first command of the specified procedure. The ret command pops the address from the top of the stack and jumps to this address.

Here is an example. Suppose you want to display an integer computed by the formula i1–i2–100, where i1 and i2 are integers. To compute this, write two functions in assembler and save them in an ASM file. The source code in this file is shown in Listing 3.1.

Listing 3.1: Functions for computing the formula i1–i2–100 . . .

asmsub proc mov EAX, i1 sub EAX, i2

file:///D|/2/0021.html (2 von 14) [25.01.2008 00:11:10]

call sub100 ret

asmsub endp sub100 proc sub EAX, 100 ret

sub100 endp . . .

The asmsub procedure begins by computing the difference i1–i2 and puts the intermediate result to the EAX register. The call sub100 command pushes the address of the next command on the stack and passes control to the beginning of the sub100 procedure, i.e., to the sub EAX, 100 command. This procedure returns the final value (equal to i1–i2–100) via the EAX register. After that, the ret command pops the address from the stack and jumps to this address. Thus the asmsub procedure resumes execution from the command that follows the call sub100 command.

There are a few versions of the call command. We demonstrated the main variant where the name of a procedure is specified as a parameter of the command. However, you can use a register as a parameter. In this case, the address of the called procedure is put in the register. Modify the previous example as shown in Listing 3.2.

Listing 3.2: A call to a procedure using a register . . .

asmsub proc mov EAX, i1 sub EAX, i2 push EBX

lea EBX, sub100 call EBX

pop EBX ret

asmsub endp sub100 proc sub EAX, 100 ret

sub100 endp . . .

The following four lines are most important for understanding the working principles of the procedures in this example:

push EBX

lea EBX, sub100 call EBX

pop EBX

The first command pushes the EBX register to the stack before modifying it. There are not very many registers in the PC, but almost every command uses one or another register. Therefore, it is likely that the main program and the procedure need the same registers, which would complicate using them. You could develop your application so that the main program and the procedure use different registers, but this would be rather difficult because of the limited number of processor registers. This is why the code of a procedure does not make any assumptions on what registers are used in the main program and simply saves the values of all registers.

The EBX register is one of the most frequently used in the main application, so it is important to save it and return its unchanged value to the caller. This is done with the commands push EBX and pop EBX.

file:///D|/2/0021.html (3 von 14) [25.01.2008 00:11:10]

After the EBX register is saved, the address of the sub100 procedure is loaded to it. Finally, the call EBX command calls the procedure, i.e., pushes the return address on the stack and passes control to the procedure.

Now, we will consider another modified variant of the code fragment. To call a procedure, you can use the jmp command (Listing 3.3).

Listing 3.3: Using the jmp command to call a procedure . . .

asmsub proc mov EAX, i1 sub EAX, i2 lea EDX, ex push EDX jmp sub100 ex:

ret

asmsub endp sub100 proc sub EAX, 100 ret

sub100 endp . . .

Using the jmp command is based on the understanding that the address of the command that follows jmp in the main program is pushed on the stack first. The sub100 procedure subtracts 100 from the value in the EAX register and passes control to the calling procedure asmsub with the ret command. The ret command “does not know” that the stack stores the address of the command that follows the call command. After ret is executed, the address of the ex label is put to the program counter. This address was earlier pushed on the stack with the commands:

lea EDX, ex push EDX

In these examples, the EAX register was used to pass parameters and return the result. This is one of the simplest variants. Generally, the problem of passing parameters and returning the result cannot be solved so simply.

Therefore, it is worth more detailed consideration.

There are different ways of passing actual parameters to a procedure. The simplest one, which was demonstrated in the previous examples, is to pass the parameters via registers: The main program writes the actual parameters to registers, and the procedure reads and uses them. You can deal with the result (if any) in the same manner: The procedure writes its result to a register, and the main program extracts it. Which registers can be used to pass parameters and return the result? You may choose them at will, although there are a few rules.

To pass parameters, the EAX, EBX, ECX, and EDX registers are used most frequently, and the EBP, ESI, and EDI are used less frequently. Usually, the EBP register is used with the stack pointer register (ESP) to access

parameters in the stack. We will address this topic later. It is convenient to use the ESI and EDI registers as index registers for operations over arrays. However, nothing can prevent you from using them at your discretion.

Here, we will consider an example. Suppose you need to find the maximum of two integers and compute its absolute value. Develop two procedures in the assembler (name them maxint and maxabs) that find the maximum and its absolute value.

The maxint procedure takes two integer parameters (i1 and i2) and can be declared as maxint (i1, i2).

The maxabs procedure takes an integer as a parameter. Let it be intval, and its declaration can be maxabs (intval).

file:///D|/2/0021.html (4 von 14) [25.01.2008 00:11:10]

We will assume that the first parameter i1 is passed via the EAX register, while i2 is passed via the EBX register.

The procedures will return the results in the EAX register. The result of the maxint procedure is an input parameter for the maxabs procedure. The source code of a fragment of the program that uses these procedures is shown in Listing 3.4.

Listing 3.4: Passing parameters via registers in assembly procedures

;the main program . . .

mov EAX, i1 mov EBX, i2 call maxint

; The maximum is in the EAX register mov intval, EAX

call maxabs

; The absolute value of the maximum . . .

; The procedures are declared here

; maxint(i1, i2) maxint proc cmp EAX,EBX jge ex mov EAX,EBX ex:

ret

maxint endp

; maxabs(intval) maxabs proc

mov EAX, intval cmp EAX, 0 jge quit neg EAX quit:

ret

maxabs endp . . .

When passing parameters via registers, you must keep in mind one important point. The registers that the procedure uses can be exploited in other parts of the program, so destroying their contents by the procedure can cause the program to crash. It is important to save the contents of the registers before entering a procedure if you are not completely sure that their contents are not used by other procedures or by the program. To save registers, the stack is normally used. Thus, if the main program in the previous example uses the EBX register, you should modify the source code (Listing 3.5). The additional commands are in bold.

Listing 3.5: Saving registers when working with procedures

;the main program . . .

mov EAX, i1 push EBX

file:///D|/2/0021.html (5 von 14) [25.01.2008 00:11:10]

call maxint pop EBX

; The maximum is in the EAX register mov intval, EAX

call maxabs

; The procedures are declared here . . .

; maxint(i1, i2) maxint proc . . .

maxint endp

; maxabs(intval) maxabs proc . . .

maxabs endp . . .

Another approach is often used. Both the main program and the procedure are allowed to use the same registers, but the procedure is obliged to save the values of the registers used by the main program. This is simple: The procedure must first push the values of the registers it needs on the stack, and then it can use them as it likes.

Before it returns, it must restore the original values of the registers by popping them from the stack.

It is strongly recommended that you do this in every procedure, even if it is obvious that the main program and the procedure use different registers. The source code of the program might change later (in fact, this is very likely), and the main program might need the registers after the changes are made.

Therefore, it is important to save registers by using two special commands: pusha and popa. They push the values of the general-purpose registers on the stack and pop them from it.

Note that you do not have to save the register used by a procedure to return the result. Changing this register is the goal of the procedure.

It is convenient to pass parameters via registers, and this method is used frequently. It is effective when there are few parameters. However, if you have many parameters, you might be short of registers for them. In this case, another method of passing parameters is used: via the stack. The main program pushes actual parameters (their values or addresses) on the stack, and the procedure pops them from the stack.

Suppose a procedure (named myproc) has n parameters and is declared as myproc (x1,x2,…,xn). We will assume that the main program pushes the parameters on the stack in a certain order before calling the procedure.

Options of arrangement of the parameters within the stack are limited; in fact, there are only two variants. The first one involves pushing the parameters on the stack from left to right: First, the first parameter is pushed; then the second, etc. In 32-bit applications, each parameter has the size of a double word, so the main program’s commands that implement a procedure call are as follows:

push x1 push x2 . . . push xn call myproc

According to the second variant, the parameters are pushed on the stack from right to left: The nth parameter is pushed first, then the (n–1)th parameter, etc. In this case, you should execute the following commands before you

file:///D|/2/0021.html (6 von 14) [25.01.2008 00:11:10]

call the function:

push xn . . . push x1 call myproc . . .

How does a procedure access its parameters? A commonly used method involves accessing the parameters by using the EBP register. You should put the address of the top of the stack (the contents of the ESP register) to the EBP register and then use an expression like [EBP+i] to access the parameters of the procedure. It is advisable to save the EBP register because it might be used in the main program. Therefore, first save the contents of this register and only then move the contents of the ESP register to it.

We will illustrate this with an example. Modify the previous example so that the stack is used for passing the parameters. Assume the parameters are passed to the maxint procedure from right to left, i.e., the i2 variable is pushed on the stack first, and then the i1 variable. The fragments of the source code of the main program and the procedures where the parameters are passed via the stack are shown in Listing 3.6.

Listing 3.6: Passing parameters to a procedure via the stack

; Main program . . .

push i2 push i1 call maxint

; The maximum is in the EAX register mov intval, EAX

push intval call maxabs

; The absolute value of the maximum . . .

; The procedures are declared here

; maxint(i1, i2) maxint proc push EBP mov EBP, ESP

; Loading the i1 parameter to the EAX register mov EAX, DWORD PTR [EBP+8]

; Saving the EBX register push EBX

; Loading the i2 parameter to the EBX register mov EBX, DWORD PTR [EBP+i2]

cmp EAX, EBX jge ex

mov EAX, EBX

file:///D|/2/0021.html (7 von 14) [25.01.2008 00:11:10]

pop EBP ret 8 maxint endp

; maxabs(intval) maxabs proc push EBP mov EBP, ESP

; Loading a parameter to the EAX register push EBP

mov EBP, ESP

mov EAX, DWORD PTR [EBP+8] ; intval cmp EAX, 0

jge quit neg EAX quit:

pop EBP ret 4 maxabs endp . . .

When a procedure completes, it must perform certain actions, which we will describe. Note that by the moment of exiting the procedure, the stack should have the same state as before the procedure call.

When the procedure completes, the top of the stack will contain the old value of the EBP register. Pop it and restore EBP with the pop EBP command. Now, the top of the stack contains the return address. You might think that you can exit the procedure with the ret command, but this is not the case. You should clear the stack from the

parameters that you no longer need. This can be done either in the calling program or in the procedure. Of course, the main program can do this by executing the add SP, n command (where n is the number of bytes to clear) after the call mysub command.

However, it is best to clear the stack in the procedure. There can be many calls to the procedure; therefore, you will have to write the add command in the main program many times. In the procedure, you will write this command only once. Here is a useful rule for program optimization: If an action can be done either in the main program or in the procedure, it is best to do it in the procedure. In this case, you will need fewer commands.

Thus, the procedure should first clear the stack from the parameters and only then pass control to the return address. To make it simpler to implement these two actions, an extended version of the ret command was introduced to the command set. It has a direct operand that is treated as an unsigned integer:

ret n

This command pops the return address from the stack first, then it clears n bytes in the stack, and finally it jumps to the return address.

A few additional notes: First, the ret command is actually the ret 0 command, i.e., it returns without clearing the stack. Second, the operand of this command tells how many bytes in the stack should be cleared. Finally, the operand should not take into account the return address because the ret command pops it before clearing the stack.

After the procedure returns control in such a manner, the stack will have the same state as before the call to the procedure, i.e., before the parameters were pushed on it. Thus, all traces of the call are covered up, and this is what you want.

file:///D|/2/0021.html (8 von 14) [25.01.2008 00:11:10]

Một phần của tài liệu Yury magda visual c++ optimization with assembly code a list publishing (2004) (Trang 87 - 101)

Tải bản đầy đủ (PDF)

(357 trang)