Optimizing C++ Logical Structures with Assembly Language

Một phần của tài liệu Yury magda visual c++ optimization with assembly code a list publishing (2004) (Trang 101 - 120)

Download CD Content

Overview

Programmers who write in high-level languages use assembly blocks and separately compiled modules to

decrease the sizes and improve the performance of their applications. In this section, we will look at how the most frequently used constructions of high-level programming languages are implemented in the assembler.

Constructions such as if…else, do…while, etc. carry certain redundancy. Therefore, by using their assembly analogs, you will be able to increase the performance of your applications. It should be mentioned that all high-level languages usually have inline assemblers, but we will postpone a discussion of the C++ .NET inline assembler until Part 3.

Analysis of programs written in high-level programming languages reveals their weak points, especially the unpractical use of selection instructions and loop computations. Processing large arrays and strings and mathematical calculations significantly decreases performance of programs. These is no high-level languages compiler that could completely remove redundancy and non-optimality from the code, no matter how hard it would try to optimize the size and performance of an executable module. This is also true for Intel compiler, which, we believe, has good optimization characteristics at the processor command level.

Loop computation and constructions such as if…else, while, do…while, and switch…case can be optimized most easily. As a rule, optimization of selection instructions and loops is based on using comparison commands and conditional jumps depending on the result of comparison. Generally, this can be presented as follows:

...

cmp operand1, operand2 Jcond label1

<commands 1>

jmp label2 label1:

<commands 2>

label2:

...

Here, operand1 and operand2 are variables and/or expressions, and Jcond is a conditional jump command (such as je, jl, jge, jz, or other).

Any high-level language construction, no matter how complicated it might be, can be represented as a combination of conditional jumps and comparison commands. In this chapter, we will look at variants of implementation of high- level language constructions for practical purposes. We will begin with the if statement.

file:///D|/2/0022.html [25.01.2008 00:11:10]

The if Statement

A single if statement is used for execution of a statement or a block of statements depending on whether the specified condition is true. In a general form, this statement looks like this:

if (condition) statements

In C++, it can be written as follows:

if (condition) {

<statements>

}

A more complex variant of the statement, if…else, makes it possible to selectively execute one of two actions depending on the condition. Here is its syntax in C++:

if (condition) {

<statements 1>

{ else {

<statements 2>

}

There are no such constructions in assembly language, but they can be implemented easily with certain sequences of commands.

For example, we will consider an algorithm that checks two operands for equality and jumps to one or the other branch depending on the result.

In Visual C++ .NET, the expression will look like this:

if (operand1 = operand2) {

...

} else {

...

}

Assembly language allows you to implement the if…else construction quite simply:

cmp operand1, operand2 je YES

<commands 1>

jmp EXIT YES:

<commands 2>

EXIT:

...

Another variant is also possible:

cmp operand1, operand2 jne NO

<commands 2>

file:///D|/2/0023.html (1 von 6) [25.01.2008 00:11:11]

EXIT:

...

NO:

<commands 1>

jmp EXIT

Develop a code fragment that compares two integers, X and Y. Depending on the result of comparison, the X variable takes the value of Y if X is greater than Y, and remains unchanged if X is less or equal to Y.

In C++, this code fragment looks like this:

if (X > Y) X = Y;

Here is its implementation in the assembler:

mov EAX, DWORD PTR Y cmp EAX, DWORD PTR X jge EXIT

mov DWORD PTR X, EAX EXIT:

Since 32-bit operands are used, all variables and registers are declared appropriately. The cmp comparison

command cannot be executed if both its operands are located in the memory, so one of them (Y in this case) is put to the EAX register. The result is stored in the X variable.

The following fragment of code computes the sum of two integers, X and Y, if both are within the range from 1 to 100.

In Visual C++, this fragment of code looks like this:

if ((X <= 100 && X >= 1) && (Y <= 100 && Y >= 1)) X = X + Y

The code in the assembler is shown in Listing 4.1.

Listing 4.1: A fragment of an assembly program, with an analog of the if statement, that adds up two integers

. . .

cmp DWORD PTR X, 1 jge check_x100 jmp EXIT

check_x100:

cmp DWORD PTR X, 100 jle check_y1

jmp EXIT check_y1:

cmp DWORD PTR Y, 1 jge check_y100 jmp EXIT

check_y100:

cmp DWORD PTR Y, 100 jg EXIT

mov EAX, DWORD PTR Y add DWORD PTR X, EAX EXIT:

. . .

As you can see from the algorithm, in order to find the sum of X and Y, you need to check at least four conditions:

file:///D|/2/0023.html (2 von 6) [25.01.2008 00:11:11]

● X is greater or equal to 1

● Y is greater or equal to 1

● X is less or equal to 100

● Y is less or equal to 100

Only if these four conditions are true simultaneously can you assign the X variable the value of the X + Y expression. To implement such a task, you should break the condition

(X <= 100 && X >= 1) && (Y <= 100 && Y >= 1) into the following four simpler constructions:

X <= 100, X >= 1, Y <= 100, Y >= 1

The task is simpler now. Each of the four conditions can be checked easily with the cmp command. For example, checking the X <= 100 condition and the subsequent jump is done as follows:

cmp DWORD PTR X, 100 jle check_y1

The other checks can be done with similar combinations of commands.

It is important to note that an assembly analog of a high-level language construction is not necessarily obvious, and this is illustrated in the following example.

Develop some code to compute the absolute value of the X integer. A possible variant of implementing such a task involves the if…else construction.

In Visual C++, it will look like this:

if (X >= 0) AbsX = X else

AbsX = −X

where AbsX is a variable that stores the absolute value of X. Here is an implementation of this construction in the assembler:

cmp DWORD PTR X, 0 jl NOT_X

jmp EXIT NOT_X:

neg DWORD PTR X EXIT:

mov EAX, DWORD PTR X mov DWORD PTR AbsX, EAX

An assignment statement is executed in both the if and else branch. These two assignments can be combined and put at the end of the fragment of code:

mov EAX, DWORD PTR X mov DWORD PTR AbsX, EAX

The else branch is implemented in the assembler with the command:

NOT_X:

neg DWORD PTR X

The result is stored in the Absx variable.

file:///D|/2/0023.html (3 von 6) [25.01.2008 00:11:11]

Now, we will solve a simple problem. Find the maximum of two integers and assign it to a third variable. An

appropriate C++ .NET console application would consist of a few lines of code, and its code is shown in Listing 4.2.

Listing 4.2: Computing the maximum of two numbers and displaying it // IF_ELSE_SETCC.cpp : Defines the entry point

// for the console application

#include "stdafx.h"

int _tmain(int argc, _TCHAR* argv[]) {

int i1, i2, ires;

while (true) {

printf("\n");

printf("Enter first number (i1): ");

scanf("%d", &i1);

printf("Enter second number (i2): ");

scanf("%d", &i2);

if (i1 >= i2) ires = i1;

else ires = i2;

printf("Maximum = %d\n", ires);

}

return 0;

}

As you can see from the listing, a comparison is made with the following statements:

if (i1 >= i2) ires = i1;

else ires = i2;

Now, we will try to optimize the code fragment that contains the if…else statement. It will be convenient to use the Visual C++ .NET inline assembler for this purpose. An assembly language analog of a conditional statement will look like this:

_asm {

mov EAX, i1 mov EBX, i2 cmp EAX, EBX jge set_ires xchg EAX, EBX set_ires:

mov ires, EAX }

This fragment of code is simple and does not require additional explanation. You can create even more effective code if you get rid of branches and jumps in the program or at least minimize their number. Processors such as Pentium II and higher include a number of commands that allow you to effectively implement branches in a program. Among these commands are setcc (set conditionally) and the cmov and fcmov commands. By combining these, you can achieve significant results, while improving the performance of your applications.

An assembly analog of the if…else statement that uses the setge and cmovl commands looks like this:

_asm {

xor EBX, EBX mov EAX, i1 mov EDX, i2 cmp EAX, EDX setge BL

file:///D|/2/0023.html (4 von 6) [25.01.2008 00:11:11]

cmp BL, 1 cmovl EAX, EDX mov ires, EAX }

First, the EBX register is zeroed because it will be used as a “greater than or less than” indicator. The first number (i1) is put to the EAX register, and the second number (i2) to the EDX register. If the contents of the EAX register are greater than or equal to those of the EDX register, the setge BL command writes a one to the low order part of EBX; otherwise, zero will remain in EBX. If BL=0, the contents of EDX are put to the EAX register. Before the last command, the EAX register contains the maximum, which is stored in the ires variable. As you can see from this fragment of code, there are no branches and jumps. Before you use the cmov command, you should make sure that your processor supports it. The check can be done with the cpuid command.

The complete code of the console application is shown in Listing 4.3.

Listing 4.3: A modified variant of the program that includes the ifelse statement // IF_ELSE_SETCC.cpp : Defines the entry point

// for the console application

#include "stdafx.h"

int _tmain(int argc, _TCHAR* argv[]) {

int i1, i2, ires;

while (true) {

printf("\n");

printf("Enter first number (i1): ");

scanf("%d", &i1);

printf("Enter second number (i2): ");

scanf("%d", &i2);

_asm {

xor EBX, EBX mov EAX, i1 mov EDX, i2 cmp EAX, EDX setge BL

mov ires, EAX cmp BL, 1 cmovl EAX, EDX mov ires, EAX }

printf("Maximum=%d\n", ires);

}

return 0;

}

file:///D|/2/0023.html (5 von 6) [25.01.2008 00:11:11]

Fig. 4.1: Window of an application that uses optimized code for computing the maximum of two integers

The window of the application is shown in Fig. 4.1.

file:///D|/2/0023.html (6 von 6) [25.01.2008 00:11:11]

The while Loop

This loop is used when the number of iterations is not known beforehand. The while loop is a pretest loop, and its execution depends on the initial condition. Its general syntax can be presented as follows:

while (condition) <statements>

The loop exits if the condition is false. Since the condition is checked at the beginning of each iteration, it may happen that the body of the loop is not executed even once.

In C++, a while loop looks like this:

while (condition) {

<statements>

}

The following fragment of code demonstrates the use of the while loop. Suppose you have an array of ten

integers and want to find the number of elements that precede the first zero element (if there is any). A fragment of the program should return the number of elements that precede the first zero element or a zero if such an element is not found. This code could be used to search for and extract null-terminated strings. This task is easily

implemented with a for loop, but here we will do this with a while loop.

The following variables will be used:

● x1—an integer array

● Ix1—the current array index

● sx1—the array size

● Counter—the counter of elements

This fragment of code will be executed as follows:

● After initialization of the variables, the program checks the element of the X1 array for being not equal to zero at the beginning of each iteration in the while loop. If the element is equal to zero, the loop intermediately interrupts.

● If the condition is true, i.e., the array element is not equal to zero, the loop body is executed. The Counter and

the IX1 index are incremented. If the last element of the array is encountered, the loop exits (the if statement).

● In any case, the Counter contains the number of elements that precede the first zero element or a zero if

such an element is not found.

The Visual C++ code for this task is shown in Listing 4.4.

Listing 4.4: A fragment of code that uses a while loop ...

int X1[10] = {i2, 90, −6, 30, 2 2 , 10, 2 2 , 89, −0, 47};

int Counter = 0;

int IX1 = 0;

int SX1 = sizeof (X1) / 4;

while (X1[IX1] != 0)

file:///D|/2/0024.html (1 von 4) [25.01.2008 00:11:12]

{

Counter++;

if (IX1 == SX1) break;

IX1++;

};

if (Counter == SX1+1) Counter = 0;

...

At first glance, the implementation of this task in the assembler (Listing 4.5) would seem more complicated than in the previous examples.

Listing 4.5: An assembly-language implementation of the task that uses a while loop .686

.model flat, stdcall .data

X1 DD 2, −2 3, 5, 9, −1, 0, 9, 3 SX1 DD $−X1

Counter DD 0 .code

start:

push EBX mov ECX, 0

mov EBX, offset X1 mov EDX, DWORD PTR SX1 shr EDX, 2

mov ESI, EDX AGAIN:

mov EAX, DWORD PTR [EBX]

cmp EAX, 0 je RUNOUT inc ECX dec EDX jz RUNOUT add EBX, 4 jmp AGAIN RUNOUT:

cmp ECX, ESI jne SET_CNT xor ECX, ECX SET_CNT:

mov DWORD PTR Counter, ECX pop EBX

...

end start

A few important notes should be made. The first one relates to the use of registers. When working with external programs and modules in high-level languages, it is recommended that you save the EBX, EBP, ESI, and EDI registers by pushing them on the stack. With regard to the other registers (EAX, ECX, and EDX), you can use them as you like.

The second note relates to work with arrays and strings in the assembler. To access such data in Windows, 32-bit variables that store the addresses of the arrays and strings are always used. To access the elements of the X1 array, you can use the EBX register by putting the address of the first array element to it:

mov EBX, offset X1

file:///D|/2/0024.html (2 von 4) [25.01.2008 00:11:12]

To work with an array, you should know its size. It can be stored in the EDX register:

mov EDX, DWORD PTR SX1

The counter of non-zero elements is stored in the ECX register. Since each element of the array is four bytes long (a double word), the following command is used to access the next element:

add EBX, 4

This example includes two high-level constructions: a while loop and an if conditional statement. The while loop is implemented with three commands:

mov EAX, DWORD PTR [EBX]

cmp EAX, 0 je RUNOUT,

and the if statement is implemented with the following commands:

cmp ECX, EDX je RUNOUT

If no zero element is found, zero is written to the counter according to the condition of the task:

cmp ECX, EDX jne SET_CNT xor ECX, ECX

We provide such a detailed analysis of the assembly version of the program so that you understand that there is no unique solution for the task of optimization of logical structures in high-level languages! In many cases, a building block of such an optimization is the following pair of assembly commands:

cmp operand1, operand2 Jcond label

In principle, the assembler allows you to implement any logical expressions and branches, no matter how complicated they are. The only limitation is your imagination and experience.

A while loop can be implemented in the assembler by using commands of string primitives. These commands are widely used for processing arrays and strings in loops and often simplify the algorithm of a task. A variant of

implementing a while loop with the scasd command is shown in Listing 4.6.

Listing 4.6: An implementation of a while loop with the scasd command .686

.model flat, stdcall .data

X1 DD 2, −2 3, 5, 9, −1, 0, 9, 3 SX1 DD $–X1

IX1 DD 1 Counter DD 0 .code

start:

mov EDI, offset X1 xor ECX, ECX

mov EDX, DWORD PTR SX1 shr EDX, 2

cld

xor EAX, EAX next:

scasd je ex inc ECX dec EDX

file:///D|/2/0024.html (3 von 4) [25.01.2008 00:11:12]

jz ex jmp next ex:

cmp ECX, 10 jne write_cnt mov Counter, 0 jmp quit

write_cnt:

mov Counter, ECX quit:

. . . end start

file:///D|/2/0024.html (4 von 4) [25.01.2008 00:11:12]

The dowhile Loop

do…while loops arrange execution of a loop consisting of any number of statements when the number of iterations is not known beforehand. The body of the loop will be executed at least once. The loop exits when a certain logical condition becomes false.

In C++, the do…while loop has the following form:

do {

<statements>

}

while <condition>

Write some code for computing the sum of the first four elements of an integer array. Let the number of its elements be seven. To implement this task, use the following variables:

● x1—an integer array of seven elements

● Ix1—the index of the current element of the array

● Sumx1—the current total value

A fragment of the C++ code shown in Listing 4.7 is quite simple.

Listing 4.7: A fragment of the C++ code that finds the sum of the first four elements of an array ...

int X1[7] = {2, −4, 5, 1, −1, 9, 3};

int IX1 = 0;

int sumX1 = 0;

do {

sumX1 = sumX1 + X1[1X1];

IX1++;

}

while (IX1 <= 3);

...

An assembly variant of the do…while loop is shown in Listing 4.8.

Listing 4.8: The dowhile loop implemented in the assembler .686

.model flat .data

X1 DD 2, −2 3, 5, 9, −1, 9, 3 SX1 DD $–X1

IX1 DD 1 CNT EQU 3 SUMX1 DD 0 .code

start:

push EBX

file:///D|/2/0025.html (1 von 2) [25.01.2008 00:11:13]

mov EBX, offset X1 mov EAX, 0

mov EDX, DWORD PTR SX1 shr EDX, 2

cmp EDX, CNT jl EXIT NEXT:

add EAX, [EBX]

cmp DWORD PTR 1X1, CNT jg EXIT

inc DWORD PTR 1X1 add EBX, 4

jmp NEXT EXIT:

mov DWORD PTR SUMX1, EAX pop EBX

. . . end start

First, all necessary variables are initialized. To access the elements of the array, its address is put to the EBX register:

mov EBX, offset X1

The initial total value equal to zero is put to the EAX register:

mov EAX, 0

The condition of the do…while loop is checked in the assembly code with the following command:

cmp DWORD PTR IX1, CNT where IX1 is the current array index.

Since an integer value of an array element takes a double word in the memory, to access the next element, you should increase the address value by four, just like in the previous example:

add EBX, 4

The result is put to the SUMX1 variable for later use.

file:///D|/2/0025.html (2 von 2) [25.01.2008 00:11:13]

The for Loop

A for loop arranges execution of a statement or a group of statements for a particular number of times.

In its general form, the loop looks like this:

For (initialization expression; condition; modifier expression) <statements>

When a program encounters the loop, it executes the initialization expression, which sets the loop counter to the initial value. Then the so-called terminating condition is evaluated. The loop executes while this expression is true.

Every time the loop body is executed, the modifier expression changes the loop counter. When the condition is false, the for loop terminates, and the statements that immediately follow it are executed.

In Visual C++, a for loop has a uniform syntax regardless of the direction of the modifier:

for (initialization expression; condition; modifier)

Most often, a for loop is used for iterative mathematical calculations with a constant increment and a known number of iterations, or it is used to search for elements in arrays or strings. Here, we will look at an example of using a for loop. Suppose you want to find the sum of the first seven elements of an array of 10 integers.

In C++ .NET, a fragment of the code could appear as shown in Listing 4.9.

Listing 4.9: A fragment of C++ code that uses a for loop ...

int i1[10] = {3, −5, 2, 7, −9, 1, −3, −7, −11, 15};

int isum = 0;

for (int cnt = 0; cnt < 7; cnt++) {

isum = isum + i1[cnt];

} ...

In assembler, it is convenient to implement a for loop with the loop command. In this case, the loop counter is put to the ECX register, and the current total is put to the EAX register. The address of the il integer array is put to the ESI register.

Before jumping to the next iteration, the address in ESI is increased by 4. After the loop terminates, the sum is stored in the isum variable.

The code fragment is shown in Listing 4.10.

Listing 4.10: An assembly variant of a program that finds the sum of the first seven elements of an array with the for loop

.686

.model flat .data

i1 DD 3, −5, 2, 7, −9, 1, −3, −7, −11, 15 isum DD 0

.code start:

mov ECX, 7

file:///D|/2/0026.html (1 von 2) [25.01.2008 00:11:13]

Một phần của tài liệu Yury magda visual c++ optimization with assembly code a list publishing (2004) (Trang 101 - 120)

Tải bản đầy đủ (PDF)

(357 trang)