Guide to Assembly Code Optimization



		              Mike Schmit's Top Ten Rules

	(Author John Allen, e-mail to Quantasm)
	                     Pairing Pentium Instructions
	   1. Both instructions must be simple.
	   2. Shifts or rotates can only pair in the U pipe.
	      (SHL, SHR, SAL, SAR, ROL, ROR, RCL or RCR)
	   3. ADC and SBB can only pair in the U pipe.
	   4.  JMP, CALL and Jcc can only pair in the V pipe. (Jcc = jump on
	   condition code).
	   5.  Neither  instruction  can  contain BOTH a displacement and an
	   immediate operand. For example:
	mov     [bx+2], 3  ; 2 is a displacement, 3 is immediate
	mov     mem1, 4    ; mem1 is a displacement, 4 is immediate

	   6.  Prefixed  instructions  can  only  pair  in  the U pipe. This
	   includes extended instructions that start with 0Fh except for the
	   special  case  of  the  16-bit  conditional  jumps of the 386 and
	   above. Examples of prefixed instructions:
	mov     ES:[bx],
	mov     eax, [si]  ; 32-bit operand in 16-bit code segment
	mov     ax, [esi]  ; 16-bit operand in 32-bit code segment

	   7.  The  U  pipe  instruction must be only 1 byte in length or it
	   will not pair until the second time it executes from the cache.
	   8. There can be no read-after-write or write-after-write register
	   dependencies  between  the  instructions except for special cases
	   for the flags register and the stack pointer (rules 9 and 10).
	mov     ebx, 2   ; writes to EBX
	add     ecx, ebx ; reads EBX and ECX, writes to ECX
	                 ; EBX is read after being written, no pairing
	mov     ebx, 1   ; writes to EBX
	mov     ebx, 2   ; writes to EBX
	                 ; write after write, no pairing

	   9.  The  flags register exception allows an ALU instruction to be
	   paired  with  a  Jcc  even  though the ALU instruction writes the
	   flags and Jcc reads the flags. For example:
	cmp     al, 0    ; CMP modifies the flags
	je      addr     ; JE reads the flags, but pairs
	dec     cx       ; DEC modifies the flags
	jnz     loop1    ; JNZ reads the flags, but pairs

	   10.  The stack pointer exception allows two PUSHes or two POPs to
	   be paired even though they both read and write to the SP (or ESP)
	push    eax      ; ESP is read and modified
	push    ebx      ; ESP is read and modified, but still pairs

	  Simple Instructions (for Pentium pairing)
	   The  following  is  a list of simple instructions, as required by
	   rule #1 above.
	Instruction format 16-bit example     32-bit example
	MOV reg, reg       mov ax, bx         mov eax, edx
	MOV reg, mem       mov ax, [bx]       mov eax, [edx]
	MOV reg, imm       mov ax, 1          mov eax, 1
	MOV mem, reg       mov [bx], ax       mov [edx], eax
	MOV mem, imm       mov [bx], 1        mov [edx], 1
	alu reg, reg       add ax, bx         cmp eax, edx
	alu reg, mem       add ax, [bx]       cmp eax, [edx]
	alu reg, imm       add ax, 1          cmp eax, 1
	alu mem, reg       add [bx], ax       cmp [edx], eax
	alu mem, imm       add [bx], 1        cmp [edx], 1

	where alu = add, adc, and, or, xor, sub, sbb, cmp, test

	INC  reg           inc  ax            inc  eax
	INC  mem           inc  var1          inc  [eax]
	DEC  reg           dec  bx            dec  ebx
	DEC  mem           dec  [bx]          dec  var2
	PUSH reg           push ax            push eax
	POP  reg           pop  ax            pop  eax
	LEA  reg, mem      lea  ax, [si+2]    lea  eax, [eax+4*esi+8]
	JMP  near          jmp  label         jmp  lable2
	CALL near          call proc          call proc2
	Jcc  near          jz   lbl           jnz  lbl2

	where Jcc = ja, jae, jb, jbe, jg, jge, jl, jle, je, jne, jc, js,
	            jnp, jo, jp, jnbe, jnb, jnae, jna, jnle, jnl, jnge,
	            jng, jz, jnz, jnc, jns, jpo, jno, jpe

	NOP                nop                nop
	shift reg, 1       shl  ax, 1         rcl  eax, 1
	shift mem, 1       shr  [bx], 1       rcr  [ebx], 1
	shift reg, imm     sal  ax, 2         rol  esi, 2
	shift mem, imm     sar  ax, 15        ror  [esi], 31

	where shift = shl, shr, sal, sar, rcl, rcr, rol, ror

	     * rcl and rcr are not pairable with immediate counts other than 1
	     * all memory-immediate (mem, imm) instructions are not pairable
	       with a displacement in the memory operand
	     * instructions with segment registers are not pairable

