Can assembled ASM code result in more than a single possible way (except for offset values)?

  • I don't know x86 ASM very well, but I'm rather comfortable with SHARP-z80, and I know by experience that each instruction (mnemonic) has a corresponding byte/word value, and by looking at the hex dump of the assembled binary file I can "read back" the same code I wrote using mnemonics.

    In another SO question, somebody claimed that there are some situations where ASM instructions are not translated to their corresponding binary value, but instead are rearranged in a different way by the assembler.

    I'm looking especially for cases where disassembling the binary would result in a different ASM code than the original one.

    In other words, are there any cases where assembly code is not 1:1 ratio with assembled code?

    MikeKwan linked to another question where GCC would modify inline ASM code (in a C project), but, even though that's an interesting topic, it doesn't answer to this question, because GCC is a compiler, and always tries to optimize code and inline ASM trnslation is affected by surrounding C code.

    Ira Baxter

    To the extent that the assembler designers think it was helpful, it may substitute equivalent instructions that have other, useful properties.

    First, there machines with variable length value operands fields. If a value/offset will fit into any of several variants, it is common for the assembler to substitute the shortest. (In such assemblers, it is also common to be able force a particular size). This is true of instructions that involved immediate operands and indexed addressing.

    Many machines have instructions with PC-relative offsets, commonly for JMPs, sometimes for load/store/arithmetic instructions. An assembler on encountering such an instruction during the first pass can determine of the addressed operand precedes the insruction or it has not seen the instruction yet. If preceding, the assembler can choose a short relative form or a long relative form because it knows the offset. If following, the assembler doesn't know the size, and generally chooses a large offset for the instruction that it fills in during pass2. Similarly, there tend to be ways to force the assembler to choose the short form.

    Some machines don't have long jump relative instructions. In this case, the assembler will insert a short jmp relative backwards if the the target precedes the jmp and is close by. If the target precedes but is far away, or the target is a forward reference, the assembler may insert a short-relative-jmp on the opposite branch conditions with target being past the next instruction, followed by a long absolute jmp. (I've personally built assemblers like this). This ensures that jmps can always get to their target.

    The good news about these tricks is that if you disassemble, you still get a valid assembly program.

    Now lets turn to ones that will confuse your disassembler.

    A similar trick to jump relative for literal operands may be used if the machine has short-relative addressing for load/store instructions and the programmer apparently specifies loading of a constant or value a long way away. In this case the assembler changes the instuction to refer to a literal or an address constant following an inserted short relative jmp around that constant. The dissembler thinks everything in the instruction stream is an instruction; in this case, the literal value is not and that would throw the disassembler off. At least there's an unconditional jmp around the literal to guide the disassembler.

    Screwier tricks you may find in mature assemblers where every stunt ever imagined is supported. One of my favorites on an 8 bit assemblers were "pseudo" instuctions SKIP1, SKIP2, which you can think of as extremely short relative branches. They were really just the opcoode byte of "CMP #8bits" and "CMP #16bits" instructions, and were used to jump around an 8 bit or 16 bit instruction respectively. So, a "one byte" relative jump rather than two. When you're squeezed for space, every byte counts :-{

          INC    ; 8 bit instruction

    This was also handy when trying to implement a loop where some step shouldn't be performed on loop entry, but needs to be done on further loop iterations:

    LOOP: SHLD  ; 16 bit instruction
          BNE LOOP

    This issue here is that if you disassemble the SKIP1 or SKIP2 instructions, you won't see the INC (or the corresponding 16 bit instruction).

    A trick used by assembly language programmers for passing parameters is to place them inline after the call, with the proviso that the called routine adjust the return address appropriately:

          CALL   foo
          DC     param1
          DC     param2

    Or CALL printstring DC "a variable length string",0

    There is no practical way that a disassembler can know that such a convention is being used or what that convention is, so the dissembler is bound to handle this wrong.