EUG PD
1st April 1998Categories: Description: Utility
Author: Mark Bellis
Published in EUG #37
With experience, 6502 code is simple to disassemble, resorting only occasionally to the Reference Manual for the table of opcodes.
However, disassembling a *DUMP listing is time consuming, especially when you want to decode a whole 16K ROM full.
In answer to this problem, I have written a disassembler in BASIC with an attempt to make it compatible with Elks by using Mode 6. (I'm not sure if it uses any BASIC IV statements that early machines don't understand.)
The program will disassemble a machine code file on disk, for which you can specify a pathname of up to 39 characters.
You will require a filename for the output, up to 39 characters.
The program asks for the input file's load address, for two reasons:
- Some games relocate themselves once loaded, as they use filing system RAM (&E00 to &1900 or more in a BBC B). You may specify the start address as the relocation address, so that the JSRs match up.
- If you are hacking a ROM, it's likely that you will have copied it to &3000 before *SAVEing it, as the filing system would get in the way otherwise and this facility allows you to enter the original address.
The program then asks for the address to begin disassembly. To start &1000 bytes into the file, simply make the start address &1000 bigger than the load address. For the address inputs, the program weeds out non-hex characters. I don't know if Elks can use the INSTR instruction that does this though.
Lastly, the program asks for a P, E or W to indicate whether you are outputting to the Printer, EXEC file or wordprocessor. Specifying W causes the program to leave out the line feeds when it sends lines to the file, so that you can directly load the resultant file into a Word-processor, in order to add your own comments to the code.
Specifying P leaves the line feeds in. To print the file, type:
CTRL-B *PRINT <filename> <RETURN> CTRL-C
Specifying E activates the following features:
- Top and tail code is added to the EXEC file to enable the resultant program to be run, provided the assembly address is suitable. The possible addition of a relocation feature would enable this address to be changed to be in user RAM, for reassembly.
The lines added are:
10 FOR pass%=0 TO 3 STEP 3 2-pass assembly is assumed, with listing on for error checking 20 P%= &<address> <address> is the disassembly start address 30 [OPT pass% xx ] at the end of the file yy NEXT pass%
- The treatment of branch instructions changes:
Instead of printing just the offset from the current address, the destination address is output as P%-254+&xx for backward branches, and P%+2+&xx for forward branches.
The result of this is that the branch instructions will reassemble. Therefore relocatable code (with no JMPs or absolute data) will be able to be recreated exactly.
If the relocation feature were added, it would still not be able to recalculate indirect addresses for pointers, which are often initialised by LDA#&xx : STAptr : LDA#&yy : STAptr+1, to point to &yyxx.
You will also have to recreate your own variable names.
The program now begins disassembly.
Four lines at a time of *DUMP listing are displayed; the current byte being the top left one. The ASCII characters keep up with the hex bytes, to help you to find strings in the file.
The memory location is in the left column. Location addresses are sent to the output file for each instruction decoded.
You are prompted to choose what to do with the current byte:
N.B. Do not press return after the O, B, W, D, S or E.
O | - | Opcode | .... | The program will disassemble one instruction, according to its data file (OpcData).
The Rockwell 65C02 BBR and BBS are not yet supported. I might add them on request. The program finds out from the data file which addressing mode the instruction uses, and decodes any extra bytes accordingly. The display is then updated by the number of bytes of the instruction, and the decoded instruction is displayed. |
B | - | Byte | ...... | The program decodes one byte as data, and prints EQUB &<byte>. |
W | - | Word | ...... | Decoded as EQUW &<2 bytes, highest first>. Therefore an address look-up table will allow addresses to be read normally in the printout. |
D | - | Double Word Decoded as EQUD &<4 bytes in reverse order>. As above. | ||
S | - | String | .... | The program prompts for the length of the string. Look for the next return character, 00 byte or 0A byte, and type in the number of characters, including spaces, in the string, pressing <RETURN>. See example below.
The display updates by the length of the string, and EQUS "<string>" is sent to the file and displayed. |
E | - | End | ....... | End the program. The program closes the files and terminates. |
Demo Program, DADEMO
Typing "*DADEMO" from the BASIC prompt will cause "Hello" to be printed.
Name of program to disassemble: | DADEMO |
Name of output file: | DEMODA |
Load address: | &2000 |
Start address: | &2000 |
Printer or WP: | P |
2000 08 48 8A 48 A2 00 BD 16 .H.H.... 2008 20 20 EE FF E8 C9 00 D0 ...... 2010 F5 68 AA 68 28 60 48 65 .h.h(£He 2018 6C 6C 6F 0A 0D 00 ** ** llo...Key sequence for the above program:
OOOOOOOOOOOOOOOS5<RETURN>WB(no E on the end, as the program terminates automatically when it sees the end of the input file) The result:
2000 PHP 2001 PHA 2002 TXA 2003 PHA 2004 LDX #&00 2006 LDA &2016,X 2009 JSR &FFEE 200C INX 200D CMP #&00 200F BNE &F5 2011 PLA 2012 TAX 2013 PLA 2014 PLP 2015 RTS 2016 EQUS "Hello" 201B EQUW &0D0A 201D EQUB &00 "."Note that the disassembler does not assign labels or variable names.
The "." after an DIM is the ASCII code as printed by *DUMP ("." if <32 or >126), and is provided for password programs when one character at a time is tested. The character is also printed following an opcode using the Immediate addressing mode (i.e. data byte after the instruction), so that "LDA #&48" and "CMP #&65" demonstrate that the program under scrutiny loads "H" into the accumulator and compares the accumulator with "e" respectively.
Of course, using the disassembler requires some knowledge of 6502 machine code, in order that you can discern whether the current byte is an opcode or a piece of data, and so that you can achieve speed in disassembly by looking at the program and being able to type OOOOOOOOOO, knowing that ten opcode instructions are on the screen.
Considering that it used to take me about an hour to disassemble a 256-byte *DUMP listing, the time saving is enormous, and I can now do about four 256-byte pages in ten minutes. Theoretically, a whole 16K ROM could be done in about an hour and a half, though it is better to split it into sections - it is best to end a section after an RTS or JMP instruction.
The output file can also grow very fast, at the rate of about 10 bytes per byte of machine code - decoding a ROM would take 160K of output files. I still intend to disassemble the MOS though, so that I can unravel its mysteries. I have decoded the part of the MOS which deals with OSWORDs 14 and 15 - the real-time clock code, and found the hardwired 1900-century code at &9881 in the Terminal ROM.
This disassembler has enabled hacking to go into warp drive! Try it and see for yourself!
Please feel free to request additional features, if they would be useful.
Mark Bellis, EUG #37