EUG PD


Wordwrapping In Assembler

 
Published in EUG #39

Say we have a block of text which was written in 80 column mode, but we want to show this in 40 columns instead. A crude way to do this would be simply to print it out unmodified, but this would result in some words being broken across the ends of lines.

It is therefore desirable to wordwrap the text. This may be done by considering some special cases:

Here are the 40 columns 0123456789012345678901234567890123456789

Here's a random string  ABCDE EFGHIJK LMNOPQ . RTSUVWXYZ,abcdef
                        ghij.
Fortunately this will look OK even if we'd simply printed it out. To word wrap, you move a pointer to the fortieth character, then back up one character at a time until you find a space. Here, that occurs on the very first letter, as there's a space at the end of the line.
Here are the 40 columns 0123456789012345678901234567890123456789
Here's a random string  ABCDE EFGHIJK LMNOPQ . RTSUVWXYZ,abcdefg
                        hij.
Applying the above rules, the wordwrap will be made after the full stop as that is where the first space is. It would appear as:
Here are the 40 columns 0123456789012345678901234567890123456789
Here's a random string  ABCDE EFGHIJK LMNOPQ .                        
                        RTSUVWXYZ,abcdefghij.
which isn't very satisfactory either. If instead we look for any character with an ASCII value less than the number '0' (number 48) then the word wrap will occur at the comma instead, as this has ASCII value 44. It'll look like:
Here are the 40 columns 0123456789012345678901234567890123456789
Here's a random string  ABCDE EFGHIJK LMNOPQ . RTSUVWXYZ,
                        abcdefghij.
Much better. But what happens in the extreme cases? A 42 letter long word, or a zero length word. Well, when tracking backwards in search of a character less than ASCII 48, if you realise that you're at the beginning of the string, at offset 0, then simply print out a fixed 40 letters. This will leave you part way through a string, whereupon you can then jump forward another forty letters and carry on as before. In this way, an 85-letter string would be split up into two fixed length forty characters then a 'normal' string.

By a zero length word I mean one with a carriage return only in it. These can be dealt with by performing a newline, then starting scanning again after the CR character.

The program "U.WRAP" will print out a wordwrapped text file to 40 columns, stopping when ASCII 0 is found.

That should give you all some food for thought if you're writing your own word processors as a wordwrap routine is often very important. You'll probably want to do some range checking on the memory pointer "addrL" and "addrH" as these must, of course, be somewhere in RAM.

Another feature which could prove handy is the handling of carriage return markers used by other operating systems. DOS uses a CR and linefeed, Archimedes just use the linefeed, the BBC uses both!

To do this, if you find a character less than or equal to 13 then a newline is printed. You then look ahead one byte to see if that is also less than or equal to 13, but not the same as the other. This then takes into account people who wanted two empty lines one after another in their text. Discard the second byte if it IS different.

And there it is!

Robert Sprowson, EUG #39

Robert Sprowson