A Fast Well-Behaved Pattern Flood Fill Alvin Albrecht Sinclair Basic comes with a number of graphics primitives that are easy to use but they only manage to scratch the surface when it comes to computer graphics. There are many interesting graphic tools that are practically begging to be implemented on the ts2068, the flood fill being one of them. But before we can tackle the subject of flood fills, a small review of recursion is in order. A Review of Recursion In plain terms a recursive subroutine is a subroutine that calls itself. A typical recursive algorithm breaks a complicated problem into simpler pieces, then applies itself to those pieces repeatedly until the original problem is broken into many tiny problems that can be trivially solved. I have written about recursion in a past issue of ZQA so I will not be rehashing that material here. Instead, as a refresher, let's investigate the one recursive algorithm that everyone sees in Computer Science 101 -- the computation of a factorial. N! (N factorial) is defined as (N)x(N-1)x(N-2)x…x1 with N being a positive integer. 5! = 5x4x3x2x1 = 120, for example. 0! is defined as 1. A recursive solution might look like this: int factorial(int n) { if (n <= 1) // if n<1 the answer is 1 return 1; return n*factorial(n-1); // else break into smaller problem } This is a C function that computes N! but don't let that put you off - C is a fairly easy language to understand. Sinclair Basic does not fully support recursion so a Basic version would require artifacts surrounding it, which would only obscure the point I want to make. Now that a decent C compiler is available for our T/S machines, I don't feel too guilty about sticking with the C. We make a call, asking to compute the factorial of N. If N <= 1 the answer is 1. Otherwise the problem is too difficult to solve so it is broken into a smaller one: N times the factorial of (N-1). That was easy, but it may be initially surprising to learn that this is a poor method for calculating factorials. To understand why, we'll need to pay closer attention to what is involved in a recursive call. The factorial function, as written above, needs to remember two things: the number N and where it was called from so that it can return there later. Compiling the above for a ts2068 using the z88dk C compiler would reserve two bytes to store N and two bytes for the return address, for a total of four bytes. The "return" value is passed in the z80's HL register pair which is free. So the initial call to compute 5! would require four bytes to be reserved on the stack. But that is not the end of the story. Factorial(5) would try to return "4*factorial(4)", causing another call to be made to factorial with N=4, requiring another four bytes. Factorial(4) would make a call to factorial(3), requiring a further four bytes, etc. In other words, to find the answer to factorial(5), the computer would make calls to factorial(5), factorial(4), factorial(3), factorial(2) and factorial(1). We say that factorial(5) has a recursion depth of 5 because a maximum of five instances of factorial will exist at any one time during its computation. With each instance needing four bytes to remember its value of N and return address, factorial(5) requires 5*4=20 bytes of memory to compute its result. Generalizing, we can say that the recursion depth of factorial(N) is N and 4*N bytes are required to compute the result. Even on our small 64K machines that doesn't seem to be a lot of memory and really it isn't. Things get worse if you try to compute something like 69! which would seem to require 276 bytes in the recursive solution. I say "seem to" because we would actually need to introduce a new large variable to hold the result in each recursive step -- 69! requires 41 bytes to hold its value precisely! Switching to a floating point representation would reduce that to four bytes (at the expense of precision), compared to the two bytes we've assumed (the result in each step is held in the HL register pair, a consequence of how the z88dk C compiler does things). With the way the subroutine is written now we couldn't correctly compute anything more than 8! and therefore memory usage is never really an issue. Other recursive algorithms may have a recursion depth in the thousands with dozens of bytes needed for each instance. Then the matter of memory is significant, and, indeed, you will see one such example shortly. Another problem with the factorial recursive solution is runtime. It takes time to set up calls and return from them -- these actions translate directly into pushes and pops on the z80's stack. Compare this recursive solution to the alternative iterative solution below: int factorial(int n) { int fact, i; fact = 1; // the answer starts at 1 for (i=2; i<=n; i++) // i=2, while i<=n execute the loop fact = fact*i; return fact; } We need two bytes for "fact", two bytes for "i", two bytes for "N" and two bytes for the return address = 8 bytes total no matter what "N" is. There are no calls to set up, just a for loop. This version will be faster and use up a small, fixed and predictable amount of memory. It is superior to the recursive solution in every way. So what is the conclusion of all this discussion? Recursion must be used with care. It can be a panacea to solve many very difficult problems, but you must be fully aware of how much memory will be required and the runtime necessary in comparison to an equivalent iterative solution. The Towers of Hanoi solution in a back issue of ZQA has a maximum recursion depth of 64 (for 64 disks) and the Knight's Tour has a recursion depth of 64 (the number of squares on a chess board), very manageable numbers. A Recursive Flood Fill Algorithm So what has all this got to do with flood filling? It turns out that the obvious approach to filling an arbitrary region involves a recursive solution. And the recursive solution is a bad one. In comp.sys.sinclair, Geoff Wearmouth shared a Basic subroutine from an early '80s type-in magazine that would fill an arbitrary area on screen bounded by a solid pixel boundary. Here it is: 5 REM AUTHOR UNKNOWN 10 CIRCLE 128,88,80 20 LET x=100 : LET y = 100 : REM START POINT 30 GO SUB 1000 : STOP 1000 PLOT x,y 1010 IF NOT POINT(x+1,y) THEN LET x=x+1 : GO SUB 1000 : LET x=x-1 1020 IF NOT POINT(x-1,y) THEN LET x=x-1 : GO SUB 1000 : LET x=x+1 1030 IF NOT POINT(x,y+1) THEN LET y=y+1 : GO SUB 1000 : LET y=y-1 1040 IF NOT POINT(x,y-1) THEN LET y=y-1 : GO SUB 1000 : LET y=y+1 1050 RETURN The main subroutine begins at line 1000 and the algorithm used is a recursive one, the evidence being the "GOSUB 1000" statements in the subroutine itself. The fill subroutine above plots the current point and then tries to move in all four directions away from the point. Before each move it checks to see if the point is already black, indicating a boundary. If not, it is considered a valid move and a fill is initiated from that point by another recursive call to line 1000 with the new pixel coordinate in (x,y). Earlier I mentioned that Sinclair Basic does not fully support recursion. The reason it doesn't can be seen in this fill program. Each run through the subroutine at 1000 expects to have its own private copy of (x,y). The value of (x,y) at 1010 must be the same value at lines 1020, 1030 and 1040 in order for the program to work. But there are one or more "GOSUB 1000" calls in the middle, which themselves require new values of (x,y) and which will themselves change (x,y). Sinclair Basic has only one copy of these variables which must be shared by each recursive call. A recursive C call would give each "GOSUB" a private copy of (x,y) on the stack independent of all other "GOSUBs". Not so in Sinclair Basic. So the problem is, after each "GOSUB 1000" in the fill subroutine, how do we make sure that our own (x,y) has not changed? In the above code, the solution is simple. We promise that when "GOSUB 1000" returns, the value of (x,y) is not changed from what it was when "GOSUB 1000" was executed. In line 1010, for example, x is increased by one before a call to "GOSUB 1000". Because of our promise (the jargon calls such a promise an "invariant") we know that when the GOSUB returns, x will be one larger than what it was at the beginning of line 1010. So to get x back to where it was, decrease by one and everything will be fine for the next line. Before the routine returns in line 1050 we know that (x,y) has not changed from its initial state in line 1000. That's the subroutine keeping its promise. This fill algorithm is called a flood fill because the fill "floods away" from the initial point in all directions. Other fill algorithms exist, but this one is both easy to understand and capable of filling any arbitrary region without restrictions. Earlier I hinted that a recursive solution to the flood fill problem is a bad one. I'll leave that thought here and come back to it later when we've looked at a couple of machine code implementations of the algorithm. For now, realize that each "GOSUB" requires the ts2068 to remember a return line number (two bytes) and then consider what the recursion depth might be for a 256x192 resolution blank screen (hint - you wouldn't be far off if you just multiplied 256 and 192 together!). If you typed in the Basic program and ran it, you'd realize that it is mighty slow. Any useful fill utility will need to be written in machine code. To do that, we will need to review the structure of the ts2068's display file. Display File Organization The ts2068's display file is where all the screen information is stored. The SCLD chip constructs the TV display by reading the information stored there. The display file is "memory-mapped" because the storage exists in the z80's memory space, from address 16384 to 22527. If you poke values into those addresses you will see the display change. In the ts2068's other display modes (dual screen, hi-colour, hi-res) more areas of memory are used to hold the display. In this article, we'll only concern ourselves with the default 256x192 mode. A pixel display occupying 16384 to 22527 reserves 6144 bytes to store all the screen information. The ts2068 has a resolution of 256*192 = 49152 pixels. How do we cram information about 49152 pixels into 6144 bytes? Well, each pixel can be represented by one bit - either one or zero, on or off. Cramming 8 pixels into a byte, we'd need 256*192/8 = 6144 bytes. Problem solved! A simple way to organize the display might have pixels 0..7 for the top line of the display stored at address 16384, pixels 8..15 at address 16385, … pixels 248-255 stored at address 16415. The next pixel line would follow with pixels 0..7 of line 1 at address 16416, and so forth for all 192 lines on the screen. This is indeed how the TV draws its display, left to right, top to bottom. But the display organization was chosen to optimize the printing of characters so it's not done in this simple manner. To see evidence of this, try this short program: 10 FOR z=16384 TO 22527 20 POKE z,255 30 NEXT z On the largest scale you will notice that the display is divided into three parts called blocks. First the top block is filled, then the second and finally the third. Each block is further divided into eight character lines. Each of these lines is divided into eight scan lines. The first scan line for all character lines in a block is filled, followed by the second scan line for all character lines, and so on to the final eighth scan line. Each scan line itself is composed of 32 horizontal bytes with each byte holding eight pixels. This organization sounds complicated but it really isn't that bad if some thought is applied to it. By paying attention to how the display is built up in increasing byte order, we can construct a screen address given block, character line, scan line and column as follows: FIGURE 1. Screen Address Organization in Binary 0 1 0 B B S S S L L L C C C C C Where: BB = screen block, 0..2 SSS = scan line, 0..7 LLL = character line, 0..7 CCCCC = horizontal byte / character, 0..31 While observing the Basic program in action, you'll notice that the horizontal column changes the fastest. There are 32 columns, requiring 5 bits to represent those. They increase the fastest so they appear in the bottom 5 bits of the 16-bit address. The next fastest thing that changes is the character line. There are 8 lines in each block, requiring 3 bits to represent them. These 3 bits appear next to the column bits. Next, in order of fastest changing, are the scan lines (8 of them requiring 3 bits) followed by the block (3 of them requiring 2 bits). The display starts at address 16384 (0x4000) so we add that to our 16-bit address. This is responsible for the lone '1' you see in figure 1. The character position row = 10, column = 12 is located in block 1 (the second block since it holds the second third of the display, rows 8..15), line 2 (the third character line in this block -- rows 8, 9, 10), scan lines 0 (top) through 7 (bottom) for the full character square, and column 12. This leads to a screen address that looks like: 0 1 0 0 1 S S S 0 1 0 0 1 1 0 0 With various values of SSS: SSS 0 1 2 3 4 5 6 7 Screen Address 484C 18508 494C 18764 4A4C 19020 4B4C 19276 4C4C 19532 4D4C 19788 4E4C 20044 4F4C 20300 To print a letter 'A' at (10,12), poke the appropriate values into memory at these addresses: POKE 18508,BIN 00000000 POKE 18764,BIN 00111100 POKE 19020,BIN 01000010 POKE 19276,BIN 01000010 POKE 19532,BIN 01111110 POKE 19788,BIN 01000010 POKE 20044,BIN 01000010 POKE 20300,BIN 00000000 At this point, you may realize why UDGs and printed characters are 8x8 pixels in size. There are 8 vertical scan lines in each character line and there are 8 pixels packed into a byte. But you may not realize why this particular display file organization speeds up character printing. If you back up and look at the screen addresses computed above, you'll notice that each scan line is separated by exactly 256 bytes. In assembly language, an address is held in a register pair, like HL. Adding 256 to an address to move to the next scan line is a simple matter of incrementing the most significant register, in this case H with the "INC H" instruction. That's all it takes! Moving horizontally to the right one character position involves adding one to the screen address (ie adding one to "CCCCC" in figure 1), which can be done just as quickly with "INC L". You can’t get any faster than that. In fact, this display file organization was patented by Sinclair's Richard Altwasser back in 1982 (visit http://wearmouth.demon.co.uk/ to see the patent). That's all fine and good but we still haven’t managed to easily map a pixel coordinate to a screen address. Here's how we do it: FIGURE 2. Mapping Pixel Coordinates to Screen Address Units X Y C C C C C T T T B B L L L S S S The thought process that led to figure 2 is similar to the previous one. The X coordinate is more or less obvious: there are 32 columns horizontally (5 bits) with each column containing 8 pixels (requiring 3 bits). The pixel position within a byte (0..7) changes fastest as we move horizontally so it appears as "TTT" in the least significant bits of X. For the Y coordinate, the fastest changing items as we move from the top of the screen to the bottom are the scan line, followed by the character line, followed by the block. Given an X coordinate in the range 0-255 and a Y coordinate in the range 0-192, convert them to binary as in figure 2 and reassemble the bits as in figure 1. For example, pixel coordinate (x,y) = (133,67) in binary is (1000 0101, 0100 0011) with CCCCC=10000, BB=01, LLL=000, SSS=011 according to figure 2. Moving the bits around to the form in figure 1 gives an address of "0100 1011 0001 0000" or 19216 in decimal. The bits "TTT" in the X coordinate do not appear in figure 1. They identify which bit within the screen byte corresponds to the individual pixel. "0" corresponds to the leftmost bit and "7" corresponds to the rightmost; in this case it's 5. To plot the pixel (133,67) we could simply "POKE 19216,BIN 00000100" where the single '1' in the BIN statement sits in bit 5 from the left. Keep in mind that the pixel coordinates I am using have the screen's origin located at the top left corner of the screen. This is different from TS2068 Basic which places the origin 16 pixels above the bottom left corner of the screen. If this procedure had to be done by hand for each pixel, it would get tedious quickly. Here's a short machine code routine that does it for us: ; Get Screen Address ; ; Returns the screen address and pixel mask corresponding ; to a given pixel coordinate. ; ; enter: a = h = y coord ; l = x coord ; exit : de = screen address, b = pixel mask ; uses : af, b, de, hl .SPGetScrnAddr and $07 ; A = 00000SSS or $40 ; A = 01000SSS ld d,a ; D = 01000SSS ld a,h ; A = Y coord = BBLLLSSS rra rra rra ; A = ???BBLLL and $18 ; A = 000BB000 or d ; A = 010BBSSS ld d,a ; D = 010BBSSS top 8 bits of address done ld a,l ; A = X coord = CCCCCTTT and $07 ; A = 00000TTT ld b,a ; B = 00000TTT = which pixel? ld a,$80 ; A = 10000000 jr z, norotate ; if B=0, A is the right pixel so skip .rotloop rra ; rotate the pixel right one place B times djnz rotloop .norotate ld b,a ; B = pixel mask srl l srl l srl l ; L = 000CCCCC ld a,h ; A = Y coord = BBLLLSSS rla rla ; A = LLLSSS?? and $e0 ; A = LLL00000 or l ; A = LLLCCCCC ld e,a ; E = LLLCCCCC ret ; DE = 010BBSS LLLCCCCC, the screen address! The subroutine is called with A=H=Y coordinate and L=X coordinate and we get the screen address in DE and the pixel mask in B on the way out. If we ORed B into (DE), we could plot the pixel. If we ANDed the complement of B into (DE), we could unplot the pixel and if we ANDed B with (DE) we could test whether the pixel was set. This subroutine is great for calculating a screen address corresponding to a pixel position from scratch, but you'll notice that it is rather lengthy and therefore slow, in a relative sense. Frequently you'll be plotting a pixel and then plotting many more nearby, possibly a single pixel away. For example, in the process of drawing a line, the initial point is plotted and then succeeding points above, below, to the left or right are plotted. We could handle the drawing of the line as plotting many individual pixel points, calling the above subroutine to compute the screen address for every pixel, but that would be much slower than working directly on the screen address to move up, down, left and right from a current pixel position. Let’s investigate further to substantiate that claim. Given a screen address in HL and a pixel mask in B, how would one move left one pixel? Here’s the necessary code: ; hl = screen address, B = pixel mask .left rlc b ret nc dec l ret The pixel mask is rotated left one bit. This will be a valid pixel position unless B was already at the leftmost pixel position in the screen byte (ie B=1000 0000). The "RLC B" instruction will set the carry flag in that case and leave B=0000 0001. We use the no carry flag to return early if the new mask is valid, otherwise we update the column position one character to the left by decreasing the "CCCCC" portion of the screen address. The value of B at this point is 0000 0001, correctly masking the rightmost pixel in the new screen byte to the left of the old one. These four instructions are clearly quicker than rerunning the screen address subroutine. Notice that this subroutine doesn't check if it runs off the edge of the screen. The right pixel movement is similar, substituting "rrc b" for "rlc b" and "inc l" for "dec l". To move up a pixel we need to decrement the Y coordinate, as pictured in figure 2. Given a screen address, this means first decreasing SSS followed by LLL (if necessary) and finally BB (if necessary). These bits are scattered about in the screen address pictured in figure 1 so a little care must be taken. The necessary code is shown here: ; hl = screen address .SPPixelUp ld a,h ; A=H=010BBSSS dec h ; decrease SSS and $07 ; if SSS was not originally 000 ret nz ; we're done ld a,$08 ; otherwise SSS=111 (correct) add a,h ; and we fix BB in H (one was subtracted) ld h,a ld a,l ; A=X coord=LLLCCCCC sub $20 ; decrease LLL ld l,a ret nc ; if no carry, LLL was not originally 000, okay ld a,h ; otherwise LLL=111 now, that's okay sub $08 ; but need to decrease screen block ld h,a ret This subroutine derives a lot of speed by minimizing the number of instructions executed in the most common cases. For example, 7 out of 8 times, only the first four instructions will be executed. 7 out 64 times, the first 11 instructions will execute and the rest of the time (1 out of 64) all the instructions will execute. This makes the subroutine much quicker than one would initially guess by looking at the size of the code. The PixelDown subroutine is similar but is not shown here. All these pixel movement routines are reprinted in full in the floodfill listings elsewhere in this article. That's enough information to have a first crack at a machine code version of the Basic flood fill routine. Machine Code Flood Fills Figure 3 contains a direct conversion of the Basic flood fill we saw earlier. No optimization has been done but it has been improved slightly to check for moving across screen boundaries. Type in the associated Basic listing to see it in action. We have managed to speed things up considerably by moving to machine code, but there are still a couple of improvements that can be made. First, we compute screen addresses for every single point plotted. Since we always move up, down, left or right from the current pixel we could speed things up by avoiding this computation as discussed above. The other optimization we can make is to plot 8 pixels at a time rather than one. Recall that each screen byte holds eight pixels. Why fiddle with it eight times to plot eight pixels in it when we could plot all 8 pixels at once with a single write of a whole byte? The secret to plotting multiple pixels at once is the bytefill subroutine. It operates directly on a screen address and pixel mask, exactly what we will have available now that we have decided not to compute the screen address for every single pixel plotted. ; hl = screen address ; b = incoming pixel mask .Bytefill ld a,b ; get pixel mask xor (hl) ; zero out incoming pixels that and b ; run into set pixels in display ret z ; if no pixels left, ret .bfloop ; carry flag never set here ld b,a ; b = incoming pixels rra ; expand incoming pixels ld c,a ; to the right and left ld a,b ; within byte add a,a or c or b ; a = incoming pixels wiggled ld c,a ; save in c xor (hl) ; zero out pixels that run into and c ; set pixels on display cp b ; have pixels changed from last loop? jr nz, bfloop ; keep going until incoming does not change or (hl) ld (hl),a ; fill byte on screen scf ; indicate that this was a viable step ret Bytefill is called with a screen address in HL and a pixel mask containing all the "incoming" pixels. The incoming pixels are those pixels from where the flood fill grows in the current screen byte. Previously the flood fill always grew from a single point, but not anymore. The origin of the incoming byte will be clear while perusing the second flood fill listing in figure 4. The Bytefill routine takes the incoming pixels and "wiggles" them to the left and right, trying to grow them into blank spaces within the screen byte. It does this until no more growth is possible within the screen byte. It then plots all those pixels and returns. Putting these ideas into action produces figure 4, a byte-at-a-time flood fill routine. Type in the Basic listing to see it in action. This routine is blazing fast; you will not see anything quicker. But, and this is a big but, there is a major flaw in the program that is shared with all the previous fills we have seen so far: the recursion depth is huge. Consider a flood fill from the bottom left corner of a blank screen. According to figure 4, the first thing that is done is the fill of the entire screen byte in the bottom left corner. Then a move to the right is made and its byte is filled in a recursive call to "fill". Followed by another right move and call to "fill", then another, until we hit the right edge of the screen. A right move is not possible from the right edge of the screen so a left move is tried from there. That is unsuccessful because it was just filled. An up movement from the right edge is tried, successfully. Now we are at the right edge, one pixel up from the bottom of the screen. The filler fills in the byte and tries a right movement. That's not possible because a screen boundary is hit so it successfully tries a left movement. If this is carried on you'll notice that, from the bottom left corner of a blank screen, the screen is filled alternately from the left to the right and then from the right to the left as the fill line moves one pixel higher for each scan line filled. You may have noticed that not a single return instruction is executed during the entire screen fill. That is a problem. Each call to fill puts at least 2 bytes on the stack to remember the return address. Since no returns are made for all screen bytes on the screen, there are 6144 calls made to fill without a single return. That's a recursion depth of 6144! Since at least 2 bytes are saved on the stack for each recursive call, at least 12288 bytes are needed to complete the fill. The situation can be much worse, however. In the worst case movement (up or down) 6 bytes are saved on a recursive call (two for BC, two for HL and two for the return address). We may need up to 36864 bytes to fill the screen! The truth is somewhere in the middle. Notice that by moving from a pixel fill to a byte fill we have reduced the depth of recursion by a factor of eight since entire screen bytes are considered rather than individual pixels. Still this amount of memory usage is unacceptable for most applications. How can you use a flood fill in your own programs if it's going to need most of the available memory to complete? This recursive fill is an example of a depth-first algorithm. It fills an area by going as deep as possible into the area (one call begets another call begets another, etc. without returning). The result is a memory requirement, computed as recursion depth times size of state information for each recursive step, that is proportional to the area to be filled. We can rescue the situation by considering another algorithmic approach, known as a breadth-first algorithm. Instead of going as deep as possible into an area, we try going wide first. This sounds like a lot of metaphysical talk of questionable value, but the terms "depth-first" and "breadth-first" are bonafide jargon that is used to describe the solution behaviour of many kinds of algorithms. A breadth-first approach to a flood fill would try to fill all points in the immediate area first. From a starting point, all pixels to the immediate left, right, top and bottom are filled. Then for all those adjacent pixels, their immediate neighbours are filled, etc. The savings come from a key observation: once the immediate neighbours of a pixel are filled, there is no need to remember (come back to) the current pixel. Its information can be forgotten. This was not possible in the previous depth-first approaches. As will be seen later, the breadth-first approach will have a memory requirement proportional to the circumference of the area being filled, a significant savings. The Breadth-First Approach To implement the breadth-first approach, we will need to introduce a queue to hold future pixel positions that need investigating. From a currently filled screen byte (imagine the first screen byte to get the ball rolling), each direction should be investigated for possible flood fill expansion. If a move in any direction is possible (ie no pixel boundary was met), the surrounding pixel should be filled and added to the end of the queue for later investigation. Once all directions have been looked into, the next screen byte to investigate is retrieved from the front of the queue. Its immediate neighbours are then investigated with potental expansion pixels added to the end of the queue as before. This loop is repeated until there are no more screen bytes to investigate, indicated by an empty queue. Figure 5 is the implementation. Run the Basic program to see it in action. We have lost a little speed, but you'll notice that the fill seems to progress in a saner manner. In the previous version the fill spread out all over the place. Now the fill expands along a diamond-shaped boundary that grows away from the starting pixel. A little thought will reveal that, at any moment, the screen bytes in the queue are those screen bytes around the edge of the expanding diamond-shaped fill boundary. As a result, the necessary queue size to fill an area is roughly proportional to the circumference of the area to be filled. The implementation in figure 5 allocates space on the stack for a queue whose size is determined by the caller. If at any moment the queue is found not to be large enough to complete the fill, the fill is aborted. This pleasant side effect of the breadth-first approach allows the caller to control how much memory is available for the fill. In previous versions there was no control, with the fill taking as much memory as needed. With the necessary queue size to fill an area proportional to the circumference of the area to be filled, I have found the queue size can be as small as 100 screen positions for a complete fill of a typical screen. At three bytes per screen position, that adds up to a total memory requirement of about 300 bytes, a vast improvement over the previous requirement of between 12288 and 36864 bytes! The Pattern Fill This flood fill would become much more interesting if we were able to apply a pattern while filling an area rather than being stuck with the same old black. That would be the next logical step to take and really it's a small one compared to the large steps we have taken so far. The first question to answer is how can we apply a pattern to the fill region? The procedure is fairly straightforward. Before writing the solid black fill byte to each screen position, logically AND it with a pattern byte. If the pattern is "10101010" and the fill byte is "00011111", the screen should be written with the logical AND of the two: "00001010". If the pattern were just a single byte, however, we could never make a filled area appear as anything more interesting than a collection of vertical stripes using this method. To add some more variation I decided on an 8x8 pixel pattern defined in the same way as a UDG graphic. To determine which byte of the pattern UDG to use, the scan line bits in the screen address of the fill byte is used as an index. This is ideal since the scan line bits iterate through 0-7 repeatedly from the top to the bottom of the screen. For example, if the pattern UDG is stored beginning at address "x" and the screen address is held in register HL, then the pattern byte to use is stored at address "x + H&0x7" where "&" represents a logical AND. This byte should be read and ANDed with the fill byte, followed by a write to the screen to get the desired effect. It is not quite as simple as that, however. With the breadth-first algorithm we have now, the flood fill expands from a diamond-shaped boundary centered on the start pixel. If the naïve approach is taken, as suggested above, and the screen is written with a patterned fill byte, the boundary surrounding the start pixel would have holes in it wherever the pattern byte held a '0' bit. What would prevent the flood fill from expanding through those holes back to the start pixel from where it came? Nothing! The result could be a flood fill that constantly grew into and out of itself, possibly never terminating! This problem occurs because of holes in the outermost boundary of the fill area. To avoid this problem, the outermost boundary needs to be kept black, with the pattern only applied to pixels in the interior of this boundary. The implementation in figure 6 maintains three regions within the queue, called the new block, the investigate block and the pattern block. Each block is delimited by a special sentinel to indicate block boundaries within the queue. The investigate block contains the outermost boundary screen bytes from where the flood fill is growing. This corresponds to the set of screen bytes that were in the queue in the previous black flood-fill. As before, the flood fill attempts to expand from each screen byte in the investigate block in all directions; a successful expansion is added to the new block in the queue. The pattern block contains all those screen bytes that were previously investigated, representing the former outermost boundary. Initially the new block and the pattern block are empty and the investigate block contains the single screen byte representing the starting point of the fill. Fill bytes in the new block are written as solid black to the screen. Once the investigate block has been completely investigated, the screen bytes in the pattern block (currently all black) have the pattern applied to them. Then the points in the investigate block become the pattern block and the new block becomes the investigate block. The loop repeats until the pattern block is empty. This algorithm maintains a solid black boundary two pixels thick around the fill region. The outermost boundary is the investigate block from where the fill grows and the innermost boundary is the pattern block, kept black to avoid an inward growth by the fill algorithm. The queue now needs to be large enough to hold three circumferences of the fill region, one circumference per block in the queue. Previously it was found that 100 queue positions were needed per circumference so the pattern fill will need about 300 queue positions to complete most fills on screen. That corresponds to about 900 bytes of memory, still an acceptable memory requirement for almost all applications! The assembly listing in figure 6 is rather large so I have not produced a Basic listing that you can type in and run to see the pattern fill in action. Instead I have supplied a C program that calls the pattern fill subroutine made available through the Sprite Pack library. The Sprite Pack library is a collection of various assembly language subroutines I have written over the years and made available as a C library for C programs compiled using z88dk. Another article in this issue of ZQA explains this further. You can see the demo in action on your ts2068 or an emulator by downloading the compiled program from my website at "http://justme895.tripod.com/zqa/fillprogs.zip". This zip file contains "pfill.tap" which can be loaded and run in an emulator or on the real machine as described in the other article. A screenshot of the demo in action can be seen elsewhere in this article. Figure 3. Pixel Coordinate Depth-First Flood Fill ; SPGetScrnAddr is not reprinted here to reduce the ; size of the article. It can be found in the article text. .test ld h,96 ; starting y coordinate ld l,128 ; starting x coordinate ; Flood Fill Version 1 ; H = Y coord 0..191, L = X coord 0..255 .flood1 push hl ; save (x,y) coordinate ld a,h ; GetScrnAddr requires A=H call SPGetScrnAddr ; compute screen address pop hl ; restore (x,y) in HL ld a,(de) ; byte on screen and b ; check if this pixel is set ret nz ; if so, hit boundary so ret ld a,(de) ; get screen byte or b ; set this pixel ld (de),a ; plot it on screen .right inc l ; move pixel coord right call nz, flood1 ; if no wrap 255->0 dec l ; restore x coord .left dec l ; move pixel coord left ld a,l inc a call nz, flood1 ; if no wrap 0->255 inc l ; restore x coord .up dec h ; move pixel coord up ld a,h inc a call nz, flood1 ; if no wrap 0->255 inc h ; restore y coord .down inc h ; move pixel coord down ld a,h cp 192 call c, flood1 ; if less than 192 dec h ; restore y coord ret 10 REM COPY MACHINE CODE INTO MEMORY 20 FOR n=32768 TO 32850: READ a: POKE n,a: NEXT n 30 REM DRAW SOME CIRCLES ON DISPLAY 40 FOR n=1 TO 10 50 LET x=INT (RND*256) 60 LET y=INT (RND*176) 70 LET r=INT (RND*40) 80 IF x-r<0 OR y-r<0 OR x+r>255 OR y+r>175 THEN GO TO 50 90 CIRCLE x,y,r: NEXT n: CIRCLE 127,88,87 95 REM THIS LAST CIRCLE IS NEEDED TO ENFORCE A MAXIMUM SIZE FILL REGION, OTHERWISE THERE MAY NOT BE ENOUGH MEMORY FOR THE FILL WHICH WOULD LEAD TO A CRASH 100 LET a=USR 32768: PAUSE 50: RUN 1000 DATA 38,96,46,128,229,124,205,44,128,225,26 1010 DATA 160,192,26,176,18,44,196,4,128,45,45 1020 DATA 125,60,196,4,128,44,37,124,60,196,4,128 1030 DATA 36,36,124,254,192,220,4,128,37,201 1035 REM SPGetScrnAddr 1040 DATA 230,7,246,64,87,124,31,31,31,230,24,178 1050 DATA 87,125,230,7,71,62,128,40,3,31,16,253 1060 DATA 71,203,61,203,61,203,61,124,23,23,230 1070 DATA 224,181,95,201 Figure 4. Byte-At-A-Time Depth-First Flood Fill ; SPGetScrnAddr, Bytefill and SPPixelDown are not reprinted here ; to reduce the size of the article. They can be found in the ; article text. .test ld h,96 ; start pixel at centre of screen ld l,128 ; byte at a time fill ; h = y coord, l = x coord .flood2 ld a,h call SPGetScrnAddr ; b = pixel mask ex de,hl ; hl = screen address .fill call Bytefill ; wiggle around incoming pixel mask ret nc ; if incoming pixels hit boundary, ret .up push hl ; save screen address call SPPixelUp ; move up one pixel jr c, offscreen1 push bc ; save pixel mask call fill ; try to fill from new screen position pop bc ; moving up, pixel mask remains same .offscreen1 pop hl .down ; a replay of up push hl call SPPixelDown jr c, offscreen2 push bc call fill pop bc .offscreen2 pop hl .right bit 0,b ; if first pixel in mask set, try right jr z, left inc l ; move right one byte ld a,l ; have we wrapped off screen? and $1f ; (if so, CCCCC=0 now) jr z, offscreen3 push bc ; save current pixel mask ld b,$80 ; new incoming mask = leftmost pixel set call fill ; fill from new screen position pop bc .offscreen3 dec l .left ; a replay of right bit 7,b ret z ld a,l and $1f ret z dec l ld b,$01 call fill inc l ret ; enter: HL = valid screen address ; exit : Carry = moved off screen ; HL = moves one pixel up ; used : AF, HL .SPPixelUp ld a,h dec h and $07 ret nz ld a,$08 add a,h ld h,a ld a,l sub $20 ld l,a ret nc ld a,h sub $08 ld h,a cp $40 ret 10 REM COPY MACHINE CODE INTO MEMORY 20 FOR n=32768 TO 32941: READ a: POKE n,a: NEXT n 30 REM DRAW SOME CIRCLES ON DISPLAY 40 FOR n=1 TO 10 50 LET x=INT (RND*256) 60 LET y=INT (RND*176) 70 LET r=INT (RND*40) 80 IF x-r<0 OR y-r<0 OR x+r>255 OR y+r>175 THEN GO TO 50 90 CIRCLE x,y,r: NEXT n: CIRCLE 127,88,87 100 LET a=USR 32768: PAUSE 50: RUN 1000 DATA 38,96,46,128,124,205,70,128,235,205,109,128 1010 DATA 208,229,205,131,128,56,5,197,205,9,128,193 1020 DATA 225,229,205,152,128,56,5,197,205,9,128,193 1030 DATA 225,203,64,40,14,44,125,230,31,40,7,197,6 1040 DATA 128,205,9,128,193,45,203,120,200,125,230,31 1050 DATA 200,45,6,1,205,9,128,44,201 1055 REM SPGetScrnAddr 1060 DATA 230,7,246,64,87,124,31,31,31,230,24,178 1070 DATA 87,125,230,7,71,62,128,40,3,31,16,253 1080 DATA 71,203,61,203,61,203,61,124,23,23,230 1090 DATA 224,181,95,201 1095 REM ByteFill 1100 DATA 120,174,160,200,71,31,79,120,135,177 1110 DATA 176,79,174,161,184,194,113,128,182,119 1120 DATA 55,201 1125 REM SPPixelUp 1130 DATA 124,37,230,7,192,62,8,132,103,125,214 1140 DATA 32,111,208,124,214,8,103,254,64,201 1145 REM SPPixelDown 1150 DATA 36,124,230,7,192,124,214,8,103,125,198 1160 DATA 32,111,208,124,198,8,103,254,88,63,201 Figure 5. Breadth-First Black Flood Fill ; GetScrnAddr, Bytefill, PixelUp and PixelDown have ; been omitted for brevity. You can find them ; elsewhere in this article. .test ld l,128 ; x coord ld h,96 ; y coord ld bc,100 ; queue size call ffill ret ; enter: h = y coord, l = x coord, bc = queue size ; used : ix, af, bc, de, hl ; exit : this version does not bail, but portions of the screen may not be ; filled if the queue size was too small ; stack: 3*bc+12 bytes, not including the call to ffill or interrupts .ffill ld a,h cp 192 ret nc ; if y coord out of bounds dec bc ; we will start with one struct in the queue push bc ; save max stack depth variable call getscrnaddr ; de = screen address, b = pixel byte ex de,hl ; hl = screen address call bytefill ; b = fill byte jr c, viable pop bc ret .viable ld ix,-1 add ix,sp ; ix = top of queue = initial investigate block push hl ; screen address and fill byte are push bc ; first struct in investigate block inc sp xor a push af ; mark end of investigate block dec sp ld c,(ix+1) ; reserve space on stack for queue ld b,(ix+2) ; bc = max stack depth - 1 inc bc ld l,c ld h,b add hl,bc ; space required = 3*BC (max depth) + 7 add hl,bc ; but have already taken 6 bytes and the ld c,l ; queue end marker is pushed below. ld b,h ; bc = # uninitialized bytes in queue ld h,a ld l,a ; hl = 0 sbc hl,bc ; hl = -bc add hl,sp ld (hl),a ; zero last byte in queue ld sp,hl ; move stack below queue ld a,$80 push af ; mark end of queue with $80 byte inc sp ld e,l ld d,h inc de dec bc ldir ; zero the uninitialized bytes in queue ; NOTE: Must move the stack before clearing the queue, otherwise an interrupt ; may overwrite portions of the cleared queue. push ix pop bc ; bc = top of queue ld hl,-6 add hl,bc ex de,hl ; de = new block ld l,c ld h,b ; hl = investigate block ; ix = top of queue, bottom of queue marked with $80 byte ; hl = investigate block, de = new block ; Variables indexed by ix, LSB first: ; ix + 03/04 return address ; ix + 01/02 max stack depth ; A picture of memory at this point: ; ;+-----------------------+ higher addresses ;| | | ;|- return address -| \|/ ;| | V ;+-----------------------+ lower addresses ;| | ;|- max stack depth -| ;| | ;+-----------------------+ ;| screen address MSB | <- ix = top of queue = hl = investigate block ;| screen address LSB | ;| fill byte | ;+-----------------------+ ;| end of block marker | ;| ? | ;| ? | ;+-----------------------+ ;| 0 | <- de = new block ;| 0 | ;| 0 | ;+-----------------------+ ;| | ;| ...... | size is a multiple of 3 bytes ;| rest of queue | ;| all zeroed | ;| ...... | ;| | ;+-----------------------+ ;| $80 | <- sp, special byte marks end of queue ;+-----------------------+ .pfloop ld a,(hl) cp $80 ; bit 15 of screen addr set if time to wrap jr c, inowrap push ix pop hl ; hl = ix = top of queue ld a,(hl) .inowrap cp $40 ; screen address < $4000 marks end of block jr c, endinv ; are we done yet? ld b,a dec hl ld c,(hl) ; bc = screen address dec hl ld a,(hl) ; a = fill byte dec hl inc (ix+1) ; increase available queue space by one jr nz, bcnowrap inc (ix+2) .bcnowrap push hl ; save spot in investigate block ld l,c ld h,b ; hl = screen address ld b,a ; b = fill byte .goup push hl ; save screen address call pixelup ; move screen address up one pixel jr c, updeadend ; if went off-screen push bc ; save fill byte call bytefill call c, addnew ; if up is not dead end, add this to new block pop bc ; restore fill byte .updeadend pop hl ; restore screen address .godown push hl ; save screen address call pixeldown ; move screen address down one pixel jr c, downdeadend ; if went off-screen push bc ; save fill byte call bytefill call c, addnew ; if down is not dead end, add this to new block pop bc ; restore fill byte .downdeadend pop hl ; restore screen address .goleft bit 7,b ; can only move left if leftmost bit of fill byte set jr z, goright push hl ; save screen address ld a,l dec l ; decrease column and 31 jr z, leftdeadend ; if went off-screen push bc ; save fill byte ld b,$01 ; set rightmost pixel for incoming byte call bytefill call c, addnew ; if left is not dead end, add this to new block pop bc ; restore fill byte .leftdeadend pop hl ; restore screen address .goright bit 0,b ; can only move right if rightmost bit of fill byte set jr z, nextinv inc l ; next column ld a,l and 31 jr z, nextinv ; if went off-screen ld b,$80 ; set leftmost pixel for incoming byte call bytefill call c, addnew ; if right is not dead end, add this to new block .nextinv pop hl ; hl = spot in investigate block jr pfloop .endinv dec hl dec hl dec hl ; investigate block now points at new block ld a,(de) ; check if new block is at end of queue cp $80 jr c, nowrapnew push ix pop de ; de = ix = top of queue .nowrapnew xor a ld (de),a ; store end marker for new block dec de dec de dec de ld a,(hl) ; done if the investigate block is empty cp $40 jr nc, pfloop .endpfill ld sp,ix inc sp inc sp inc sp ; return address at ix+3 ret ; add incoming fill byte and screen address to new block ; enter b = incoming byte, hl = screen address, de = new block .addnew push hl ; save screen address ld l,(ix+1) ld h,(ix+2) ; hl = max stack depth ld a,h or l jr nz, stillroom ; this version doesn't bail pop hl ; just don't add to new block ret .stillroom dec hl ; available queue space decreases by one struct ld (ix+1),l ld (ix+2),h pop hl ; hl = screen address ld a,(de) ; check if new block is at end of queue cp $80 jr c, annowrap push ix pop de ; de = ix = top of queue .annowrap ex de,hl ld (hl),d ; make struct, store screen address (2 bytes) dec hl ld (hl),e dec hl ld (hl),b ; store fill byte (1 byte) dec hl ex de,hl ret 10 REM COPY MACHINE CODE INTO MEMORY 20 FOR n=32768 TO 33121: READ a: POKE n,a: NEXT n 30 REM DRAW SOME CIRCLES ON DISPLAY 40 FOR n=1 TO 10 50 LET x=INT (RND*256) 60 LET y=INT (RND*176) 70 LET r=INT (RND*40) 80 IF x-r<0 OR y-r<0 OR x+r>255 OR y+r>175 THEN GO TO 50 90 CIRCLE x,y,r: NEXT n: CIRCLE 127,88,87 100 LET a=USR 32768: PAUSE 50: RUN 995 REM TEST 1000 DATA 46,128,38,96,1,100,0,205,11,128,201 1005 REM FFILL 1010 DATA 124,254,192,208,11,197,205,249,128,235 1020 DATA 205,75,129,56,2,193,201,221,33,255,255 1030 DATA 221,57,229,197,51,175,245,59,221,78,1 1040 DATA 221,70,2,3,105,96,9,9,77,68,103,111,237 1050 DATA 66,57,119,249,62,128,245,51,93,84,19,11 1060 DATA 237,176,221,229,193,33,250,255,9,235,105 1070 DATA 96,126,254,128,56,4,221,229,225,126,254,64 1080 DATA 56,91,71,43,78,43,126,43,221,52,1,32,3,221 1090 DATA 52,2,229,105,96,71,229,205,32,129,56,8,197 1100 DATA 205,75,129,220,211,128,193,225,229,205 1110 DATA 53,129,56,8,197,205,75,129,220,211,128,193 1120 DATA 225,203,120,40,18,229,125,45,230,31,40,10 1130 DATA 197,6,1,205,75,129,220,211,128,193,225,203 1140 DATA 64,40,14,44,125,230,31,40,8,6,128,205 1150 DATA 75,129,220,211,128,225,24,152,43,43,43,26 1160 DATA 254,128,56,3,221,229,209,175,18,27,27,27 1170 DATA 126,254,64,48,131,221,249,51,51,51,201 1175 REM ADDNEW 1180 DATA 229,221,110,1,221,102,2,124,181,32,2,225,201 1190 DATA 43,221,117,1,221,116,2,225,26,254,128,56,3 1200 DATA 221,229,209,235,114,43,115,43,112,43,235,201 1205 REM GETSCRNADDR 1210 DATA 230,7,246,64,87,124,31,31,31,230,24,178 1220 DATA 87,125,230,7,71,62,128,40,3,31,16,253 1230 DATA 71,203,61,203,61,203,61,124,23,23,230 1240 DATA 224,181,95,201 1245 REM PIXELUP 1250 DATA 124,37,230,7,192,62,8,132,103,125,214 1260 DATA 32,111,208,124,214,8,103,254,64,201 1265 REM PIXELDOWN 1270 DATA 36,124,230,7,192,124,214,8,103,125,198 1280 DATA 32,111,208,124,198,8,103,254,88,63,201 1285 REM BYTEFILL 1290 DATA 120,174,160,200,71,31,79,120,135,177 1300 DATA 176,79,174,161,184,71,194,80,129,182,119 1310 DATA 55,201 Figure 6. Breadth-First Pattern Flood Fill ; In the interest of brevity, the source for getscrnaddr, pixelup, ; pixeldown and bytefill is not reprinted here. ; Each entry in the queue is a 3-byte struct that grows down in memory: ; screen address (2-bytes, MSB first) ; fill byte (1-byte) ; Screen address with MSB<0x40 is used to indicate the end of a block. ; Screen address with MSB>=0x80 is used to mark the physical end of Q. ; ; The fill pattern is a typical 8x8 pixel character, stored in 8 bytes. ; enter: h = y coord, l = x coord, bc = queue size, de = address of fill pattern ; In hi-res mode, carry flag is most significant bit of x coord ; used : ix, af, bc, de, hl ; exit : no carry = success, carry = had to bail queue was too small ; stack: 3*bc+30 bytes, not including the call to PFILL or interrupts .SPPFill push de ; save (pattern pointer) variable dec bc ; we will start with one struct in the queue push bc ; save max stack depth variable ld a,h call SPGetScrnAddr ; de = screen address, b = pixel byte ex de,hl ; hl = screen address call bytefill ; b = fill byte jr c, viable pop bc pop de ret .viable ex de,hl ; de = screen address, b = fill byte ld hl,-7 add hl,sp push hl ; create pattern block pointer = top of queue push hl pop ix ; ix = top of queue dec hl dec hl dec hl push hl ; create investigate block pointer ld hl,-12 add hl,sp push hl ; create new block pointer xor a push af dec sp ; mark end of pattern block push de ; screen address and fill byte are push bc ; first struct in investigate block inc sp push af dec sp ; mark end of investigate block ld c,(ix+7) ld b,(ix+8) ; bc = max stack depth - 1 inc bc ld l,c ld h,b add hl,bc ; space required = 3*BC (max depth) + 10 add hl,bc ; but have already taken 9 bytes ld c,l ld b,h ; bc = # uninitialized bytes in queue ld hl,0 sbc hl,bc ; negate hl, additions above will not set carry add hl,sp ld (hl),0 ; zero last byte in queue ld sp,hl ; move stack below queue ld a,$80 push af ; mark end of queue with $80 byte inc sp ld e,l ld d,h inc de dec bc ldir ; zero the uninitialized bytes in queue ; NOTE: Must move the stack before clearing the queue, otherwise an interrupt could overwrite portions of the (just cleared) queue. ; ix = top of queue, bottom of queue marked with 0x80 byte ; Variables indexed by ix, LSB first: ; ix + 11/12 return address ; ix + 09/10 fill pattern pointer ; ix + 07/08 max stack depth ; ix + 05/06 pattern block pointer ; ix + 03/04 investigate block pointer ; ix + 01/02 new block pointer ; A picture of memory at this point: ; ;+-----------------------+ higher addresses ;| | | ;|- return address -| \|/ ;| | V ;+-----------------------+ lower addresses ;| fill | ;|- pattern pointer -| ;| | ;+-----------------------+ ;| | ;|- max stack depth -| ;| | ;+-----------------------+ ;| | ;|- pattern block -| ;| | ;+-----------------------+ ;| | ;|- investigate block -| ;| | ;+-----------------------+ ;| | ;|- new block -| ;| | ;+-----------------------+ ;| end of block marker | <- ix = pattern block = top of queue ;| ? | ;| ? | ;+-----------------------+ ;| screen address MSB | <- investigate block ;| screen address LSB | ;| fill byte | ;+-----------------------+ ;| end of block marker | ;| ? | ;| ? | ;+-----------------------+ ;| 0 | <- new block ;| 0 | ;| 0 | ;+-----------------------+ ;| | ;| ...... | size is a multiple of 3 bytes ;| rest of queue | ;| all zeroed | ;| ...... | ;| | ;+-----------------------+ ;| 0x80 | <- sp, special byte marks end of queue ;+-----------------------+ .pfloop ld l,(ix+3) ld h,(ix+4) ; hl = investigate block ld e,(ix+1) ld d,(ix+2) ; de = new block call investigate ld (ix+1),e ld (ix+2),d ; save new block ld (ix+3),l ld (ix+4),h ; save investigate block ld l,(ix+5) ld h,(ix+6) ; hl = pattern block ld c,(ix+7) ld b,(ix+8) ; bc = max stack depth (available space) call applypattern ld (ix+7),c ld (ix+8),b ; save stack depth ld (ix+5),l ld (ix+6),h ; save pattern block ld a,(hl) ; done if the investigate block was empty cp 0x40 jp nc, pfloop .endpfill ld de,11 ; return address is at ix+11 add ix,de ld sp,ix or a ; make sure carry is clear, indicating success ret ; IN/OUT: hl = investigate block, de = new block .investigate ld a,(hl) cp 0x80 ; bit 15 of screen addr set if time to wrap jp c, inowrap push ix pop hl ; hl = ix = top of queue ld a,(hl) .inowrap cp 0x40 ; screen address < 0x4000 marks end of block jp c, endinv ; are we done yet? ld b,a dec hl ld c,(hl) ; bc = screen address dec hl ld a,(hl) ; a = fill byte dec hl push hl ; save spot in investigate block ld l,c ld h,b ; hl = screen address ld b,a ; b = fill byte .goup push hl ; save screen address call SPPixelUp ; move screen address up one pixel jr c, updeadend ; if went off-screen push bc ; save fill byte call bytefill call c, addnew ; if up is not dead end, add this to new block pop bc ; restore fill byte .updeadend pop hl ; restore screen address .godown push hl ; save screen address call SPPixelDown ; move screen address down one pixel jr c, downdeadend push bc ; save fill byte call bytefill call c, addnew ; if down is not dead end, add this to new block pop bc ; restore fill byte .downdeadend pop hl ; restore screen address .goleft bit 7,b ; can only move left if leftmost bit of fill byte set jr z, goright ld a,l and 31 jr nz, okleft bit 5,h ; for hi-res mode: column = 1 if l=0 and bit 5 of h is set jr z, goright .okleft push hl ; save screen address call SPCharLeft push bc ; save fill byte ld b,0x01 ; set rightmost pixel for incoming byte call bytefill call c, addnew ; if left is not dead end, add this to new block pop bc ; restore fill byte pop hl ; restore screen address .goright bit 0,b ; can only move right if rightmost bit of fill byte set jr z, nextinv or a ; clear carry call SPCharRight jr c, nextinv ; went off screen ld a,l and 31 jr z, nextinv ; wrapped around line ld b,0x80 ; set leftmost pixel for incoming byte call bytefill call c, addnew ; if right is not dead end, add this to new block .nextinv pop hl ; hl = spot in investigate block jp investigate .endinv dec hl dec hl dec hl ; investigate block now points at new block ld a,(de) ; check if new block is at end of queue cp 0x80 jr c, nowrapnew defb 0xdd ld e,l defb 0xdd ld d,h ; de = ix = top of queue .nowrapnew xor a ld (de),a ; store end marker for new block dec de dec de dec de ret ; add incoming fill byte and screen address to new block ; enter b = incoming byte, hl = screen address, de = new block .addnew push hl ; save screen address ld l,(ix+7) ld h,(ix+8) ; hl = max stack depth ld a,h or l jr z, bail ; no space in queue so bail! dec hl ; available queue space decreases by one struct ld (ix+7),l ld (ix+8),h pop hl ; hl = screen address ld a,(de) ; check if new block is at end of queue cp 0x80 jr c, annowrap defb 0xdd ld e,l defb 0xdd ld d,h ; de = ix = top of queue .annowrap ex de,hl ld (hl),d ; make struct, store screen address (2 bytes) dec hl ld (hl),e dec hl ld (hl),b ; store fill byte (1 byte) dec hl ex de,hl ret ; if the queue filled up, we need to bail. Bailing means patterning any set pixels ; which may still be on the display. If we didn't bail ; there is no guarantee the fill would ever return. .bail pop hl ; hl = screen address, b = fill byte ld a,b xor (hl) ld (hl),a ; clear this byte on screen xor a ld (de),a ; mark end of new block ld l,(ix+5) ld h,(ix+6) ; hl = pattern block call applypattern ; for pattern block call applypattern ; for investigate block call applypattern ; for new block ld de,11 ; return address is at ix+11 add ix,de ld sp,ix scf ; indicate we had to bail ret ; hl = pattern block, bc = max stack depth (available space) .applypattern ld a,(hl) cp 0x80 ; bit 15 of screen addr set if time to wrap jp c, apnowrap push ix pop hl ; hl = ix = top of queue ld a,(hl) .apnowrap cp 0x40 ; screen address < 0x4000 marks end of block jr c, endapply ; are we done yet? and 0x07 ; use scan line 0..7 to index pattern add a,(ix+9) ld e,a ld a,0 adc a,(ix+10) ld d,a ; de points into fill pattern ld a,(de) ; a = pattern ld d,(hl) dec hl ld e,(hl) ; de = screen address dec hl and (hl) ; and pattern with fill byte sub (hl) ; or in complement of fill byte dec a ex de,hl and (hl) ; apply pattern to screen ld (hl),a ex de,hl dec hl inc bc ; increase available queue space jp applypattern .endapply dec hl dec hl dec hl ; pattern block now pts at investigate block ret /* Pattern Fill Demo Program */ /* Alvin Albrecht 01.2003 */ /* C Program for ts2068 or Spectrum */ /* Compile with the z88dk compiler */ /* http://z88dk.sourceforge.net/ */ #include #include #include /* first define some pattern UDGs */ extern uchar patterns[]; #asm ._patterns defb @11111111 defb @11111111 defb @11111111 defb @11111111 defb @11111111 defb @11111111 defb @11111111 defb @11111111 defb @10101010 defb @01010101 defb @10101010 defb @01010101 defb @10101010 defb @01010101 defb @10101010 defb @01010101 defb @00000000 defb @01111110 defb @01100110 defb @01100110 defb @01100110 defb @01100110 defb @01111110 defb @00000000 defb @10001000 defb @01000100 defb @00100010 defb @00010001 defb @10001000 defb @01000100 defb @00100010 defb @00010001 defb @00010001 defb @00100010 defb @01000100 defb @10001000 defb @00010001 defb @00100010 defb @01000100 defb @10001000 defb @10011001 defb @01100110 defb @01100110 defb @10011001 defb @10011001 defb @01100110 defb @01100110 defb @10011001 defb @00100010 defb @01010101 defb @10001000 defb @00000000 defb @00100010 defb @01010101 defb @10001000 defb @00000000 defb @11111111 defb @10000000 defb @10100010 defb @10010100 defb @10001000 defb @10010100 defb @10100010 defb @10000000 #endasm main() /* C programs start here */ { int x,y,r,n; while (1) { /* forever */ clg(); /* clear screen */ for (n=0; n!=10; n++) { /* for loop executed 10 times */ do { x = rand() % 256; /* pick centre coordinate of circle */ y = rand() % 192; r = rand() % 40; /* pick random radius */ } while (((x-r)<0) || ((y-r)<0) || ((x+r)>255) || ((y+r)>191)); circle(x,y,r,1); /* draw random circle */ } x = rand() % 256; y = rand() % 192; r = (rand() % 8)*8; /* 1 of 8 patterns defined above */ sp_PFill(x, y, patterns + r, 300); /* pattern fill @ x,y */ sp_WaitForKey(); } }