6181 lines
256 KiB
Plaintext
6181 lines
256 KiB
Plaintext
|
||
########
|
||
##################
|
||
###### ######
|
||
#####
|
||
##### #### #### ## ##### #### #### #### #### #### #####
|
||
##### ## ## #### ## ## ## ### ## #### ## ## ##
|
||
##### ######## ## ## ## ##### ## ## ## ## ##
|
||
##### ## ## ######## ## ## ## ### ## ## #### ## ##
|
||
##### #### #### #### #### ##### #### #### #### #### #### ######
|
||
##### ##
|
||
###### ###### Issue #16
|
||
################## April 26, 1998
|
||
########
|
||
|
||
...............................................................................
|
||
|
||
|
||
Knowledge is power. -- Nam et ipsa scientia potestas est.
|
||
|
||
Francis Bacon, Meditationes Sacrae
|
||
|
||
|
||
...............................................................................
|
||
|
||
|
||
BSOUT
|
||
|
||
The number 16 is an important number for Commodore folks.
|
||
Commodore's most famous computer is so named for having 2^16 bytes.
|
||
I'm sure many of us were greatly influenced by Commodore as 16-year
|
||
olds. And it was just a little over sixteen years ago that the
|
||
Commodore 64 first became available.
|
||
I want to convey to you what a remarkable fact this is.
|
||
After sixteen years the Commodore 64 is not only still being used,
|
||
but the entire community is still moving forwards. People are still
|
||
using the machine seriously, new and innovative hardware and software
|
||
is still being developed, and new and innovative uses for these old
|
||
machines are still being found. I do not believe any other computer
|
||
community can make this claim. How cool is that?
|
||
Thus does issue #16 boldly stride forwards into the deep
|
||
realms of the Commodore unknown, secure in the knowledge that the
|
||
Commodore community will blaze brightly well into the future, and
|
||
in eager anticipation of the undiscovered algorithms and schematics
|
||
which lie just around the corner.
|
||
|
||
And now a few words on the future of C=Hacking. As everyone
|
||
knows, C=Hacking has been pining for the fjords for a while now.
|
||
Now that it is once again displaying its beautiful plumage, I hope
|
||
to keep new issues appearing reasonably regularly. My current
|
||
thinking is that C=Hacking will appear on a "critical mass" basis:
|
||
once enough articles have arrived, a new issue will be released.
|
||
I will of course be meanwhile digging around for material and authors,
|
||
and we'll see if four issues per year is too optimistic (one caveat:
|
||
I will be trying to graduate soon, so you might have to wait a
|
||
little bit longer than normal for the next issue or two).
|
||
I also expect to slim the issues down somewhat. The focus
|
||
will now be technical -- as Jim mentioned in issue #15, the nontechnical
|
||
stuff will move to another mag. Instead of a behemoth magazine with
|
||
tons of articles, I am going to try using a critical mass of 2-3 main
|
||
articles plus some smaller articles. (This might also make it
|
||
possible to release issues more often).
|
||
The magazine now has four sections: Jiffies, The C=Hallenge,
|
||
Side Hacking, and the main articles. Jiffies are just short, quickie
|
||
and perhaps quirky programs, in the flavor of the old RUN "Magic" column,
|
||
or "Bits & Pieces" from The Transactor. The C=Hallenge presents a
|
||
challenge problem for readers to submit solutions to, to be published
|
||
in future issues. Side Hacking is for articles which are too small to
|
||
be main articles but nifty in their own right. Thus there is now room
|
||
for articles of all sizes, from monstrous to a mere screenful. With the
|
||
first two sections I am hoping to stimulate reader involvement, and
|
||
we'll see if it works or not. In this issue I have included at least one
|
||
of each type of article, to give a flavor of the different sections.
|
||
Otherwise, things ought to be more or less the same. I'd like
|
||
to thank Jim Brain for keeping C=Hacking going over the last few years,
|
||
and also for the use of the jbrain.com web site. I'd also like to say
|
||
that although this issue seems to be the "Steve and Pasi issue", there's
|
||
no particular reason to expect that to be the case for future issues.
|
||
And now... onwards!
|
||
.......
|
||
....
|
||
..
|
||
. C=H
|
||
|
||
::::::::::::::::::::::::::::::::::: Contents ::::::::::::::::::::::::::::::::::
|
||
|
||
BSOUT
|
||
o Voluminous ruminations from your unfettered editor.
|
||
|
||
|
||
Jiffies
|
||
o Quick and nifty.
|
||
|
||
|
||
The C=Hallenge
|
||
|
||
o All is now clear.
|
||
|
||
|
||
Side Hacking
|
||
|
||
o "PAL VIC20 goes NTSC", by Timo Raita <vic@iki.fi> and
|
||
Pasi 'Albert' Ojala <albert@cs.tut.fi>. How to turn your
|
||
PAL VIC20 into an NTSC VIC20.
|
||
|
||
o "Starfields", by S. Judd <sjudd@nwu.edu>. Making a simple
|
||
starfield.
|
||
|
||
o "Milestones of C64 Data Compression", by Pontus Berg
|
||
<bacchus@fairlight.org>. A short history of data compression
|
||
on the 64.
|
||
|
||
|
||
Main Articles
|
||
|
||
o "Compression Basics", by Pasi 'Albert' Ojala <albert@cs.tut.fi>.
|
||
Part one of a two-part article on data compression, giving an
|
||
introduction to and overview of data compression and related
|
||
issues.
|
||
|
||
o "3D for the Masses: Cool World and the 3D Library", by S. Judd
|
||
<sjudd@nwu.edu>. The mother of all 3d articles.
|
||
|
||
|
||
|
||
.................................. Stuff ....................................
|
||
|
||
Legalities
|
||
----------
|
||
|
||
Rather than write yet another flatulent paragraph that nobody will
|
||
ever read, I will now make my own little contribution to dismantling
|
||
the 30 years' worth of encrustation that has led to the current Byzantine
|
||
US legal system:
|
||
|
||
C=Hacking is a freely available magazine, involving concepts which
|
||
may blow up your computer if you're not careful. The individual authors
|
||
hold the copyrights on their articles.
|
||
|
||
Rather than procedure and language, I therefore depend on common sense and
|
||
common courtesy. If you have neither, then you probably shouldn't use
|
||
a 64, and you definitely shouldn't be reading C=Hacking. Please email
|
||
any questions or concerns to chacking-ed@mail.jbrain.com.
|
||
|
||
General Info
|
||
------------
|
||
|
||
-For information on subscribing to the C=Hacking mailing list, send email to
|
||
|
||
chacking-info@mail.jbrain.com
|
||
|
||
-For general information on anything else, send email to
|
||
|
||
chacking-info@mail.jbrain.com
|
||
|
||
-For information on chacking-info@mail.jbrain.com, send email to
|
||
|
||
chacking-info@mail.jbrain.com
|
||
|
||
To submit an article:
|
||
|
||
- Invest $5 in a copy of Strunk & White, "The Elements of Style".
|
||
- Read it.
|
||
- Send some email to chacking-ed@mail.jbrain.com
|
||
|
||
Note that I have a list of possible article topics. If you have a topic
|
||
you'd like to see an article on please email chacking@mail.jbrain.com
|
||
and I'll see what I can do!
|
||
.......
|
||
....
|
||
..
|
||
. C=H
|
||
|
||
................................... Jiffies ..................................
|
||
|
||
|
||
Well folks, this issue's Jiffy is pretty lonely. I've given it a few
|
||
friends, but I hope you will send in some nifty creations and tricks
|
||
of your own.
|
||
|
||
|
||
$01 From John Ianetta, 76703.4244@compuserve.com:
|
||
|
||
This C-64 BASIC type-in program is presented here without further comment.
|
||
|
||
10 printchr$(147)
|
||
20 poke55,138:poke56,228:clr
|
||
30 a$="ibm":print"a$ = ";a$
|
||
40 b$="macintosh"
|
||
50 print:print"b$ = ";b$
|
||
60 print:print"a$ + b$ = ";a$+b$
|
||
70 poke55,.:poke56,160:clr:list
|
||
|
||
|
||
$02 I am Commodore, here me ROR!
|
||
|
||
First trick: performing ROR on a possibly signed number. Don't blink!
|
||
|
||
CMP #$80
|
||
ROR
|
||
|
||
Second trick: performing a cyclical left shift on an 8-bit number.
|
||
|
||
CMP #$80
|
||
ROL
|
||
|
||
Oooohh! Aaaaahhh! Another method:
|
||
|
||
ASL
|
||
ADC #00
|
||
........
|
||
....
|
||
..
|
||
. C=H
|
||
|
||
............................... The C=Hallenge ...............................
|
||
|
||
|
||
The chacking challenge this time around is very simple: write a program
|
||
which clears the screen.
|
||
|
||
This could be a text screen or a graphics screen. Naturally ? CHR$(147)
|
||
is awfully boring, and the point is to come up with an interesting
|
||
algorithm -- either visually or code-wise -- for clearing the screen.
|
||
|
||
The purpose here is to get some reader involvement going, so submit your
|
||
solutions to
|
||
|
||
chacking-ed@mail.jbrain.com
|
||
|
||
and chances are awfully good that you'll see them in the next issue!
|
||
(Source code is a huge bonus). And, if you have a good C=Hacking
|
||
C=Hallenge problem, by all means please send it to the above address.
|
||
........
|
||
....
|
||
..
|
||
. C=H
|
||
|
||
................................ Side Hacking ................................
|
||
|
||
|
||
PAL VIC20 Goes NTSC
|
||
|
||
by Timo Raita <vic@iki.fi> http://www.iki.fi/vic/
|
||
Pasi 'Albert' Ojala <albert@cs.tut.fi> http://www.cs.tut.fi/~albert/
|
||
_________________________________________________________________
|
||
|
||
Introduction
|
||
|
||
Recently Marko M<>kel<65> organized an order from Jameco's C= chip
|
||
closeout sale, one of the items being a heap of 6560R2-101 chips. When
|
||
we had the chips, we of course wanted to get some of our PAL machines
|
||
running with these NTSC VIC-I chips. A couple of weeks earlier Richard
|
||
Atkinson wrote on the cbm-hackers mailing list that he had got a 6560
|
||
chip running on a PAL board. He used an oscillator module, because he
|
||
couldn't get the 14.31818 MHz crystal to oscillate in the PAL VIC's
|
||
clock generator circuit.
|
||
|
||
We checked the PAL and NTSC VIC20 schematic diagrams but couldn't
|
||
notice a significant difference in the clock generator circuits. There
|
||
seemed to be no reason why a PAL machine could not be converted into
|
||
NTSC machine fairly easily by changing the crystal and making a couple
|
||
of small changes. Adding a clock oscillator felt a somewhat desperate
|
||
quick'n'dirty solution.
|
||
|
||
Note that some old television sets require you to adjust their
|
||
vertical hold knob for them to show 60 Hz (NTSC) frame rate. Some
|
||
recent pseudo-intelligent televisions only display one frame rate (50
|
||
Hz in PAL-land). Multistandard television sets and most video monitors
|
||
do display 60 Hz picture correctly. There is a very small chance that
|
||
your display does not like the 60 Hz frame rate. Still, be careful if
|
||
you haven't tried 60 Hz before.
|
||
|
||
You should also note that PAL and NTSC machines use different KERNAL
|
||
ROM versions, 901486-06 is for NTSC and 901486-07 is for PAL. However,
|
||
the differences are small. In an NTSC machine with a PAL ROM the
|
||
screen is not centered, the 60 Hz timer is not accurate and the RS-232
|
||
timing values are wrong.
|
||
|
||
The Story
|
||
|
||
Timo:
|
||
|
||
At first I took a VIC20CR board (FAB NO. 251040-01) and just replaced
|
||
the videochip and the crystal, and as you might suspect, it didn't
|
||
work. I noticed that the two resistors in the clock generator circuit
|
||
(R5 and R6) had a value of 470 ohms, while the schematics (both NTSC
|
||
and PAL!) stated that they should be 330 ohms. I replaced those
|
||
resistors, and also noticed the single 56 pF capacitor on the bottom
|
||
side of the board. This capacitor was connected to the ends of the R5
|
||
resistor and was not shown in the schematics. As you might guess, the
|
||
capacitor prevents fast voltage changes, and thus makes it impossible
|
||
to increase the frequency.
|
||
|
||
Is this capacitor present also on NTSC-board? Someone with such a
|
||
board could check this out. I removed the capacitor, and now it works.
|
||
I didn't test the board between these two modifications, but Pasi
|
||
confirmed that you only need to remove the capacitor.
|
||
|
||
Pasi:
|
||
|
||
I first tried to convert my VIC20CR machine, because the clock circuit
|
||
in it seemed identical to the one a newer NTSC machine uses, except of
|
||
course the crystal frequency. Some of the pull-up resistors were
|
||
different, but I didn't think it made any difference. Pull-up
|
||
resistors vary a lot without any reason anyway. I replaced the video
|
||
chip and the crystal, but I could not get it to oscillate. I first
|
||
thought that the 7402 chip in the clock circuit just couldn't keep up
|
||
and replaced it with a 74LS02 chip. There was no difference. The PAL
|
||
crystal with a PAL VIC-I (6561) still worked.
|
||
|
||
I turned my eyes to my third VIC20, which is an older model with all
|
||
that heat-generating regulator stuff inside. It has almost the same
|
||
clock circuit as the old NTSC schematic shows. There are three
|
||
differences:
|
||
|
||
1. The crystal is 14.31818 MHz for NTSC, 8.867236 MHz for PAL.
|
||
2. Two NOR gates are used as NOT gates to drop one 74S04 from the
|
||
design.
|
||
3. In PAL the crystal frequency is divided by two by a 7474
|
||
D-flipflop.
|
||
|
||
I could either put in a 28.63636 MHz crystal or I could use the
|
||
14.31818 MHz crystal and bypass the 7474 clock divider. I didn't have
|
||
a 28 MHz crystal, so I soldered in the 14.31818 MHz crystal and bent
|
||
the pin 39 (phi1 in) up from the 6560 video chip so that it would not
|
||
be connected to the divided clock. I then soldered a wire connecting
|
||
this pin (6560 pin 39) and the 14.31818 MHz clock coming from the 7402
|
||
(UB9 pin 10). The machine started working.
|
||
|
||
I just hadn't any colors. My monitor (Philips CM8833) does not show
|
||
NTSC colors anyway, but a multistandard TV (all new Philips models at
|
||
least) shows colors as long as the VIC-I clock is close enough to the
|
||
NTSC color clock. The oscillator frequency can be fine-adjusted with
|
||
the trimmer capacitor C35. Just remember that using a metallic
|
||
unshielded screwdriver is a bad idea because it changes the
|
||
capacitance of the clock circuit (unless the trimmer is an insulated
|
||
model). Warming has also its effect on the circuit capacitances so let
|
||
the machine be on for a while before being satisfied with the
|
||
adjustment. With a small adjustment I had colors and was done with
|
||
that machine.
|
||
|
||
Then I heard from Timo that the CR model has a hidden capacitor on the
|
||
solder side of the motherboard, probably to filter out upper harmonic
|
||
frequencies (multiples of 4.433618 MHz). I decided to give the VIC20CR
|
||
modification another try. I removed the 56 pF capacitor, which was
|
||
connected in parallel with R5, and the machine still worked fine.
|
||
|
||
I then replaced the crystal with the 14.31818 MHz crystal and inserted
|
||
the 6560 video chip. The machine didn't work. I finally found out that
|
||
it was because I had replaced the original 7402 with 74LS02. When I
|
||
replaced it with a 74S02, the machine started working. I just could
|
||
not get the frequency right and thus no colors until I added a 22 pF
|
||
capacitor in parallel with the capacitor C50 and the trimmer capacitor
|
||
C48 to increase the adjustment range from the original 5..25 pF to
|
||
27..47 pF. The trimmer orientation didn't have any visible effect
|
||
anymore. I had colors and was satisfied.
|
||
|
||
To check all possibilities, I also replaced the 74S02 with 7402 that
|
||
was originally used in the circuit (not the same physical chip because
|
||
I had butchered it while soldering it out). I didn't need the parallel
|
||
capacitor anymore, although the trimmer adjustment now needed to be
|
||
correct or I lost colors.
|
||
|
||
As I really don't need two NTSC machines, I then converted this
|
||
machine back to PAL. I made the modifications backwards. I replaced
|
||
the crystal and video chip and then was stumped because the machine
|
||
didn't work. I scratched my head for a while but then remembered the
|
||
extra capacitor I had removed. And surely, the machine started working
|
||
when I put it back. Obviously, the machine had only worked without the
|
||
capacitor because it had 74LS02 at the time. 74S02 and 7402 won't work
|
||
without it. So, if you are doing an NTSC to PAL conversion, you need
|
||
to add this capacitor.
|
||
|
||
Summary
|
||
|
||
PAL VIC20 To NTSC
|
||
|
||
This is the older, palm-heating VIC20 model with the two-prong power
|
||
connector and the almost-cube power supply.
|
||
|
||
1. Replace the 8.867236 MHz crystal with a 14.31818 MHz crystal
|
||
2. If UB9 is 74LS02, replace it with a 7402 (or 74S02 if you only use
|
||
NTSC)
|
||
3. Bend pin 39 from the 6560 video chip so that it does not go to the
|
||
socket.
|
||
4. Add a jumper wire from UB9 pin 10 to the video chip pin 39.
|
||
5. Adjust the trimmer capacitor C35 so that your multistandard
|
||
television shows colors.
|
||
|
||
PAL VIC20CR To NTSC
|
||
|
||
1. Replace the 4.433618 MHz crystal with a 14.31818 MHz crystal
|
||
2. If your machine has a capacitor in parallel with R5, remove it
|
||
(the parallel capacitor). The values in our machines were 56 pF.
|
||
3. If UB9 is 74LS02, replace it with a 7402 (or 74S02 if you only use
|
||
NTSC)
|
||
4. If necessary, increase the clock adjustment range by adding a
|
||
capacitor of 11..22 pF in parallel to C50 (15 pF in the
|
||
schematics, 5 pF in my machine) and C48 (the trimmer capacitor
|
||
0..20 pF). With 7402 you can do with a smaller capacitor or with
|
||
no additional capacitor at all.
|
||
5. Adjust C48 so that your multistandard television shows colors.
|
||
|
||
Trouble-shooting
|
||
|
||
* There is no picture
|
||
+ You have not removed (or added for NTSC) the 56 pF capacitor
|
||
- Remove/Add the capacitor
|
||
+ Your clock crystal is broken
|
||
- Find a working one
|
||
+ Your machine has a 74LS02 instead of 7402 (or 74S02)
|
||
- Replace the 74LS02 with a 7402
|
||
+ You forgot to replace the PAL video chip with the NTSC chip
|
||
- Remove 6561 and insert 6560
|
||
+ For older model: You forgot the clock wire and/or to turn up
|
||
6560 pin 39
|
||
- Connect 6560 pin 39 to UB9 pin 10
|
||
* There is picture, but it is scrolling vertically
|
||
+ Your monitor or television does not properly sync to 60 Hz
|
||
frame rate
|
||
- Adjust the monitor/TV's vertical hold setting
|
||
* There is picture but no colors
|
||
+ Your monitor or TV does not understand the color encoding
|
||
- Give up the color or get a better television
|
||
+ The color clock frequency is slightly off
|
||
- Adjust the trimmer capacitor
|
||
+ The color clock frequency adjustment range is not enough
|
||
- Add a 11-22 pF capacitor in parallel with the trimmer
|
||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
|
||
|
||
|
||
Starfields, by S. Judd
|
||
|
||
|
||
If you've ever played Elite or Star Raiders you've experienced
|
||
a starfield. The idea is to use stars to give a sense of motion
|
||
while navigating through the universe. In this article we'll derive
|
||
a simple algorithm for making a starfield -- the algorithm I used
|
||
in Cool World.
|
||
As usual, a good way to start is with a little thinking --
|
||
in this case, we first need to figure out what the starfield problem
|
||
even is! Consider the motion of the stars. As we move forwards, the
|
||
stars should somehow move past us -- over us, to the side of us, etc.
|
||
(A star coming straight at us wouldn't appear to move at all). And
|
||
something that moves needs a starting place and an ending place.
|
||
Certainly they end at the screen boundaries, and since anything really
|
||
far away looks like it's right in front of us they must start at
|
||
(or near) the center of the screen.
|
||
So far so good: stars start near the center of the screen
|
||
and move outwards to the edges. What kind of path do they follow?
|
||
Well it can't be a curve, right? If it were a curve, then all
|
||
stars would have to curve in exactly the same way, since there is
|
||
a rotation symmetry (just tilting your head won't alter the shape
|
||
of the path of the stars). So it has to be a straight line. Since
|
||
stars start at the center of the screen and move outwards, we know
|
||
that, wherever a star might be on the screen right now, it started
|
||
in the center. So, to get the path of the star's motion, we simply
|
||
draw a line from the center of the screen through the star's current
|
||
location, and move the star along that line.
|
||
Drawing lines is easy enough. If the center of the screen
|
||
is located at (xc, yc) then the equation of a line from the center
|
||
to some point (x0, y0) is just
|
||
|
||
(x,y) = (xc, yc) + t*(x0-xc, y0-yc)
|
||
|
||
You should read a vector equation like the above as
|
||
|
||
for t=blah to wherever step 0.0001
|
||
x = xc + t*(x0-xc)
|
||
y = yc + t*(y0-yc)
|
||
plot(x,y)
|
||
next
|
||
|
||
and you can see that when t=0
|
||
|
||
(x,y) = (xc, yc) i.e. the center of the screen
|
||
|
||
and when t=1
|
||
|
||
(x,y) = (x0, y0) i.e. the current location.
|
||
|
||
Thus, when t goes from 0 to 1 we hit all points in-between, and
|
||
when t is a very small number (like 0.0001) we basically move from
|
||
(xc,yc) to the point on the line right next to (xc,yc).
|
||
Great, now we know the path the stars take. But _how_ do
|
||
they move along that path? We know from experience that things
|
||
which are far away appear to move slowly, but when we're up close
|
||
they move really fast (think of a jet plane, or the moon). And
|
||
we know that in this problem that stars which are far away are
|
||
near the center of the screen, and stars which are nearby are out
|
||
near the edges. So, stars near the center of the screen ought to
|
||
move slowly, and as they move outwards they ought to increase in
|
||
speed.
|
||
Well, that's easy enough to do. We already have an easy
|
||
measure of how far away we are from the center:
|
||
|
||
dx = x0-xc
|
||
dy = y0-yc
|
||
|
||
But this is the quantity we need to compute to draw the line. All
|
||
we have to do is move some fraction of (dx,dy) from our current
|
||
point (x0,y0):
|
||
|
||
x = x0 + dx/8 ;Note that we started at x0, not xc
|
||
y = y0 + dy/8
|
||
|
||
where I just divided by 8 for the heck of it. Dividing by a larger
|
||
number will result in slower motion, and dividing by a smaller number
|
||
will result in faster motion along the line. But the important thing
|
||
to notive in the above equations is that when a star is far away from
|
||
the center of the screen, dx and dy are large and the stars move faster.
|
||
Well alrighty then. Now we have an algorithm:
|
||
|
||
draw some stars on the screen
|
||
for each star, located at (x,y):
|
||
erase old star
|
||
compute dx = x-xc and dy = y-yc
|
||
x = x + dx*velocity
|
||
y = y + dy*velocity
|
||
plot (x,y)
|
||
keep on going!
|
||
|
||
where velocity is some number like 1/8 or 1/2 or 0.14159 or whatever.
|
||
This is enough to move us forwards (and backwards, if a negative value
|
||
of velocity is used). What about moving up and down, or sideways?
|
||
What about rotating?
|
||
First and foremost, remember that, as we move forwards, stars
|
||
always move in the same way: draw a straight line from the origin
|
||
through the star, and move along that path. And that's the case
|
||
no matter where the star is located. This means that to rotate or
|
||
move sideways, we simply move the position of the stars. Then
|
||
using the above algorithm they will just move outwards from the
|
||
center through their new position. In other words, rotations and
|
||
translations are pretty easy.
|
||
Sideways translations are easy: just change the x-coordinates
|
||
(for side to side translations) or y-coordinates (for up and down
|
||
translations). And rotations are done in the usual way:
|
||
|
||
x = x*cos(t) - y*sin(t)
|
||
y = x*sin(t) + y*cos(t)
|
||
|
||
where t is some rotation angle. Note that you can think of sideways
|
||
motion as moving the center of the screen (like from 160,100 to 180,94).
|
||
Finally, what happens when stars move off the screen? Why,
|
||
just plop a new star down at a random location, and propagate it along
|
||
just like all the others. If we're moving forwards, stars move off
|
||
the screen when they hit the edges. If we're moving backwards, they
|
||
are gone once they get near enough to the center of the screen.
|
||
Now it's time for a simple program which implements these
|
||
ideas. It is in BASIC, and uses BLARG (available in the fridge)
|
||
to do the graphics. Yep, a SuperCPU comes in really handy for
|
||
this one. Rotations are also left out, and I put little effort
|
||
into fixing up some of the limitations of the simple algorithm
|
||
(it helps to see them!). For a complete ML implementation see the
|
||
Cool World source code.
|
||
|
||
10 rem starfield
|
||
15 mode16:gron16
|
||
20 dim x(100),y(100):a=rnd(ti)
|
||
25 xc=160:yc=100:n=12:v=1/8
|
||
30 for i=1 to n:gosub200:next
|
||
40 :
|
||
50 for i=1 to n
|
||
55 color0:plot x(i),y(i):color 1
|
||
60 dx=x(i)-xc:dy=y(i)-yc
|
||
65 x1=x(i)+v*dx+x0:y1=y(i)+v*dy+y0
|
||
67 x2=abs(x1-x(i)):y2=abs(y1-y(i)):if (x2+y2<3*abs(v)) then gosub 200:goto 60
|
||
70 if (x1<0) or (y1<0) or (x1>319) or (y1>199) then gosub 200:dx=0:dy=0:goto65
|
||
75 plot x1,y1:x(i)=x1:y(i)=y1
|
||
80 next
|
||
90 get a$
|
||
100 if a$=";" then x0=x0-3:goto 40
|
||
110 if a$=":" then x0=x0+3:goto 40
|
||
120 if a$="@" then y0=y0-3:goto 40
|
||
130 if a$="/" then y0=y0+3:goto 40
|
||
133 if a$="a" then v=v+1/32
|
||
135 if a$="z" then v=v-1/32
|
||
140 if a$<>"q" then 40
|
||
150 groff:stop
|
||
200 x(i)=280*rnd(1)+20:y(i)=170*rnd(1)+15:return
|
||
|
||
Line 200 just plops a new star down at a random position between
|
||
x=20..300 and y=15..185. Lines 100-140 just let you "navigate"
|
||
and change the velocity. The main loop begins at line 50:
|
||
|
||
55 erase old star
|
||
60 compute dx and dy
|
||
65 advance the star outward from the center of the screen
|
||
67 check if the star is too close to the center of the screen (in case
|
||
moving backwards)
|
||
70 if star has moved off the screen, then make a new star
|
||
|
||
And that's the basic idea. Easy! It's also easy to make further
|
||
modifications -- perhaps some stars could move faster than others, some
|
||
stars could be larger that others, or different colors, and so on.
|
||
Starfields are fast, easy to implement, and make a nifty addition to
|
||
many types of programs.
|
||
|
||
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
|
||
|
||
|
||
Milestones of C64 Data Compression
|
||
|
||
|
||
Pontus Berg, bacchus@fairlight.org
|
||
|
||
One of the featured articles in this issue is on data compression.
|
||
A very natural question to ask is: what about data compression on the C-64?
|
||
The purpose of this article is therefore to gain some insight into the
|
||
history of data compression on the 64. This article doesn't cover
|
||
programs like ARC, which are used for storage/archiving, but instead
|
||
focuses on programs which are compressed in executable format, i.e.
|
||
decompress at runtime.
|
||
|
||
The earliest instance of compression comes with all computers:
|
||
the BASIC ROMs. As everyone knows, BASIC replaces keywords with
|
||
one-byte tokens. This not only takes less memory, but makes the
|
||
program run faster as well. Clearly, though, the BASIC interpreter
|
||
is a very special situation.
|
||
|
||
The general problem of 64 data compression is to *reversibly*
|
||
replace one chunk of data with a different chunk of data which is
|
||
shorter *on the average*. It won't always be shorter, simply because
|
||
there are many more big chunks of data than there are small ones.
|
||
The idea is to take advantage of any special structure in the data.
|
||
The data in this case is 64 machine language programs, including
|
||
graphics data, tables, etc.
|
||
|
||
The early years didn't feature any compression, but then
|
||
"packers" (RLE compression) were introduced. The first packer I used
|
||
myself was flash packer, but I can't tell if it was an early one. The
|
||
idea of RLE -- run length encoding -- is simply to replace repeated
|
||
bytes with a single byte and the number of times to repeat it. For
|
||
example, AAAAAA could be replaced by xA6, where x is a control character
|
||
to tell the decompressor that the next two bytes represent a run.
|
||
|
||
Then came the TimeCruncher, in 1986 or perhaps 1987. Basically,
|
||
one can divide the world into compression before and after the TimeCruncher
|
||
by Macham/Network. With TimeCruncher, Matcham introduced sequence
|
||
crunching -- this is THE milestone in the evolution. The idea of
|
||
sequencing is the same as LZ77: replace repeated byte sequences with a
|
||
*reference* to earlier sequences. As you would imagine, short sequences,
|
||
especially of two or three bytes, are very common, so methods of handling
|
||
those cases tend to pay large dividends. See Pasi's article for a
|
||
detailed example of LZ77 compression. It is worth noting that several
|
||
64 compression authors were not aware of LZ77 when designing their programs!
|
||
Naturally the 64 places certain limitations on a compression
|
||
algorithm. With typical 64 crunchers you define a scanning range,
|
||
which is the range in which the sequence scanner looks for references.
|
||
The sequence cruncher replaces parts of the codes with references to
|
||
equal sequences in the program. References are relative: number of
|
||
bytes away and length. Building this reference in a smart and efficient
|
||
way is the key to success. References that are far away require more
|
||
bits, so a "higher speed" (bigger search area) finds more sequences,
|
||
but the references are longer.
|
||
|
||
The next step of value was CruelCrunch, where Galleon (and
|
||
to some extent Syncro) took the concept to where it could be taken.
|
||
Following that, the next step was introducing crunchers which use
|
||
the REU, where Antitrack took DarkSqeezer2 and modified it into a
|
||
REU version. Actually this was already possible in the original
|
||
Darksqeezer 3, but that was not available to ATT. Alex was pissing
|
||
mad when it leaked out (rather soon) ;-).
|
||
The AB Cruncher, followed by ByteBoiler, was really the first
|
||
cruncher to always beat the old CruelCrunch. With the REU feature and
|
||
public availablility, this took the real power of crunchers to all users.
|
||
It should be noted that the OneWay line of crunchers (ByteBoiler and
|
||
AB Crunch, which doesn't even require an REU) actually first scan the
|
||
files and then optimize their algorithms to perform their best, rather
|
||
than letting the user select a speed and see the results after each.
|
||
|
||
Regarding compression times, a char/RLE packer typically takes
|
||
the time it takes to read a file twice, as they usually feature two
|
||
passes -- one for determining the bytes to use as controlbytes and
|
||
one to create the packed file. Sequence crunchers like TimeCruncher
|
||
typically took a few hours, and CruelCrunch as much as ten hours
|
||
(I always let it work over night so I can't tell for sure - it's not
|
||
something you clock while watching ;-). After the introduction of
|
||
REU-based sequence crunchers (which construct tables of the memory
|
||
contents and do a table lookup, rather than repeatedly scanning the data),
|
||
and their subsequent optimization, the crunching times went down first to
|
||
some 30 minutes and then to a few minutes. ByteBoiler only takes some two
|
||
minutes for a full 200 block program, as I recall.
|
||
|
||
The RLE packers and sequence crunchers are often combined. One
|
||
reason was the historical time saving argument - a file packed first
|
||
would be smaller upon entering the crunching phase which could hence
|
||
be completed much faster. A very sophistocated charpacker is however
|
||
a big waste as the result they produce - a block or so shorther at
|
||
best - is almost always eaten up by worse crunching. Some argue that
|
||
you could as well crucnh right away without first using the packer, but
|
||
then again you have the other argument - a charpacker can handle a file
|
||
of virtually any length (0029 to ffff is available) whereas normally
|
||
a cruncher is slightly more limited.
|
||
|
||
Almost any game or demofile (mixed contents of graphics, code,
|
||
musicdata and player, etc.) normally packs into some 50-60% of its original
|
||
size when using an RLE+sequence combination. The packer might compress
|
||
the file by some 30%, depending on the number of controlbytes and such,
|
||
and the cruncher can compress it an additional 10-20%.
|
||
|
||
Minimising the size was the key motivation for the development of
|
||
these programs -- to make more fit on the expensive disks and to make them
|
||
load faster from any CBM device (we all know the speed of them ;-).
|
||
For the crackers one could mention the levelcrunchers as well. This is a
|
||
way to pack data and have it depack transparently while loading the data,
|
||
as opposed to adding a depacker to be run upon execution. The very
|
||
same crunching algorithms are used, and the same programs often come in
|
||
a level- and filecrunching edition.
|
||
|
||
.......
|
||
....
|
||
..
|
||
. C=H
|
||
|
||
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
|
||
|
||
Featured Articles
|
||
|
||
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
|
||
|
||
|
||
Compression Basics
|
||
|
||
|
||
Pasi 'Albert' Ojala albert@cs.tut.fi
|
||
http://www.cs.tut.fi/%7Ealbert/
|
||
|
||
_____________________________________________________________________
|
||
|
||
Introduction
|
||
============
|
||
|
||
Because real-world files usually are quite redundant,
|
||
compression can often reduce the file sizes considerably. This in
|
||
turn reduces the needed storage size and transfer channel capacity.
|
||
Especially in systems where memory is at premium compression can make
|
||
the difference between impossible and implementable. Commodore 64
|
||
and its relatives are good examples of this kind of a system.
|
||
|
||
The most used 5.25-inch disk drive for Commodore 64 only holds
|
||
170 kB of data, which is only about 2.5 times the total random access
|
||
memory of the machine. With compression, many more programs can fit
|
||
on a disk. This is especially true for programs containing flashy
|
||
graphics or sampled sound. Compression also reduces the loading
|
||
times from the notoriously slow 1541 drive, whether you use the
|
||
original slow serial bus routines or some kind of a disk turbo loader
|
||
routine.
|
||
|
||
Dozens of compression programs are available for Commodore 64.
|
||
I leave the work to chronicle the history of the C64 compression
|
||
programs to others and concentrate on a general overview of the
|
||
different compression algorithms. Later we'll take a closer look on
|
||
how the compression algorithms actually work and in the next article
|
||
I will introduce my own creation: pucrunch.
|
||
|
||
Pucrunch is a compression program written in ANSI-C which
|
||
generates files that automatically decompress and execute themselves
|
||
when run on a C64 (or C128 in C64-mode, VIC20, or C16/+4). It is a
|
||
cross-compressor, if you will, allowing you to do the real work on
|
||
any machine, like a cross-assembler.
|
||
|
||
Our target environment (Commodore 64 and VIC20) restricts us
|
||
somewhat when designing the 'ideal' compression system. We would
|
||
like it to be able to decompress as big a program as possible.
|
||
Therefore the decompression code must be located in low memory, be as
|
||
short as possible, and must use very small amounts of extra memory.
|
||
|
||
Another requirement is that the decompression should be
|
||
relatively fast, which means that the arithmetic used should be
|
||
mostly 8- or 9-bit which is much faster than e.g. 16-bit arithmetic.
|
||
Processor- and memory-intensive algorithms are pretty much out of the
|
||
question. A part of the decompressor efficiency depends on the
|
||
format of the compressed data. Byte-aligned codes can be accessed
|
||
very quickly; non-byte-aligned codes are much slower to handle, but
|
||
provide better compression.
|
||
|
||
This is not meant to be the end-all document for data
|
||
compression. My intention is to only scratch the surface and give
|
||
you a crude overview. Also, I'm mainly talking about lossless
|
||
compression here, although some lossy compression ideas are briefly
|
||
mentioned. A lot of compression talk is available in the world wide
|
||
web, although it may not be possible to understand everything on the
|
||
first reading. To build the knowledge, you have to read many
|
||
documents and understand _something_ from each one so that when you
|
||
return to a document, you can understand more than the previous time.
|
||
It's a lot like watching Babylon 5. :-)
|
||
|
||
Some words of warning: I try to give something interesting to
|
||
read to both advanced and not so advanced readers. It is perfectly
|
||
all right for you to skip all uninteresting details. I start with a
|
||
Huffman and LZ77 example so you can get the basic idea before
|
||
flooding you with equations, complications, and trivia.
|
||
|
||
_____________________________________________________________________
|
||
|
||
Huffman and LZ77 Example
|
||
========================
|
||
|
||
Let's say I had some simple language like "Chippish" containing
|
||
only the letters _CDISV_. How would a string like
|
||
|
||
_SIDVICIIISIDIDVI_
|
||
|
||
compress using a) Huffman encoding, and b) LZ77? How do compression
|
||
concepts such as information entropy enter into this?
|
||
|
||
A direct binary code would map the different symbols to
|
||
consequtive bit patterns, such as:
|
||
|
||
Symbol Code
|
||
'C' 000
|
||
'D' 001
|
||
'I' 010
|
||
'S' 011
|
||
'V' 100
|
||
|
||
Because there are five symbols, we need 3 bits to represent all
|
||
of the possibilities, but we also don't use all the possibilities.
|
||
Only 5 values are used out of the maximum 8 that can be represented
|
||
in 3 bits. With this code the original message takes 48 bits:
|
||
|
||
SIDVICIIISIDIDVI ==
|
||
011 010 001 100 010 000 010 010 010 011 010 001 010 001 100 010
|
||
|
||
For Huffman and for entropy calculation (entropy is explained in
|
||
the next chapter) we first need to calculate the symbol frequencies
|
||
from the message. The probability for each symbol is the frequency
|
||
of appearance divided by the message length. When we reduce the
|
||
number of bits needed to represent the probable symbols (their code
|
||
lengths) we can also reduce the average code length and thus the
|
||
number of bits we need to send.
|
||
|
||
'C' 1/16 0.0625
|
||
'D' 3/16 0.1875
|
||
'I' 8/16 0.5
|
||
'S' 2/16 0.125
|
||
'V' 2/16 0.125
|
||
|
||
The entropy gives the lower limit for a statistical compression
|
||
method's average codelength. Using the equation from the next
|
||
section, we can calculate it as 1.953. This means that however
|
||
cleverly you select a code to represent the symbols, in average you
|
||
need at least 1.953 bits per symbol. In this case you can't do
|
||
better than 32 bits, since there are a total of 16 symbols.
|
||
|
||
Next we create the Huffman tree. We first rank the symbols in
|
||
decreasing probability order and then combine two lowest-probability
|
||
symbols into a single composite symbol (C1, C2, ..). The probability
|
||
of this new symbol is therefore the sum of the two original
|
||
probabilities. The process is then repeated until a single composite
|
||
symbol remains:
|
||
|
||
Step 1 Step 2 Step 3 Step 4
|
||
'I' 0.5 'I' 0.5 'I' 0.5 C3 0.5\C4
|
||
'D' 0.1875 C1 0.1875 C2 0.3125\C3 'I' 0.5/
|
||
'S' 0.125 'D' 0.1875\C2 C1 0.1875/
|
||
'V' 0.125 \C1 'S' 0.125 /
|
||
'C' 0.0625/
|
||
|
||
Note that the composite symbols are inserted as high as
|
||
possible, to get the shortest maximum code length (compare C1 and 'D'
|
||
at Step 2).
|
||
|
||
At each step two lowest-probability nodes are combined until we
|
||
have only one symbol left. Without knowing it we have already
|
||
created a Huffman tree. Start at the final symbol (C4 in this case),
|
||
break up the composite symbol assigning 0 to the first symbol and 1
|
||
to the second one. The following tree just discards the
|
||
probabilities as we don't need them anymore.
|
||
|
||
C4
|
||
0 / \ 1
|
||
/ 'I'
|
||
C3
|
||
0 / \ 1
|
||
/ \
|
||
C2 C1
|
||
0 / \1 0/ \ 1
|
||
'D''S' 'V''C'
|
||
|
||
Symbol Code Code Length
|
||
'C' 011 3
|
||
'D' 000 3
|
||
'I' 1 1
|
||
'S' 001 3
|
||
'V' 010 3
|
||
|
||
When we follow the tree from to top to the symbol we want to
|
||
encode and remember each decision (which branch to follow), we get
|
||
the code: {'C', 'D', 'I', 'S', 'V'} = {011, 000, 1, 001, 010}. For
|
||
example when we see the symbol 'C' in the input, we output 011. If
|
||
we see 'I' in the input, we output a single 1. The code for 'I' is
|
||
very short because it occurs very often in the input.
|
||
|
||
Now we have the code lengths and can calculate the average code
|
||
length: 0.0625*3+0.1875*3+0.5*1+0.125*3+0.125*3 = 2. We did not
|
||
quite reach the lower limit that entropy gave us. Well, actually it
|
||
is not so surprising because we know that Huffman code is optimal
|
||
only if all the probabilities are negative powers of two.
|
||
|
||
Encoded, the message becomes:
|
||
|
||
SIDVICIIISIDIDVI ==
|
||
001 1 000 010 1 011 1 1 1 001 1 000 1 000 010 1
|
||
|
||
The spaces are only to make the reading easier. So, the
|
||
compressed output takes 32 bits and we need at least 10 bits to
|
||
transfer the Huffman tree by sending the code lengths (more on this
|
||
later). The message originally took 48 bits, now it takes at least
|
||
42 bits.
|
||
|
||
Huffman coding is an example of a "variable length code" with a
|
||
"defined word" input. Inputs of fixed size -- a single, three-bit
|
||
letter above -- are replaced by a variable number of bits. At the
|
||
other end of the scale are routines which break the _input_ up into
|
||
variably sized chunks, and replace those chunks with an often
|
||
fixed-length _output_. The most popular schemes of this type are
|
||
Lempel-Ziv, or LZ, codes.
|
||
|
||
Of these, LZ77 is probably the most straightforward. It tries
|
||
to replace recurring patterns in the data with a short code. The
|
||
code tells the decompressor how many symbols to copy and from where
|
||
in the output to copy them. To compress the data, LZ77 maintains a
|
||
history buffer, which contains the data that has been processed, and
|
||
tries to match the next part of the message to it. If there is no
|
||
match, the next symbol is output as-is. Otherwise an (offset,length)
|
||
-pair is output.
|
||
|
||
Output History Lookahead
|
||
SIDVICIIISIDIDVI
|
||
S S IDVICIIISIDIDVI
|
||
I SI DVICIIISIDIDVI
|
||
D SID VICIIISIDIDVI
|
||
V SIDV ICIIISIDIDVI
|
||
I SIDVI CIIISIDIDVI
|
||
C SIDVIC IIISIDIDVI
|
||
I SIDVICI IISIDIDVI
|
||
I SIDVICII ISIDIDVI
|
||
I SIDVICIII SIDIDVI
|
||
--- --- match length: 3
|
||
|----9---| match offset: 9
|
||
(9, 3) SIDVICIIISID IDVI
|
||
-- -- match length: 2
|
||
|2| match offset: 2
|
||
(2, 2) SIDVICIIISIDID VI
|
||
-- -- match length: 2
|
||
|----11----| match offset: 11
|
||
(11, 2) SIDVICIIISIDIDVI
|
||
|
||
At each stage the string in the lookahead buffer is searched
|
||
from the history buffer. The longest match is used and the distance
|
||
between the match and the current position is output, with the match
|
||
length. The processed data is then moved to the history buffer.
|
||
Note that the history buffer contains data that has already been
|
||
output. In the decompression side it corresponds to the data that
|
||
has already been decompressed. The message becomes:
|
||
|
||
S I D V I C I I I (9,3) (2,2) (11,2)
|
||
|
||
The following describes what the decompressor does with this data.
|
||
|
||
History Input
|
||
S
|
||
S I
|
||
SI D
|
||
SID V
|
||
SIDV I
|
||
SIDVI C
|
||
SIDVIC I
|
||
SIDVICI I
|
||
SIDVICII I
|
||
SIDVICIII (9,3) -> SID
|
||
|----9---|
|
||
SIDVICIIISID (2,2) -> ID
|
||
|2|
|
||
SIDVICIIISIDID (11,2) -> VI
|
||
|----11----|
|
||
SIDVICIIISIDIDVI
|
||
|
||
In the decompressor the history buffer contains the data that
|
||
has already been decompressed. If we get a literal symbol code, it
|
||
is added as-is. If we get an (offset,length) pair, the offset tells
|
||
us from where to copy and the length tells us how many symbols to
|
||
copy to the current output position. For example (9,3) tells us to
|
||
go back 9 locations and copy 3 symbols to the current output
|
||
position. The great thing is that we don't need to transfer or
|
||
maintain any other data structure than the data itself.
|
||
|
||
Compare this to the BASIC interpreter, where all tokens have the
|
||
high bit set and all normal characters don't (PETSCII codes 0-127).
|
||
So when the LIST routine sees a normal character it just prints it
|
||
as-is, but when it hits a special character (PETSCII >= 128) it looks
|
||
up the corresponding keyword in a table. LZ77 is similar, but an
|
||
LZ77 LIST would look up the keyword in the data already LISTed to the
|
||
screen! LZ78 uses a separate table which is expanded as the data is
|
||
processed.
|
||
|
||
The number of bits needed to encode the message (>52 bits) is
|
||
somewhat bigger than the Huffman code used (42 bits). This is mainly
|
||
because the message is too short for LZ77. It takes quite a long
|
||
time to build up a good enough dictionary (the history buffer).
|
||
|
||
_____________________________________________________________________
|
||
|
||
Introduction to Information Theory
|
||
==================================
|
||
|
||
Symbol Sources
|
||
--------------
|
||
Information theory traditionally deals with symbol sources
|
||
that have certain properties. One important property is that they
|
||
give out symbols that belong to a finite, predefined alphabet A.
|
||
An alphabet can consist of for example all upper-case characters
|
||
(A = {'A','B','C',..'Z',..}), all byte values (A = {0,1,..255}) or
|
||
both binary digits (A = {0,1}).
|
||
|
||
As we are dealing with file compression, the symbol source is a
|
||
file and the symbols (characters) are byte values from 0 to 255. A
|
||
string or a phrase is a concatenation of symbols, for example 011101,
|
||
"AAACB". Quite intuitive, right?
|
||
|
||
When reading symbols from a symbol source, there is some
|
||
probability for each of the symbols to appear. For totally random
|
||
sources each symbol is equally likely, but random sources are also
|
||
incompressible, and we are not interested in them here. Equal
|
||
probabilities or not, probabilities give us a means of defining the
|
||
concept of symbol self-information, i.e. the amount of information a
|
||
symbol carries.
|
||
|
||
Simply, the more probable an event is, the less bits of
|
||
information it contains. If we denote the probability of a symbol
|
||
A[i] occurring as p(A[i]), the expression -log2(p(A[i])) (base-2
|
||
logarithm) gives the amount of information in bits that the source
|
||
symbol A[i] carries. You can calculate base-2 logarithms using
|
||
base-10 or natural logarithms if you remember that log2(n) = log(n)/log(2).
|
||
|
||
A real-world example is a comparison between the statements:
|
||
1. it is raining
|
||
2. the moon of earth has exploded.
|
||
|
||
The first case happens every once in a while (assuming we are
|
||
not living in a desert area). Its probability may change around the
|
||
world, but may be something like 0.3 during bleak autumn days. You
|
||
would not be very surprised to hear that it is raining outside. It
|
||
is not so for the second case. The second case would be big news, as
|
||
it has never before happened, as far as we know. Although it seems
|
||
very unlikely we could decide a very small probability for it, like
|
||
1E-30. The equation gives the self-information for the first case as
|
||
1.74 bits, and 99.7 bits for the second case.
|
||
|
||
|
||
Message Entropy
|
||
---------------
|
||
So, the more probable a symbol is, the less information it
|
||
carries. What about the whole message, i.e. the symbols read from
|
||
the input stream?
|
||
|
||
What is the information contents a specific message carries?
|
||
This brings us to another concept: the entropy of a source. The
|
||
measure of entropy gives us the amount of information in a message
|
||
and is calculated like this: H = sum{ -p(A[i])*log2(p(A[i])) }. For
|
||
completeness we note that 0*log2(0) gives the result 0 although
|
||
log2(0) is not defined in itself. In essence, we multiply the
|
||
information a symbol carries by the probability of the symbol and
|
||
then sum all multiplication results for all symbols together.
|
||
|
||
The entropy of a message is a convenient measure of information,
|
||
because it sets the lower limit for the average codeword length for a
|
||
block-variable code, for example Huffman code. You can not get
|
||
better compression with a statistical compression method which only
|
||
considers single-symbol probabilities. The average codeword length
|
||
is calculated in an analogous way to the entropy. Average code
|
||
length is L = sum{-l(i)*log2(p(A[i])) }, where l(i) is the codeword
|
||
length for the ith symbol in the alphabet. The difference between L
|
||
and H gives an indication about the efficiency of a code. Smaller
|
||
difference means more efficient code.
|
||
|
||
It is no coincidence that the entropy and average code length
|
||
are calculated using very similar equations. If the symbol
|
||
probabilities are not equal, we can get a shorter overall message,
|
||
i.e. shorter _average_ codeword length (i.e. compression), if we
|
||
assign shorter codes for symbols that are more likely to occur. Note
|
||
that entropy is only the lower limit for statistical compression
|
||
systems. Other methods may perform better, although not for all
|
||
files.
|
||
|
||
|
||
Codes
|
||
-----
|
||
A code is any mapping from an input alphabet to an output
|
||
alphabet. A code can be e.g. {a, b, c} = {0, 1, 00}, but this code
|
||
is obviously not uniquely decodable. If the decoder gets a code
|
||
message of two zeros, there is no way it can know whether the
|
||
original message had two a's or a c.
|
||
|
||
A code is _instantaneous_ if each codeword (a code symbol as
|
||
opposed to source symbol) in a message can be decoded as soon as it
|
||
is received. The binary code {a, b} = {0, 01} is uniquely decodable,
|
||
but it isn't instantaneous. You need to peek into the future to see
|
||
if the next bit is 1. If it is, b is decoded, if not, a is decoded.
|
||
The binary code {a, b, c} = {0, 10, 11} on the other hand is an
|
||
instantaneous code.
|
||
|
||
A code is a _prefix code_ if and only if no codeword is a prefix
|
||
of another codeword. A code is instantaneous if and only if it is a
|
||
prefix code, so a prefix code is always a uniquely decodable
|
||
instantaneous code. We only deal with prefix codes from now on. It
|
||
can be proven that all uniquely decodable codes can be changed into
|
||
prefix codes of equal code lengths.
|
||
|
||
_____________________________________________________________________
|
||
|
||
'Classic' Code Classification
|
||
|
||
Compression algorithms can be crudely divided into four groups:
|
||
1. Block-to-block codes
|
||
2. Block-to-variable codes
|
||
3. Variable-to-block codes
|
||
4. Variable-to-variable codes
|
||
|
||
|
||
Block-to-block codes
|
||
--------------------
|
||
These codes take a specific number of bits at a time from the
|
||
input and emit a specific number of bits as a result. If all of the
|
||
symbols in the input alphabet (in the case of bytes, all values from
|
||
0 to 255) are used, the output alphabet must be the same size as the
|
||
input alphabet, i.e. uses the same number of bits. Otherwise it
|
||
could not represent all arbitrary messages.
|
||
|
||
Obviously, this kind of code does not give any compression, but
|
||
it allows a transformation to be performed on the data, which may
|
||
make the data more easily compressible, or which separates the
|
||
'essential' information for lossy compression. For example the
|
||
discrete cosine transform (DCT) belongs to this group. It doesn't
|
||
really compress anything, as it takes in a matrix of values and
|
||
produces a matrix of equal size as output, but the resulting values
|
||
hold the information in a more compact form.
|
||
|
||
In lossless audio compression the transform could be something
|
||
along the lines of delta encoding, i.e. the difference between
|
||
successive samples (there is usually high correlation between
|
||
successive samples in audio data), or something more advanced like
|
||
Nth order prediction. Only the prediction error is transmitted. In
|
||
lossy compression the prediction error may be transmitted in reduced
|
||
precision. The reproduction in the decompression won't then be
|
||
exact, but the number of bits needed to transmit the prediction error
|
||
may be much smaller.
|
||
|
||
One block-to-block code relevant to Commodore 64, VIC 20 and
|
||
their relatives is nybble packing that is performed by some C64
|
||
compression programs. As nybbles by definition only occupy 4 bits of
|
||
a byte, we can fit two nybbles into each byte without throwing any
|
||
data away, thus getting 50% compression from the original which used
|
||
a whole byte for every nybble. Although this compression ratio may
|
||
seem very good, in reality very little is gained globally. First,
|
||
only very small parts of actual files contain nybble-width data.
|
||
Secondly, better methods exist that also take advantage of the
|
||
patterns in the data.
|
||
|
||
|
||
Block-to-variable codes
|
||
-----------------------
|
||
Block-to-variable codes use a variable number of output bits for
|
||
each input symbol. All statistical data compression systems, such as
|
||
symbol ranking, Huffman coding, Shannon-Fano coding, and arithmetic
|
||
coding belong to this group (these are explained in more detail
|
||
later). The idea is to assign shorter codes for symbols that occur
|
||
often, and longer codes for symbols that occur rarely. This provides
|
||
a reduction in the average code length, and thus compression.
|
||
|
||
There are three types of statistical codes: fixed, static, and
|
||
adaptive. Static codes need two passes over the input message.
|
||
During the first pass they gather statistics of the message so that
|
||
they know the probabilities of the source symbols. During the second
|
||
pass they perform the actual encoding. Adaptive codes do not need
|
||
the first pass. They update the statistics while encoding the data.
|
||
The same updating of statistics is done in the decoder so that they
|
||
keep in sync, making the code uniquely decodable. Fixed codes are
|
||
'static' static codes. They use a preset statistical model, and the
|
||
statistics of the actual message has no effect on the encoding. You
|
||
just have to hope (or make certain) that the message statistics are
|
||
close to the one the code assumes.
|
||
|
||
However, 0-order statistical compression (and entropy) don't
|
||
take advantage of inter-symbol relations. They assume symbols are
|
||
disconnected variables, but in reality there is considerable relation
|
||
between successive symbols. If I would drop every third character
|
||
from this text, you would probably be able to decipher it quite well.
|
||
First order statistical compression uses the previous character to
|
||
predict the next one. Second order compression uses two previous
|
||
characters, and so on. The more characters are used to predict the
|
||
next character the better estimate of the probability distribution
|
||
for the next character. But more is not only better, there are also
|
||
prices to pay.
|
||
|
||
The first drawback is the amount of memory needed to store the
|
||
probability tables. The frequencies for each character encountered
|
||
must be accounted for. And you need one table for each 'previous
|
||
character' value. If we are using an adaptive code, the second
|
||
drawback is the time needed to update the tables and then update the
|
||
encoding accordingly. In the case of Huffman encoding the Huffman
|
||
tree needs to be recreated. And the encoding and decoding itself
|
||
certainly takes time also.
|
||
|
||
We can keep the memory usage and processing demands tolerable by
|
||
using a 0-order static Huffman code. Still, the Huffman tree takes
|
||
up precious memory and decoding Huffman code on a 1-MHz 8-bit
|
||
processor is slow and does not offer very good compression either.
|
||
Still, statistical compression can still offer savings as a part of a
|
||
hybrid compression system.
|
||
|
||
For example:
|
||
'A' 1/2 0
|
||
'B' 1/4 10
|
||
'C' 1/8 110
|
||
'D' 1/8 111
|
||
|
||
"BACADBAABAADABCA" total: 32 bits
|
||
10 0 110 0 111 10 0 0 10 0 0 111 0 10 110 0 total: 28 bits
|
||
|
||
This is an example of a simple statistical compression. The
|
||
original symbols each take two bits to represent (4 possibilities),
|
||
thus the whole string takes 32 bits. The variable-length code
|
||
assigns the shortest code to the most probable symbol (A) and it
|
||
takes 28 bits to represent the same string. The spaces between
|
||
symbols are only there for clarity. The decoder still knows where
|
||
each symbol ends because the code is a prefix code.
|
||
|
||
On the other hand, I am simplifying things a bit here, because
|
||
I'm omitting one vital piece of information: the length of the
|
||
message. The file system normally stores the information about the
|
||
end of file by storing the length of the file. The decoder also
|
||
needs this information. We have two basic methods: reserve one
|
||
symbol to represent the end of file condition or send the length of
|
||
the original file. Both have their virtues.
|
||
|
||
The best compressors available today take into account
|
||
intersymbol probabilities. Dynamic Markov Coding (DMC) starts with a
|
||
zero-order Markov model and gradually extends this initial model as
|
||
compression progresses. Prediction by Partial Matching (PPM),
|
||
although it really is a variable-to-block code, looks for a match of
|
||
the text to be compressed in an order-n context and if there is no
|
||
match drops back to an order n-1 context until it reaches order 0.
|
||
|
||
|
||
Variable-to-block codes
|
||
-----------------------
|
||
The previous compression methods handled a specific number of
|
||
bits at a time. A group of bits were read from the input stream and
|
||
some bits were written to the output. Variable-to-block codes behave
|
||
just the opposite. They use a fixed-length output code to represent
|
||
a variable-length part of the input. Variable-to-block codes are
|
||
also called free-parse methods, because there is no pre-defined way
|
||
to divide the input message into encodable parts (i.e. strings that
|
||
will be replaced by shorter codes). Substitutional compressors
|
||
belong to this group.
|
||
|
||
Substitutional compressors work by trying to replace strings in
|
||
the input data with shorter codes. Lempel-Ziv methods (named after
|
||
the inventors) contain two main groups: LZ77 and LZ78.
|
||
|
||
Lempel-Ziv 1977
|
||
+++++++++++++++
|
||
In 1977 Ziv and Lempel proposed a lossless compression method
|
||
which replaces phrases in the data stream by a reference to a
|
||
previous occurrance of the phrase. As long as it takes fewer bits to
|
||
represent the reference and the phrase length than the phrase itself,
|
||
we get compression. Kind-of like the way BASIC substitutes tokens
|
||
for keywords.
|
||
|
||
LZ77-type compressors use a history buffer, which contains a
|
||
fixed amount of symbols output/seen so far. The compressor reads
|
||
symbols from the input to a lookahead buffer and tries to find as
|
||
long as possible match from the history buffer. The length of the
|
||
string match and the location in the buffer (offset from the current
|
||
position) is written to the output. If there is no suitable match,
|
||
the next input symbol is sent as a literal symbol.
|
||
|
||
Of course there must be a way to identify literal bytes and
|
||
compressed data in the output. There are lot of different ways to
|
||
accomplish this, but a single bit to select between a literal and
|
||
compressed data is the easiest.
|
||
|
||
The basic scheme is a variable-to-block code. A variable-length
|
||
piece of the message is represented by a constant amount of bits:
|
||
the match length and the match offset. Because the data in the
|
||
history buffer is known to both the compressor and decompressor, it
|
||
can be used in the compression. The decompressor simply copies part
|
||
of the already decompressed data or a literal byte to the current
|
||
output position.
|
||
|
||
Variants of LZ77 apply additional compression to the output of
|
||
the compressor, which include a simple variable-length code (LZB),
|
||
dynamic Huffman coding (LZH), and Shannon-Fano coding (ZIP 1.x)), all
|
||
of which result in a certain degree of improvement over the basic
|
||
scheme. This is because the output values from the first stage are
|
||
not evenly distributed, i.e. their probabilities are not equal and
|
||
statistical compression can do its part.
|
||
|
||
|
||
Lempel-Ziv 1978
|
||
+++++++++++++++
|
||
One large problem with the LZ77 method is that it does not use
|
||
the coding space efficiently, i.e. there are length and offset
|
||
values that never get used. If the history buffer contains multiple
|
||
copies of a string, only the latest occurrance is needed, but they
|
||
all take space in the offset value space. Each duplicate string
|
||
wastes one offset value.
|
||
|
||
To get higher efficiency, we have to create a real dictionary.
|
||
Strings are added to the codebook only once. There are no duplicates
|
||
that waste bits just because they exist. Also, each entry in the
|
||
codebook will have a specific length, thus only an index to the
|
||
codebook is needed to specify a string (phrase). In LZ77 the length
|
||
and offset values were handled more or less as disconnected variables
|
||
although there is correlation. Because they are now handled as one
|
||
entity, we can expect to do a little better in that regard also.
|
||
|
||
LZ78-type compressors use this kind of a dictionary. The next
|
||
part of the message (the lookahead buffer contents) is searched from
|
||
the dictionary and the maximum-length match is returned. The output
|
||
code is an index to the dictionary. If there is no suitable entry in
|
||
the dictionary, the next input symbol is sent as a literal symbol.
|
||
The dictionary is updated after each symbol is encoded, so that it is
|
||
possible to build an identical dictionary in the decompression code
|
||
without sending additional data.
|
||
|
||
Essentially, strings that we have seen in the data are added to
|
||
the dictionary. To be able to constantly adapt to the message
|
||
statistics, the dictionary must be trimmed down by discarding the
|
||
oldest entries. This also prevents the dictionary from becoming
|
||
full, which would decrease the compression ratio. This is handled
|
||
automatically in LZ77 by its use of a history buffer (a sliding
|
||
window). For LZ78 it must be implemented separately. Because the
|
||
decompression code updates its dictionary in sychronization with the
|
||
compressor the code remains uniquely decodable.
|
||
|
||
|
||
Run-Length Encoding
|
||
+++++++++++++++++++
|
||
Run length encoding also belongs to this group. If there are
|
||
consecutive equal valued symbols in the input, the compressor outputs
|
||
how many of them there are, and their value. Again, we must be able
|
||
to identify literal bytes and compressed data. One of the RLE
|
||
compressors I have seen outputs two equal symbols to indentify a run
|
||
of symbols. The next byte(s) then tell how many more of these to
|
||
output. If the value is 0, there are only two consecutive equal
|
||
symbols in the original stream. Depending on how many bits are used
|
||
to represent the value, this is the only case when the output is
|
||
expanded.
|
||
|
||
Run-length encoding has been used since day one in C64
|
||
compression programs because it is very fast and very simple. Part
|
||
of this is because it deals with byte-aligned data and is essentially
|
||
just copying bytes from one place to another. The drawback is that
|
||
RLE can only compress identical bytes into a shorter representation.
|
||
On the C64, only graphics and music data contain large runs of
|
||
identical bytes. Program code rarely contains more than a couple of
|
||
successive identical bytes. We need something better.
|
||
|
||
That "something better" seems to be LZ77, which has been used in
|
||
C64 compression programs for a long time. LZ77 can take advantage of
|
||
repeating code/graphic/music data fragments and thus achieves better
|
||
compression. The drawback is that practical LZ77 implementations
|
||
tend to became variable-to-variable codes (more on that later) and
|
||
need to handle data bit by bit, which is quite a lot slower than
|
||
handling bytes.
|
||
|
||
LZ78 is not practical for C64, because the decompressor needs to
|
||
create and update the dictionary. A big enough dictionary would take
|
||
too much memory and updating the dictionary would need its share of
|
||
processor cycles.
|
||
|
||
|
||
Variable-to-variable codes
|
||
--------------------------
|
||
The compression algorithms in this category are mostly hybrids
|
||
or concatenations of the previously described compressors. For
|
||
example a variable-to-block code such as LZ77 followed by a
|
||
statistical compressor like Huffman encoding falls into this category
|
||
and is used in Zip, LHa, Gzip and many more. They use fixed, static,
|
||
and adaptive statistical compression, depending on the program and
|
||
the compression level selected.
|
||
|
||
Randomly concatenating algorithms rarely produces good results,
|
||
so you have to know what you are doing and what kind of files you are
|
||
compressing. Whenever a novice asks the usual question: 'What
|
||
compression program should I use?', they get the appropriate
|
||
response: 'What kind of data you are compressing?'
|
||
|
||
Borrowed from Tom Lane's article in comp.compression:
|
||
It's hardly ever worthwhile to take the compressed output of one
|
||
compression method and shove it through another compression method.
|
||
Especially not if the second method is a general-purpose compressor
|
||
that doesn't have specific knowledge of the first compression step.
|
||
Compression is effective in direct proportion to the extent that it
|
||
eliminates obvious patterns in the data. So if the first compression
|
||
step is any good, it will leave little traction for the second step.
|
||
Combining multiple compression methods is only helpful when the
|
||
methods are specifically chosen to be complementary.
|
||
|
||
A small sidetrack I want to take:
|
||
This also brings us conveniently to another truth in lossless
|
||
compression. There isn't a single compressor which would be able to
|
||
losslessly compress all possible files (you can see the
|
||
comp.compression FAQ for information about the counting proof). It
|
||
is our luck that we are not interested in compressing all files. We
|
||
are only interested in compressing a very small subset of all files.
|
||
The more accurately we can describe the files we would encounter, the
|
||
better. This is called modelling, and it is what all compression
|
||
programs do and must do to be successful.
|
||
|
||
Audio and graphics compression algorithm may assume a continuous
|
||
signal, and a text compressor may assume that there are repeated
|
||
strings in the data. If the data does not match the assumptions (the
|
||
model), the algorithm usually expands the data instead of compressing
|
||
it.
|
||
|
||
_____________________________________________________________________
|
||
|
||
Representing Integers
|
||
|
||
Many compression algorithms use integer values for something or
|
||
another. Pucrunch is no exception as it needs to represent RLE
|
||
repeat counts and LZ77 string match lengths and offsets. Any
|
||
algorithm that needs to represent integer values can benefit very
|
||
much if we manage to reduce the number of bits needed to do that.
|
||
This is why efficient coding of these integers is very important.
|
||
What encoding method to select depends on the distribution and the
|
||
range of the values.
|
||
|
||
|
||
Fixed, Linear
|
||
-------------
|
||
If the values are evenly distributed throughout the whole range,
|
||
a direct binary representation is the optimal choice. The number of
|
||
bits needed of course depends on the range. If the range is not a
|
||
power of two, some tweaking can be done to the code to get nearer the
|
||
theoretical optimum log2(_range_) bits per value.
|
||
|
||
Value Binary Adjusted 1&2
|
||
---------------------------
|
||
0 000 00 000 H = 2.585
|
||
1 001 01 001 L = 2.666
|
||
2 010 100 010 (for flat distribution)
|
||
3 011 101 011
|
||
4 100 110 10
|
||
5 101 111 11
|
||
|
||
The previous table shows two different versions of how the
|
||
adjustment could be done for a code that has to represent 6 different
|
||
values with the minimum average number of bits. As can be seen, they
|
||
are still both prefix codes, i.e. it's possible to (easily) decode
|
||
them.
|
||
|
||
If there is no definite upper limit to the integer value, direct
|
||
binary code can't be used and one of the following codes must be
|
||
selected.
|
||
|
||
|
||
Elias Gamma Code
|
||
----------------
|
||
The Elias gamma code assumes that smaller integer values are
|
||
more probable. In fact it assumes (or benefits from) a
|
||
proportionally decreasing distribution. Values that use n bits
|
||
should be twice as probable as values that use n+1 bits.
|
||
|
||
In this code the number of zero-bits before the first one-bit (a
|
||
unary code) defines how many more bits to get. The code may be
|
||
considered a special fixed Huffman tree. You can generate a Huffman
|
||
tree from the assumed value distribution and you'll get a very
|
||
similar code. The code is also directly decodable without any tables
|
||
or difficult operations, because once the first one-bit is found, the
|
||
length of the code word is instantly known. The bits following the
|
||
zero bits (if any) are directly the encoded value.
|
||
|
||
Gamma Code Integer Bits
|
||
--------------------------
|
||
1 1 1
|
||
01x 2-3 3
|
||
001xx 4-7 5
|
||
0001xxx 8-15 7
|
||
00001xxxx 16-31 9
|
||
000001xxxxx 32-63 11
|
||
0000001xxxxxx 64-127 13
|
||
...
|
||
|
||
|
||
Elias Delta Code
|
||
----------------
|
||
The Elias Delta Code is an extension of the gamma code. This
|
||
code assumes a little more 'traditional' value distribution. The
|
||
first part of the code is a gamma code, which tells how many more
|
||
bits to get (one less than the gamma code value).
|
||
|
||
Delta Code Integer Bits
|
||
--------------------------
|
||
1 1 1
|
||
010x 2-3 4
|
||
011xx 4-7 5
|
||
00100xxx 8-15 8
|
||
00101xxxx 16-31 9
|
||
00110xxxxx 32-63 10
|
||
00111xxxxxx 64-127 11
|
||
...
|
||
|
||
The delta code is better than gamma code for big values, as it
|
||
is asymptotically optimal (the expected codeword length approaches
|
||
constant times entropy when entropy approaches infinity), which the
|
||
gamma code is not. What this means is that the extra bits needed to
|
||
indicate where the code ends become smaller and smaller proportion of
|
||
the total bits as we encode bigger and bigger numbers. The gamma
|
||
code is better for greatly skewed value distributions (a lot of small
|
||
values).
|
||
|
||
|
||
Fibonacci Code
|
||
--------------
|
||
The fibonacci code is another variable length code where smaller
|
||
integers get shorter codes. The code ends with two one-bits, and the
|
||
value is the sum of the corresponding Fibonacci values for the bits
|
||
that are set (except the last one-bit, which ends the code).
|
||
|
||
1 2 3 5 8 13 21 34 55 89
|
||
----------------------------
|
||
1 (1) = 1
|
||
0 1 (1) = 2
|
||
0 0 1 (1) = 3
|
||
1 0 1 (1) = 4
|
||
0 0 0 1 (1) = 5
|
||
1 0 0 1 (1) = 6
|
||
0 1 0 1 (1) = 7
|
||
0 0 0 0 1 (1) = 8
|
||
: : : : : : :
|
||
1 0 1 0 1 (1) = 12
|
||
0 0 0 0 0 1 (1) = 13
|
||
: : : : : : : :
|
||
0 1 0 1 0 1 (1) = 20
|
||
0 0 0 0 0 0 1 (1) = 21
|
||
: : : : : : : : :
|
||
1 0 0 1 0 0 1 (1) = 27
|
||
|
||
Note that because the code does not have two successive one-bits
|
||
until the end mark, the code density may seem quite poor compared to
|
||
the other codes, and it is, if most of the values are small (1-3).
|
||
On the other hand, it also makes the code very robust by localizing
|
||
and containing possible errors. Although, if the Fibonacci code is
|
||
used as a part of a larger system, this robustness may not help much,
|
||
because we lose the synchronization in the upper level anyway. Most
|
||
adaptive methods can't recover from any errors, whether they are
|
||
detected or not. Even in LZ77 the errors can be inherited infinitely
|
||
far into the future.
|
||
|
||
|
||
Comparison between delta, gamma and Fibonacci code lengths
|
||
----------------------------------------------------------
|
||
|
||
Gamma Delta Fibonacci
|
||
1 1 1 2.0
|
||
2-3 3 4 3.5
|
||
4-7 5 5 4.8
|
||
8-15 7 8 6.4
|
||
16-31 9 9 7.9
|
||
32-63 11 10 9.2
|
||
64-127 13 11 10.6
|
||
|
||
The comparison shows that if even half of the values are in the
|
||
range 1..7 (and other values relatively near this range), the Elias
|
||
gamma code wins by a handsome margin.
|
||
|
||
|
||
Golomb and Rice Codes
|
||
---------------------
|
||
Golomb (and Rice) codes are prefix codes that are suboptimal
|
||
(compared to Huffman), but very easy to implement. Golomb codes are
|
||
distinguished from each other by a single parameter m. This makes it
|
||
very easy to adjust the code dynamically to adapt to changes in the
|
||
values to encode.
|
||
|
||
Golomb m=1 m=2 m=3 m=4 m=5 m=6
|
||
Rice k=0 k=1 k=2
|
||
---------------------------------------------------
|
||
n = 0 0 00 00 000 000 000
|
||
1 10 01 010 001 001 001
|
||
2 110 100 011 010 010 0100
|
||
3 1110 101 100 011 0110 0101
|
||
4 11110 1100 1010 1000 0111 0110
|
||
5 111110 1101 1011 1001 1000 0111
|
||
6 1111110 11100 1100 1010 1001 1000
|
||
7 : 11101 11010 1011 1010 1001
|
||
8 : 111100 11011 11000 10110 10100
|
||
|
||
To encode an integer n (starting from 0 this time, not from 1 as
|
||
for Elias codes and Fibonacci code) using the Golomb code with
|
||
parameter m, we first compute floor( n/m ) and output this using a
|
||
unary code. Then we compute the remainder n mod m and output that
|
||
value using a binary code which is adjusted so that we sometimes use
|
||
floor( log2(m) ) bits and sometimes ceil( log2(m) ) bits.
|
||
|
||
Rice coding is the same as Golomb coding except that only a
|
||
subset of parameters can be used, namely the powers of 2. In other
|
||
words, a Rice code with the parameter k is equal to Golomb code with
|
||
parameter m = 2^k. Because of this the Rice codes are much more
|
||
efficient to implement on a computer. Division becomes a shift
|
||
operation and modulo becomes a bit mask operation.
|
||
|
||
|
||
Hybrid/Mixed Codes
|
||
------------------
|
||
Sometimes it may be advantageous to use a code that combines two
|
||
or more of these codes. In a way the Elias codes are already hybrid
|
||
codes. The gamma code has a fixed huffman tree (a unary code) and a
|
||
binary code part, the delta code has a gamma code and a binary code
|
||
part. The same applies to Golomb and Rice codes because they consist
|
||
of a unary code part and a linear code (adjusted) part.
|
||
|
||
|
||
So now we have several alternatives to choose from. We simply
|
||
have to do a little real-life research to determine how the values we
|
||
want to encode are distributed so that we can select the optimum code
|
||
to represent them.
|
||
|
||
Of course we still have to keep in mind that we intend to decode
|
||
the thing with a 1-MHz 8-bit processor. As always, compromises loom
|
||
on the horizon. Pucrunch uses Elias Gamma Code, because it is the
|
||
best alternative for that task and is very close to static Huffman
|
||
code. The best part is that the Gamma Code is much simpler to decode
|
||
and doesn't need additional memory.
|
||
|
||
_____________________________________________________________________
|
||
|
||
Closer Look
|
||
|
||
Because the decompression routines are usually much easier to
|
||
understand than the corresponding compression routines, I will
|
||
primarily describe only them here. This also ensures that there
|
||
really _is_ a decompressor for a compression algorithm. Many are
|
||
those people who have developed a great new compression algorithm
|
||
that outperforms all existing versions, only to later discover that
|
||
their algorithm doesn't save enough information to be able to recover
|
||
the original file from the compressed data.
|
||
|
||
Also, the added bonus is that once we have a decompressor, we
|
||
can improve the compressor without changing the file format. At
|
||
least until we have some statistics to develop a better system. Many
|
||
lossy video and audio compression systems only document and
|
||
standardize the decompressor and the file or stream format, making it
|
||
possible to improve the encoding part of the process when faster and
|
||
better hardware (or algorithms) become available.
|
||
|
||
|
||
RLE
|
||
---
|
||
void DecompressRLE() {
|
||
int oldChar = -1;
|
||
int newChar;
|
||
|
||
while(1) {
|
||
newChar = GetByte();
|
||
if(newChar == EOF)
|
||
return;
|
||
PutByte(newChar);
|
||
if(newChar == oldChar) {
|
||
int len = GetLength();
|
||
|
||
while(len > 0) {
|
||
PutByte(newChar);
|
||
len = len - 1;
|
||
}
|
||
}
|
||
oldChar = newChar;
|
||
}
|
||
}
|
||
|
||
This RLE algorithm uses two successive equal characters to mark
|
||
a run of bytes. I have in purpose left open the question of how the
|
||
length is encoded (1 or 2 bytes or variable-length code). The
|
||
decompressor also allows chaining/extension of RLE runs, for example
|
||
'a', 'a', 255, 'a', 255 would output 513 'a'-characters.
|
||
In this case the compression algorithm is almost as simple.
|
||
|
||
void CompressRLE() {
|
||
int oldChar = -1;
|
||
int newChar;
|
||
|
||
while(1) {
|
||
newChar = GetByte();
|
||
if(newChar==oldChar) {
|
||
int length = 0;
|
||
|
||
if(newChar == EOF)
|
||
return;
|
||
PutByte(newChar); /* RLE indicator */
|
||
|
||
/* Get all equal characters */
|
||
while((newChar = GetByte()) == oldChar) {
|
||
length++;
|
||
}
|
||
PutLength(length);
|
||
}
|
||
if(newChar == EOF)
|
||
return;
|
||
PutByte(newChar);
|
||
oldChar = newChar;
|
||
}
|
||
}
|
||
|
||
If there are two equal bytes, the compression algorithm reads
|
||
more bytes until it gets a different byte. If there was only two
|
||
equal bytes, the length value will be zero and the compression
|
||
algorithm expands the data. A C64-related example would be the
|
||
compression of the BASIC ROM with this RLE algorithm. Or actually
|
||
expansion, as the new file size is 8200 bytes instead of the original
|
||
8192 bytes. Those equal byte runs that the algorithm needs just
|
||
aren't there. For comparison, pucrunch manages to compress the BASIC
|
||
ROM into 7288 bytes, the decompression code included. Even Huffman
|
||
coding manages to compress it into 7684 bytes.
|
||
|
||
"BAAAAAADBBABBBBBAAADABCD" total: 24*8=192 bits
|
||
"BAA",4,"DBB",0,"ABB",3,"AA",1,"DABCD" total: 16*8+4*8=160 bits
|
||
|
||
This is an example of how the presented RLE encoder would work
|
||
on a string. The total length calculations assume that we are
|
||
handling 8-bit data, although only values from 'A' to 'D' are present
|
||
in the string. After seeing two equal characters the decoder gets a
|
||
repeat count and then adds that many more of them. Notice that the
|
||
repeat count is zero if there are only two equal characters.
|
||
|
||
|
||
Huffman Code
|
||
------------
|
||
|
||
int GetHuffman() {
|
||
int index = 0;
|
||
|
||
while(1) {
|
||
if(GetBit() == 1) {
|
||
index = LeftNode(index);
|
||
} else {
|
||
index = RightNode(index);
|
||
}
|
||
if(LeafNode(index)) {
|
||
return LeafValue(index);
|
||
}
|
||
}
|
||
}
|
||
|
||
My pseudo code of the Huffman decode function is a very
|
||
simplified one, so I should probably describe how the Huffman code
|
||
and the corresponding binary tree is constructed first.
|
||
|
||
First we need the statistics for all the symbols occurring in
|
||
the message, i.e. the file we are compressing. Then we rank them in
|
||
decreasing probability order. Then we combine the smallest two
|
||
probabilities and assign 0 and 1 to the binary tree branches, i.e.
|
||
the original symbols. We do this until there is only one composite
|
||
symbol left.
|
||
|
||
Depending on where we insert the composite symbols we get
|
||
different Huffman trees. The average code length is equal in both
|
||
cases (and so is the compression ratio), but the length of the
|
||
longest code changes. The implementation of the decoder is usually
|
||
more efficient if we keep the longest code as short as possible.
|
||
This is achieved by inserting the composite symbols (new nodes)
|
||
before all symbols/nodes that have equal probability.
|
||
|
||
"BAAAAAADBBABBBBBAAADABCD"
|
||
A (11) B (9) D (3) C (1)
|
||
|
||
Step 1 Step 2 Step 3
|
||
'A' 0.458 'A' 0.458 C2 0.542 0\ C3
|
||
'B' 0.375 'B' 0.375 0\ C2 'A' 0.458 1/
|
||
'D' 0.125 0\ C1 C1 0.167 1/
|
||
'C' 0.042 1/
|
||
|
||
C3
|
||
0 / \ 1
|
||
/ 'A'
|
||
C2
|
||
0 / \ 1
|
||
'B' \
|
||
C1
|
||
0 / \ 1
|
||
'D' 'C'
|
||
|
||
So, in each step we combine two lowest-probability nodes or
|
||
leaves into a new node. When we are done, we have a Huffman tree
|
||
containing all the original symbols. The Huffman codes for the
|
||
symbols can now be gotten by starting at the root of the tree and
|
||
collecting the 0/1-bits on the way to the desired leaf (symbol). We
|
||
get:
|
||
|
||
'A' = 1 'B' = 00 'C' = 011 'D' = 010
|
||
|
||
These codes (or the binary tree) are used when encoding the
|
||
file, but the decoder also needs this information. Sending the
|
||
binary tree or the codes would take a lot of bytes, thus taking away
|
||
all or most of the compression. The amount of data needed to
|
||
transfer the tree can be greatly reduced by sending just the symbols
|
||
and their code lengths. If the tree is traversed in a canonical
|
||
(predefined) order, this is all that is needed to recreate the tree
|
||
and the Huffman codes. By doing a 0-branch-first traverse we get:
|
||
|
||
Symbol Code Code Length
|
||
'B' 00 2
|
||
'D' 010 3
|
||
'C' 011 3
|
||
'A' 1 1
|
||
|
||
So we can just send 'B', 2, 'D', 3, 'C', 3, 'A', 1 and the
|
||
decoder has enough information (when it also knows how we went
|
||
through the tree) to recreate the Huffman codes and the tree.
|
||
Actually you can even drop the symbol values if you handle things a
|
||
bit differently (see the Deflate specification in RFC1951), but my
|
||
arrangement makes the algorithm much simpler and doesn't need to
|
||
transfer data for symbols that are not present in the message.
|
||
|
||
Basically we start with a code value of all zeros and the
|
||
appropriate length for the first symbol. For other symbols we first
|
||
add the code value with 1 and then shift the value left or right to
|
||
get it to be the right size. In the example we first assign 00 to
|
||
'B', then add one to get 01, shift left to get a 3-bit codeword for
|
||
'D' making it 010 like it should. For 'C' add 1, you get 011, no
|
||
shift because the codewords is the right size already. And for 'A'
|
||
add one and get 100, shift 2 places to right and get 1.
|
||
|
||
The Deflate algorithm in essence attaches a counting sort
|
||
algorithm to this algorithm, feeding in the symbols in increasing
|
||
code length order. Oh, don't worry if you don't understand what the
|
||
counting sort has to do with this. I just wanted to give you some
|
||
idea about it if you some day read the deflate specification or the
|
||
gzip source code.
|
||
|
||
Actually, the decoder doesn't necessarily need to know the
|
||
Huffman codes at all, as long as it has created the proper internal
|
||
representation of the Huffman tree. I developed a special table
|
||
format which I used in the C64 Huffman decode function and may
|
||
present it in a separate article someday. The decoding works by just
|
||
going through the tree by following the instructions given by the
|
||
input bits as shown in the example Huffman decode code. Each bit in
|
||
the input makes us go to either the 0-branch or the 1-branch. If the
|
||
branch is a leaf node, we have decoded a symbol and just output it,
|
||
return to the root node and repeat the procedure.
|
||
|
||
A technique related to Huffman coding is Shannon-Fano coding.
|
||
It works by first dividing the symbols into two equal-probability
|
||
groups (or as close to as possible). These groups are then further
|
||
divided until there is only one symbol in each group left. The
|
||
algorithm used to create the Huffman codes is bottom-up, while the
|
||
Shannon-Fano codes are created top-down. Huffman encoding always
|
||
generates optimal codes (in the entropy sense), Shannon-Fano
|
||
sometimes uses a few more bits.
|
||
|
||
There are also ways of modifying the statistical compression
|
||
methods so that we get nearer to the entropy. In the case of 'A'
|
||
having the probability 0.75 and 'B' 0.25 we can decide to group
|
||
several symbols together, producing a variable-to-variable code.
|
||
|
||
"AA" 0.5625 0
|
||
"B" 0.25 10
|
||
"AB" 0.1875 11
|
||
|
||
If we separately transmit the length of the file, we get the
|
||
above probabilities. If a file has only one 'A', it can be encoded
|
||
as length=1 and either "AA" or "AB". The entropy of the source is
|
||
H = 0.8113, and the average code length (per source symbol) is
|
||
approximately L = 0.8518, which is much better than L = 1.0, which we
|
||
would get if we used a code {'A','B'} = {0,1}. Unfortunately this
|
||
method also expands the number of symbols we have to handle, because
|
||
each possible source symbol combination is handled as a separate
|
||
symbol.
|
||
|
||
|
||
Arithmetic Coding
|
||
-----------------
|
||
Huffman and Shannon-Fano codes are only optimal if the
|
||
probabilities of the symbols are negative powers of two. This is
|
||
because all prefix codes work in the bit level. Decisions between
|
||
tree branches always take one bit, whether the probabilities for the
|
||
branches are 0.5/0.5 or 0.9/0.1. In the latter case it would
|
||
theoretically take only 0.15 bits (-log2(0.9)) to select the first
|
||
branch and 3.32 bits (-log2(0.1)) to select the second branch, making
|
||
the average code length 0.467 bits (0.9*0.15 + 0.1*3.32). The
|
||
Huffman code still needs one bit for each decision.
|
||
|
||
Arithmetic coding does not have this restriction. It works by
|
||
representing the file by an interval of real numbers between 0 and 1.
|
||
When the file size increases, the interval needed to represent it
|
||
becomes smaller, and the number of bits needed to specify that
|
||
interval increases. Successive symbols in the message reduce this
|
||
interval in accordance with the probability of that symbol. The more
|
||
likely symbols reduce the range by less, and thus add fewer bits to
|
||
the message.
|
||
|
||
1 Codewords
|
||
+-----------+-----------+-----------+
|
||
| |8/9 YY | Detail |<- 31/32 .11111
|
||
| +-----------+-----------+<- 15/16 .1111
|
||
| Y | | too small |<- 14/16 .1110
|
||
|2/3 | YX | for text |<- 6/8 .110
|
||
+-----------+-----------+-----------+
|
||
| | |16/27 XYY |<- 10/16 .1010
|
||
| | +-----------+
|
||
| | XY | |
|
||
| | | XYX |<- 4/8 .100
|
||
| |4/9 | |
|
||
| +-----------+-----------+
|
||
| | | |
|
||
| X | | XXY |<- 3/8 .011
|
||
| | |8/27 |
|
||
| | +-----------+
|
||
| | XX | |
|
||
| | | |<- 1/4 .01
|
||
| | | XXX |
|
||
| | | |
|
||
|0 | | |
|
||
+-----------+-----------+-----------+
|
||
|
||
As an example of arithmetic coding, lets consider the example of
|
||
two symbols X and Y, of probabilities 2/3 and 1/3. To encode a
|
||
message, we examine the first symbol: If it is a X, we choose the
|
||
lower partition; if it is a Y, we choose the upper partition.
|
||
Continuing in this manner for three symbols, we get the codewords
|
||
shown to the right of the diagram above. They can be found by simply
|
||
taking an appropriate location in the interval for that particular
|
||
set of symbols and turning it into a binary fraction. In practice,
|
||
it is also necessary to add a special end-of-data symbol, which is
|
||
not represented in this simple example.
|
||
|
||
This explanation may not be enough to help you understand
|
||
arithmetic coding. There are a lot of good articles about arithmetic
|
||
compression in the net, for example by Mark Nelson.
|
||
|
||
Arithmetic coding is not practical for C64 for many reasons.
|
||
The biggest reason being speed, especially for adaptive arithmetic
|
||
coding. The close second reason is of course memory.
|
||
|
||
|
||
Symbol Ranking
|
||
--------------
|
||
Symbol ranking is comparable to Huffman coding with a fixed
|
||
Huffman tree. The compression ratio is not very impressive (reaches
|
||
Huffman code only is some cases), but the decoding algorithm is very
|
||
simple, does not need as much memory as Huffman and is also faster.
|
||
|
||
int GetByte() {
|
||
int index = GetUnaryCode();
|
||
|
||
return mappingTable[index];
|
||
}
|
||
|
||
The main idea is to have a table containing the symbols in
|
||
descending probability order (rank order). The message is then
|
||
represented by the table indices. The index values are in turn
|
||
represented by a variable-length integer representation (these are
|
||
studied in the next article). Because more probable symbols (smaller
|
||
indices) take less bits than less probable symbols, in average we
|
||
save bits. Note that we have to send the rank order, i.e. the
|
||
symbol table too.
|
||
|
||
"BAAAAAADBBABBBBBAAADABCD" total: 24*8=192 bits
|
||
Rank Order: A (11) B (9) D (3) C (1) 4*8=32 bits
|
||
Unary Code: 0 10 110 1110
|
||
"100000001101010010101010100001100101110110" 42 bits
|
||
total: 74 bits
|
||
|
||
The statistics rank the symbols in the order ABDC (most probable
|
||
first), which takes approximately 32 bits to transmit (we assume that
|
||
any 8-bit value is possible). The indices are represented as a code
|
||
{0, 1, 2, 3} = {0, 10, 110, 1110}. This is a simple unary code where
|
||
the number of 1-bits before the first 0-bit directly give the integer
|
||
value. The first 0-bit also ends a symbol. When this code and the
|
||
rank order table are combined in the decoder, we get the reverse code
|
||
{0, 10, 110, 1110} = {A, B, D, C}. Note that in this case the code
|
||
is very similar to the Huffman code we created in a previous example.
|
||
|
||
|
||
LZ78
|
||
----
|
||
LZ78-based schemes work by entering phrases into a dictionary
|
||
and then, when a repeat occurrence of that particular phrase is
|
||
found, outputting the dictionary index instead of the phrase. For
|
||
example, LZW (Lempel-Ziv-Welch) uses a dictionary with 4096 entries.
|
||
In the beginning the entries 0-255 refer to individual bytes, and the
|
||
rest 256-4095 refer to longer strings. Each time a new code is
|
||
generated it means a new string has been selected from the input
|
||
stream. New strings that are added to the dictionary are created by
|
||
appending the current character K to the end of an existing string w.
|
||
The algorithm for LZW compression is as follows:
|
||
|
||
set w = NIL
|
||
loop
|
||
read a character K
|
||
if wK exists in the dictionary
|
||
w = wK
|
||
else
|
||
output the code for w
|
||
add wK to the string table
|
||
w = K
|
||
endloop
|
||
|
||
Input string: /WED/WE/WEE/WEB
|
||
|
||
Input Output New code and string
|
||
/W / 256 = /W
|
||
E W 257 = WE
|
||
D E 258 = ED
|
||
/ D 259 = D/
|
||
WE 256 260 = /WE
|
||
/ E 261 = E/
|
||
WEE 260 262 = /WEE
|
||
/W 261 263 = E/W
|
||
EB 257 264 = WEB
|
||
B
|
||
|
||
A sample run of LZW over a (highly redundant) input string can
|
||
be seen in the diagram above. The strings are built up
|
||
character-by-character starting with a code value of 256. LZW
|
||
decompression takes the stream of codes and uses it to exactly
|
||
recreate the original input data. Just like the compression
|
||
algorithm, the decompressor adds a new string to the dictionary each
|
||
time it reads in a new code. All it needs to do in addition is to
|
||
translate each incoming code into a string and send it to the output.
|
||
A sample run of the LZW decompressor is shown in below.
|
||
|
||
Input code: /WEDEB
|
||
|
||
Input Output New code and string
|
||
/ /
|
||
W W 256 = /W
|
||
E E 257 = WE
|
||
D D 258 = ED
|
||
256 /W 259 = D/
|
||
E E 260 = /WE
|
||
260 /WE 261 = E/
|
||
261 E/ 262 = /WEE
|
||
257 WE 263 = E/W
|
||
B B 264 = WEB
|
||
|
||
The most remarkable feature of this type of compression is that
|
||
the entire dictionary has been transmitted to the decoder without
|
||
actually explicitly transmitting the dictionary. The decoder builds
|
||
the dictionary as part of the decoding process.
|
||
|
||
See also the article "LZW Compression" by Bill Lucier in
|
||
C=Hacking issue 6 and "LZW Data Compression" by Mark Nelson mentioned
|
||
in the references section.
|
||
|
||
_____________________________________________________________________
|
||
|
||
Conclusions
|
||
===========
|
||
|
||
That's more than enough for one article. What did we get out of
|
||
it ? Statistical compression works with uneven symbol probabilities
|
||
to reduce the average code length. Substitutional compressors
|
||
replace strings with shorter representations. All popular
|
||
compression algorithms use LZ77 or LZ78 followed by some sort of
|
||
statistical compression. And you can't just mix and match different
|
||
algorithms and expect good results.
|
||
|
||
There are no shortcuts in understanding data compression. Some
|
||
things you only understand when trying out them yourself. However, I
|
||
hope that this article has given you at least a vague grasp of how
|
||
different compression methods really work.
|
||
|
||
I would like to send special thanks to Stephen Judd for his
|
||
comments. Without him this article would've been much more
|
||
unreadable than it is now. On the other hand, that's what the
|
||
magazine editor is for :-)
|
||
|
||
The second part of the story is a detailed talk about pucrunch.
|
||
I also go through the corresponding C64 decompression code in detail.
|
||
If you are impatient and can't wait for the next issue, you can take
|
||
a peek into http://www.cs.tut.fi/%7Ealbert/Dev/pucrunch/ for a preview.
|
||
|
||
_____________________________________________________________________
|
||
|
||
References
|
||
==========
|
||
|
||
* The comp.compression FAQ
|
||
http://www.cis.ohio-state.edu/hypertext/faq/usenet/
|
||
compression-faq/top.html
|
||
* A Data Compression Review
|
||
http://www.ics.uci.edu/%7Edan/pubs/DataCompression.html
|
||
* Data Compression Class
|
||
http://www.cs.unt.edu/home/srt/5330/
|
||
* Mark Nelson's Homepage
|
||
http://web2.airmail.net/markn/
|
||
+ LZW Data Compression
|
||
http://web2.airmail.net/markn/articles/lzw/lzw.htm
|
||
+ Arithmetic Coding Article
|
||
http://web2.airmail.net/markn/articles/arith/part1.htm
|
||
+ Data Compression with the Burrows-Wheeler Transform
|
||
http://web2.airmail.net/markn/articles/bwt/bwt.htm
|
||
* Charles Bloom's Page
|
||
http://wwwvms.utexas.edu/%7Ecbloom/
|
||
* The Redundancy of the Ziv-Lempel Algorithm for Memoryless Sources
|
||
http://ei0.ei.ele.tue.nl/%7Etjalling/zivlem/zivlem.html
|
||
* Ross Williams' Compression Pages
|
||
http://www.ross.net/home/
|
||
* DEFLATE Compressed Data Format Specification version 1.3
|
||
http://www.funet.fi/pub/doc/rfc/rfc1951.txt
|
||
* Markus F.X.J. Oberhumer's Compression Links
|
||
http://wildsau.idv.uni-linz.ac.at/mfx/compress.html
|
||
* The Lossless Compression (Squeeze) Page
|
||
http://www.cs.sfu.ca/CC/365/li/squeeze/
|
||
|
||
........
|
||
....
|
||
..
|
||
. C=H
|
||
|
||
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
|
||
|
||
|
||
3D Graphics for the Masses: lib3d and Cool World
|
||
-------------------------- by
|
||
Stephen L. Judd
|
||
sjudd@nwu.edu
|
||
|
||
Well folks, it's time once again for some 3D graphics. This
|
||
will, I think, be more or less the final word on the subject, at least
|
||
from me! I'm pooped, and I think these routines pretty much push
|
||
these algorithms as far as they're going to go. And there are so
|
||
many other interesting things to move on to!
|
||
|
||
These routines, not to mention this article, are the codification
|
||
of all those years of algorithms and derivations and everything else.
|
||
Those past efforts created working prototypes; this one is the
|
||
production model. Nate Dannenberg suggested that this be done, and
|
||
who am I to disobey? Besides, after a break of a few years it's going
|
||
to be fun to rederive all the equations and algorithms from scratch,
|
||
and do a "for-real" implementation, improving the good ideas and
|
||
fixing the bad.
|
||
|
||
Right? Right.
|
||
|
||
So that's what I did, and that's what this article does. The
|
||
first section of the article summarizes the basics of 3D graphics:
|
||
projections and rotations. Since in my experience many people don't
|
||
remember their high school math, I've also covered the basic mathematical
|
||
tools needed, like trigonometry and linear algebra, and other related
|
||
issues. The next section covers constructing a 3D world: representing
|
||
objects, navigating through the world, things like that. The third
|
||
section covers the main library routines and their implementation,
|
||
including lots of code disassembly. The fourth section covers Cool World,
|
||
a program which is a demonstration of the 3D library. The final section
|
||
just summarizes the 3d library routines, calling conventions, parameters
|
||
used, things like that. Since this article is enormous, I have left
|
||
out the complete source code; the truly motivated can visit
|
||
|
||
http://stratus.esam.nwu.edu/~judd/fridge/
|
||
|
||
to check out the full source for both Cool World and the 3d library. The
|
||
binaries are also present there.
|
||
|
||
By this time you may have asked, "What exactly _is_ this
|
||
3d library?" Glad you asked. It is a set of routines that are
|
||
important for doing 3D graphics: rotations, projections, drawing
|
||
polygons, that sort of thing. The routines are very flexible, compact,
|
||
and extremely fast. They can draw to any VIC bank, and still leave
|
||
enough room for eight sprite definitions. What the routines are
|
||
not is a "3D world construction kit". A program has to be written
|
||
which utilizes the routines.
|
||
Which leads to Cool World. CW is, simply, a program which
|
||
demonstrates using the 3D library. It is very large. It has some
|
||
seventeen objects in it -- the initial tetrahedron is the 'center';
|
||
straight ahead of it is squaresville; to the left is a bunch of stars;
|
||
straight down is the Cobra Mk III; straight back is a line of
|
||
tetrahedrons. It also has some other fun stuff like the starfield
|
||
and a tune. It doesn't have Kim Basinger, unfortunately.
|
||
Oh yeah: the _whole point_ of the 3d library is for you to
|
||
use it in your own programs! You can sell those programs if you like,
|
||
too. Doesn't bother me; in fact I would be quite flattered to see
|
||
it used in some cool and popular program. Why re-invent the wheel
|
||
when the routines are already written? Use the 3d library. Live
|
||
the 3d library. Do something interesting with it. If nobody uses
|
||
it, it might as well have never been written, right? Right. You
|
||
know you want to use it. So use it!
|
||
But before you use it, you better understand it. So let's
|
||
start at the very beginning (a very good place to start).
|
||
|
||
---------
|
||
Section 1: 3D Basics and review
|
||
---------
|
||
|
||
This is just going to be a quick summary. For more detail
|
||
you ought to go back to some of the earlier articles.
|
||
|
||
First we need to decide on a coordinate system. A convenient
|
||
system for dealing with the computer is to have the x-axis point to
|
||
the right and the y-axis to point down the screen -- the usual
|
||
coordinate system with x=0 y=0 being in the upper-left corner of
|
||
the screen. The z-axis can then point into the monitor (i.e. z=20
|
||
is behind the monitor screen somewhere, and z=-20 is somewhere behind
|
||
where you're sitting); this keeps the coordinate system right-handed.
|
||
|
||
The next thing is to figure out an equation for projecting
|
||
from three dimensions into two dimensions. In other words: how to
|
||
paint a picture.
|
||
One of the great breakthroughs for art at the beginning of
|
||
the Renaissance was the understanding of perspecitve. The first
|
||
thing to notice is that far-away objects look smaller. The next
|
||
thing to observe is that straight lines seem to meet in the middle --
|
||
like looking down a long road or sidewalk or railroad tracks. That's
|
||
perspective: far-off objects get smaller _towards the center of vision_.
|
||
Well that's an easy equation: just take the coordinate, and
|
||
divide by the distance away from us. If we're looking down the z-axis,
|
||
|
||
x' = x/z y' = y/z
|
||
|
||
gives us a pair of projected points. Objects which are far away have
|
||
a large value for z, which means x' and y' will be proportionally
|
||
smaller. For really large z, they go to zero -- they go to the center
|
||
of our vision.
|
||
Now, how can we make objects appear larger? One way is to
|
||
stand closer to them -- that changes the perspective. Another way
|
||
is to magnify them, with a lens. This is how a telescope works, and
|
||
also how the old Mark-I eyeball works as well. Whereas perspective
|
||
makes far-off points behave differently than near ones, magnification
|
||
just expands all points equally. And that's exactly how to put it in
|
||
the equation:
|
||
|
||
x' = d * x/z y' = d * y/z
|
||
|
||
where d is some constant magnification factor (a number, like 12).
|
||
And the above are the projection equations.
|
||
A very important skill is the ability to 'read' an equation.
|
||
The above equation takes a point, divides by the distance away from us,
|
||
and then multiplies by the magnification constant. It says that
|
||
far-off objects will be small, and that all objects are magnified
|
||
equally. It also implies that the z-coordinates better all be positive
|
||
or all negative. If an object has some positive z-coordinates and
|
||
some negative, that means that some of its points are in front of us
|
||
and some are behind us. In other words, that we are standing inside
|
||
the object!
|
||
A proper derivation is as follows: consider a pinhole camera.
|
||
A light ray bounces off some object at a point (x,y,z) and passes
|
||
through the pinhole, located at the origin (0,0,0). We then place
|
||
a photographic plate parallel to the x-y plane, at z'=d, and figure
|
||
out where the light ray hits the plate. The equation of the light
|
||
ray is the line
|
||
|
||
(x', y', z') = t*(x,y,z)
|
||
|
||
and it hits the plate when z'=d, which happens when
|
||
|
||
tz = d
|
||
|
||
so when t = d/z. Thus the coordinates on the plate (the film) are
|
||
|
||
x' = d/z * x y' = d/z * y
|
||
|
||
which gives the earlier equations. Moving the film up and down will
|
||
magnify the image, corresponding to changing the value of d. If d
|
||
is negative, the image will be inverted. Positive d corresponds to
|
||
having the film between the pinhole lens and the object, but
|
||
mathematically it gives the same answer without inverting the object
|
||
(since x' and y' won't have a minus sign in front of them).
|
||
|
||
Representing 3D objects
|
||
-----------------------
|
||
|
||
The important thing to recognize from the above is that
|
||
straight lines in 3D will still be straight lines in 2D -- the
|
||
projection doesn't turn straight lines into curves or anything
|
||
like that.
|
||
So, to project a straight line from 3D into 2D all that
|
||
is needed are the _endpoints_ of the line. Project those two
|
||
endpoints from 3D into 2D, and then all that is needed is to draw
|
||
a line between the two _projected_ points.
|
||
Most of the objects we will be dealing with will be made
|
||
up of a series of flat plates, because these are easier to do
|
||
calculations with (ever noticed the main difference between the
|
||
stealth fighter and the stealth bomber?). In other words: polygons.
|
||
These polygons are just flat objects with several straight
|
||
sides. All we need are the vertices of each polygon (the endpoints
|
||
of each of the lines); from those we can reconstruct all the
|
||
in-between points.
|
||
A very simple example is a cube. It has six faces. The
|
||
eight corners of a cube might be located at (+/-1, +/-1, +/-1).
|
||
Project those eight points and connect the projected points in the
|
||
right way, and Viola! Violin! 3D graphics.
|
||
|
||
The next thing we will want to do is rotate an object. For
|
||
this we need a little trig, and it's very helpful to know a little
|
||
about linear algebra and vectors. What's that? You don't remember
|
||
this stuff? For shame! But we can fix that up in no time.
|
||
|
||
Trig review in two paragraphs.
|
||
-----------
|
||
|
||
Take a look at a triangle sometime. No matter how large or
|
||
small you draw a particular triangle, it still looks the same
|
||
because all of the sides and angles are in proportion to one another.
|
||
Now fix one of the angles at 90 degrees, to form a right triangle.
|
||
The minute you choose one of the other angles, the opposite angle
|
||
is determined (since the angles have to add up to 180 degrees).
|
||
And when you know all the angles you know the _ratios_ of the sides
|
||
to each other -- you don't know the actual length of the sides,
|
||
but you know how the triangle looks and hence its proportions.
|
||
And that is pretty much all of trig. Draw a right triangle,
|
||
and pick an angle (call that angle theta). The hypoteneuse is the
|
||
longest side of a triangle, and hence the side opposite the 90 degree
|
||
angle. The 'adjacent' side is the side which touches the angle
|
||
theta; the 'opposite' side is the side opposite theta (imagine that!).
|
||
We _define_ sine, cosine, and tangent as
|
||
|
||
cos(theta) = adjacent/hypoteneuse
|
||
sin(theta) = opposite/hypoteneuse
|
||
tan(theta) = sin(theta)/cos(theta) = opposite/adjacent
|
||
|
||
and that's all of trig, right there. Everything else comes from
|
||
those three definitions. If you can't remember them, just remember
|
||
the famous Indian Chief SOHCAHTOA, who fought so bravely for the
|
||
cause of his people and mathematicians everywhere (SOH: sin-opposite-hyp;
|
||
CAH: cos-adjacent-hyp; TOA: tan-opposite-adjacent).
|
||
|
||
Polar coordinates
|
||
-----------------
|
||
|
||
Now we've got some tools to do some useful calculations with,
|
||
once we remember the Pythagorean theorem,
|
||
|
||
a^2 + b^2 = c^2,
|
||
|
||
where c = the hypoteneuse and a,b are the sides of a right triangle.
|
||
Before doing rotations it is helpful to understand polar coordinates.
|
||
Draw a set of normal coordinate axis on a piece of paper, and
|
||
draw a point somewhere (at 3,2 say). Label this point x,y and
|
||
draw a line from the origin (0,0) to that point, and call its
|
||
length "r". Label the angle between that line and the positive
|
||
x-axis as "theta", and draw a line straight down from x,y to the
|
||
x-axis. Whaddaya know? It's a right triangle. The length of one
|
||
side is x, the length of the other side is y, and the length of the
|
||
hypoteneuse is r. And the trig definitions tell us that
|
||
|
||
cos(theta) = x/r
|
||
sin(theta) = y/r
|
||
tan(theta) = y/x.
|
||
|
||
So now we can define sin, cos, and tan for all angles theta just by
|
||
drawing a little picture.
|
||
Two coordinates, x and y, are used to locate the point.
|
||
Another two coordinates in the picture are "r" and "theta", and
|
||
they are just as good for locating where a point is. In fact,
|
||
the above equations give the x and y coordinates of a point
|
||
(r,theta):
|
||
|
||
x = r * cos(theta) y = r * sin(theta)
|
||
|
||
They also give us the r and theta coordinates of a point (x,y):
|
||
|
||
tan(theta) = y/x
|
||
r^2 = x^2 + y^2
|
||
|
||
You may also have noticed that the trig formula
|
||
|
||
cos(theta)^2 + sin(theta)^2 = 1
|
||
|
||
follows from the above trig definitions, since x^2 + y^2 = r^2.
|
||
In the Cartesian (or rectangular) system there are two
|
||
coordinates, x and y. The equation x=constant is a vertical line
|
||
and the equation y=constant is a horizontal line, so this is a
|
||
system of straight lines which cross each other at right angles.
|
||
In polar coordinates the equation theta=constant is a radial line,
|
||
and r=constant gives a circle (constant radius, you know). So
|
||
polar coordinates consists of circles which intersect radial lines
|
||
(note that they also intersect at right angles). And the equations
|
||
above tell how to convert between the two systems.
|
||
There are many other coordinate systems, btw. There are
|
||
elliptical coordinates and parabolic coordinates and other strange
|
||
systems. Why bother? For one thing, if you're solving an equation
|
||
on a round plate it's awfully convenient to use polar coordinates.
|
||
For another, we can now do some rotations.
|
||
|
||
(And for another, you can draw cool graphs like r=cos(3*theta)).
|
||
|
||
|
||
Rotations
|
||
---------
|
||
|
||
Start with a point (r,theta) and rotate it to a new point
|
||
(r,theta+s) i.e. rotate by an amount s. The original coordinate was
|
||
|
||
x = r*cos(theta) y = r*sin(theta)
|
||
|
||
and the new coordinate is
|
||
|
||
x' = r*cos(theta+s) y = r*sin(theta+s)
|
||
|
||
so by using the trig identities for cos(a+b) and sin(a+b) we
|
||
arrive at the simple expressions
|
||
|
||
x' = x*cos(s) - y*sin(s)
|
||
y' = x*sin(s) + y*cos(s)
|
||
|
||
for the rotated coordinates x' and y' in terms of the original
|
||
coordinates x and y. The above is a two-dimensional rotation.
|
||
Three-dimensional rotation is done with a series of two-dimensional
|
||
rotations. The above rotates x and y, but keeps z fixed (I don't
|
||
see a z anywhere in there, at least). In other words, it rotates
|
||
about the z-axis. A rotation about the x-axis looks like
|
||
|
||
x' = x*cos(s) - z*sin(s)
|
||
z' = x*sin(s) + z*cos(s)
|
||
|
||
and a rotation about the y-axis looks like
|
||
|
||
x' = x*cos(s) + z*sin(s)
|
||
z' = -x*sin(s) + z*cos(s)
|
||
|
||
(the opposite minus sign just comes from the orientation of the y-axis).
|
||
|
||
Rotations in three dimensions do NOT commute. Just by trying
|
||
with a book it is very easy to see that a rotation of 45 degrees about
|
||
the x-axis and then 45-degrees about the y-axis gives a very different
|
||
result from first rotating about the y-axis and then the x-axis.
|
||
|
||
To really make rotations powerful we need a little linear
|
||
algebra. But first...
|
||
|
||
|
||
Radians ("God's units")
|
||
-------
|
||
Just for completeness... what is an angle, really? It's not
|
||
a length, it's not a direction... what is it? And why are there
|
||
360 "angles" in a circle?
|
||
The answer to the second is "because the ancient Babylonians
|
||
used base 60 for all their calculations." In other words, degrees
|
||
are totally arbitrary. There could just as easily be 1000 degrees
|
||
in a circle. So, what is an angle?
|
||
Think about a circle. It has two lengths associated with it:
|
||
the length "across" (the diameter D) and the length "around" (the
|
||
circumference C). And, like triangles, no matter how large or small
|
||
the circle is drawn, those lengths stay in proportion. That is,
|
||
|
||
C/D = constant
|
||
|
||
That constant is, of course, pi = 3.1415926... This then gives
|
||
the equation for the circumference of a circle as C = pi*D = 2*pi*r
|
||
where r=radius. But what is an angle?
|
||
Draw two radii in the circle. We _define_ the angle between
|
||
those two radii as the length of the subtended circumference divided
|
||
by the radius:
|
||
|
||
angle = length / radius.
|
||
|
||
So again, no matter what the _size_ of the circle is the _angle_ stays
|
||
the same. The length is just a fraction of the circumference, which
|
||
is 2*pi*radius; this means that the angle is just a fraction of 2*pi.
|
||
These are radians. There are 2*pi of them in a circle.
|
||
A quarter of a circle (90 degrees) is just 1/4 (2*pi) = pi/2 radians.
|
||
An eighth is pi/4 radians. And so on. These are of course very
|
||
natural units for an angle. The above definition has angle equal
|
||
to a length divided by a length; radians have no dimension (whereas
|
||
degrees are a dimension, just like feet or pounds or seconds are).
|
||
Degrees are useful for talking and writing, but radians
|
||
are what is needed for calculations.
|
||
|
||
|
||
Vectors and Linear Algebra
|
||
--------------------------
|
||
|
||
A vector (in three dimensions) is simply a line drawn from the
|
||
origin (0,0,0) to some point (x,y,z). Since they always emanate from
|
||
the origin, it is correct to refer to "the vector (x,y,z)". The important
|
||
thing is that it has both length _and_ direction. A physical example
|
||
is the difference between velocity and speed. Speed is just a
|
||
quantity: for example, "120 Miles per hour". Velocity, on the other
|
||
hand, has _direction_ -- you might be going straight up, or straight
|
||
down, or turning around a curve, etc. and the _length_ of the velocity
|
||
vector gives the speed: 120 MPH.
|
||
But all we need to worry about here is the geometric meaning,
|
||
and the difference between the _point_ (Px,Py,Pz) and the _vector_
|
||
(Px, Py, Pz). The point P is just a point, but geometrically the
|
||
vector P is a line extending *from* the origin *to* the point P --
|
||
it has a length, and a direction: it points in the direction of
|
||
(Px, Py, Pz). We will be using vectors for rotations, and to figure
|
||
out what direction something points, and all sorts of other stuff.
|
||
|
||
The dimension of a vector is the number of elements in that
|
||
vector. Let P be a vector. A 2D vector might be P=(Px,Py). A three
|
||
dimensional vector example is P=(Px,Py,Pz). P=(p1,p2,p3,p4,p5,p6)
|
||
would be a six-dimensional vector. So an n-dimensional vector is just
|
||
a list of n independent quantities. We'll just be dealing with
|
||
two and three dimensional vectors here, though, so when you see
|
||
a sentence like "The vector v1" just think of (v1x, v1y, v1z).
|
||
The length of a vector is again given by Pythagorus:
|
||
|
||
r^2 = Px^2 + Py^2 + Pz^2 + ...
|
||
|
||
i.e. the sum of the squares of all the elements. This is a good
|
||
calculation to avoid in algorithms, since it is expensive, but
|
||
it is useful to know.
|
||
The simplest thing one can do with a vector is change its
|
||
length, by multiplying by a constant:
|
||
|
||
c*(Px,Py,Pz) = (c*Px, c*Py, c*Pz).
|
||
|
||
Multiplying by a constant multiplies all elements in the vector by
|
||
that constant. Just like with triangles all lengths increase
|
||
proportionally, so it is a vector which points in the same direction
|
||
but has a different length.
|
||
Two vector operations that are very useful are the "dot product"
|
||
and the "cross product". I'll write the dot product of two vectors
|
||
R and P as either R.P or <R,P>, and it is defined as
|
||
|
||
R . P = |R| |P| cos(theta)
|
||
|
||
that is, the length of R times the length of P times the cosine of the
|
||
angle between the two vectors. From this it is easy to show that
|
||
the dot product may also be written as
|
||
|
||
R . P = Rx*Px + Ry*Py + ...
|
||
|
||
that is, multiply the individual components together and add them up.
|
||
Note that the dot product gives a _number_ (a scalar), NOT a vector.
|
||
Note also that the dot product between two vectors separated by an
|
||
angle greater than pi/2 will be negative, from the first equation,
|
||
and that the dot product of two perpendicular vectors is zero.
|
||
The dot product is sometimes referred to as the inner (or scalar)
|
||
product.
|
||
The cross-product (sometimes called the vector product or
|
||
skew product) is denoted by RxP and is given by
|
||
|
||
R x P = (Ry*Pz-Rz*Py, Rz*Px-Rx*Pz, Rx*Py-Ry*Px)
|
||
|
||
As you can see, the result is a _vector_. In fact, this vector
|
||
is perpendicular to both R and P -- it is perpendicular to the plane
|
||
that R and P lie in. Its length is given by
|
||
|
||
length = |R| |P| sin(theta)
|
||
|
||
Note that P x R = - R x P; the direction of the resulting vector is
|
||
usually determined by the "right hand rule". All that is important
|
||
here is to remember that the cross-product generates perpendicular
|
||
vectors. We won't be using any cross products in this article.
|
||
|
||
There are lots of other things we can do to vectors. One
|
||
of the most important is to multiply by a _matrix_. A matrix is
|
||
like a bunch of vectors grouped together, so it has rows and columns.
|
||
An example of a 2x2 matrix is
|
||
|
||
[a b]
|
||
[c d]
|
||
|
||
an example of a 3x3 matrix is
|
||
|
||
[a b c]
|
||
[d e f]
|
||
[g h i]
|
||
|
||
and so on. The number of rows doesn't have to equal the number of
|
||
columns. In fact, an n-dimensional vector is just an n x 1 (read
|
||
"n by 1") matrix: n rows, but one column.
|
||
We add matrices together by adding the individual elements:
|
||
|
||
[a1 a2 a3] [b1 b2 b3] [a1+b1 a2+b2 a3+b3]
|
||
[a4 a5 a6] + [b4 b5 b6] = [a4+b4 a5+b5 a6+b6]
|
||
[a7 a8 a9] [b7 b8 b9] [a7+b7 a8+b8 a9+b9]
|
||
|
||
We can multiply by a constant, which just multiplies all elements
|
||
by that constant (just like the vector).
|
||
Matrices can also be multiplied. The usual rule is "row times
|
||
column". That is, given two matrices A and B, you take rows of
|
||
A and dot them with columns of B:
|
||
|
||
[A1 A2] [B1 B2] = [A1*B1+A2*B3 A1*B2+A2*B4]
|
||
[A3 A4] [B3 B4] [A3*B1+A4*B3 A3*B2+A4*B4]
|
||
|
||
In the above, the (1,1) element is the first row of A (A1 A2) times
|
||
the first column of B (B1 B3) to get A1*B1+A2*B3. And so on. (With
|
||
a little practice this becomes very easy). We will be multiplying
|
||
rotation matrices together, and multiplying matrices times vectors.
|
||
Although "row times column" is the usual way that this is
|
||
taught, it can also be looked at as "columns times elements". The
|
||
easiest example is to multiply a matrix A times a vector x. The first
|
||
method gives:
|
||
|
||
[a1 a2 a3] [ <row1,x> ] [a1*x1 + a2*x2 + a3*x3]
|
||
let A = [a4 a5 a6] then Ax = [ <row2,x> ] = [a4*x1 + a5*x2 + a6*x3]
|
||
[a7 a8 a9] [ <row3,x> ] [a7*x1 + a8*x2 + a9*x3]
|
||
|
||
The right-hand part may be written as
|
||
|
||
[a1] [a2] [a3]
|
||
x1*[a4] + x2*[a5] + x3*[a6]
|
||
[a7] [a8] [a9]
|
||
|
||
or, in other words,
|
||
|
||
Ax = x1*column1 + x2*column2 + x3*column3
|
||
|
||
that is, the components of x times the columns of A, added together.
|
||
This is a very useful thing to be aware of, as we shall see.
|
||
Note that normal multiplication commutes: 3*2 = 2*3. In matrix
|
||
multiplication, this is NOT true. Multiplication in general does
|
||
NOT commute, and AB is usually different from BA.
|
||
|
||
We can also divide by matrices, but it isn't called division.
|
||
It's called inversion. Let's say you have an equation like
|
||
|
||
5*x = b.
|
||
|
||
To solve for x you would just multiply both sides by 1/5 i.e. by
|
||
the "inverse" of 5. To solve a matrix equation like
|
||
|
||
Ax = b
|
||
|
||
we just multiply both sides by the inverse of A -- call it A'.
|
||
And in just the same way that 1/a * a is one, a matrix times its
|
||
inverse is the _identity matrix_ which is the matrix with "1" down
|
||
the diagonal:
|
||
|
||
[1 0 0]
|
||
I = [0 1 0]
|
||
[0 0 1]
|
||
|
||
It is called the identity because IA = A; multiplying by the identity
|
||
matrix is just like multiplying by one.
|
||
Inverting a matrix is in general a very expensive operation,
|
||
and we don't need to go into it here. We will be doing some special
|
||
inversions later on though, so keep in mind that an inversion
|
||
un-does a matrix multiplication.
|
||
|
||
Transformations -- more than meets the eye!
|
||
---------------
|
||
|
||
Now we have an _extremely_ powerful tool at our disposal.
|
||
What happens when you multiply a matrix times a vector? You get
|
||
a new vector, of the same dimension as the old one. That is, it
|
||
takes the old vector and _transforms_ it to a new one. Take the
|
||
two-dimensional case:
|
||
|
||
[a b] [x] = [a*x + b*y]
|
||
[c d] [y] [c*x + d*y]
|
||
|
||
Look familiar? Well if a=cos(s), b=-sin(s), c=cos(s), and d=sin(s)
|
||
we get the earlier rotation equations, which we can now rewrite as
|
||
|
||
P' = RP
|
||
|
||
where R is the rotation matrix
|
||
|
||
R = [cos(s) -sin(s)]
|
||
[sin(s) cos(s)].
|
||
|
||
The way to think about this is that R _operates_ on a vector P.
|
||
When we apply R to P, it rotates P to a new vector P'.
|
||
So far we haven't gained much. But in three dimensions,
|
||
the rotation matrices look like
|
||
|
||
[cos(s) -sin(s) 0] [cos(s) 0 sin(s)]
|
||
Rz = [sin(s) cos(s) 0] Ry = [ 0 1 0 ] etc.
|
||
[ 0 0 1] [-sin(s) 0 cos(s)]
|
||
|
||
Try multiplying Rz times a vector (x,y,z) to see that it rotates
|
||
x and y while leaving z unchanged -- it rotates about the z-axis.
|
||
Now let's say that we're navigating through some 3D world
|
||
and have turned, and looked up, and turned a few more times, etc.
|
||
That is, we apply a bunch of rotations, all out of order:
|
||
|
||
Rx Ry Ry Rz Rx Rz Ry Ry Rx P = P'
|
||
|
||
All of the rotation matrices on the left hand side can be multiplied
|
||
together into a _single_ matrix R
|
||
|
||
R = Rx Ry Ry Rz Rx Rz Ry Ry Rx
|
||
|
||
which is a _new_ transformation, which is the transformation you
|
||
would get by applying all the little rotations in that specific order.
|
||
This new matrix can now be applied to any number of vectors. And this
|
||
is extremely useful and important.
|
||
Another incredibly useful thing to realize is that the
|
||
inverse of a rotation matrix is just its transpose (reflect through
|
||
the diagonal, i.e. element i,j swaps with element j,i). It's
|
||
very easy to see for the individual rotation matrices -- the inverse
|
||
of rotating by an amount s is to rotate by an amount -s, which flips
|
||
the minus sign on the sin(s) terms above. And if you take the transpose
|
||
of the accumulated matrix R above, remembering that (AB)^T = B^T A^T,
|
||
you'll see that R^T just applies all of the inverse rotations in
|
||
the opposite order -- it undoes each small rotation one at a time
|
||
(multiply R^T times R to see that you end up with the identity matrix).
|
||
The important point, though, is that to invert any series of
|
||
rotations, no matter how complicated, all we have to do is take a
|
||
transpose.
|
||
|
||
|
||
Hidden Faces (orientation)
|
||
------------
|
||
|
||
Finally there is the issue of hidden faces, and the related
|
||
issue of polygon clipping. In previous articles a number of different
|
||
methods have been tried. For hidden faces, the issue is to determine
|
||
whether a polygon faces towards us (in which case it is visible) or
|
||
away from us (in which case it isn't).
|
||
One way is to compute a normal vector (for example, to rotate
|
||
a normal vector along with the rest of the face). A light ray which
|
||
bounces off of any point on this face will be visible -- will go
|
||
through the origin -- only if the face faces towards us. The normal
|
||
vector tells us which direction face is pointing in. If we draw
|
||
a line from our eye (the origin) to a point on the face, the normal
|
||
vector will make an angle of 90 degrees with the face when the face
|
||
is edge-on. So by computing the angle between the normal vector and
|
||
a vector from the origin to a point on the face we can test for
|
||
visibility by checking if it is greater than or less than 90 degrees.
|
||
We have a way of computing the angle between two vectors:
|
||
the dot-product. Any point on the face will give a vector from
|
||
the origin to the face, so choose some vertex V on the polygon,
|
||
then dot it with the normal vector N:
|
||
|
||
<V,N> = Vx*Nx + Vy*Ny + Vz*Nz = |V| |N| cos(theta)
|
||
|
||
Since cos(theta) is positive for theta<90 and negative for theta>90
|
||
all that needs to be done is to compute Vx*Nx + ... and check whether
|
||
it is positive or negative. If you know additional information about
|
||
either V or N (as earlier articles did) then this calculation can
|
||
be simplified by quite a lot (as earlier articles did!). And if
|
||
you want to be really cheap, you can just use the z-coordinate of the
|
||
normal vector and skip the dot-product altogether (this only works
|
||
for objects which are reasonably far away, though).
|
||
Another method involves the cross-product -- take the
|
||
cross-product of two vectors in the _projected_ polygon and see
|
||
whether it points into or out of the monitor. Again, only the
|
||
direction is of interest, so that usually means that only the
|
||
z-component of the vector needs to be computed. On the downside,
|
||
I seem to recall that in practice this method was cumbersome and
|
||
tended to fail for certain polygons, making them never visible (because
|
||
of doing integer calculations and losing remainders).
|
||
The final method is a simple ordering argument: list the
|
||
points of the polygon in a particular order, say clockwise. If,
|
||
after rotating, that same point list goes around the polygon in a
|
||
counter-clockwise order then the polygon is turned around the other
|
||
way. This is the method that the 3d library uses. It is more or
|
||
less a freebie -- it falls out automatically from setting up the
|
||
polygon draw, so it takes no extra computation time for visible
|
||
polygons. It is best-suited to solid polygon routines, though.
|
||
|
||
Polygon clipping refers to determining when one polygon
|
||
overlaps another. For convex objects (all internal angles are
|
||
less than 180 degrees) this isn't an issue, but for concave
|
||
objects it's a big deal. There are two convex objects in Cool World:
|
||
the pointy stars and the Cobra Mk III. The pointy stars are clipped
|
||
correctly, so that tines on the stars are drawn behind each other
|
||
properly. The Cobra doesn't have any clipping, so you can see
|
||
glitches when it draws one polyon on top of another when it should
|
||
be behind it.
|
||
A general clipping algorithm is pretty tough and time-consuming.
|
||
It is almost always best to split a concave object up into a group
|
||
of smaller convex objects -- a little work on your part can save
|
||
huge amounts of time on the computer's part.
|
||
|
||
And THAT, I think, covers the fundamentals of 3d graphics.
|
||
Now it's on to constructing a 3D world, and implementing all that
|
||
we've deduced and computed as an actual computer program.
|
||
|
||
---------
|
||
Section 2: Constructing a 3D world
|
||
---------
|
||
|
||
Now that we have the tools for making 3D objects, we need to
|
||
put them all together to make a 3D world. This world might have
|
||
many different objects in it, all independent and movable. They
|
||
might see each other, run into each other, and so on. And they
|
||
of course will have to be displayed on the computer.
|
||
As we shall see, we can do all this in a very elegant and
|
||
simple manner.
|
||
|
||
Representing objects
|
||
--------------------
|
||
|
||
The object problem boils down to a few object attributes:
|
||
each object has a _position_ in the world, and each object has an
|
||
_orientation_. Each object might also have a _velocity_, but velocity
|
||
is very easy to understand and deal with once the position and
|
||
orientation problem has been solved.
|
||
The position tells where an object is -- at least, where the
|
||
center of the object is. The orientation vector tells in what direction
|
||
the object is pointing. Velocity tells what direction the object is
|
||
moving. Different programs will handle velocity in different ways.
|
||
Although we are supposed to be tolerant of different positions and
|
||
orientations, in these programs they are all handled the same way
|
||
and are what will be focused on.
|
||
If you can't visualize this, just think of some kind of
|
||
spaceship or fighter plane. It is located somewhere, and it points
|
||
in a certain direction. The orientation vector is a little
|
||
line with an arrow at the end -- it points in a certain direction
|
||
and is anchored at some position. As the object turns, so does
|
||
the orientation vector. And as it moves, so does the position.
|
||
Once positions and orientations are known we can calculate
|
||
where everything is relative to everything else. "Relative" is an
|
||
important word here. If we want to know how far we are from some
|
||
object, our actual locations don't matter; just the relative locations.
|
||
And if we are walking around and suddenly turn, then everything better
|
||
turn around us and not some other point. Rotations are always about
|
||
the origin, which means we are the origin -- the center of the
|
||
universe. Recall also that the way projections work is by assuming
|
||
we are looking straight down the z-axis, so our orientation better
|
||
be straight.
|
||
So to find out what the world looks like from some object
|
||
we need to do two things. First, all other objects need to be
|
||
translated to put the specific object at the origin -- shift the
|
||
coordinate system to put the object at (0,0,0). Second, the orientation
|
||
vector of the object needs to be on the z-axis for projections to
|
||
work right -- rotate the coordinate system and all of the objects
|
||
around us to make the orientation vector be in the right direction.
|
||
|
||
Every time an object moves, its position changes. And every
|
||
time it turns or tilts it changes its orientation. Let's say that
|
||
after a while we are located at position P and have an orientation
|
||
vector V, and want to look at an object located at position Q.
|
||
Translating to to the origin is easy enough: just subtract P from all
|
||
coordinates. So the original object will be located at P-P = 0,
|
||
and the other object will be located at Q-P. What about rotating?
|
||
The orientation vector comes about by a series of rotations
|
||
applied to the initial orientation. As was explained earlier, all of
|
||
those rotations can be summed up as a single rotation matrix R,
|
||
so if the intial vector was, say, z=(0,0,1) then
|
||
|
||
V = Rz
|
||
|
||
Given a position vector, what we need to do is un-rotate that position
|
||
vector back onto the z-axis. Which means we just multiply by
|
||
the _inverse_ of R, R'. For example, if we turn to the left, then
|
||
the world turns to the right. If we turn left and then look up,
|
||
the way to un-do that is to look back down and turn right. That's
|
||
an inverse rotation, and we know from the earlier sections that
|
||
rotation matrices are very easy to invert.
|
||
So, after translating and rotating, we are located at the
|
||
origin and looking straight down the z-axis, whereas the other
|
||
object Q is located at the rotated translated point
|
||
|
||
Q2 = R' (Q-P)
|
||
|
||
i.e. Q-P translates the object, and R' rotates it relative to us.
|
||
|
||
Readers who are on the ball may have noticed that so far we
|
||
have only deduced what happens to the _center_ of the object. Quite
|
||
right. We need to figure out what happens to the points of an
|
||
object.
|
||
First it should be clear that we want to define the points
|
||
of an object relative to the center of the object. It is first
|
||
necessary so that the object can rotate on its own. And it has
|
||
all kinds of computational advantages, as we shall see in a later
|
||
section; among them are that the numbers stay small (and the
|
||
rotations fast), they are always rotated as a whole (so they don't
|
||
lose accuracy or get out of proportion), and the points of objects
|
||
which aren't visible don't have to be rotated at all.
|
||
So we need to amend the previous statement: we need to
|
||
figure out what happens to the points of an object _only when that
|
||
object is visible_.
|
||
|
||
Visibility
|
||
----------
|
||
|
||
When is an object visible? When it is within your field
|
||
of vision of course. It has to be in front of you, and not too
|
||
far out to the sides.
|
||
So, after translating and rotating an object center,
|
||
|
||
Q2 = R' (Q-P),
|
||
|
||
we need to check if Q2 is in front of us (if the z-coordinate is
|
||
positive, and that it isn't too far away), and that it isn't too
|
||
far out to either side. The field of vision might be a cone; the
|
||
easiest one is a little pyramid made up of planes at 45 degrees
|
||
to one another. That is, that you can see 45 degrees to either
|
||
side or up and down. This is the easiest because the equation of
|
||
a line (a plane actually) at 45 degrees is either
|
||
|
||
x=z or x=-z, y=z or y=-z
|
||
|
||
So it is very easy to check the x- and y-coordinates of the new center
|
||
to see if -z < x < z and -z < y < z.
|
||
If the object is visible, then the rest of the object needs
|
||
to be computed and displayed.
|
||
|
||
Computing displayed objects
|
||
---------------------------
|
||
|
||
Now let's say this object, which was located at Q, is
|
||
visible. It has a bunch of points that define the object, relative
|
||
to the center -- call those points X1 X2 X3 etc. where X1=(x1,y1,z1)
|
||
etc. -- and it has an orientation W. Again, the orientation
|
||
vector is computed by performing some series of rotations M on
|
||
a vector which lies along the z-axis like (0,0,1).
|
||
First the points need to be rotated along with the
|
||
orientation vector. This isn't an inverse rotation like before --
|
||
the points rotate in the same way as the orientation vector.
|
||
So apply M to all the points; consider a single point X1:
|
||
|
||
rotated point = M X1
|
||
|
||
Once the points have been rotated we have to find their actual
|
||
location. Since they are defined relative to the center of the
|
||
object, this means just adding them to the center of the object:
|
||
|
||
rotated point + Q = M X1 + Q
|
||
|
||
This is the _actual_ location of the points in the world. Now as
|
||
before we need to translate and rotate them to get their relative
|
||
location:
|
||
|
||
X1' = R' (M X1 + Q - P)
|
||
|
||
As before, subtract P and apply the inverse rotation R'. The above
|
||
equation is the _entire_ equation of the 3D world!
|
||
We can rewrite this equation as
|
||
|
||
X1' = R' M X1 + R'(Q-P)
|
||
|
||
and recognize that R'(Q-P) was calculated earlier, to see if the
|
||
object was visible.
|
||
The above equation can be read as "Rotate the rotated object
|
||
points backwards to the way we are facing, and add to the rotated
|
||
and translated center." That is, if we turn one way, the center
|
||
rotates in the opposite direction, and the entire object rotates
|
||
in the opposite direction. Physically, if you turn your head
|
||
sideways you still see the monitor facing you. If instead of
|
||
turning your head you were to move the monitor, you would also
|
||
have to rotate the monitor to keep it facing you (if you didn't
|
||
rotate the monitor, and only changed its position, you would
|
||
be able to see the sides, etc.). That rotation is in the opposite
|
||
direction that your head would rotate (head turns to the right,
|
||
monitor turns to the left).
|
||
|
||
Displaying objects
|
||
------------------
|
||
|
||
Now that everyone is rotated and translated and otherwise
|
||
happy, they need to be projected and displayed. The projection is
|
||
easy -- as before, divide by the z-coordinate and multiply by
|
||
a magnification factor. Then connect the points in the right
|
||
way to get the polygons and faces that make up the objects, using
|
||
the appropriate method to remove faces that aren't visible, and
|
||
draw to the screen.
|
||
One thing remains, though: depth-sorting the objects, to
|
||
make sure that close-up objects overlap far-away objects. So,
|
||
_before projecting_, all of the z-coordinates need to be looked
|
||
at and the objects ordered so that they are drawn in the right
|
||
way.
|
||
This is really a form of polygon clipping. You can do
|
||
the same thing with various complex objects to get an easy way
|
||
to clip.
|
||
|
||
Summary
|
||
-------
|
||
|
||
All objects have a position and an orientation, and might
|
||
have things like velocity. The position of the object is where
|
||
the object is located. The points that make up any object are defined
|
||
relative to this center; the center is thus the center of rotation
|
||
of the object in addition to being its location. The orientation
|
||
determines what direction the object is pointing in, and the velocity
|
||
determines what direction it is moving in.
|
||
Navigating through the world amounts to changing the position
|
||
and orientation (and velocity) of the object. To figure out how the
|
||
world looks from any single object with position P, P=(Px,Py,Pz), and
|
||
orientation matrix R (where R=cumulative rotation matrix), a very simple
|
||
and elegant equation is used. First the equation
|
||
|
||
R'(Q-P)
|
||
|
||
where R' = inverse of R, determines where the center of an object Q
|
||
is located (by translating and then undoing the orientation of P), and
|
||
tells whether the object at Q is visible or not. If it is, then the
|
||
equation
|
||
|
||
R' (M X + Q-P)
|
||
|
||
gives the new location of any point X on the object, where M is
|
||
the orientation matrix for Q, and X is some point defined relative
|
||
to Q. After depth-sorting the individual objects, these points
|
||
are projected and drawn to the screen.
|
||
Doing things in this way gives a method that is very
|
||
fast and efficient, and always retains accuracy -- everything
|
||
is calculated relative to everything else. This is extremely
|
||
powerful. It makes the programming of objects easy, it makes no
|
||
roundoff errors from e.g. rotating rotated points, and no cycles
|
||
are wasted calculating rotations of non-visible objects. Some PC
|
||
algorithms you might come across do all sorts of God-awful things to
|
||
overcome these problems, like rotate a single point and then use
|
||
cross-products to preserve all the angles, and all sorts of other
|
||
horrid mathematical butchery. That way leads to the Dark Side.
|
||
|
||
By the way, there are some subtleties (for example, computing
|
||
orientation vectors) in this calculation -- see the discussion of
|
||
Cool World in Section 4 for more detail on implementation issues.
|
||
|
||
---------
|
||
Section 3: Implementing the 3D Library
|
||
---------
|
||
|
||
Now that we've dined on a tasty and nutritous meal of some
|
||
theory we need to figure out how to implement everything on the 64,
|
||
and to do so efficiently! The idea of the 3D library is to collect
|
||
all of the important and difficult routines into a single place.
|
||
Before doing so it is probably a good idea to figure out just what
|
||
the important routines are.
|
||
|
||
Obviously rotations and projections are important. In fact,
|
||
there just isn't much else to do -- a few additions here and there,
|
||
maybe a few manipulations, but everything else is really straightforward.
|
||
Of course, once that data is rotated and projected it would probably
|
||
be much more enjoyable for the viewer to display it on the screen.
|
||
Drawing lines is pretty easy, but a good solid-polygon routine is
|
||
pretty tough so it makes a good addition to the library.
|
||
|
||
So three things, and the first is rotations. There are two
|
||
kinds of rotations. When we turn left or right we rotate the world,
|
||
which means rotating the positions of objects. We also need to
|
||
rotate the object itself, which means rotating the individual
|
||
points that define the object.
|
||
|
||
Calculating rotation matrices
|
||
-----------------------------
|
||
|
||
In both cases, we need a rotation matrix that we can use to
|
||
transform coordinates. Previous programs just calculated a rotation
|
||
matrix like
|
||
|
||
R = Ry Rz Rx so that for some point P: RP = Ry Rz Rx P
|
||
|
||
that is, given three angles sx, sy, and sz, calculate the rotation
|
||
matrix you get by first rotating around the x-axis by an amount sx
|
||
(Rx times P), then the z-axis by an amount sz (Rz times Rx P),
|
||
and finally the y-axis by the amount sy (Ry times (Rz times Rx P)).
|
||
Well, that kind of routine just doesn't cut it anymore. Why?
|
||
Well, what happens when you turn left, turn left, look up, and roll?
|
||
You get a matrix that looks like
|
||
|
||
R = Rz Rx Ry Ry (think Ry=turn left, Rx=look up, Rz=roll)
|
||
|
||
What three angles sx sy and sz would give us that exact rotation
|
||
matrix? -We don't know-! As has been said many times, **matrix
|
||
multiplications do not commute**, so rotations do not commute, so
|
||
we can't just add up the rotation amounts or something. We have
|
||
to keep a running rotation matrix, that just accumulates rotations
|
||
each time we turn left or roll or whatever. In terms of a routine,
|
||
that means we need to be able to compute
|
||
|
||
Rx R and Ry R and Rz R
|
||
|
||
where R is the accumulated rotation matrix, and Rx etc. are the
|
||
little rotation matrices about the x/y/z axis that correspond to us
|
||
turning left and right such. Although this seems like a lot of
|
||
work, as we shall see it is actually quite a bit faster than
|
||
calculating an entire rotation matrix like the old routine.
|
||
Still, it is sometimes handy to be able to calculate a rotation
|
||
matrix using the old method, like when rotating by large amounts, or if
|
||
we just need an object to spin around in some arbitrary way, or if all
|
||
we need is a direction and an elevation (think of a gun turret or a
|
||
telescope). So an improved version of the old routine is also
|
||
included.
|
||
|
||
Rotating points
|
||
---------------
|
||
|
||
That ought to take care of calculating the rotation matrix.
|
||
Now we need to apply that matrix to some points. The first type of
|
||
points are the object centers (i.e. positions in the world).
|
||
Polygonamy used a single byte to represent positions, and that was
|
||
really too restrictive. A full 16-bit number is needed, and it
|
||
should be signed. So we need to be able to rotate 16-bit signed numbers.
|
||
The second type of points are the object points, which are
|
||
defined relative to the object centers. Since they are relative to
|
||
the centers these points can be 8-bits, but they should definitely
|
||
be signed integers. There are many more object points than there
|
||
are objects, so these rotations need to be really fast.
|
||
In both cases, it clearly makes much more sense to pass in
|
||
a whole list of points to be rotated, instead of having to call
|
||
the routine for each individual point.
|
||
Remember that the object points are only rotated if the
|
||
object is actually visible. And, in point of fact, once they've
|
||
been rotated they will need to be projected. So it makes sense
|
||
to tack the projection routine onto the object-rotation routine
|
||
(as opposed to the world-rotation routine, which rotates the object
|
||
centers). To summarize, then, in addition to the matrix calculation
|
||
routines we need two rotation routines: one for rotating 16-bit signed
|
||
object centers, and one for rotating (and possibly projecting) the
|
||
actual object points.
|
||
|
||
Drawing Polygons
|
||
----------------
|
||
|
||
Once everything has been rotated and projected, it will need
|
||
to be drawn. So a good solid polygon routine makes a great addition
|
||
to the library. The algorithm will be built upon the polygonamy
|
||
idea: start at the bottom of the polygon, and draw the left and
|
||
right sides of the polygon _simultaneously_, filling in-between.
|
||
That is, move up in the y-direction, calculate the left-endpoint,
|
||
calculate the right-endpoint, and fill.
|
||
To calculate the endpoints we just calculate the slopes of
|
||
the lines, and draw. So all that is needed is to pass in a list
|
||
of points, and perhaps tell the routine where to draw those points.
|
||
Recall also that this gives a very easy way of doing hidden faces:
|
||
if the points are given in counter-clockwise order for the normal
|
||
polygon, then they will be in clockwise order if the polygon is
|
||
facing away from us. So basically the polygon isn't visible if
|
||
we can't draw it, and the hidden-face calculation is more or less
|
||
a freebie. (That is, the hidden-face calculation takes no extra
|
||
time if the face is visible; it will use up some time though if
|
||
the face is not visible.)
|
||
The polygonamy routine had two large disadvantages: it was
|
||
huge, because of the way the fill-routines and tables were set up, and
|
||
it was inflexible in the sense that it could only draw to a specific
|
||
bitmap. It also couldn't handle things like negative coordinates, or
|
||
drawing polygons which were partially off-screen. The code was also
|
||
a bit convoluted, and in need of re-thinking (imagine a big prototype
|
||
machine with exposed wires sticking out and large cables snaking around,
|
||
all held together with duct tape and bailing wire, and you'll get an
|
||
idea of what the polygonamy code looks like). Of course, since we
|
||
already know the ideas and equations, it's not too tough to just
|
||
rework everything from scratch.
|
||
|
||
Basic Routines
|
||
--------------
|
||
|
||
Clearly, before writing all of the library routines we are
|
||
going to have to figure out how to do some very general operations.
|
||
We are going to need some 8- and 16-bit signed multiply routines.
|
||
The projection routine is going to involve a divide of a 16-bit
|
||
z-coordinate, and the polygon routine is going to need a division
|
||
routine for calculating line slopes. Then we have to figure out
|
||
how to represent numbers and on and on. Hokey smokes! This is
|
||
all new stuff, too.
|
||
|
||
But, first and foremost, we need a multiply. We will of course
|
||
be using the fast multiply, which uses the fact that
|
||
|
||
a*b = f(a+b) - f(a-b), f(x) = x^2/4.
|
||
|
||
Understanding this routine will be crucial to understanding what will
|
||
follow. The general routine to multiply .Y and .A looks like
|
||
|
||
STA ZP1 ;ZP1 points to f(x) low byte
|
||
STA ZP2 ;high byte
|
||
EOR #$FF
|
||
ADC #01
|
||
STA ZP3 ;table of f(x-256) -- see below
|
||
STA ZP4
|
||
LDA (ZP1),Y ;f(Y+A), low byte
|
||
SEC
|
||
SBC (ZP3),Y ;f(Y-A), low byte
|
||
TAX
|
||
LDA (ZP2),Y ;f(Y+A), high byte
|
||
SBC (ZP4),Y ;f(Y-A), high byte
|
||
;result is now in .X .A = low, high
|
||
|
||
By using the indexed addressing mode, we let the CPU add A and Y
|
||
together. If Y and A can both range from 0..255, then Y+A can
|
||
range from 0..510. So we need a 510-byte table of f(x). What
|
||
about Y-A? The complementing of A means that what we are really
|
||
calculating on the computer is Y + 256-A, or 256+Y-A. So we need
|
||
another table of f(x-256) to correctly get f(Y-A).
|
||
Consider Y=1, A=2. -A = $FE in 2's complement, so added together
|
||
we get $FF, or -1. Next consider Y=2, A=1. In this case, Y-A gives
|
||
2+$FF = $101, or 257. Computationally, if Y-A is 0..255 it is a negative
|
||
number, and subtracting 256 gives the correct negative number. If
|
||
Y-A is 256..510, then it really represents 0..255, so again subtracting
|
||
256 gives the correct number. If you think about it, you'll see
|
||
that the upper 256 bytes of this table is the same as the lower
|
||
256 bytes of the first table: f(x-256) for x=256..511 is the same
|
||
as f(x) for x=0..255. This means that the first table can be
|
||
piggy-backed on top of the second table in memory. For example,
|
||
if the table were at $C000
|
||
|
||
$C000 f(x-256)
|
||
$C100 f(x) (512 bytes)
|
||
|
||
would do the trick. WITH ONE EXCEPTION. And a rather silly one
|
||
at that. That exception is A=0. Multiplying by zero should give
|
||
a zero. But the complementing of A gives -256, which on the computer
|
||
is... zero! That is, think of A=0 and Y=0. A-Y=0, and when we
|
||
look it up in the table of f(x-256) we will get f(-256) instead
|
||
of f(0), and get the wrong answer. So we have to check for A=0.
|
||
And oh yes: don't forget to round the tables correctly!
|
||
|
||
Extremely Cool Stuff
|
||
--------------------
|
||
|
||
As usual, a little bit of thinking and a little bit of
|
||
planning ahead will lead to enormous benefits.
|
||
|
||
After staring at the fast multiplication code for a bit,
|
||
we can make a very useful observation:
|
||
|
||
Once the zero page locations are set up, they stay set up.
|
||
|
||
This makes all the difference in the world! Let's say we've just
|
||
calculated a*b. If we now want to calculate c*b, _we don't have to
|
||
set up b again_.
|
||
If we have a succession of calculations, x*a1, x*a2, x*a3,
|
||
etc., we only have to set up the ZP location once (with x and -x)
|
||
and keep reusing it. This makes the fast multiply become very fast!
|
||
With a lot of multiplications it reduces to just 24 cycles or so for
|
||
a full 16-bit result, as all of that zero-page overhead disappears.
|
||
Now think about rotations. We have to multiply a matrix
|
||
times a vector (i.e. the three coordinates). It was pointed out
|
||
in the discussion of linear algebra that we can write the
|
||
multiplication as
|
||
|
||
[x1]
|
||
R [x2] = x1*column1 of R + x2*column2 of R + x3*column3 of R
|
||
[x3]
|
||
|
||
And there is the payoff: we get three multiplies for every one
|
||
set up of zero page. That is, we set x1 in the zero page variables,
|
||
and then multiply by the three elements of the first column.
|
||
Then do the same for x2 and x3. So we still do nine multiplications,
|
||
but we only have to set up the zero-page locations three times.
|
||
Just changing the way in which we do these multiplications has
|
||
instantly given us a great big time savings!
|
||
|
||
Now we need to figure out how to do signed multiplies. The
|
||
fast multiply runs into problems adjusting to signed numbers. Consider
|
||
the simple example of x=-100 and y=1. We get that
|
||
|
||
f(x+y) = f(256-100 + 1) = f(157).
|
||
|
||
But this is just what we'd get if we multiplied x=100 and y=57
|
||
together. You don't know whether 157 is -99 or if it really
|
||
is 157. So we can't just modify the tables... unless... well, what
|
||
about if we are able to remove this ambiguity from the tables?
|
||
We can do just that if we restrict the range of allowed numbers.
|
||
For example, if we restrict to
|
||
|
||
x=-96..95 = 160..255,0..95 (-96 is 160 in 2's complement)
|
||
y=-64..64 = 192..255,0..64 (-64 is 192, -63 is 193, etc.)
|
||
|
||
then
|
||
|
||
if x+y = 0..159 the actual number is 0..159
|
||
160..255 -96..-1
|
||
256..351 0..95
|
||
352..510 -160..-2
|
||
|
||
That is, the _only_ way you can get the number 159 from x+y is when
|
||
x=95 and y=64. The four ranges above are derived by considering
|
||
the four cases x>0 and y>0, x<0 and y>0, x<0 and y<0, x>0 and y<0.
|
||
All values of x+y and x-y, from 0 to 510, correspond to _unique_ values
|
||
of x and y, if we restrict x to the range -96..95 and restrict y to the
|
||
range -64..64. With those restrictions, we can construct a *single*
|
||
512-byte fast-multiply table to do a signed fast-multiply.
|
||
How is this useful?
|
||
|
||
One word: object rotations. If the rotation matrix values
|
||
range from -64..64, and the object points range from -96..95, then
|
||
we can do an _extremely_ fast multiply to perform the object rotations,
|
||
as we shall see in the discussion of the rotation routines. Since
|
||
there are quite a lot of object points to rotate, this makes an
|
||
enormous contribution to the overall speed of the routine.
|
||
|
||
Although the above helps out greatly for the point rotations,
|
||
we still need a general signed multiply routine for the centers and
|
||
the projections and such. So, let's figure it out!
|
||
Say we multiply two numbers x and y together, and x is negative.
|
||
If we plug it in to a multiplication routine (_any_ multiplication
|
||
routine), we will really be calculating
|
||
|
||
(2^N + x)*y = 2^N*y + x*y
|
||
|
||
assuming that x is represented using 2's complement (N would be 8 or 16
|
||
or whatever). There are two observations:
|
||
|
||
- If the result is _less_ than 2^N, we are done -- 2^N*y is all
|
||
in the higher bytes which we don't care about.
|
||
|
||
- Otherwise, subtract 2^N*y from the result, i.e. subtract
|
||
y from the high bytes.
|
||
|
||
Now let's say that both x and y are negative. Then on the computer
|
||
the number we will get is
|
||
|
||
(2^N + x)*(2^N + y) = 2^(2N) + 2^N*x + 2^N*y - x*y
|
||
|
||
Now it is too large by a factor of 2^2N, 2^N*x and 2^N*y. BUT
|
||
the basic observations haven't changed a bit! We still need to
|
||
_subtract_ x and y from the high bytes. And the 2^2N is totally
|
||
irrelevant -- we can't get numbers that large by multiplying numbers
|
||
together which are no larger than 2^N.
|
||
This leads to the following algorithm for doing signed
|
||
multiplications:
|
||
|
||
multiply x and y as normal with some routine
|
||
if x<0 then subtract y from the high bytes of the result
|
||
if y<0 then subtract x from the high bytes
|
||
|
||
And that's all there is to it! Note that x and y are "backwards",
|
||
i.e. subtract y, and not x, when x<0. Some examples:
|
||
|
||
x=-1, y=16 Computer: x=$FF y=$10 (N=8)
|
||
x*y = $0FF0
|
||
Result is less than 256, so ignore high byte
|
||
Answer = $F0 = -16
|
||
OR: subtract y from high byte,
|
||
Answer = $FFF0 = -16
|
||
|
||
x=2112 y=-365 Computer: x=$0840 y=$FE93 (N=16)
|
||
x*y = $08343CC0
|
||
y<0 so subtract x from high bytes (x*2^16),
|
||
Answer = $F43CC0 = -770880
|
||
|
||
x=-31 y=-41 Computer: x=$E1 y=$D7
|
||
x*y = $BCF7
|
||
x<0 so subtract $D700 -> $E5F7
|
||
y<0 so subtract $E100 -> $04F7 = 1271 = correct!
|
||
|
||
So, in summary, signed multiplies can be done with the same fast
|
||
multiply routine along with some _very simple_ post-processing.
|
||
And if we know something about the result ahead of time (like if
|
||
it's less than 256 or whatever) then it takes _no_ additional
|
||
processing!
|
||
How cool is that?
|
||
|
||
Projections
|
||
-----------
|
||
|
||
After rotating, and adding object points to the rotated centers,
|
||
the points need to be projected. Polygonamy had a really crappy
|
||
projection algorithm, which just didn't give very good results.
|
||
I don't remember how it worked, but I do remember that it sacrificed
|
||
accuracy for speed, and as a result polygons on the screen looked
|
||
like they were made of rubber as their points shifted around.
|
||
So we need to re-solve this problem.
|
||
Recall that projection amounts to dividing by the z-coordinate
|
||
and multiplying by a magnification factor. Since the object points will
|
||
all be 16-bits (after adding to the 16-bit centers), this means dividing
|
||
a 16-bit signed number by another 16-bit number, and then doing a
|
||
multiplication. Bleah.
|
||
The first thing to do is to get rid of the divide. The
|
||
projected coordinates are given by
|
||
|
||
x' = d*x/z = x * d/z, y' = d*y/z = y * d/z
|
||
|
||
where d is the magnification factor and z is the z-coordinate of
|
||
the object. The obvious way to avoid the divide is to convert it
|
||
into a multiply by somehow calculating d/z. The problem is that
|
||
z is a 16-bit number; if it were 8 bits we could just do a table
|
||
lookup. If only it were 8 bits... Hmmmmm...
|
||
Physically the z-coordinate represents how far away the
|
||
object is. For projections, we need
|
||
|
||
- accuracy for small values of z (when objects are close to
|
||
us and large)
|
||
|
||
- speed for large values of z
|
||
|
||
If z is large then accuracy isn't such a big deal, since the objects
|
||
are far away and indistinct. So if z is larger than 8 bits, we can
|
||
just shift it right until it is eight bits and ignore the tiny loss
|
||
of accuracy. Moreover, note that the equations compute x/z and y/z;
|
||
that is, a 16-bit number on top of another. Therefore if we shift
|
||
_both_ x and z right we don't change the value at all! So now we
|
||
have a nifty algorithm:
|
||
|
||
if z>255 then shift z, x, and y right until z is eight bits
|
||
compute d/z
|
||
multiply by x, multiply by y
|
||
|
||
There are still two more steps in this algorithm, the first of which
|
||
is to compute d/z. This number can be less than one and as large
|
||
as d, and will almost always have a remainder. So we are going to
|
||
need not only the integer part but the fractional part as well.
|
||
The second step is to multiply this number times x.
|
||
Let d/z = N + R/256, so N=integer part and R=fractional part.
|
||
For example, 3/2 = 1 + 128/256 = 1.5. Also let x = 256*xh + xl.
|
||
Then the multiplication is
|
||
|
||
x * d/z = (256*xh + xl) * (N + R/256)
|
||
= 256*xh*N + xh*R + xl*N + xl*R/256
|
||
|
||
There are four terms in this expression. We know ahead of time that
|
||
the result is going to be less than 16-bits (remember that the screen
|
||
is only 320x200; if we are projecting stuff and getting coordinates
|
||
like 40,000 then there is a serious problem! And nothing in the
|
||
library is set up to handle 24-bit coordinates anyways).
|
||
The first term, xh*N, must therefore be an eight-bit number,
|
||
since if it were 16-bits multiplying by 256 would give a 24-bit number,
|
||
which we've just said won't happen. So we only really care about
|
||
the lower eight bits of xh*N. The next two terms, xh*R and xl*N,
|
||
will be 16-bit numbers. The last term, xl*R. will also be 16-bits,
|
||
but since we divide by 256 we only care about the high 8-bits of
|
||
the result (we don't need any fractional parts for anything).
|
||
And that, friends, takes care of projection: accurate when
|
||
it needs to be, and still very fast.
|
||
|
||
Cumulative Rotations
|
||
--------------------
|
||
|
||
There is still this problem with the accumulating rotation
|
||
matrix. The problem to be solved is: we want to accumulate some
|
||
general rotation operator
|
||
|
||
[A B C]
|
||
M = [D E F]
|
||
[G H I]
|
||
|
||
by applying a rotation matrix like
|
||
|
||
[ 1 0 0 ]
|
||
Rx = [ 0 cos(t) sin(t)]
|
||
[ 0 -sin(t) cos(t)]
|
||
|
||
to it. So, just do it:
|
||
|
||
[ A B C ]
|
||
Rx M = [ D*cos(t)+G*sin(t) E*cos(t)+H*sin(t) F*cos(t)+I*sin(t) ]
|
||
[-D*sin(t)+G*cos(t) -E*sin(t)+H*cos(t) -F*sin(t)+I*cos(t) ]
|
||
|
||
You might want to do the matrix multiplication yourself, to double-check.
|
||
Notice that it only affects rows 2 and 3. Also notice that only one
|
||
column is affected at a time: that is, the first column of Rx M contains
|
||
only D and G (the first column of M); the second column only E and H; etc.
|
||
Similar expressions result when multiplying by Ry or Rz -- they just
|
||
affect different rows. The point is that we don't need a whole bunch
|
||
of routines -- we just need one routine, and to tell it which rows to
|
||
operate on.
|
||
In general, the full multiplication is a pretty hairy problem. But
|
||
if the angle t is some fixed amount then it is quite simple. For example,
|
||
if t is some small angle like 3 degrees then we can turn left and right
|
||
in 3 degree increments, and the quantities cos(t) and sin(t) are just
|
||
constants. Notice also that rotating in the opposite direction
|
||
corresponds to letting t go to -t (i.e. rotating by -angle). Since
|
||
sine is an odd function, this just flips the sign of the sines
|
||
(i.e. sin(-t) = -sin(t)).
|
||
If sin(t) and cos(t) are constants, then all that is needed
|
||
is a table of g(x) = x*sin(t) and x*cos(t); the above calculation then
|
||
reduces to just a few table lookups and additions/subtractions.
|
||
There's just one caveat: if t is a small number (like three degrees)
|
||
then cos(t) is very close to 1 and sin(t) is very close to 0.
|
||
That is to say, the fractional parts of x*sin(t) and x*cos(t) are very
|
||
important!
|
||
So we will need two tables, one containing the integer part
|
||
of x*cos(t) and the other containing the fractional. This in turn
|
||
means that we need to keep track of the fractions in the accumulation
|
||
matrix. The accumulation matrix will therefore have eighteen elements;
|
||
nine integer parts and nine fractional parts. (The fractional part
|
||
is just a decimal number times 256).
|
||
The routines to rotate and project only use the integer parts;
|
||
the only routines that really need the fractional parts are the
|
||
accumulation routines.
|
||
How important is the fractional part? Very important. There
|
||
is a class of transformations called area-preserving transformations,
|
||
of which rotations are a member. That is, if you rotate an object,
|
||
the area of that object does not change. The condition for a matrix
|
||
to be area-preserving is that its determinant is equal to one; if
|
||
the cumulative rotation matrix starts to get inaccurate then its
|
||
determinant will gradually drift away from one, and it will no
|
||
longer preserve areas. This in turn means that it will start to
|
||
distort an object when it is applied to the object. So, accuracy
|
||
is very important where rotation matrices are concerned!
|
||
|
||
There is one other issue that needs to be taken care of.
|
||
Rotation matrices have this little feature that the largest
|
||
value that any element of the matrix can ever take is: one!
|
||
And usually the elements are all less than one. To retain
|
||
accuracy we need to multiply the entire matrix by a constant;
|
||
a natural value is 64, so that instead of ranging from -1..1
|
||
the rotation matrix elements will range from -64..64. And as was
|
||
pointed out earlier, we can do some very fast signed arithmetic
|
||
if one of the numbers is between -64 and 64.
|
||
The downside is that we have to remember to divide out
|
||
that factor of 64 once the calculation is done -- it's just a
|
||
temporary place-holder. For object point rotations we can
|
||
incorporate this into the multiplication table, but for the 16-bit
|
||
center rotations we will have to divide it out manually.
|
||
|
||
Polygon Routine Calculations
|
||
----------------------------
|
||
|
||
Finally, there's the polygon routine. Starting at the lowest
|
||
point on the polygon, we need to draw the left and right sides
|
||
simultaneously. Specifically we want to draw the lines to the
|
||
left and right, and we only care about the x-coordinates of the
|
||
endpoints at each value of y:
|
||
|
||
start at the bottom
|
||
Decrement y (move upwards one step)
|
||
Compute x-coord of left line
|
||
Compute y-coord of right line
|
||
(Fill in-between)
|
||
|
||
Drawing a line is easy. The equation of a line is
|
||
|
||
dy = m*dx, m=slope, dy=change in y, dx=change in x
|
||
|
||
The question posed by the above algorithm is "If I take a single
|
||
step in y, how far do I step in x?" Mathematically: if dy=1,
|
||
what is dx? Obviously,
|
||
|
||
dx = 1/m
|
||
|
||
i.e. the inverse slope. If the endpoints of the line are at
|
||
(X1,Y1) and (X2,Y2), then the inverse slope is just
|
||
|
||
1/m = DX/DY, where DX=X2-X1 and DY=Y2-Y1.
|
||
|
||
Once this is known for the left and right sides, updating the x-coordinates
|
||
is a simple matter of just calculating x+dx (dx=DX/DY). Note that
|
||
if DY=0, the line segment is a straight horizontal line. Therefore
|
||
we can just skip to the next point in the polygon, and let the
|
||
fill routine -- which is very good at drawing straight lines -- take
|
||
care of it.
|
||
|
||
Now we can start drawing the left and right sides. As we
|
||
all know, lines on a computer are really a series of horizontal
|
||
line segments:
|
||
|
||
***** **
|
||
***** * (lines with positive and
|
||
**** ** negative slope)
|
||
***** *
|
||
|
||
Note that the polygon routine doesn't calculate all points on the
|
||
line, the way a normal line routine does. It only calculates the
|
||
left or right endpoints of each of those horizontal line segments.
|
||
For example, if dx=5 then the routine takes big steps of 5 units each;
|
||
the fill routine takes care of the rest of the line segment.
|
||
There are two issues that need to be dealt with in the polygon
|
||
routine. The first trick in drawing lines is to split one of those
|
||
line segments in half, for the first and last points. That is, consider
|
||
a line with DY=1 and DX=10. From the above equations, dx = 10, and if
|
||
this line was just drawn naively it would look like
|
||
|
||
*
|
||
**********
|
||
|
||
i.e. a single horizontal line segment of length 10, followed by a dot
|
||
at the last point. What we really want is to divide that one line
|
||
segment in half, to get
|
||
|
||
*****
|
||
*****
|
||
|
||
for a nice looking line. In the polygon routine, this means that the
|
||
first step should be of size dx/2. All subsequent steps will be
|
||
of size dx, except for the last point.
|
||
The other issue to be dealt with again deals with the endpoints.
|
||
How to deal with them depends on two things: the slope of the line,
|
||
and whether it is the left or right side of the polygon. Consider
|
||
the below line, with endpoints marked with F and L (for first and last)
|
||
|
||
*L**
|
||
****
|
||
F*
|
||
|
||
and dx=4. You can see that we have overshot the last point. In point
|
||
of fact, we may also have overshot the first point. Let's say the above
|
||
was the left side of a polygon. Then we would want to start filling
|
||
at the LEFT edge of each line segment -- we want that * to the left
|
||
of L to be drawn, but want to start filling at F. On the other hand,
|
||
if this were the right side of the polygon then we would want to fill
|
||
to the right edge of each line segment. That means filling to point L,
|
||
again grabbing the * to the left of L, and filling to the * to the
|
||
right of F.
|
||
The correct way to handle this is actually very simple: place
|
||
the x-coordinate update (x=x+dx) in just the right spot, either before
|
||
or after the plot/fill routine. In the above example, the left line
|
||
should do the update after the plot/fill:
|
||
|
||
fill in-between and plot endpoints
|
||
x=x+dx
|
||
|
||
The line will then always use the "previous" endpoint: it will start
|
||
with F, and then with the last point of the previous line segment.
|
||
The right line should do the updating before the plotting: it will then
|
||
use the rightmost point of each segment, and when the end of the line
|
||
is reached the next line will be computed, resetting the starting point.
|
||
|
||
Polygonamy put some wacky restrictions on DX and DY that just
|
||
won't cut it here. The lib3d routine can handle negative coordinates,
|
||
and polygons which are partially off-screen. The only restriction
|
||
it puts on the numbers is that DX is 9-bits and DY is 8-bits; the
|
||
coordinates X2 and Y2 etc. can be a full 16-bits, but the sides
|
||
of the polygons can't exceed DX=9-bits and DY=8-bits.
|
||
In principle this allows you to draw polygons which are
|
||
as large as the entire screen. In practice it was perhaps not a
|
||
great decision: if these polygons are being rotated, then a
|
||
polygon with DX=320 will have DY=320 after a 90-degree rotation.
|
||
So in a sense polygons _sides_ are actually limited to be no
|
||
larger than 256 units.
|
||
The important word here though is "side". A polygon
|
||
can be much larger than 256x256 if it has more than one side!
|
||
For example, a stopsign has eight sides, but each side is smaller
|
||
than the distance from one end of the polygon to the other.
|
||
Also, sides can always be split in half. This restriction is
|
||
just one of those design decisions...
|
||
Anyways, DX=9-bits and DY=8-bits. To calculate this we need
|
||
a divide routine. Polygonamy used a routine involving logs and some
|
||
fudged tables and other complexities. Well most of that was pretty
|
||
silly, because the divide is such a tiny portion of the whole polygon
|
||
routine -- saving a few tens of cycles is awfully small potatoes in a
|
||
routine that's using many thousands of cycles.
|
||
So the new strategy is to do the general routine: calculate
|
||
log(DX) and log(DY), subtract, and take EXP() of the result to get
|
||
an initial guess. Then use a fast multiply to see how accurate
|
||
the guess was, and if it is too large or too small then a few subtractions
|
||
or additions can fix things up. The remainder is also computed as
|
||
a decimal number (remainder/DY times 256); the remainder is crucial,
|
||
as accuracy is important -- if the lines are not drawn accurately,
|
||
the polygon sides won't join together, and the polygon will look very
|
||
weird.
|
||
Why not do a divide like in the projection routine? Well,
|
||
first of all, I wrote the polygon renderer before deriving that
|
||
routine. :) Second, that routine combines a multiply _and_ a
|
||
divide into a single multiply; there is no multiply here, and if
|
||
you count up the cycles I think you'll find that the logarithmic
|
||
routine is faster, and with the remainder calculation it is simply
|
||
better-suited for drawing polygons.
|
||
Although the remaining parts of the algorithm require a lot
|
||
of blood and sweat to get the cycle count down, they are relatively
|
||
straightforward. I leave their explanation, along with a few other
|
||
subtleties, for the detailed code disassemblies, below.
|
||
|
||
Detailed Code Disassembly
|
||
-------------------------
|
||
|
||
After all of those preliminary calculations and ideas and
|
||
concepts we are finally in a position to write some decent code.
|
||
The full lib3d source code is included in this issue, but it is worth
|
||
examining the routines in more detail. Five routines will be
|
||
discussed below:
|
||
|
||
CALCMAT - Calculate a rotation matrix
|
||
ACCROTX - Accumulate the rotation matrix by a fixed amount
|
||
GLOBROT - Global rotate (rotate centers)
|
||
ROTPROJ - Local rotation (object points), and project if necessary
|
||
POLYFILL- Polygon routine
|
||
|
||
Only the major portions of the routines will be highlighted. And the
|
||
first routine is...
|
||
|
||
CALCMAT
|
||
-------
|
||
|
||
*
|
||
* Calculate the local matrix
|
||
*
|
||
* Pass in: A,X,Y = angles around z,x,y axis
|
||
*
|
||
* Strategy: M = Ry Rx Rz where Rz=roll, Rx=pitch, Ry=yaw
|
||
*
|
||
|
||
CALCMAT calculates a rotation matrix using the old method: pass in the
|
||
rotation amounts about the x y and z axis and calculate a matrix.
|
||
There is a fairly complete explanation of this routine in the source
|
||
code, but it has a very important feature: the order of the rotations.
|
||
|
||
As the above comment indicates, it first takes care of roll (tilting
|
||
your head sideways), then pitch (looking up and down), and finally
|
||
yaw (looking left or right). This routine can't be used for a general
|
||
3D world, but by doing rotations in this order it _can_ be used for
|
||
certain kinds of motion, like motion in a plane (driving a car around),
|
||
or an angle/elevation motion (like a gun turret).
|
||
|
||
Note also the major difference from the accumulation routines: they
|
||
rotate by a fixed amount. Since this routine just uses the angles
|
||
as parameters, the program that uses this routine can of course update
|
||
the angles by any amount it likes. That is: the accumulation routines
|
||
can only turn left or right in 3 degree increments; this routine can
|
||
instantly point in any direction.
|
||
|
||
Next up are the accumulation routines. These routines apply a fixed
|
||
rotation to the rotation matrix; i.e. they rotate the world by a
|
||
fixed amount about an axis. The carry flag is used to indicate
|
||
whether rotations are positive or negative (clockwise or counter-
|
||
clockwise if you like).
|
||
|
||
As was pointed out earlier, only one routine is needed to do the
|
||
actual rotation calculation. All it needs to know is which rows
|
||
to operate on, so that is the job of ACCROTX ACCROTY and ACCROTZ.
|
||
|
||
ACCROTX
|
||
-------
|
||
|
||
*
|
||
* The next three procedures rotate the accumulation
|
||
* matrix (multiply by Rx, Ry, or Rz).
|
||
*
|
||
* Carry clear means rotate by positive amount, clear
|
||
* means rotate by negative amount.
|
||
*
|
||
ACCROTX ENT
|
||
|
||
The x-rotation just operates on rows 2 and 3; these routines just loop
|
||
through the row elements, passing them to the main routine ROTXY.
|
||
The new, rotated elements are then stored in the rotation matrix,
|
||
and on exit the rotation matrix has accumulated a rotation about
|
||
the x-axis.
|
||
|
||
|
||
ROR ROTFLAG
|
||
LDX #2
|
||
:LOOP STX COUNT
|
||
LDA D21REM,X ;rows 2 and 3
|
||
STA AUXP
|
||
LDA G31REM,X
|
||
STA AUXP+1
|
||
|
||
LDY G31,X ;.Y = row 3
|
||
LDA D21,X ;.X = row 2
|
||
TAX
|
||
JSR ROTXY
|
||
LDX COUNT
|
||
LDA TEMPX
|
||
STA D21REM,X
|
||
LDA TEMPX+1
|
||
STA D21,X
|
||
LDA TEMPY
|
||
STA G31REM,X
|
||
LDA TEMPY+1
|
||
STA G31,X
|
||
DEX
|
||
BPL :LOOP
|
||
RTS
|
||
|
||
*
|
||
* Rotate .X,AUXP .Y,AUXP+1 -> TEMPX,+1 TEMPY,+1
|
||
*
|
||
* If flag is set for negative rotations, swap X and Y
|
||
* and swap destinations (TEMPX TEMPY)
|
||
*
|
||
ROTXY
|
||
|
||
This is the main accumulation routine. It simply calculates
|
||
x*cos(delta) + y*sin(delta), and y*cos(delta) - x*sin(delta).
|
||
Since delta is so small, cos(delta) is very nearly 1 and sin(delta)
|
||
is very nearly zero: an integer and a remainder part of x and y
|
||
are necessary to properly calculate x*cos(delta) etc.
|
||
|
||
Let x = xint + xrem (integer and remainder). Then
|
||
|
||
x*cos(delta) = xint*cos(delta) + xrem*cos(delta).
|
||
|
||
Since cos(delta) is nearly equal to 1, xint*cos(delta) is equal to
|
||
xint plus a small correction; that is, xint*cos(delta) gives an
|
||
integer part and a remainder part. What about xrem*cos(delta)? It
|
||
also gives a small correction to xrem. But xrem is already small!
|
||
Which means the correction to xrem is really really small, i.e outside
|
||
the range of our numbers. So we can just discard the correction.
|
||
This can generate a little bit of numerical error, but it is miniscule,
|
||
and tends to get lost in the noise.
|
||
|
||
The point of all this is that x*cos(delta) + y*sin(delta) is calculated
|
||
as
|
||
xrem + xint*cos(delta) + yint*sin(delta)
|
||
|
||
where xrem*cos(delta) is taken to be xrem, and yrem*sin(delta) is taken
|
||
to be zero. This introduces a very small error into the computation,
|
||
but the errors tend to get lost in the wash.
|
||
|
||
:CONT LDA CDELREM,X
|
||
CLC
|
||
ADC SDELREM,Y
|
||
STA TEMPX
|
||
LDA CDEL,X ;x*cos(delta)
|
||
ADC SDEL,Y ;+y*sin(delta)
|
||
STA TEMPX+1
|
||
LDA TEMPX
|
||
CLC
|
||
ADC AUXP ;+xrem
|
||
STA TEMPX
|
||
BCC :NEXT
|
||
INC TEMPX+1
|
||
|
||
:NEXT a similar piece of code to calculate y*cos - x*sin
|
||
|
||
And that is the entire routine -- quite zippy, and at 14 bits per
|
||
number it is remarkably accurate.
|
||
|
||
GLOBROT
|
||
-------
|
||
|
||
Next up is GLOBROT. The centers are 16-bit signed numbers, and
|
||
the rotation matrix of course consists of 8-bit signed numbers.
|
||
|
||
*
|
||
* Perform a global rotation, i.e. rotate the centers
|
||
* (16-bit signed value) by the rotation matrix.
|
||
*
|
||
* The multiplication multiplies to get a 24-bit result
|
||
* and then divides the result by 64 (mult by 4). To
|
||
* perform the signed multiplication:
|
||
* - multiply C*y as normal
|
||
* - if y<0 then subtract 256*C
|
||
* - if C<0 then subtract 2^16*y
|
||
*
|
||
* Parameters: .Y = number of points to rotate
|
||
*
|
||
|
||
Recall that the object centers are 16-bit signed coordinates; therefore
|
||
a full 16-bit signed multiply is needed. Two macros are used for
|
||
the multiplies:
|
||
|
||
MULTAY MAC ;Multiply A*Y, store in var1, A
|
||
|
||
which does the usual fast multiply, and
|
||
|
||
QMULTAY MAC ;Assumes pointers already set up
|
||
LDA (MULTLO1),Y
|
||
SEC
|
||
SBC (MULTLO2),Y
|
||
STA ]1
|
||
LDA (MULTHI1),Y
|
||
SBC (MULTHI2),Y
|
||
<<<
|
||
|
||
which does a "quick multiply", that is, it is the fast multiply
|
||
without any of the zero-page setup muckety-muck. As was pointed out
|
||
in the theory section, once the zero page variables are set up,
|
||
they stay set up! ROTPROJ can take better advantage of this,
|
||
but GLOBROT can also benefit. Remember also that if A=0 this
|
||
routine will fail; the routines below must check for A=0 ahead of
|
||
time.
|
||
|
||
Since the matrix elements are all "6-bit offset", that is, since
|
||
they are the actual value times 64, the final result after the
|
||
rotations needs to be divided by 64. The easiest way to divide the
|
||
24-bit result by 64 is to shift *left* twice, and keep the upper bits.
|
||
In the macro below, "C" is the "C"enter, and Mij is some element in
|
||
the rotation matrix.
|
||
|
||
* Fix sign and divide by 64
|
||
|
||
DIVFIX MAC ;If Mij<0 subtract 256*C
|
||
LDY MULTLO1 ;If C<0 subtract 2^16*Mij
|
||
BPL POSM
|
||
STA TEMP3
|
||
LDA TEMP2
|
||
SEC
|
||
SBC ]1 ;Center
|
||
STA TEMP2
|
||
LDA TEMP3
|
||
SBC ]1+1 ;high byte
|
||
POSM LDY ]1+1
|
||
BPL DIV
|
||
SEC
|
||
SBC MULTLO1 ;Subtract Mij
|
||
|
||
DIV ASL TEMP1 ;Divide by 64
|
||
ROL TEMP2
|
||
ROL
|
||
ASL TEMP1
|
||
ROL TEMP2
|
||
ROL
|
||
LDY TEMP2
|
||
<<< ;.A, .Y = final result
|
||
|
||
|
||
* Main routine
|
||
|
||
GLOBROT ENT
|
||
|
||
Recall that there are two ways to look at multiplying a matrix times
|
||
a vector. The usual way is "row times column"... and that is exactly
|
||
how this routine works. Why not do the "column times vector element"
|
||
method, which we said offers this great computational advantage?
|
||
|
||
Doing things that way means calculating
|
||
|
||
[A11] [B12] [C13]
|
||
[D21]*CX + [E22]*CY + [F23]*CZ
|
||
[G31] [H32] [I33]
|
||
|
||
where the A11 etc. are the columns of the rotation matrix. The
|
||
argument was that CX could be set up in zero page once, and then
|
||
multiplications could be done three at a time. Why not just do that?
|
||
|
||
Well, the most basic answer is that I had already written this
|
||
routine before I noticed the multiplication trick. So I could have
|
||
rewritten it using the above method, or modify what was already in
|
||
place. Since CX CY and CZ are all 16-bit signed numbers, I chose
|
||
to do the multiplication savings using A11 etc. as the zero
|
||
page variable (A11*CX = A11*CXLO + 256*A11*CXHI), so the multiplications
|
||
are done two at a time. I think I also decided that the total cycle
|
||
savings in rewriting the routine would have been miniscule compared
|
||
with the total routine time, and that the whole global rotation time
|
||
was small compared with the local rotations and especially the
|
||
polygon plotting.
|
||
|
||
In sum, doing a total rewrite didn't seem to be worth the effort.
|
||
|
||
Nevertheless, this routine can certainly be made faster.
|
||
Another item for improvement is to do the division by 64 only
|
||
at the very end of the calculation (the routines below multiply,
|
||
then divide by 64, then add; it would be better to multiply, add,
|
||
then after all additions divide by 64). Who knows what I was
|
||
thinking at the time?!
|
||
|
||
Oh well, that's why God invented version numbers :).
|
||
|
||
Here is the main routine loop:
|
||
|
||
GLOOP DEY
|
||
STY COUNT
|
||
|
||
[copy object centers to zero page for fast access]
|
||
|
||
This part multiplies row 1 of the rotation matrix times the position
|
||
vector, and stores the new result in CX, the x-coordinate of the object
|
||
center.
|
||
|
||
LDA A11 ;Row1
|
||
LDX B12
|
||
LDY C13
|
||
JSR MULTROW ;Returns result in .X .A = lo hi
|
||
LDY COUNT
|
||
STA (CXHI),Y
|
||
TXA
|
||
STA (CXLO),Y
|
||
|
||
[Do the same thing for row 2 and the y-coordinate]
|
||
|
||
[Do the same thing for row 3 and the z-coordinate]
|
||
|
||
The routine then loops around back to GLOOP, until all points have
|
||
been rotated. The important work of this routine is done in MULTROW:
|
||
|
||
*
|
||
* Multiply a row by CX CY CZ
|
||
* (A Procedure to save at least a LITTLE memory...)
|
||
*
|
||
M1 EQU CXSGN ;CXSGN etc. no longer used.
|
||
M2 EQU CYSGN
|
||
M3 EQU CZSGN
|
||
|
||
MULTROW
|
||
STA M1
|
||
STX M2
|
||
STY M3
|
||
|
||
This next part checks to make sure A is not zero; as was pointed out
|
||
in the discussion of the fast multiply, A=0 will fail (ask me how I
|
||
know).
|
||
|
||
TAY
|
||
BEQ :SKIP1
|
||
|
||
It is all pretty straightforward from here: compute M1*CX, adjust for
|
||
signed values, and divide by 64 (this is the divide by 64 that it
|
||
would have been better to leave for later). Since zero-page is
|
||
already set up, the second multiplication is done with a quick multiply.
|
||
|
||
LDY CX+1
|
||
>>> MULTAY,TEMP2
|
||
STA TEMP3
|
||
LDY CX
|
||
>>> QMULTAY,TEMP1
|
||
CLC
|
||
ADC TEMP2
|
||
STA TEMP2
|
||
LDA TEMP3
|
||
ADC #00
|
||
>>> DIVFIX,CX ;Adjust result and /64
|
||
:SKIP1 STY TM1
|
||
STA TM1+1
|
||
|
||
The next two parts just calculate M2*CY and M3*CZ, adding to TM1 and
|
||
TM1+1 as they go, and the routine exits.
|
||
|
||
ROTPROJ
|
||
-------
|
||
|
||
Now we move on to perhaps THE core routine. In general, *a whole lot*
|
||
of object points need to be rotated and projected, so this routine
|
||
needs to blaze. Remember that there are two rotations in the
|
||
"world" equation: local rotations to get an object pointing in the
|
||
right direction, and another rotation to figure out what it looks
|
||
like relative to us. This routine can therefore just rotate (for
|
||
the first kind of rotation), or rotate and project (for the second
|
||
kind). The idea is to pass in a list of points, tell it how many
|
||
points to rotate (and project), and tell it where to store them all!
|
||
|
||
*
|
||
* ROTPROJ -- Perform local rotation and project points.
|
||
*
|
||
* Setup needs:
|
||
* Rotation matrix
|
||
* Pointer to math table (MATMULT = $AF-$B0)
|
||
* Pointers to point list (P0X-P0Z = $69-$6E)
|
||
* Pointers to final destinations (PLISTYLO ... = $BD...)
|
||
* (Same as used by POLYFILL)
|
||
* Pointers to lists of centers (CXLO CXHI... = $A3-$AE)
|
||
* .Y = Number of points to rotate (0..N-1)
|
||
* .X = Object center index (index to center of object)
|
||
*
|
||
* New addition:
|
||
* C set means rotate and project, but C clear means just
|
||
* rotate. If C=0 then need pointers to rotation
|
||
* destinations ROTPX,ROTPY,ROTPZ=$59-$5E.
|
||
*
|
||
|
||
Recall that the rotation matrix is offset by 64 -- instead of ranging
|
||
from -1..1, all elements range from -64..64. As in the case of
|
||
GLOBROT above, the final result needs to be divided by 64. Also
|
||
recall the result from the discussion of signed multiplication:
|
||
if one set of numbers is between -64..64, and the other between
|
||
-96..95, then all combinations a+b (and a-b) generate a unique
|
||
8-bit number. In other words, only one table is needed.
|
||
|
||
Next, think about rotations for a moment: rotations do not change
|
||
lengths (if you rotate, say, a pencil, it doesn't magically get
|
||
longer). This has two important implications. The first is
|
||
that we know a priori that the result of this rotation will be an
|
||
*8-bit* result, if the starting length was an 8-bit number.
|
||
Thus we need only one table, and just two table lookups -- one
|
||
for f(a+b) and the other for f(a-b). Since the upper 8-bits
|
||
are irrelevant, the divide by 64 can be incorporated directly
|
||
into the table of f(x).
|
||
|
||
The second implication is that the _length_ of the point-vectors
|
||
cannot be more than 128. This is because if we ignore the upper
|
||
8-bits, we don't capture the sign of the result. Consider: each
|
||
x, y, and z coordinate of a point can be between -96..95. A point
|
||
like (80,80,80) has length sqrt(3*80^2) = 138.56. So if some rotation
|
||
were to rotate that vector onto the x-axis it would have a new
|
||
coordinate of (138,0,0). Although this is an 8-bit number, it is
|
||
not an eight-bit signed number -- the rotation could just as easily
|
||
put the coordinate at (-138,0,0), which requires extra bits. So some
|
||
care must be taken to make sure that object points are not more than
|
||
128 units away from the object center.
|
||
|
||
As was pointed out earlier, we can save a lot of time with the
|
||
matrix multiply by doing the multiplication as column-times-element,
|
||
instead of the usual row-times-column: column 1 of the matrix
|
||
times Px plus column 2 times Py plus column 3 times Pz, where
|
||
(Px,Py,Pz) is the vector being rotated. Here is the entire
|
||
signed multiplication routine (with divide by 64):
|
||
|
||
SMULT MAC ;Signed multiplication
|
||
STA MATMULT
|
||
EOR #$FF
|
||
CLC
|
||
ADC #01
|
||
STA AUXP
|
||
LDA (MATMULT),Y
|
||
SEC
|
||
SBC (AUXP),Y
|
||
<<<
|
||
|
||
And this is the Quick Multiply, used after the zero-page variables
|
||
have been set up:
|
||
|
||
QMULT MAC ;Multiplication that assumes
|
||
LDA (MATMULT),Y ;pointers are already initialized
|
||
SEC
|
||
SBC (AUXP),Y
|
||
<<<
|
||
|
||
Pretty cool, eh? A 12-cycle signed multiply and divide. :)
|
||
|
||
On to the routine:
|
||
|
||
ROTLOOP is the main loop. It first rotates the points. If the
|
||
carry was set then it adds the points to the object center and
|
||
projects them.
|
||
|
||
ROTLOOP LDY COUNT
|
||
BEQ UPRTS
|
||
DEY
|
||
STY COUNT
|
||
|
||
LDA (P0X),Y
|
||
BNE :C1
|
||
STA TEMPX
|
||
STA TEMPY
|
||
STA TEMPZ
|
||
BEQ :C2
|
||
:C1 LDY A11 ;Column 1
|
||
>>> SMULT
|
||
STA TEMPX
|
||
LDY D21
|
||
>>> QMULT
|
||
STA TEMPY
|
||
LDY G31
|
||
>>> QMULT
|
||
STA TEMPZ
|
||
|
||
The above multiplies the x-coordinate (P0X) times the first column of
|
||
the rotation matrix. Note the check for P0X=0, which would otherwise
|
||
cause the multiplication to fail. Note that there is one SMULT,
|
||
which sets up the zero-page pointers, followed by two QMULTS.
|
||
The result in TEMPX, TEMPY, TEMPZ will be added to subsequent
|
||
column multiplies. And that's the whole routine!
|
||
|
||
After the third column has been multiplied, we have the rotated
|
||
point. If projections are taking place, then the point needs to
|
||
be added to the center and projected. The centers are 16-bit
|
||
signed coordinates, but so far the points are just 8-bit signed
|
||
coordinates. To make them into 16-bit signed coordinates we
|
||
need to add a high byte of either 00, for positive numbers, or
|
||
$FF, for negative numbers (e.g. $FFFE = -2). In the code below,
|
||
.Y is used to hold the high byte.
|
||
|
||
ADDC LDY #00
|
||
CLC
|
||
ADC TEMPZ
|
||
BPL :ROTCHK
|
||
DEY ;Sign
|
||
|
||
:ROTCHK LDX ROTFLAG
|
||
BMI :POS1
|
||
LDY COUNT ;If just rotating, then just
|
||
STA (ROTP0Z),Y ;store!
|
||
LDA TEMPY
|
||
STA (ROTP0Y),Y
|
||
LDA TEMPX
|
||
STA (ROTP0X),Y
|
||
JMP ROTLOOP
|
||
|
||
:POS1 CLC ;Add in centers
|
||
ADC CZ
|
||
STA TEMPZ
|
||
TYA
|
||
ADC CZ+1
|
||
STA TEMPZ+1 ;Assume this is positive!
|
||
|
||
Remember that only objects which are in front of us -- which have
|
||
positive z-coordinates -- are visible. The projection won't
|
||
work right if any z-coordinates are negative. Thus the above
|
||
piece of code doesn't care about the sign of the z-coordinate.
|
||
|
||
Two similar pieces of code add the x/y coordinate to the x/y center
|
||
coordinate. The signs of the result are stored in CXSGN and CYSGN,
|
||
for use below.
|
||
|
||
Recall how projection works: accuracy for small values of Z, but
|
||
speed for high values of Z. If the z-coordinate is larger than
|
||
8-bits, we just shift all the coordinates right until z is 8-bits.
|
||
Since the X and Y coordinates might be negative, we need the
|
||
sign bits CXSGN and CYSGN; they contain either 00 or $FF, so
|
||
that the rotations will come out correctly (negative numbers will
|
||
stay negative!).
|
||
|
||
LDA TEMPZ+1
|
||
BEQ PROJ
|
||
:BLAH LSR ;Shift everything until
|
||
ROR TEMPZ ;Z=8-bits
|
||
LSR CXSGN
|
||
ROR TEMPX+1 ;Projection thus loses accuracy
|
||
ROR TEMPX ;for far-away objects.
|
||
LSR CYSGN
|
||
ROR TEMPY+1 ;(Big whoop)
|
||
ROR TEMPY
|
||
TAX
|
||
BNE :BLAH
|
||
|
||
Finally comes the projection itself. It just follows the equations
|
||
set out earlier, in Section 3. PROJTAB is the table of d/z.
|
||
It turns out that there are a lot of zeros and ones in the
|
||
table, so those two possibilities are treated as special cases.
|
||
The PROJTAB value is used in the zero-page multiplication pointers,
|
||
to again make multiplication faster. By calculating both the
|
||
x- and y-coordinates simultaneously, the pointers may be reused
|
||
even further.
|
||
|
||
After multiplying, the coordinates are added to the screen offsets
|
||
(XOFFSET and YOFFSET), so that 0,0 is at the center of the screen
|
||
(i.e. XOFFSET=160 and YOFFSET=100). These values are changeable,
|
||
so the screen center may be relocated almost anywhere. The final
|
||
coordinates are 16-bit signed values.
|
||
|
||
Finally, there is the other core routine: POLYFILL, the polygon
|
||
renderer.
|
||
|
||
POLYFILL
|
||
--------
|
||
|
||
POLYFILL works much like the old Polygonamy routine. Each object
|
||
consists of a list of points. Each face of the object is some
|
||
subset of those points. POLYFILL polygons are defined by a set
|
||
of indices into the object point list -- better to copy an 8-bit
|
||
index than a 16-bit point. The index list must go counter-clockwise
|
||
around the polygon, so the hidden-face routine will work correctly.
|
||
To draw the polygon it simply starts at the bottom, calculating
|
||
the left and right edges and filling in-between.
|
||
|
||
POLYFILL also greatly improves upon the old routine. It can draw
|
||
to bitmaps in any bank. It is very compact. It deals with
|
||
16-bit signed coordinates, and can draw polygons which are partially
|
||
(or even wholly) off-screen. It also correctly draws polygons which
|
||
overlap (in a very sneaky way).
|
||
|
||
Keep in mind that this routine needs to be just balls-busting fast.
|
||
A typical object might have anywhere from 3-10 visible polygons, and
|
||
a typical polygon might be 100-200 lines high. Just saving a few
|
||
tens of cycles in the main loop can translate to tens of thousands
|
||
of cycles saved overall. By the same token, there's no need to
|
||
knock ourselves out on routines which aren't part of the main loop;
|
||
saving 100 cycles overall is chump change. Finally, since this is
|
||
a library, there are some memory constraints, and some of the routines
|
||
are subsequently less efficient than they would otherwise be. Again,
|
||
though, those few added cycles are all lost in the noise when compared
|
||
with the whole routine.
|
||
|
||
There is a whole a lot of code, so it's time to just dive in. First,
|
||
|
||
*
|
||
* Some tables
|
||
*
|
||
|
||
XCOLUMN ;xcolumn(x)=x/8
|
||
]BLAH = 0
|
||
LUP 32
|
||
DFB ]BLAH,]BLAH,]BLAH,]BLAH
|
||
DFB ]BLAH,]BLAH,]BLAH,]BLAH
|
||
]BLAH = ]BLAH+1
|
||
--^
|
||
|
||
* Below are EOR #$FF for merging into the bitmap, instead
|
||
* of just ORAing into the bitmap.
|
||
|
||
LBITP ;Left bit patterns
|
||
LUP 32
|
||
* DFB $FF,$7F,$3F,$1F,$0F,$07,$03,$01
|
||
DFB 00,$80,$C0,$E0,$F0,$F8,$FC,$FE
|
||
--^
|
||
RBITP ;right bit patterns
|
||
LUP 32
|
||
* DFB $80,$C0,$E0,$F0,$F8,$FC,$FE,$FF
|
||
DFB $7F,$3F,$1F,$0F,$07,$03,$01,$00
|
||
--^
|
||
|
||
I will put off discussion of the above tables until they are
|
||
actually used. It is just helpful to keep them in mind, for later.
|
||
|
||
Next up is the fill routine. The polygonamy fill routine looked like
|
||
|
||
STA BITMAP,Y
|
||
STA BITMAP+8,Y
|
||
STA BITMAP+16,Y
|
||
...
|
||
|
||
There were a total of 25 of these -- one for each row. Then there
|
||
were two bitmaps, so 50 total. Each bitmap-blaster was page-aligned.
|
||
|
||
Fills take place between two columns on the screen. By entering
|
||
the above routine at the appropriate STA xxxx,Y, the routine
|
||
could begin filling from any column. By plopping an RTS onto
|
||
the appropriate STA the routine could exit on any column. After
|
||
filling between the left and right columns, the RTS was restored
|
||
to an STA xxxx,Y.
|
||
|
||
Although the filling is very fast, the routine is very large, locked
|
||
to a specific bitmap, and a bit cumbersome in terms of overhead. So
|
||
a new routine is needed, and here it is:
|
||
|
||
*
|
||
* The fill routine. It just blasts into the bitmap,
|
||
* with self-modifying code determining the entry and
|
||
* exit points.
|
||
*
|
||
FILLMAIN
|
||
]BLAH = 0 ;This assembles to
|
||
LUP 32 ;LDY #00 STA (BPOINT),Y
|
||
LDY #]BLAH ;LDY #08 STA (BPOINT),Y
|
||
STA (BPOINT),Y ;...
|
||
]BLAH = ]BLAH+8 ;LDY #248 STA (BPOINT),Y
|
||
--^
|
||
|
||
INC BPOINT+1 ;Advance pointer to column 32
|
||
COL32 LDY #00
|
||
STA (BPOINT),Y
|
||
LDY #08 ;33
|
||
STA (BPOINT),Y
|
||
LDY #16
|
||
STA (BPOINT),Y
|
||
LDY #24
|
||
STA (BPOINT),Y
|
||
LDY #32
|
||
STA (BPOINT),Y
|
||
LDY #40 ;37
|
||
STA (BPOINT),Y
|
||
LDY #48
|
||
STA (BPOINT),Y
|
||
LDY #56 ;Column 39
|
||
STA (BPOINT),Y
|
||
FILLEND RTS ;64+256=320
|
||
FILL
|
||
JMP FILLMAIN ;166 bytes
|
||
|
||
As before, self-modifying code is used to determine the entry and
|
||
exit points. As you can see, compared with the polygonamy routine
|
||
this method takes 3 extra cycles per column filled. But now there
|
||
is just one routine, and it can draw to any bitmap. As an added
|
||
bonus, if we do the RTS just right, then (BPOINT),Y will point to
|
||
the ending-column (also note the INC BPOINT+1 in the fill code, for
|
||
when column 32 is crossed).
|
||
|
||
There are two parts to doing the filling. The above fill routine
|
||
fills 8 bits at a time. But the left and right endpoints need to
|
||
be handled specially. The old routine had to recalculate those
|
||
endpoints; this method gets it automatically, and so saves some
|
||
extra cycles in overhead.
|
||
|
||
In the very worst case, the old routine takes 200 cycles to fill
|
||
40 columns; the new routine takes 325 cycles. The main routine
|
||
below takes around 224 cycles per loop iteration. Ignoring the
|
||
savings in overhead (a few tens of cycles), this means that in
|
||
the absolute worst case the old method will be around 20% faster.
|
||
For typical polygons, they become comparable (5% or so). With all
|
||
the advantages the new routine offers, this is quite acceptable!
|
||
|
||
A few extra tables follow:
|
||
|
||
LOBROW DFB 00,64,128,192,00,64,128,192
|
||
(and so on)...
|
||
|
||
HIBROW
|
||
DFB 00,01,02,03,05,06,07,08
|
||
etc.
|
||
|
||
COLTAB ;coltab(x)=x*8
|
||
DFB 0,8,16,24,32,40,48,56,64,72,80
|
||
etc.
|
||
|
||
LOBROW and HIBROW are used to calculate the address of a row in
|
||
the bitmap, by adding to the bitmap base address. COLTAB then gives
|
||
the offest for a given column. Thus the address of any row,column
|
||
is quickly calculated with these tables.
|
||
|
||
Finally, the next two tables are used to calculate the entry and exit
|
||
points into the fill routine.
|
||
|
||
*
|
||
* Table of entry point offsets into fill routine.
|
||
* The idea is to enter on an LDY
|
||
*
|
||
FILLENT
|
||
DFB 0,4,8,12,16,20,24,28,32,36,40 ;0-10
|
||
DFB 44,48,52,56,60,64,68,72,76,80 ;11-20
|
||
DFB 84,88,92,96,100,104,108,112 ;21-28
|
||
DFB 116,120,124,128 ;29-32
|
||
DFB 134 ;Skip over INC BPOINT+1
|
||
DFB 138,142,146,150,154,158 ;34-39
|
||
DFB 162 ;Remember that we use FILLENT+1
|
||
*
|
||
* Table of RTS points in fill routine.
|
||
* The idea is to rts after the NEXT LDY #xx so as to
|
||
* get the correct pointer to the rightmost column.
|
||
*
|
||
* Thus, these entries are just the above +2
|
||
*
|
||
FILLRTS
|
||
DFB 2,6,10,14,18,22,26,30,34,38,42 ;0-10
|
||
DFB 46,50,54,58,62,66,70,74,78,82 ;11-20
|
||
DFB 86,90,94,98,102,106,110,114,118,122,126
|
||
DFB 132 ;Skip over INC
|
||
DFB 136,140,144,148,152,156,160 ;33-39
|
||
|
||
|
||
All that's left now is the main code routines. Before drawing the
|
||
polygon there is some setup stuff that needs to be done. The
|
||
lowest point needs to be found, and the hidden face check needs to
|
||
be done. The hidden face check is simply: if we can't draw this
|
||
polygon, then it is hidden!
|
||
|
||
As was said before, the list of point indices moves around the
|
||
polygon counter-clockwise when the polygon is visible. When the
|
||
polygon gets turned around, these points will go around the
|
||
polygon clockwise. What does this mean computationally? It
|
||
means that the right side of the polygon will be to the LEFT of
|
||
the left side, on the screen. A simple example:
|
||
____
|
||
\ /
|
||
L \/ R
|
||
|
||
When the above polygon faces the other way (into the monitor), R will
|
||
be to the left of L. The polygon routine always computes R as the
|
||
"right side" and L as the "left side" of the polygon; if, after taking
|
||
a few steps, the right side is to the left of the left side, then we
|
||
know that the polygon is turned around. The hidden face calculation
|
||
just computes the slopes of the left and right lines -- which we need
|
||
to do to draw the polygon anyways -- and takes a single step along
|
||
each line. All that is needed is to compare the left and right points
|
||
(or the slopes), and either draw the polygon or punt.
|
||
|
||
There are two sides to be dealt with: the left and right. Each of
|
||
those sides can have either a positive or negative slope, therefore
|
||
a total of four routines are needed. A second set of four routines
|
||
is used for parts of the polygon that have y-coordinates larger than
|
||
200 or less than zero, i.e. off of the visible screen; these routines
|
||
calculate the left and right sides as normal, but skip the plotting
|
||
calculations.
|
||
|
||
All four routines are similar. The differences in slope change
|
||
some operations from addition to subtraction, and as was explained
|
||
earlier affects whether the x=x+dx update comes before or after
|
||
the plotting. It also affects the very last point plotted, similar
|
||
to the way the x=x+dx update is affected. The truly motivated can
|
||
figure out these differences, so I'll just go through one routine
|
||
in detail:
|
||
|
||
* Left positive, right negative
|
||
*
|
||
* End: left and right advance normally
|
||
|
||
DYL3 >>> DYLINEL,UL3
|
||
LEFTPM >>> LINELP,LEFTMM
|
||
JMP LCOLPM
|
||
|
||
LOOP3 DEC YLEFT
|
||
BEQ DYL3
|
||
UL3 >>> PUPDATE,LEFTXLO;LEFTXHI;LEFTREM;LDY;LDXINT;LDXREM
|
||
LCOLPM >>> LCOLUMN
|
||
|
||
DEC YRIGHT
|
||
BNE UR3
|
||
>>> DYLINER,UR3
|
||
RIGHTPM >>> LINERM,RIGHTPP
|
||
JMP RCOLPM
|
||
UR3 >>> MUPDATE,RIGHTXLO;RIGHTXHI;RIGHTREM;RDY;RDXINT;RDXREM
|
||
RCOLPM >>> RCOLUMN
|
||
>>> UPYRS,LOOP3
|
||
|
||
That's the whole routine, for a polygon which ends in a little
|
||
peak, like /\ i.e. left line positive slope and right line negative.
|
||
Let's read the main loop:
|
||
|
||
LOOP3 Decrease the number of remaining points to draw in the left line;
|
||
if the line is finished, then calculate the next line.
|
||
PUPDATE: The "P" is for "Plus": compute x=x+dx for the left side.
|
||
LCOLUMN: Calculate the left endpoint on the screen, and set
|
||
up the screen pointer and the fill entry point.
|
||
Decrease the number of remaining points in the right line;
|
||
if finished, calculate the next line segment (note BNE).
|
||
MUPDATE: "M"inus update, since slope is negative: calculate x=x-dx
|
||
RCOLUMN: Calculate the right endpoint and fill column,
|
||
do the fill, and plot the left/right endpoints.
|
||
UPYRS: Update the bitmap pointer, and go back to LOOP3
|
||
|
||
So it's just what's been said all along: calculate the left and right
|
||
endpoints, and fill. When a line is all done, the next side of
|
||
the polygon is computed. DYL3 performs the computations for the
|
||
left line segment. DYLINEL computes the value of DY. Note that
|
||
DY always has the same sign, since the polygon is always drawn from
|
||
the bottom-up. LINELP computes DX and DX/DY, the inverse slope.
|
||
If DX is negative it will jump to LEFTMM. Since we are in the
|
||
"plus-minus" routine, i.e. left side = plus slope, right side = minus
|
||
slope, then it needs to jump to the "minus-minus" routine if DX
|
||
has the wrong sign. LEFTMM stands for "left side, minus-minus",
|
||
in just the same way that LEFTPM in the above code means "left side,
|
||
plus-minus". The right line segment is similar.
|
||
Note what happens next: the routine JMPs to LCOLPM. It skips
|
||
over the PUPDATE at UL3. Similarly, the right line calculation JMPs
|
||
over UR3 into RCOLPM. What this does is delay the calculation of
|
||
x=x+dx or x=x-dx until *after* the plotting is done, to fill between
|
||
the correct left and right endpoints. See the discussion of polygon
|
||
fills in Section 3 for a more detailed explanation of what is going
|
||
on here.
|
||
Now let's go through the actual macros which make up
|
||
the above routine. First, the routines to calculate the slope of
|
||
the left line segment:
|
||
|
||
*
|
||
* Part 1: Compute dy
|
||
*
|
||
* Since the very last point must be plotted in a special
|
||
* way (else not be plotted at all), ]1=address of routine
|
||
* to handle last point.
|
||
*
|
||
|
||
DYLINEL MAC ;Line with left points
|
||
BEGIN LDX LINDEX
|
||
L1 DEC REMPTS ;Remaining lines to plot
|
||
BPL L2 ;Zero is still plotted
|
||
INC YLEFT
|
||
LDA REMPTS
|
||
CMP #$FF ;Last point
|
||
BCS ]1
|
||
EXIT RTS
|
||
|
||
The above code first checks to see if any points are remaining.
|
||
If this is the last point, we still need to fill in the very last
|
||
points. If the right-hand-side routine has already dealt with
|
||
the last endpoint, REMPTS will equal $FF and the routine will really
|
||
exit; in this case, the last part has already been plotted.
|
||
|
||
L3 LDX NUMPTS
|
||
L2 LDY PQ,X
|
||
STY TEMP1
|
||
LDA (PLISTYLO),Y ;Remember, y -decreases-
|
||
DEX
|
||
BMI L3 ;Loop around if needed
|
||
LDY PQ,X
|
||
STY TEMP2
|
||
SEC
|
||
SBC (PLISTYLO),Y
|
||
BEQ L1 ;If DY=0 then skip to next point
|
||
|
||
Next, it reads points out of the point queue (PQ) -- the list of point
|
||
indices going around the polygon. X contains LINDEX, the index
|
||
of the current point on the left-hand side. We then decrease this
|
||
index to get the next coordinate on the left-hand side; this index
|
||
is increased for right-hand side coordinates. That is, left lines
|
||
go clockwise through the point list, and right lines go counter-
|
||
clockwise. If DY=0, then we can just skip to the next point in
|
||
the point list -- the fill routine is very good at drawing horizontal
|
||
lines.
|
||
Sharp-eyed readers may have noticed that only the low-bytes
|
||
of the y-coordinates are used. We know ahead of time that DY is
|
||
limited to an eight-bit number, and since we are always moving up
|
||
the polygon we know that DY always has the same sign. In principle,
|
||
then, we don't need the high bytes at all. In practice, though,
|
||
the points can get screwed up (like if the polygon is very very
|
||
tiny, and one of the y-coordinates gets rounded down). The JSR FLOWCHK
|
||
below checks the high byte.
|
||
|
||
STA LDY
|
||
STA YLEFT
|
||
STA DIVY
|
||
STX LINDEX
|
||
|
||
JSR FLOWCHK ;Just in case, check for dy < 0
|
||
BMI EXIT ;(sometimes points get screwed up)
|
||
<<<
|
||
|
||
The code for FLOWCHK is simply
|
||
|
||
FLOWCHK
|
||
LDY TEMP1
|
||
LDA (PLISTYHI),Y
|
||
LDY TEMP2
|
||
SBC (PLISTYHI),Y
|
||
RTS
|
||
|
||
Why put such a tiny thing in a subroutine? Because I was out of
|
||
memory. It was very annoying. Remember the rule though: a few
|
||
occasional wasted cycles are a drop in the bucket. And, of course,
|
||
this is why God invented version numbers. (It can be made more
|
||
efficient, anyways -- as you might have guessed, this was kludged
|
||
in at the last moment, long after the routine had been written).
|
||
|
||
Okee dokee, the next part of computing the line slope is to compute DX.
|
||
For this routine, the left line has positive slope.
|
||
|
||
*
|
||
* Part 2: Compute dx. If dx has negative the expected
|
||
* sign then jump to the complementary routine.
|
||
*
|
||
* dx>0 means forwards point is to the right of the
|
||
* current point, and vice-versa.
|
||
*
|
||
|
||
LINELP MAC ;Left line, dx>0
|
||
LDY TEMP2
|
||
LDA (PLISTXLO),Y ;Next point
|
||
LDY TEMP1
|
||
SEC ;Carry can be clear if jumped to
|
||
SBC (PLISTXLO),Y ;Current point
|
||
STA DIVXLO
|
||
LDY TEMP2
|
||
LDA (PLISTXHI),Y
|
||
LDY TEMP1
|
||
SBC (PLISTXHI),Y
|
||
BPL CONT1 ;If dx<0 then jump out
|
||
JMP ]1 ;Entry address
|
||
|
||
CONT1 STA DIVXHI
|
||
LDA (PLISTXLO),Y ;Current point
|
||
STA LEFTXLO
|
||
LDA (PLISTXHI),Y
|
||
STA LEFTXHI
|
||
|
||
JSR DIVXY ;Returns int,rem = .X,.A
|
||
JSR LINELP2
|
||
DONE2 <<< ;Now .X=low byte, .A=high byte
|
||
;of current point
|
||
|
||
Pretty short. It first computes DX, and jumps to the complementary
|
||
routine if DX has the wrong sign. Recall that DX can be 9-bits,
|
||
so both the low and high bytes of PLISTX are used. The current
|
||
line coordinate is stored in LEFTXLO/LEFTXHI, and DIVXY thencomputes
|
||
DX/DY, both the integer part and remainder part in fixed 16-bit format.
|
||
That is, the number returned is xxxxxxxx.xxxxxxxx i.e. 8 bit integer,
|
||
8 bit remainder (256*rem/DY).
|
||
A few brief words should be said about DIVXY. It uses a
|
||
table of logs to compute an estimate for DX/DY. It then multiplies
|
||
that estimate by DY, and fixes up the estimate if it is too large
|
||
or too small. At this point, it has the integer part N and the
|
||
remainder R, i.e. DX/DY = N + R/DY. To compute R/DY it just uses
|
||
the log tables again but with a different exponential table,
|
||
since LOG(R) - LOG(DY) will always be *negative*. It then uses
|
||
this estimate as the actual remainder -- it doesn't try to correct
|
||
with a multiplication. By my calculations this can indeed cause a
|
||
tiny little error, but only for very very special cases (extremely
|
||
large lines/polygons), and it is something that is difficult to even
|
||
notice. I am always happy to trade tiny errors for extra cycles.
|
||
After DIVXY is all done the subroutine LINELP2 is used --
|
||
again, it was moved out to conserve memory. Recall that the first
|
||
step needs to be of size dx/2. LINELP2 and siblings perform this
|
||
computation:
|
||
|
||
LINELP2
|
||
STX LDXINT
|
||
|
||
CMP #$FF ;if dy=1
|
||
BNE :NOT1 ;then A=dx/2
|
||
LDA #00
|
||
ROR ;Get carry bit from dx/2
|
||
STA LDXREM
|
||
LDA #$80
|
||
STA LEFTREM
|
||
LDX LEFTXLO
|
||
LDA LEFTXHI
|
||
BCC :RTS ;And start from current point
|
||
|
||
DIVXY sets .A to #$FF if DY=1 -- although you'd expect that $FF is
|
||
a perfectly valid remainder, it turns out that it doesn't happen
|
||
because of the way DIVXY computes remainders. If DY=1, then DX/DY is
|
||
exactly DX. More importantly, this is the only case in which DX/DY
|
||
might be 9-bits; any other DY will return an 8-bit value of DX/DY.
|
||
For this reason it is handled specially.
|
||
|
||
:NOT1 STA LDXREM
|
||
LDA #$80
|
||
STA LEFTREM ;Initialize remainder to 1/2
|
||
TXA ;x=x-dxdy/2
|
||
LSR
|
||
STA TEMP2
|
||
LDA LEFTXLO
|
||
TAX ;.X = current point
|
||
SEC
|
||
SBC TEMP2
|
||
STA LEFTXLO
|
||
LDA LEFTXHI
|
||
BCS :DONE
|
||
DEC LEFTXHI
|
||
TAY ;Set/clear Z flag
|
||
:DONE CLC
|
||
:RTS RTS
|
||
|
||
The above probably looks a little strange. The slope is positive,
|
||
so we want to calculate x=x+dx/2, but the above calculates x=x-dx/2.
|
||
The reason is that the main loop will add dx to it, so that the
|
||
next iteration will take it out to +dx/2. Why do it on the next
|
||
iteration? For the same reason the x=x+dx update is delayed until
|
||
after the plotting: for left lines with positive slope, we always
|
||
need to start at the left endpoint of each horizontal line segment;
|
||
in this case, that left endpoint is exactly the starting point.
|
||
Notice that before computing x-dx/2 the above routine puts
|
||
LEFTXLO/LEFTXI -- the starting point -- in .X/.A. The plot/fill
|
||
calculations are done using the point in .X/.A and so the plot
|
||
will be done from the starting point.
|
||
|
||
The above are the routines to calculate the slope of the line.
|
||
They then jump over the next part: the part of the main loop
|
||
which calculates x=x+dx.
|
||
|
||
*
|
||
* Next, the parts which update the X coordinates
|
||
* for positive and negative dx (in real space, of
|
||
* course -- the stored value of dx is always positive)
|
||
*
|
||
* Parameters passed in are:
|
||
* ]1 = lo byte x coord
|
||
* ]2 = high byte x coord
|
||
* ]3 = remainder
|
||
* ]4 = dy
|
||
* ]5 = dx/dy, integer
|
||
* ]6 = dx/dy, remainder
|
||
*
|
||
|
||
PUPDATE MAC ;dx>0
|
||
LDA ]3
|
||
CLC
|
||
ADC ]6
|
||
STA ]3
|
||
LDA ]1 ;x=x+dx/dy
|
||
ADC ]5
|
||
TAX
|
||
STA ]1
|
||
BCC CONT
|
||
INC ]2 ;High byte, x
|
||
CLC
|
||
CONT LDA ]2
|
||
<<< ;Carry is CLEAR on exit
|
||
;.X = lo byte xcoord
|
||
;.A = hi byte
|
||
|
||
As you can see, it is a very simple 16-bit addition. As it says,
|
||
it leaves carry clear and .X/.A = lo/hi for the column calculation:
|
||
|
||
*
|
||
* Compute the columns and plot the endpoints
|
||
*
|
||
* .X contains the low byte of the x-coord
|
||
* .A was JUST loaded with the high byte
|
||
*
|
||
* Carry is assumed to be CLEAR
|
||
*
|
||
|
||
LCOLUMN MAC
|
||
* LDA LEFTXHI
|
||
BEQ CONT
|
||
CMP #2
|
||
BCC CONT1
|
||
ASL
|
||
BCS NEG
|
||
|
||
POS LDA #$FF ;Column flag
|
||
STA LCOL
|
||
BNE DONE
|
||
|
||
NEG LDA #00 ;If negative, column=0
|
||
STA LCOL
|
||
STA LBITS ;(will be EOR #$FF'ed)
|
||
LDA #4 ;Start filling at column 1
|
||
STA FILL+1
|
||
BNE DONE
|
||
|
||
The routine first checks if the coordinate lies in one of the
|
||
40 columns on the screen, by checking the high byte. If the high byte
|
||
is larger than one then the coordinate is way off the right side of the
|
||
screen. If the left coordinate is past the right edge of the screen then
|
||
there is nothing on-screen to fill, so LCOL is set to $FF as a flag. If
|
||
the coordinate is negative then LCOL, the left column number, is set to 0,
|
||
which will flag the plotting routine later on. LBITS will be explained
|
||
later, and FILL is the entry point for the fill routine, consisting
|
||
of a JMP xxxx instruction. Setting FILL+1 to 4 starts the fill routine
|
||
at column one. Why not start at column zero, and let the fill routine
|
||
take care of it? Because that requires a flag, and hence a check of
|
||
that flag in the plotting routine. That flag check means a few extra
|
||
cycles on each loop iteration, and actually screws up the plotting
|
||
logic. For the relatively rare event of negative columns, this is
|
||
totally not worth it.
|
||
There is still another branch above -- if the high byte is
|
||
equal to one. If it is, then we need to check if the x-coordinate
|
||
is less than 320. If it is, then the bitmap pointer and column
|
||
entry point need to be set up correctly.
|
||
|
||
CONT1 CPX #64 ;Check X<320
|
||
BCS POS
|
||
LDA #32 ;Add 32 columns!
|
||
INC BPOINT+1 ;and advance pointer
|
||
CONT ADC XCOLUMN,X
|
||
STA LCOL
|
||
TAY
|
||
|
||
LDA LBITP,X
|
||
STA LBITS ;For later use in RCOLUMN
|
||
|
||
LDA FILLENT+1,Y ;Get fill entry address
|
||
STA FILL+1
|
||
DONE <<<
|
||
|
||
By using the ADC XCOLUMN,X instead of the simple LDA XCOLUMN,X we can
|
||
just add the extra 32 columns in very easily. Remember that the
|
||
routine is entered with C clear, and A=0 if X<256. This gives the
|
||
column number, 0-39 (the XCOLUMN table is way up above). A table
|
||
lookup is cheaper than dividing by eight. LBITP is a table which
|
||
gives the bit patterns for use in plotting the endpoints -- the part
|
||
of the line not drawn by the fill routine. Plotting this endpoint
|
||
is put off until later, because, among other things, the left and
|
||
right endpoints might share a column (at the tips of the polygon, or
|
||
for a very skinny polygon); plotting now would hose other things
|
||
on the screen. Finally, FILLENT is used to set the correct entry
|
||
point into the fill routine, FILL.
|
||
|
||
The right side of the polygon uses a similar set of routines
|
||
to calculate slopes and do the line updating. The column calculation
|
||
is similar at first, but does the actual plotting and filling:
|
||
|
||
*
|
||
* Note that RCOLUMN does the actual filling
|
||
*
|
||
* On entry, .X = lo byte x-coord, and .A was JUST loaded
|
||
* with the high byte.
|
||
*
|
||
* Carry must be CLEAR.
|
||
*
|
||
|
||
RCOLUMN MAC
|
||
* LDA RIGHTXHI
|
||
BEQ CONT
|
||
CMP #2
|
||
BCC CONT1
|
||
ASL
|
||
BCS JDONE ;Skip plot+fill if x<0
|
||
|
||
POS LDX LCOL
|
||
BMI JDONE ;Skip if lcol>40!
|
||
LDY YCOUNT ;Plot the left edge
|
||
LDA (FILLPAT),Y
|
||
STA TEMP1
|
||
LDY COLTAB,X ;Get column index
|
||
EOR (BPOINT),Y ;Merge into the bitmap
|
||
AND LBITS ;bits EOR #$FF
|
||
EOR TEMP1 ;voila!
|
||
STA (BPOINT),Y
|
||
LDA TEMP1
|
||
JSR FILL ;Just fill to last column
|
||
JDONE JMP DONE
|
||
|
||
If the right column is negative, then the polygon is totally off the
|
||
left side of the screen. If the right coordinate is off the right side
|
||
of the screen, and the left column is still on the screen, then we
|
||
just fill all the way to the edge. The left endpoints are then merged
|
||
into the bitmap -- merging will be discussed, below.
|
||
|
||
UNDER LDY LCOL ;It is possible that, due to
|
||
BMI DONE ;rounding, etc. the left column
|
||
LDX #00 ;got ahead of the right column
|
||
;x=0 will force load of $FF
|
||
SAME LDA RBITP,X ;If zero, they share column
|
||
EOR LBITS
|
||
STA LBITS
|
||
LDY YCOUNT
|
||
LDA (FILLPAT),Y
|
||
STA TEMP1
|
||
LDX LCOL
|
||
LDY COLTAB,X
|
||
EOR (BPOINT),Y ;Merge
|
||
AND LBITS
|
||
EOR TEMP1
|
||
STA (BPOINT),Y ;and plot
|
||
JMP DONE
|
||
|
||
CONT1 CPX #64 ;Check for X>319
|
||
BCS POS
|
||
LDA #32 ;Add 32 columns!
|
||
CONT ADC XCOLUMN,X
|
||
CMP LCOL ;Make sure we want to fill
|
||
BCC UNDER
|
||
BEQ SAME
|
||
|
||
After the right column is computed it is compared to the left column.
|
||
If they share a column (or if things got screwed up and the right column
|
||
is to the left of the left column), then the left and right sides
|
||
need to be combined together and merged into the bitmap. Again,
|
||
merging is described below. Otherwise, it progresses normally:
|
||
|
||
LDY RBITP,X
|
||
STY TEMP1
|
||
LDX LCOL
|
||
STA LCOL ;Right column
|
||
|
||
LDY YCOUNT ;Plot the left edge
|
||
LDA (FILLPAT),Y
|
||
STA TEMP2
|
||
LDY COLTAB,X ;Get column index
|
||
EOR (BPOINT),Y ;Merge into bitmap
|
||
AND LBITS
|
||
EOR TEMP2
|
||
STA (BPOINT),Y
|
||
|
||
it uses LCOL as temporary storage for the right column, sets the fill
|
||
pattern, and then plots the left edge. Recall that LCOLUMN set up
|
||
the bitmap pointer, BPOINT, if the x-coord was larger than 255.
|
||
COLTAB gives the offset of a column within the bitmap, i.e. column*8.
|
||
The pattern is loaded and the left endpoint is then merged into the
|
||
bitmap.
|
||
What, exactly, is a 'merge'? Consider a line with LEFTX=5,
|
||
i.e. that starts in the middle of a column. The left line calculation
|
||
assumes that the left edge will be 00000111, and the fill takes care
|
||
of everything to the right of that column. We can't just STA into
|
||
the bitmap, because that would be bad news for overlapping or adjacent
|
||
polygons. We also don't want to ORA -- remember that this is a pattern
|
||
fill, and if that pattern has holes in it then this polygon will
|
||
combine with whatever is behind it, and can again lead to poor results.
|
||
What we really want to do is to have the pattern stored to the screen,
|
||
without hosing nearby parts of the screen. A very crude and cumbersome
|
||
way of doing this would be to do something like
|
||
|
||
LDA BITP ;00000111
|
||
EOR #$FF
|
||
AND (BITMAP) ;mask off the screen
|
||
STA blah
|
||
LDA BITP
|
||
AND pattern
|
||
ORA blah
|
||
STA (BITMAP)
|
||
|
||
That might be OK for some lame-o PC coder, but we're a little smarter
|
||
than that I think. Surely this can be made more efficient? Of course
|
||
it can, and stop calling me Shirley. It just requires a little bit
|
||
of thinking, and the result is pretty nifty.
|
||
First we need to specify what the problem is. There are
|
||
three elements: the BITP endpoint bits, the FILLPAT fill pattern bits,
|
||
and the BPOINT bitmap screen bits. What we need is a function which
|
||
does
|
||
|
||
bitmask pattern bitmap output
|
||
------- ------- ------ ------
|
||
0 y x x
|
||
1 y x y
|
||
|
||
In the above, 'y' represents a bit in the fill pattern and 'x' represents
|
||
a bit on the screen. They might have values 0 or 1. If the bitmask
|
||
(i.e. 00000111) has a 0 in it, then the bitmap bit should pass through.
|
||
If the bitmask has a 1, then the *pattern* bit should pass through.
|
||
A function which does this is
|
||
|
||
(pattern EOR bitmap) AND (NOT bitmask) EOR pattern
|
||
|
||
This first tells us that the bitmask should be 11111000 instead of
|
||
00000111. This method only involves one bitmap access, and doesn't
|
||
involve any temporary variables. In short, it merges one bit pattern
|
||
into another, using a mask to tell which bits to change. So let's
|
||
look at that bitmap merge code again:
|
||
|
||
LDY YCOUNT
|
||
LDA (FILLPAT),Y
|
||
STA TEMP2
|
||
LDY COLTAB,X ;Get column index
|
||
EOR (BPOINT),Y ;Merge into bitmap
|
||
AND LBITS
|
||
EOR TEMP2
|
||
STA (BPOINT),Y
|
||
|
||
And there it is. Pattern EOR bitmap AND not bitmask EOR pattern.
|
||
Note that because the pattern is loaded first, .Y can be reused as
|
||
an index into the bitmap, and X can stay right where it is (to be
|
||
again used shortly). With these considerations, you can see that
|
||
trying to mask off the bitmap would be very cumbersome indeed!
|
||
Note that if the left and right endpoints share a column,
|
||
their two bitmasks need to be combined before performing the merge.
|
||
This just means EORing the masks together -- remember that they
|
||
are inverted, so AND won't work, and EOR is used instead of ORA
|
||
in case bits overlap (try a few examples to get a feel for this).
|
||
Okay, let's remember where we are: the columns have all been
|
||
computed, the fill entry points are computed, and the left endpoint
|
||
has been plotted. We still have to do the fill and plot the right
|
||
endpoint. LCOLUMN set the fill entry point, but RCOLUMN needs to
|
||
set the exit point, by sticking an RTS at the right spot.
|
||
|
||
LDY LCOL ;right column
|
||
LDX FILLRTS,Y ;fill terminator
|
||
LDA #$60 ;RTS
|
||
STA FILLMAIN,X
|
||
LDA TEMP2 ;Fill pattern
|
||
JSR FILL
|
||
EOR (BPOINT),Y
|
||
AND TEMP1 ;Right bits
|
||
EOR TEMP2 ;Merge
|
||
STA (BPOINT),Y
|
||
LDA #$91 ;STA (),Y
|
||
STA FILLMAIN,X ;Fill leaves .X unchanged
|
||
;And leaves .Y with correct offset
|
||
DONE <<<
|
||
|
||
The fill routine doesn't change X, leaves (BPOINT),Y pointing to the
|
||
last column, and leaves the fill pattern in A. The right side of the
|
||
line can then be immediately plotted after filling, and the STA (),Y
|
||
instruction can be immediately restored in the fill routine.
|
||
And that's the whole routine!
|
||
After plotting, the loop just takes a step up the screen and
|
||
loops around. Since BPOINT, the bitmap pointer, may have been changed
|
||
(by LCOLUMN, or by the fill routine), the high byte needs to be restored.
|
||
|
||
*
|
||
* Finally, we need to decrease the Y coordinate and
|
||
* update the pointer if necessary.
|
||
*
|
||
* Also, since BPOINT may have been altered, we need
|
||
* to restore the high byte.
|
||
*
|
||
* ]1 = loop address
|
||
*
|
||
UPYRS MAC
|
||
LDA ROWHIGH
|
||
STA BPOINT+1
|
||
DEC BPOINT
|
||
DEC YCOUNT
|
||
BMI FIXIT
|
||
JMP ]1
|
||
|
||
FIXIT LDA #7
|
||
STA YCOUNT
|
||
LDY YROW
|
||
BEQ EXIT ;If y<0 then exit
|
||
DEY
|
||
STY YROW
|
||
ORA LOBROW,Y
|
||
STA BPOINT
|
||
LDA HIBROW,Y
|
||
CLC
|
||
ADC BITMAP
|
||
STA BPOINT+1
|
||
STA ROWHIGH
|
||
JMP ]1
|
||
EXIT RTS
|
||
<<<
|
||
|
||
Of course, the 64 bitmap is made up of 'rows of eight', i.e. after
|
||
every eight rows we have to reset the bitmap pointer. If we move
|
||
past the top of the screen (i.e. Y<0) then there is no more polygon
|
||
left to draw. Otherwise the new bitmap location is computed, by
|
||
adding the offset to BITMAP, the variable which lets the routine
|
||
draw to any bitmap (BITMAP contains the location in memory of the
|
||
bitmap). Note also that the high byte is stored to ROWHIGH --
|
||
ROWHIGH is what the routine immediately above uses to restore the
|
||
high byte of BPOINT!
|
||
And that, friends, is the essence of the polygon routine,
|
||
and concludes the disassembly of the 3d library routines.
|
||
|
||
|
||
---------
|
||
Section 4: Cool World
|
||
---------
|
||
|
||
Now that we have a nice set of routines for doing 3D graphics,
|
||
it's time to write a program which actually uses those routines to
|
||
construct a 3D world. That program is Cool World. Not only does it
|
||
demonstrate the 3d library, but it makes sure that all of the library
|
||
routines actually work!
|
||
Although the algorithms for constructing a world were discussed
|
||
way back in Section 2, it's probably a good idea to very quickly
|
||
summarize those algorithms, just as a reminder. After that, this
|
||
section will go through the major portions of the Cool World code,
|
||
and explain how (and why) things work.
|
||
|
||
In the 3d world, each object has certain properties. It has
|
||
a position in the world. It also has an orientation; it points in
|
||
some direction. It might also have some velocity, and other stuff,
|
||
but all we'll worry about in Cool World is position and orientation.
|
||
Each object is defined by a series of points which are defined
|
||
relative to the object center (the object's position). The orientation
|
||
tells how to rotate those points about the center -- to get the object
|
||
rotating in the right direction, we just apply a rotation matrix to
|
||
the object.
|
||
In Cool World, we will be the only thing moving around in
|
||
the world -- all the other objects are fixed in position, although
|
||
some of them are rotating. To figure out how objects look from our
|
||
perspective, we first figure out if the object is visible by using
|
||
the equation R'(Q-P). P is our position, and is subtracted from
|
||
the object position Q to get its relative position. R' then rotates
|
||
the world around us, to get it pointing in the direction of our
|
||
orientation vector. If the object is visible, then we compute the
|
||
full object by using the equation
|
||
|
||
R' (M X + Q-P).
|
||
|
||
X represents the points of an object. M is the rotation matrix to
|
||
get the object pointing in the right direction. The rotated points
|
||
are then added to the relative center (Q-P) and rotated according to
|
||
our orientation. The points are then projected. The job of
|
||
Cool World and the 3D library is to implement that tiny little
|
||
equation up there.
|
||
Once all the visible objects are rotated and projected,
|
||
they need to be depth-sorted to figure out what order to draw them
|
||
in. Each object is then drawn, in order, and around and around it
|
||
goes. With this in mind:
|
||
|
||
*
|
||
* Cool World (Polygonamy II: Monogonamy)
|
||
*
|
||
* Just yer basic realtime 3d virtual world program for
|
||
* the C64. This program is intended to demonstrate the
|
||
* 3d library.
|
||
*
|
||
* SLJ 8/15/97
|
||
*
|
||
|
||
The code starts with a bunch of boring setup stuff, but soon gets
|
||
to the
|
||
|
||
* Main loop
|
||
|
||
MAINLOOP
|
||
|
||
which resets a few things, swaps the buffers, and clears the new draw
|
||
buffer. The screen clearing is done in a very brain-dead way; as
|
||
you may recall, polygonamy only cleared the parts of the screen that
|
||
needed clearing. I decided this time around that this was more trouble
|
||
than it was worth, especially with the starfields hanging around.
|
||
Still, a smart screen clear might keep track of the highest and
|
||
lowest points drawn, and only clear that part of the screen.
|
||
Once the preliminaries are over with, the main part gets
|
||
going:
|
||
|
||
:C1 JSR READKEY
|
||
JSR UPD8POS ;Update position if necessary
|
||
|
||
JSR UPD8ROT ;Update second rotation matrix
|
||
|
||
JSR PLOTSTAR
|
||
|
||
JSR SORT ;Construct visible object lst
|
||
|
||
First the keyboard is scanned. If a movement key is pressed, then the
|
||
stars are updated; if we turn, then our orientation vector is
|
||
updated via calls to ACCROTX, ACCROTY, and ACCROTZ.
|
||
The orientation vector is very easy to keep track of.
|
||
We enter the world pointing straight down the z-axis, so initially
|
||
the orientation vector is V=(0,0,1). Now it gets a little trickier.
|
||
Let's say we turn to the left. We can think of this two ways:
|
||
our orientation vector rotates left, or the world rotates around
|
||
us to the right. The problem here is that these aren't the
|
||
same thing!
|
||
"What?!" you may say. The problem here is not the single
|
||
rotation, but the rotations that come after it. Imagine a
|
||
little x-y-z coordinate axis coming out of your shirt, with
|
||
the z-axis pointing straight ahead of you. When you turn in
|
||
some direction, that coordinate system needs to turn with you,
|
||
because rotations are always about those axis. In other words,
|
||
the world needs to rotate around those axis; the *inverse* of
|
||
those rotations gives the orientation vector.
|
||
What would happen if we instead rotated the orientation vector?
|
||
Imagine a pencil coming out of your chest -- that's the orientation
|
||
vector. Now turn it to the left, with you still looking at the
|
||
monitor. Once it's out at 45 degrees or so, let's say we started
|
||
spinning around the z-axis. Well that z-axis is still facing
|
||
towards the monitor, and the pen rotates about THAT axis, drawing
|
||
a circle around the monitor as it goes around (or, if you like,
|
||
making a cone-shape with the tip of the cone at your chest).
|
||
If, on the other hand, YOU turn left, and then start spinning,
|
||
you can see that something very different happens -- the monitor
|
||
now circles around the pencil.
|
||
The point is that we need to rotate the world around us.
|
||
Specifically, by using ACCROTX and friends we maintain a rotation
|
||
matrix which will rotate the world around us. But we still need
|
||
an orientation vector, to tell us what direction to move in when
|
||
we move forwards or backwards. That rotation matrix tells us
|
||
how to rotate the whole world so that it "points in our direction";
|
||
therefore, if we un-do all of those rotations, we can figure out
|
||
what direction we were originally pointing in. In other words,
|
||
the inverse rotation matrix, when applied to the original orientation
|
||
vector (0,0,1), gives the current orientation vector.
|
||
Recall that inverting a rotation matrix is really easy:
|
||
just take the transpose. And what happens when apply this new
|
||
rotation matrix to V=(0,0,1)? With a simple calculation, you can
|
||
see that M V just gives the third row of M, for any matrix M.
|
||
So, let's say that we've calculated the accumulated rotation
|
||
matrix R'. To invert it just take the transpose, giving R.
|
||
The orientation vector is thus RV: the third row of R. But
|
||
the third row of R is just the third *column* of R', since R'
|
||
is the transpose of R. Therefore, the orientation vector -- the
|
||
direction we are currently pointing in -- is simply the third
|
||
column of the accumulated rotation matrix. We don't have to
|
||
do any extra work at all to compute that orientation vector; it
|
||
just falls right out of the calculation we've already done!
|
||
So, to summarize: Keys are read in. Any rotations
|
||
will cause an update of the rotation matrix. Our orientation
|
||
vector is simply the third column of that matrix. (Note that
|
||
the same logic holds for other objects, if they happen to move).
|
||
If we move forwards or backwards, then the stars are updated and
|
||
a flag is set for UPD8POS.
|
||
|
||
JSR UPD8POS
|
||
-----------
|
||
Truth in advertising: UPD8POS updates positions. Specifically,
|
||
our position, since we are the only ones moving. The world equation
|
||
is R' (MX + Q-P), which we can write as R'MX + R'(Q-P). Q is the
|
||
position of an object; changing our position P just changes Q-P. In
|
||
Cool World there's no need to keep track of our position in the world;
|
||
we might as well just keep track of the relative positions of objects.
|
||
Instead of updating a position P when we move, Cool World just
|
||
changes the locations Q of all the objects.
|
||
Since objects never move, the quantity R'(Q-P) won't change as
|
||
long as WE don't move. UPD8POS calculates that quantity, and does so
|
||
only if we move or rotate -- if an object were to move, this quantity
|
||
would have to be recalculated for that object. Note that the calculation
|
||
is done using GLOBROT, the lib3d routine to rotate the 16-bit centers.
|
||
Later on, those rotated 16-bit centers will be added to R'MX, the rotated
|
||
object points, if an object is visible.
|
||
Cool World lets you travel at different speeds by pressing
|
||
the keys 0-4. Different speeds are easy to implement. Recall that
|
||
the rotation matrix is multiplied by 64. Grabbing the third column
|
||
of that matrix gives a vector of length 64 (rotations don't change
|
||
lengths, so multiplying a rotation matrix times (0,0,1) must
|
||
always return a vector of the same length). By changing the length
|
||
of that orientation vector -- for example, by multiplying or dividing
|
||
by two -- we can change the amount which we subtract from the object
|
||
positions (or add to our position, if you like), and hence the speed
|
||
at which we move through the world.
|
||
|
||
JSR UPD8ROT
|
||
-----------
|
||
Since some of the objects need to rotate, a second set of
|
||
rotation matrices is maintained. As far as the lib3d routines are
|
||
concerned there is only one matrix, in zero page. Therefore the
|
||
global rotation matrix needs to be stored in memory before the
|
||
new matrices are calculated.
|
||
The new rotation matrices are calculated using CALCMAT,
|
||
the routine which only needs the x, y, and z angles. In fact,
|
||
only one matrix is calculated using this routine. The other
|
||
matrices are just permutations of that one rotation matrix; for
|
||
example, the transpose (the inverse, remember) is a new rotation
|
||
matrix. We can also get new matrices by, say, a cyclic permutation
|
||
of the rows -- this is the same as permuting the coordinates
|
||
(i.e. x->y y->z z->x), which is the same as rotating the whole
|
||
coordinate system. As long as the determinant is one, the new
|
||
matrix will preserve areas.
|
||
Once the new matrix is calculated, it is stored in memory
|
||
for later use by the local rotation routines (ROTPROJ). To get
|
||
a different matrix, all we have to do is copy the saved matrix
|
||
to zero page in a different way. That is, we don't have to store
|
||
ALL of the new rotation matrices in memory; we can do any row swapping
|
||
and such on the fly, when copying from storage into zero page.
|
||
Note that by using CALCMAT we can use large angle increments,
|
||
to make the objects appear to spin faster. ACCROTX and friends can
|
||
only accumulate by a fixed angle amount.
|
||
|
||
JSR PLOTSTAR
|
||
------------
|
||
PLOTSTAR is another imaginatively named subroutine which,
|
||
get this, plots stars. The starfield routine is discussed elsewhere
|
||
in this issue.
|
||
|
||
JSR SORT
|
||
--------
|
||
Finally, this routine -- you won't believe this -- sorts the
|
||
objects according to depth. The sort is just a simple insertion
|
||
sort -- the object z-coordinates are traversed, and each object
|
||
is inserted into an object display list. The list is searched
|
||
from the top down, and elements are bumped up the list until the
|
||
correct spot for the new object is found. The object list is thus
|
||
always sorted. This kind of sort is super-easy to implement in
|
||
assembly, and "easy" is a magic word in my book.
|
||
SORT also does the checking for visibility. As always, the
|
||
easiest and stupidest approach is taken: if the object lies in
|
||
a 45-degree wedge in front of us, then it is visible. This wedge
|
||
is easiest because the boundaries are so simple: the lines z=x, z=-x,
|
||
z=y, and z=-y. If the object center lies within those boundaries --
|
||
if -x < z < x and -y < z < y -- then the object is considered to
|
||
lie in our field of vision. Since we don't want to view objects
|
||
which are a million miles away, a distance restriction is put on
|
||
the z-coordinate as well: if z>8192 then the object is too far away.
|
||
We also don't want to look at objects which are too close -- in
|
||
particular, we don't want some object points in front of us and
|
||
some behind us -- so we need a minimum distance restriction as well.
|
||
Since the largest objects have points 95 units away from the center,
|
||
checking for z>100 is pretty reasonable.
|
||
By the way -- that "distance" check only checks the z-coordinate.
|
||
The actual distance^2 is x^2 + y^2 + z^2. This leads to the "peripheral
|
||
vision" problem you may have noticed -- that an object may become
|
||
visible towards the edge of the screen, but when you turn towards it
|
||
it disappears.
|
||
|
||
So, once the centers have been rotated, the extra rotation
|
||
matrix calculated, and the object list constructed, all that is
|
||
left to do is to plot any visible objects:
|
||
|
||
:LDRAW LDY NUMVIS ;Draw objects from object list
|
||
BEQ :DONE
|
||
LDA OBLIST-1,Y ;Get object number
|
||
ASL
|
||
TAX
|
||
LDA VISTAB,X
|
||
STA :JSR+1
|
||
LDA VISTAB+1,X
|
||
STA :JSR+2
|
||
:JSR JSR DRAWOB1
|
||
DEC NUMVIS
|
||
BNE :LDRAW
|
||
:DONE
|
||
JMP MAINLOOP
|
||
|
||
VISTAB DA DRAWOB1
|
||
DA DRAWOB2
|
||
...
|
||
|
||
The object number from the object list is used as an index into
|
||
VISTAB, the table of objects, and each routine is called in order.
|
||
It turns out that in Cool World there are two kinds of objects:
|
||
those that rotate, and those that don't. What's the difference?
|
||
Recall the half of the world equation that we haven't yet
|
||
calculated: R'MX. Objects that don't rotate have no rotation
|
||
matrix M. And for objects that do rotate, I just skipped applying
|
||
the global rotation R' (these objects are just spinning more or
|
||
less randomly, so why bother). This becomes very apparent when
|
||
viewing the Cobra Mk III -- no matter what angle you look at it
|
||
from, it looks the same! A full implementation would of course
|
||
have to apply both R' and M. Anyways, with this in mind, let's
|
||
take a closer look at two representative objects: the center
|
||
tetrahedron, which doesn't rotate, and one of the stars, which
|
||
does.
|
||
|
||
*
|
||
* Object 1: A simple tetrahedron.
|
||
*
|
||
* (1,1,1) (1,-1,-1) (-1,1,-1) (-1,-1,1)
|
||
|
||
TETX DFB 65,65,-65,-65
|
||
TETY DFB 65,-65,65,-65
|
||
TETZ DFB 65,-65,-65,65
|
||
|
||
These are the points that define the object, relative to its center.
|
||
As you can see, they have length 65*sqrt(3) = 112. This is within
|
||
the limitation discussed earlier, that all vectors must have length
|
||
less than 128. Other objects -- for example, the cubes -- have coordinates
|
||
like (55,55,55), giving length 55*sqrt(3) = 95.
|
||
|
||
* Point list
|
||
|
||
FACE1 DFB 0,1,2,0
|
||
FACE2 DFB 3,2,1,3
|
||
FACE3 DFB 3,0,2,3
|
||
FACE4 DFB 3,1,0,3
|
||
|
||
A tetrahedron has four faces, and the above will tell POLYFILL how
|
||
to connect the dots. They go around the faces in counter-clockwise
|
||
order. Note that the first and last point are the same; this is
|
||
necessary because of the way POLYFILL works. When drawing from
|
||
the point list, POLYFILL only considers points to the left or
|
||
to the right. When it gets to one end of the list, it jumps
|
||
to the other end of the list, and so always has a point to the
|
||
left or to the right of the current spot in the point list.
|
||
(Bottom line: make sure the first point in the list is also the
|
||
last point).
|
||
|
||
TETCX DA 0 ;Center at center of screen
|
||
TETCY DA 00
|
||
TETCZ DA INITOFF ;and back a little ways.
|
||
|
||
|
||
OB1POS DFB 00 ;Position in the point list
|
||
OB1CEN DFB 00 ;Position of center
|
||
|
||
TETCX etc. are where the object starts out in the world. OB1POS is
|
||
a leftover that isn't used. OB1CEN is the important variable here.
|
||
The whole 3d library is structured around lists. All of the object
|
||
centers are stored in a single list, and are first put there by an
|
||
initialization routine. This list of object centers is then used
|
||
for everything -- for rotations, for visibility checks, etc.
|
||
OB1CEN tells where this object's center is located in the object
|
||
list.
|
||
This next part is the initialization routine, called when
|
||
the program is first run:
|
||
|
||
*
|
||
* ]1, ]2, ]3 = object cx,cy,cz
|
||
*
|
||
INITOB MAC
|
||
LDA ]1
|
||
STA C0X,X
|
||
LDA ]1+1
|
||
STA C0X+256,X
|
||
LDA ]2
|
||
STA C0Y,X
|
||
LDA ]2+1
|
||
STA C0Y+256,X
|
||
LDA ]3
|
||
STA C0Z,X
|
||
LDA ]3+1
|
||
STA C0Z+256,X
|
||
<<<
|
||
|
||
INITOB1 ;Install center into center list
|
||
LDX NUMOBJS
|
||
STX OB1CEN
|
||
>>> INITOB,TETCX;TETCY;TETCZ
|
||
INC NUMOBJS
|
||
RTS
|
||
|
||
As you can see, all it does is copy the initial starting point,
|
||
TETCX, TETCY, TETCZ, into the object list, and transfer
|
||
NUMOBJS, the number of objects in the center list, into OB1CEN.
|
||
It then increments the number of objects. In this way objects
|
||
can be inserted into the list in any order, and a more or less
|
||
arbitrary number of objects may be inserted. (Of course,
|
||
REMOVING an object from the list can be a little trickier).
|
||
Now we move on to the main routine:
|
||
|
||
*
|
||
* Rotate and draw the first object.
|
||
*
|
||
* PLISTXLO etc. already point to correct offset.
|
||
|
||
* ]1,]2,]3 = object X,Y,Z point lists
|
||
SETPOINT MAC
|
||
LDA #<]1 ;Set point pointers
|
||
STA P0X
|
||
LDA #>]1
|
||
STA P0X+1
|
||
LDA #<]2
|
||
STA P0Y
|
||
LDA #>]2
|
||
STA P0Y+1
|
||
LDA #<]3
|
||
STA P0Z
|
||
LDA #>]3
|
||
STA P0Z+1
|
||
<<<
|
||
|
||
SETPOINT simply sets up the point list pointers P0X P0Y and P0Z
|
||
for the lib3d routines. It is a macro because all objects need
|
||
to tell the routines where their actual x, y, and z point coordinates
|
||
are located, so this little chunk of code is used quite a lot.
|
||
The routine called by the main loop now begins:
|
||
|
||
DRAWOB1
|
||
LDX OB1CEN ;Center index
|
||
ROTTET ;Alternative entry point
|
||
|
||
>>> SETPOINT,TETX;TETY;TETZ
|
||
|
||
LDY #4 ;Four points to rotate
|
||
SEC ;rotate and project
|
||
JSR ROTPROJ
|
||
|
||
The second entry point, ROTTET, is used by other objects which
|
||
are tetrahedrons but not object #1 (in Cool World there is a line
|
||
of tetrahedrons, behind our starting position). This of course
|
||
means there won't be a lot of duplicated code.
|
||
Next the points need to be rotated and projected. SETPOINT
|
||
sets up the list pointers for ROTPROJ, .Y is set to the number
|
||
of object points in those lists, and C is set to tell ROTPROJ to
|
||
rotate and project. If we were just rotating (C=clear, i.e. to
|
||
calculate MX), we would have to set up some extra pointers to tell
|
||
ROTPROJ where to store the rotated points; those rotated points can
|
||
then be rotated and projected, as above. Note that the rotation
|
||
matrix is assumed to already be set up in zero page. Once the points
|
||
have been rotated and projected, all that remains is to...
|
||
|
||
* Draw the object
|
||
|
||
SETFILL MAC
|
||
;]1 = number of points
|
||
;]2 = Face point list
|
||
;]3 = pattern
|
||
LDY #]1
|
||
L1 LDA ]2,Y
|
||
STA PQ,Y
|
||
DEY
|
||
BPL L1
|
||
LDA #<]3
|
||
STA FILLPAT
|
||
LDA #>]3
|
||
STA FILLPAT+1
|
||
LDX #]1
|
||
<<<
|
||
|
||
Again we have another chunk of code which is used over and over and
|
||
over. SETFILL just sets stuff up for POLYFILL, the polygon drawing
|
||
routine. It first copies points into the point queue; specifically,
|
||
it copies from lists like FACE1, above, into PQ, the point queue
|
||
used by POLYFILL. It then sets FILLPAT, the fill-pattern pointer, to
|
||
the 8 bytes of data describing the fill pattern. Finally it loads
|
||
.X with the number of points in the point queue, for use by
|
||
POLYFILL. All that remains is to call POLYFILL -- all the other
|
||
pointers and such are set up, and POLYFILL does the hidden face
|
||
check for us.
|
||
|
||
DRAWTET ;Yet another entry point
|
||
|
||
>>> SETFILL,3;FACE1;DITHER1
|
||
JSR POLYFILL
|
||
|
||
>>> SETFILL,3;FACE2;ZIGZAG
|
||
JSR POLYFILL
|
||
|
||
>>> SETFILL,3;FACE3;CROSSSM
|
||
JSR POLYFILL
|
||
|
||
>>> SETFILL,3;FACE4;SOLID
|
||
JMP POLYFILL
|
||
|
||
And that's the whole routine! DITHER1, ZIGZAG etc. are just little
|
||
tables of data that I deleted out, containing fill patterns. For
|
||
example, SOLID looks like
|
||
|
||
SOLID HEX FFFFFFFFFFFFFFFF
|
||
|
||
i.e. eight bytes of solid color.
|
||
The next object, a star, is more involved. Not only is
|
||
it rotating, but it is also a concave object, and requires clipping.
|
||
That is, parts of this object can cover up other parts of the
|
||
object. We will therefore have to draw the object in just the
|
||
right way, so that parts which are behind other parts will be
|
||
drawn first. Compare with the way in which we depth-sorted the
|
||
entire object list, to make sure far-off objects are drawn before
|
||
nearby objects. When applied to polygons, it is called polygon
|
||
clipping.
|
||
Incidentally, I *didn't* do any clipping on the Cobra Mk III.
|
||
Sometimes you will see what appear to be glitches as the ship rotates.
|
||
This is actually the lack of clipping -- polygons which should be
|
||
behind other polygons are being drawn in front of those polygons.
|
||
There are two types of stars in Cool World, but each is
|
||
constructed in a similar way. The larger stars are done by
|
||
starting with a cube, and then adding an extra point above the
|
||
center of each face. That is, imagine that you grab the center
|
||
of each face and pull outwards -- those little extrusions are
|
||
form the tines of the star. The smaller stars are similarly
|
||
constructed, starting from a tetrahedron. The large stars therefore
|
||
have six tines, and the smaller stars have four.
|
||
Clipping these objects is really pretty easy, because
|
||
each of those tines is a convex object. All we have to do is
|
||
depth-sort the *tips* of each of the tines, and then draw the
|
||
tines in the appropiate order.
|
||
Recall that in the discussion of objects, in Section 2,
|
||
it was pointed out that any concave object may be split up into
|
||
a group of convex objects. That is basically what we are doing
|
||
here.
|
||
|
||
*
|
||
* All-Stars: A bunch of the cool star things.
|
||
*
|
||
|
||
STOFF EQU 530 ;Offset unit
|
||
|
||
STCX EQU -8000 ;Center
|
||
STCY EQU 1200
|
||
STCZ EQU -400+INITOFF
|
||
|
||
STAR1CX DA STCX
|
||
STAR1CY DA STCY
|
||
STAR1CZ DA STCZ
|
||
|
||
OB7CEN DFB 00
|
||
|
||
STAR1X DFB 25,-25,25,-25,50,50,-50,-50
|
||
STAR1Y DFB -25,-25,25,25,-50,50,50,-50
|
||
STAR1Z DFB 25,-25,-25,25,-50,50,-50,50
|
||
|
||
S1T1F1 DFB 4,2,0,4 ;Star 1, Tine 1, face 1
|
||
S1T1F2 DFB 4,1,2,4
|
||
S1T1F3 DFB 4,0,1,4
|
||
|
||
S1T2F1 DFB 5,0,2,5
|
||
S1T2F2 DFB 5,3,0,5
|
||
S1T2F3 DFB 5,2,3,5
|
||
|
||
S1T3F1 DFB 6,2,1,6
|
||
S1T3F2 DFB 6,3,2,6
|
||
S1T3F3 DFB 6,1,3,6
|
||
|
||
S1T4F1 DFB 7,1,0,7
|
||
S1T4F2 DFB 7,0,3,7
|
||
S1T4F3 DFB 7,3,1,7
|
||
|
||
INITOB7
|
||
LDX NUMOBJS
|
||
STX OB7CEN
|
||
>>> INITOB,STAR1CX;STAR1CY;STAR1CZ
|
||
INC NUMOBJS
|
||
RTS
|
||
|
||
As before, the above stuff defines the position, points, and
|
||
faces of the object, and INITOB7 inserts it into the center list.
|
||
The main routine then proceeds similarly:
|
||
|
||
DRAWOB7
|
||
LDX OB7CEN ;Center index
|
||
|
||
ROTSTAR1 JSR RFETCH ;Use secondary rotation matrix
|
||
|
||
SETSTAR1 >>> SETPOINT,STAR1X;STAR1Y;STAR1Z
|
||
|
||
LDY #8 ;Eight points to rotate
|
||
SEC ;rotate and project
|
||
JSR ROTPROJ
|
||
|
||
Note the JSR RFETCH above. RFETCH retrieves one of the rotation
|
||
matrices from memory, and copies it into zero page for use by ROTPROJ.
|
||
Again, different rotation matrices may be constructed by performing
|
||
this copy in slightly different ways. The tines must then be
|
||
sorted, (and the accumulation matrix is restored to zero page),
|
||
and once sorted the tines are drawn in order. The code below
|
||
just goes through the sorted tine list, and calls :TINE1 through
|
||
:TINE4 based on the value in the list. The sorting routine will
|
||
be discussed shortly.
|
||
|
||
JSR ST1SORT ;Sort the tines
|
||
JSR IFETCH ;Restore accumulation matrix
|
||
|
||
* Draw the object. In order to handle overlaps correctly,
|
||
* the tines must first be depth-sorted, above.
|
||
DRAWST1
|
||
LDX #03
|
||
ST1LOOP STX TEMP
|
||
LDY T1LIST,X ;Sorted tine list
|
||
BEQ :TINE1
|
||
DEY
|
||
BNE :C1
|
||
JMP :TINE2
|
||
:C1 DEY
|
||
BNE :TINE4
|
||
JMP :TINE3
|
||
:TINE4
|
||
>>> SETFILL,3;S1T4F1;SOLID
|
||
JSR POLYFILL
|
||
>>> SETFILL,3;S1T4F2;DITHER1
|
||
JSR POLYFILL
|
||
>>> SETFILL,3;S1T4F3;DITHER2
|
||
JSR POLYFILL
|
||
:NEXT LDX TEMP
|
||
DEX
|
||
BPL ST1LOOP
|
||
RTS
|
||
|
||
:TINE1 ...draw tine 1 in a similar way
|
||
|
||
:TINE2 >>> SETFILL,3;S1T2F1;ZIGS
|
||
...
|
||
:TINE3 >>> SETFILL,3;S1T3F1;BRICK
|
||
...
|
||
|
||
|
||
T1LIST DS 6 ;Tine sort list
|
||
T1Z DS 6
|
||
|
||
Now we get to the sorting routine. It depth-sorts the z-coordinates,
|
||
so it actually needs the z-coordinates. Too bad we rotated and
|
||
projected, and no longer have the rotated z-coordinates! So we
|
||
have to actually calculate those guys.
|
||
Now, if you remember how rotations work, you know that
|
||
the z-coordinate is given by the third row of the rotation matrix
|
||
times the point being rotated -- in this case, those points are the
|
||
tips of the individual tines. Well, we know those points are
|
||
at (1,-1,-1), (1,1,1), (-1,1,-1), and (-1,-1,1). Actually, they
|
||
are in the same *direction* as those points/vectors, but have
|
||
a different length (i.e. STAR1X etc. defines a point like 50,-50,-50).
|
||
The lengths don't matter though -- whether a star is big or
|
||
small, the tines are still ordered in the same way. And order is
|
||
all we care about here -- which tines are behind which.
|
||
So, calculating the third row times points like (1,-1,-1)
|
||
is really easy. If that row has elements (M6 M7 M8), then the
|
||
row times the vector just gives M6 - M7 - M8. So all it takes is
|
||
an LDA and two SBC's to calculate the effective z-coordinate.
|
||
Just for convenience I added 128 to the result, to make everything
|
||
positive.
|
||
After calculating these z-coordinates, they are inserted into
|
||
a little list. That list is then sorted. Yeah, an insertion sort
|
||
would have been best here, I think. Instead, I used an ugly,
|
||
unrolled bubble-like sort.
|
||
By the way, the cubic-stars are even easier to sort, since
|
||
their tines have coordinates like (1,0,0), (0,1,0) etc. Calculating
|
||
those z-coordinates requires an LDA, and nothing else!
|
||
|
||
*
|
||
* Sort the tines. All that matters is the z-coord,
|
||
* thus we simply dot the third row of the matrix
|
||
* with the tine endpoint, add 128 for convenience,
|
||
* and figure out where stuff goes in the list.
|
||
*
|
||
ST1SORT ;Sort the tines
|
||
LDX #00
|
||
LDA MATRIX+6 ;Tine 1: 1,-1,-1
|
||
SEC
|
||
SBC MATRIX+7
|
||
SEC
|
||
SBC MATRIX+8
|
||
EOR #$80 ;Add 128
|
||
STA T1Z ;z-coord
|
||
STX T1LIST
|
||
|
||
LDA MATRIX+6 ;Tine 2: 1,1,1
|
||
CLC
|
||
ADC MATRIX+7
|
||
CLC
|
||
ADC MATRIX+8
|
||
EOR #$80
|
||
STA T1Z+1
|
||
INX
|
||
STX T1LIST+1
|
||
|
||
LDA MATRIX+7 ;Tine 3: -1,1,-1
|
||
SEC
|
||
SBC MATRIX+6
|
||
SEC
|
||
SBC MATRIX+8
|
||
EOR #$80
|
||
STA T1Z+2
|
||
INX
|
||
STX T1LIST+2
|
||
|
||
LDA MATRIX+8 ;Tine 4: -1,-1,1
|
||
SEC
|
||
SBC MATRIX+7
|
||
SEC
|
||
SBC MATRIX+6
|
||
EOR #$80
|
||
STA T1Z+3
|
||
INX
|
||
STX T1LIST+3
|
||
|
||
CMP T1Z+2 ;Now bubble-sort the list
|
||
BCS :C1 ;largest values on top
|
||
DEX ;So find the largest value!
|
||
|
||
...urolled code removed for sensitive viewers.
|
||
|
||
And that's all it takes!
|
||
And that, friends, covers the main details of Cool World.
|
||
|
||
Can it possibly be true that this article is finally coming
|
||
to an end?
|
||
|
||
|
||
---------
|
||
Section 5: 3d library routines and memory map
|
||
---------
|
||
|
||
There are seven routines:
|
||
|
||
$8A00 CALCMAT - Calculate a rotation matrix
|
||
$8A03 ACCROTX - Add a rotation to rotation matrix around x-axis
|
||
$8A06 ACCROTY - ... y-axis
|
||
$8A09 ACCROTZ - ... z-axis
|
||
$8A0C GLOBROT - 16-bit rotation for centers
|
||
$8A0F ROTPROJ - Rotate/Rotate and project objects
|
||
$8A12 POLYFILL - Draw a pattern-filled polygon
|
||
|
||
Following a discussion of the routines there is a memory map and discussion
|
||
of the various locations.
|
||
|
||
Library Routines
|
||
----------------
|
||
|
||
$8A00 CALCMAT Calculate a rotation matrix
|
||
|
||
Stuff to set up: Nothing.
|
||
|
||
On entry: .X .Y .A = theta_x theta_y theta_z
|
||
On exit: Rotation matrix is contained in $B1-$B9
|
||
Cycle count: 390 cycles, worst case.
|
||
|
||
|
||
$8A03 ACCROTX This will "accumulate" a rotation matrix by one rotation
|
||
around the x-axis by the angle delta=2*pi/128. Because
|
||
this is such a small angle the fractional parts must also
|
||
be remembered, in $4A-$52. These routines are necessary
|
||
to do full 3d rotations.
|
||
|
||
On entry: carry clear/set to indicate positive/negative rotations.
|
||
On exit: Updated matrix in $B1-$B9, $4A-$52
|
||
Cycle count: Somewhere around 150, I think.
|
||
|
||
$8A06 ACCROTY Similar around y-axis
|
||
|
||
$8A09 ACCROTZ Similar around z-axis
|
||
|
||
|
||
$8A0C GLOBROT Perform a global rotation of 16-bit signed coordinates
|
||
(Rotate centers)
|
||
|
||
Stuff to set up:
|
||
MULTLO1 MULTLO2 MULTHI1 MULTHI2 = $F7-$FE
|
||
Multiplication tables
|
||
C0XLO, C0XHI, C0YLO, C0YHI, C0ZLO, C0ZHI = $63-$6E
|
||
Pointers to points to be rotated. Note also that $63-$6E
|
||
has a habit of getting hosed by the other routines.
|
||
CXLO, CXHI, CYLO, CYHI, CZLO, CZHI = $A3-$AE
|
||
Pointers to where rotated points will be stored.
|
||
|
||
On entry: .Y = number of points to rotate (0..Y-1).
|
||
On exit: Rotated points are stored in CXLO CXHI etc.
|
||
|
||
|
||
$8A0F ROTPROJ Rotate and project object points (vertices). Points must
|
||
be in range -96..95, and must be with 128 units of the
|
||
object center. Upon rotation, they are added to
|
||
the object center, projected if C is set, and stored
|
||
in the point list.
|
||
|
||
The _length_ of a vertex vector (sqrt(x^2 + y^2 + z^2)) must
|
||
be less than 128.
|
||
|
||
Stuff to set up:
|
||
Rotation matrix = $B1-$B9 (Call CALCMAT)
|
||
ROTMATH = $B0
|
||
P0X P0Y P0Z = $69-$6E
|
||
Pointers to points to be rotated. As with GLOBROT, these
|
||
pointers will get clobbered by other routines. Points are
|
||
8-bit signed numbers in range -96..95, and must have
|
||
length less than 128.
|
||
PLISTXLO PLISTXHI ... = $BD-$C4
|
||
Where to store the rotated+projected points.
|
||
CXLO CXHI ... = $A3-$AE
|
||
List of object centers. Center will be added to rotated point
|
||
before projection.
|
||
XOFFSET, YOFFSET = $53, $54
|
||
Location of origin on the screen. Added to projected points
|
||
before storing in PLIST. 160,100 gives center of screen.
|
||
|
||
On entry: .X = Object center index (center is at CXLO,X CXHI,X etc.)
|
||
.Y = Number of points to rotate (0..Y-1).
|
||
.C = set for rotate and project, clear for just rotate
|
||
On exit: Rotated, possibly projected points in PLIST.
|
||
|
||
|
||
$8A12 POLYFILL Draw pattern-filled polygon into bitmap.
|
||
|
||
Stuff to set up:
|
||
MULTLO1, MULTLO2, MULTHI1, MULTHI2 = $F7-$FE
|
||
BITMAP = $FF
|
||
FILLPAT = $BB-$BC
|
||
PLISTXLO, PLISTXHI, PLISTYLO, PLISTYHI = $BD-$C4
|
||
Point queue = $0200. Entries are _indices_ into the PLISTs,
|
||
must be ordered _counter clockwise_, and must _close_ on
|
||
itself (example: 4 1 2 4).
|
||
|
||
On entry: .X = Number of points in point queue (LDX #3 in above
|
||
example)
|
||
On exit: Gorgeous looking polygon in bitmap at BITMAP.
|
||
|
||
|
||
Memory map
|
||
----------
|
||
|
||
Zero page:
|
||
|
||
$4A-$52 ACCREM Fractional part of accumulating matrix
|
||
|
||
$53 XOFFSET Location of origin on screen (e.g. 160,100)
|
||
$54 YOFFSET
|
||
|
||
$55-$72 Intermediate variables. These locations are regularly hosed
|
||
by the routines. Feel free to use them, but keep in mind that
|
||
you will lose them!
|
||
|
||
$A3-$AE C0XLO, C0XHI, C0YLO, ...
|
||
Pointers to rotated object centers, i.e. where the objects are
|
||
relative to you. Centers are 16-bit signed (2's complement).
|
||
|
||
$AF-$B0 ROTMATH
|
||
Pointer to the multiplication table at $C200-$C3FF.
|
||
|
||
$B1-$B9 Rotation matrix.
|
||
|
||
$BB-$BC FILLPAT
|
||
Pointer to fill pattern (eight bytes, 8x8 character)
|
||
|
||
$BD-$C0 PLISTXLO, PLISTXHI
|
||
$C1-$C4 PLISTYLO, PLISTYHI
|
||
Pointers to rotated object points, e.g. used by polygon renderer.
|
||
Numbers are 16-bit 2's complement.
|
||
|
||
$F7-$FE MULTLO1, MULTLO2, MULTHI1, MULTHI2
|
||
Pointers to math tables at $C400-$C9FF
|
||
MULTLO1 -> $C500
|
||
MULTLO2 -> $C400
|
||
MULTHI1 -> $C800
|
||
MULTHI2 -> $C700
|
||
(Only the high bytes of the pointers are actually relevant).
|
||
|
||
$FF BITMAP
|
||
High byte of bitmap base address.
|
||
|
||
|
||
Library:
|
||
|
||
$0200-$027F Point queue.
|
||
|
||
$8600-$9FFF Routines, tables, etc.
|
||
$8A00 Jump table into routines
|
||
|
||
|
||
Tables:
|
||
|
||
$8400-$85FF ROTMATH, pointed to by $AF-$B0.
|
||
|
||
$C000-$C2FF MULTLO, pointed to by $F7-$F8 and $F9-$FA
|
||
$C300-$C5FF MULTHI, pointed to by $FB-$FC and $FD-$FE
|
||
$C600-$C6FF CDEL, table of f(x)=x*cos(delta)
|
||
$C700-$C7FF CDELREM remainder
|
||
$C800-$C8FF SDEL x*sin(delta)
|
||
$C900-$C9FF SDELREM
|
||
$CA00-$CAFF PROJTAB, projection table, f(x)=d/x, integer part
|
||
$CB00-$CBFF PROJREM, projection table, 256*remainder
|
||
$CC00-$CCFF LOG, logarithm table
|
||
$CD00-$CDFF EXP, exponential table
|
||
$CE00-$CEFF NEGEXP, exponential table
|
||
$CF00-$CFFF TRIG, table of 32*sin(2*pi*x/128) -- 160 bytes.
|
||
|
||
ROTMATH is the Special 8-bit signed multiplication table for ROTPROJ.
|
||
MULTLO and MULTHI are the usual fast-multiply tables. CDEL and SDEL
|
||
are used by ACCROTX, to accumulate matrices. PROJTAB is used
|
||
by ROTPROJ, to project the 16-bit coordinates. LOG EXP and NEGEXP
|
||
are used by the POLYFILL divide routine, to calculate line slopes.
|
||
Finally, TRIG is used by CALCMAT to calculate rotation matrices.
|
||
|
||
The tables from $C600-$CFFF are fixed and must not be moved. The
|
||
tables ROTMATH, MULTLO, and MULTHI on the other hand are only
|
||
accessed indirectly, and thus may be relocated. Thus there is
|
||
room for a color map and eight sprite definitions in any VIC bank.
|
||
|
||
|
||
----------
|
||
Section 6: Conclusions
|
||
----------
|
||
|
||
Wow. You really made it this far. I am totally impressed.
|
||
What stamina.
|
||
|
||
Well I hope you have been able to gain something from this
|
||
article. It is my sincere hope that this will stimulate the
|
||
development of some nice 3d programs. After a long, multi-year
|
||
break from 3d graphics I really enjoyed figuring out all this
|
||
stuff again, and finally doing the job right. I also enjoyed
|
||
re-re-figuring it out, to write this article, since over half a
|
||
year has passed since I wrote the 3d library! I hope you have
|
||
enjoyed it too, and can use this knowledge (or improve on it!)
|
||
to do some really nifty stuff.
|
||
|
||
SLJ 4/3/98
|
||
|
||
---------
|
||
Section 7: Binaries
|
||
---------
|
||
|
||
There are three files below. The first is the 3d library.
|
||
The second is the tables from $C000-$CFFF. The third is the ROTMATH
|
||
table ($8400-$85FF above).
|
||
|
||
::::::::: lib3d.o :::::::::
|
||
|
||
begin 644 lib3d.o
|
||
M`(8```````````$!`0$!`0$!`@("`@("`@(#`P,#`P,#`P0$!`0$!`0$!04%
|
||
M!04%!04&!@8&!@8&!@<'!P<'!P<'"`@("`@("`@)"0D)"0D)"0H*"@H*"@H*
|
||
M"PL+"PL+"PL,#`P,#`P,#`T-#0T-#0T-#@X.#@X.#@X/#P\/#P\/#Q`0$!`0
|
||
M$!`0$1$1$1$1$1$2$A(2$A(2$A,3$Q,3$Q,3%!04%!04%!05%145%145%186
|
||
M%A86%A86%Q<7%Q<7%Q<8&!@8&!@8&!D9&1D9&1D9&AH:&AH:&AH;&QL;&QL;
|
||
M&QP<'!P<'!P<'1T='1T='1T>'AX>'AX>'A\?'Q\?'Q\?`(#`X/#X_/X`@,#@
|
||
M\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`
|
||
M@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X
|
||
M_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`
|
||
MX/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^
|
||
M`(#`X/#X_/X`@,#@\/C\_@"`P.#P^/S^`(#`X/#X_/X`@,#@\/C\_@"`P.#P
|
||
M^/S^`(#`X/#X_/X`@,#@\/C\_G\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_
|
||
M'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!
|
||
M`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/
|
||
M!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_
|
||
M/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#
|
||
M`0!_/Q\/!P,!`'\_'P\'`P$`?S\?#P<#`0!_/Q\/!P,!`'\_'P\'`P$`?S\?
|
||
M#P<#`0"@`)%MH`B1;:`0D6V@&)%MH""1;:`HD6V@,)%MH#B1;:!`D6V@2)%M
|
||
MH%"1;:!8D6V@8)%MH&B1;:!PD6V@>)%MH("1;:"(D6V@D)%MH)B1;:"@D6V@
|
||
MJ)%MH+"1;:"XD6V@P)%MH,B1;:#0D6V@V)%MH."1;:#HD6V@\)%MH/B1;>9N
|
||
MH`"1;:`(D6V@$)%MH!B1;:`@D6V@*)%MH#"1;:`XD6U@3`")`$"`P`!`@,``
|
||
M0(#``$"`P`!`@,``0(#````!`@,%!@<("@L,#0\0$1(4%187&1H;'!X`"!`8
|
||
M("@P.$!(4%A@:'!X@(B0F*"HL+C`R-#8X.CP^``($!@@*#`X3#*:3"J;3%:;
|
||
M3(*;3`V<3*N=3&:*``0(#!`4&!P@)"@L,#0X/$!$2$Q05%A<8&1H;'!T>'R`
|
||
MAHJ.DI::GJ("!@H.$A8:'B(F*BXR-CH^0D9*3E)66EYB9FIN<G9Z?H2(C)"4
|
||
MF)R@AEB&6:D`A56%5H5:O``"L<'%5;'#,`[E5I`*AEJQP855L<.%5LK0Y*9:
|
||
MT`%@AEN\``*QO85DA6>QOX5EA6BE5496:D96:D96:H5KJJ55*0>%:AVFB85M
|
||
MO;^)&&7_A6Z%;Z9:QED0"^9>I5G)_["]8*98O``"A%6QP<HP]+P``H16./'!
|
||
M\-V%7(5>A7*&6B#WF##<I%6QO85DI%8X\;V%<*15L;^%9:16\;\0`TSKBX5Q
|
||
M(*Z9(`"9(("+I%:QO:15./&]A7"D5K&_I%7QOQ`#3(*,A7&QO85GL;^%:""N
|
||
MF2!9F856I60XY6>%<*5EY6B%<:5P..5@L`/&<3CE8H5PI7'I`#`%!7#P`6"D
|
||
M:\`9L`6E5DQ?CJ@83)V5\"+)`I`6"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`
|
||
ML.FI(.9N?0"&A6RHO0"'A5>Y%HJ-I(FF6\99$`FE6<G^L"=@H@#D6+#ZO``"
|
||
MA%6QP>@XO``"A%;QP?#=A5V%7X5RAEL@]Y@PVF!H:&"D5K&]I%4X\;V%<*16
|
||
ML;^D5?&_$`-,^8J%<;&]A62QOX5E(*Z9("F9(("+I%:QO:15./&]A7"D5K&_
|
||
MI%7QOQ`#3.J,A7&QO85GL;^%:""NF2!9F856I60XY6>%<*5EY6B%<:5P&&5@
|
||
MD`+F<3CE8H5PI7'I`#`0!7#0"Z5A..5CI6#E8I`!8*1KP!FP!:563$60I588
|
||
M3+*6I%6QO85GI%8X\;V%<*15L;^%:*16\;\0`TSYBH5Q(*Z9((&9A5:E9#CE
|
||
M9X5PI67E:(5QI7`XY6"P`L9Q&&5BA7"E<6D`,!`%<-`+I6,XY6&E8N5@D`%@
|
||
MI&O`&;`%I59,^Y.E5AA,SYBD5;&]A6>D5CCQO85PI%6QOX5HI%;QOQ`#3.J,
|
||
MA7$@KID@@9F%5J5D..5GA7"E9>5HA7&E<!AE8)`#YG$896*E<6D`,`%@I&O`
|
||
M&;`%I59,&Y*E5AA,PI>F6L99$`OF7J59R?^P2V"F6+P``H15L<'*,/2\``*$
|
||
M5CCQP?#=A5R%7H5RAEH@]Y@PW*15L;V%9*16./&]A7"D5;&_A66D5O&_$`-,
|
||
M2H^%<2"NF2``F4RTC6#&7O"BI68XY6&%9J5DY6"%9*JP`L9EI648\"+)`I`6
|
||
M"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`L.FI(.9N?0"&A6RHO0"'A5>Y%HJ-
|
||
MI(G&7]!=IEO&61`)I5G)_K"A8*(`Y%BP^KP``H15L<'H.+P``H16\<'PW85=
|
||
MA5^%<H9;(/>8,-JD5K&]I%4X\;V%<*16L;^D5?&_$`-,OY.%<;&]A6>QOX5H
|
||
M(*Z9(%F93%^.I6D896.%::5G96*JA6>0`^9H&*5H\$W)`I!#"K`:IFPP%J1J
|
||
ML;N%5;S8B5%M)5=%59%MI54@HXE,[8ZD;#!DH@"]`(A%5X57I&JQNX55IFR\
|
||
MV(E1;2571561;4SMCN!`L+RI('T`AL5LD-#PU+P`B(15IFR%;*1JL;N%5KS8
|
||
MB5%M)5=%5I%MI&R^/HJI8)T`B:56(*.)46TE5456D6VID9T`B:5OA6[&;<9J
|
||
M,`-,FXVI!X5JI&OP%8B$:QFFB85MN;^)&&7_A6Z%;TR;C6"F6L99$`OF7J59
|
||
MR?^P4V"F6+P``H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*16L;VD
|
||
M53CQO85PI%:QOZ15\;\0`TQSC85QL;V%9+&_A64@KID@*9E,CH_&7O"?I688
|
||
M96&%9J5D96"JA620`^9E&*5E\"+)`I`6"K`&J?^%;-`FJ0"%;(57J02-I(G0
|
||
M&>!`L.FI(.9N?0"&A6RHO0"'A5>Y%HJ-I(G&7]!IIEO&61`)I5G)_K!18*(`
|
||
MY%BP^KP``H15L<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5K&]I%4X\;V%
|
||
M<*16L;^D5?&_$`-,WY&%<;&]A6>QOX5H(*Z9(%F93$60O``"L;VJL;\83$60
|
||
MI6D896.%::5G96*JA6>0`^9H&*5H\$W)`I!#"K`:IFPP%J1JL;N%5;S8B5%M
|
||
M)5=%59%MI54@HXE,TY"D;#!DH@"]`(A%5X57I&JQNX55IFR\V(E1;2571561
|
||
M;4S3D.!`L+RI('T`AL5LD-#PU+P`B(15IFR%;*1JL;N%5KS8B5%M)5=%5I%M
|
||
MI&R^/HJI8)T`B:56(*.)46TE5456D6VID9T`B:5OA6[&;<9J,`-,=8^I!X5J
|
||
MI&OP%8B$:QFFB85MN;^)&&7_A6Z%;TQUCV"F6L99$`OF7J59R?^P4V"F6+P`
|
||
M`H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*16L;VD53CQO85PI%:Q
|
||
MOZ15\;\0`TP4DX5QL;V%9+&_A64@KID@*9E,=)'&7O"?I68896&%9J5D96"J
|
||
MA620`^9E&*5E\"+)`I`6"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`L.FI(.9N
|
||
M?0"&A6RHO0"'A5>Y%HJ-I(G&7]!9IEO&61`)I5G)_K!-8*(`Y%BP^KP``H15
|
||
ML<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5;&]A6>D5CCQO85PI%6QOX5H
|
||
MI%;QOQ`#3/F/A7$@KID@@9E,&Y*E:3CE8X5II6?E8H5GJK`"QFBE:!CP3<D"
|
||
MD$,*L!JF;#`6I&JQNX55O-B)46TE5T55D6VE52"CB4RIDJ1L,&2B`+T`B$57
|
||
MA5>D:K&[A56F;+S8B5%M)5=%59%M3*F2X$"PO*D@?0"&Q6R0T/#4O`"(A%6F
|
||
M;(5LI&JQNX56O-B)46TE5T56D6VD;+X^BJE@G0")I58@HXE1;25515:1;:F1
|
||
MG0")I6^%;L9MQFHP`TQ;D:D'A6JD:_`5B(1K&::)A6VYOXD89?^%;H5O3%N1
|
||
M8.9?O``"L;VJL;\83%23IEK&61`+YEZE6<G_L.1@IEB\``*$5;'!RC#TO``"
|
||
MA%8X\<'PW85<A5Z%<H9:(/>8,-RD5;&]A62D5CCQO85PI%6QOX5EI%;QOQ`#
|
||
M3#"1A7$@KID@`)E,5)/&7O"CI68XY6&%9J5DY6"%9*JP`L9EI648\"+)`I`6
|
||
M"K`&J?^%;-`FJ0"%;(57J02-I(G0&>!`L.FI(.9N?0"&A6RHO0"'A5>Y%HJ-
|
||
MI(G&7]!9IEO&61`)I5G)_K!-8*(`Y%BP^KP``H15L<'H.+P``H16\<'PW85=
|
||
MA5^%<H9;(/>8,-JD5;&]A6>D5CCQO85PI%6QOX5HI%;QOQ`#3!^.A7$@KID@
|
||
M@9E,^Y.E:3CE8X5II6?E8H5GJK`"QFBE:!CP3<D"D$,*L!JF;#`6I&JQNX55
|
||
MO-B)46TE5T55D6VE52"CB4R)E*1L,&2B`+T`B$57A5>D:K&[A56F;+S8B5%M
|
||
M)5=%59%M3(F4X$"PO*D@?0"&Q6R0T/#4O`"(A%6F;(5LI&JQNX56O-B)46TE
|
||
M5T56D6VD;+X^BJE@G0")I58@HXE1;25515:1;:F1G0")I6^%;L9MQFHP`TP[
|
||
MDZD'A6JD:_`5B(1K&::)A6VYOXD89?^%;H5O3#N38,9>T%JF6L99$`OF7J59
|
||
MR?^P2V"F6+P``H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*15L;V%
|
||
M9*16./&]A7"D5;&_A66D5O&_$`-,^Y6%<2"NF2``F4PGE6"E9CCE885FI63E
|
||
M8(5DJK`"QF6E91C&7]!=IEO&61`)I5G)_K#:8*(`Y%BP^KP``H15L<'H.+P`
|
||
M`H16\<'PW85=A5^%<H9;(/>8,-JD5K&]I%4X\;V%<*16L;^D5?&_$`-,DYB%
|
||
M<;&]A6>QOX5H(*Z9(%F93)V5I6D896.%::5G96*JA6>0`^9H&*5HQFHP`TRT
|
||
ME*D'A6K&:Z1K&::)A6VYOXD89?^%;H5OP!CP`TRTE$R;C<9>T%VF6L99$`OF
|
||
M7J59R?^P3V"F6+P``H15L<'*,/2\``*$5CCQP?#=A5R%7H5RAEH@]Y@PW*16
|
||
ML;VD53CQO85PI%:QOZ15\;\0`TSJE(5QL;V%9+&_A64@KID@*9E,.Y:E9AAE
|
||
M885FI61E8*J%9)`#YF48I67&7]!>IEO&61`)I5G)_K!18*(`Y%BP^KP``H15
|
||
ML<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5K&]I%4X\;V%<*16L;^D5?&_
|
||
M$`-,AI>%<;&]A6>QOX5H(*Z9(%F93+*68*5I&&5CA6FE9V5BJH5GD`/F:!BE
|
||
M:,9J,`-,Q96I!X5JQFND:QFFB85MN;^)&&7_A6Z%;\`8\`-,Q95,=8_&7M!=
|
||
MIEK&61`+YEZE6<G_L+!@IEB\``*$5;'!RC#TO``"A%8X\<'PW85<A5Z%<H9:
|
||
M(/>8,-RD5K&]I%4X\;V%<*16L;^D5?&_$`-,(9B%<;&]A62QOX5E(*Z9("F9
|
||
M3%"7I68896&%9J5D96"JA620`^9E&*5EQE_06:9;QED0":59R?ZP36"B`.18
|
||
ML/J\``*$5;'!Z#B\``*$5O'!\-V%785?A7*&6R#WF##:I%6QO85GI%8X\;V%
|
||
M<*15L;^%:*16\;\0`TQQEH5Q(*Z9((&93,*7I6DXY6.%::5GY6*%9ZJP`L9H
|
||
MI6@8QFHP`TS:EJD'A6K&:Z1K&::)A6VYOXD89?^%;H5OP!CP`TS:EDQ;D6#&
|
||
M7M!9IEK&61`+YEZE6<G_L.U@IEB\``*$5;'!RC#TO``"A%8X\<'PW85<A5Z%
|
||
M<H9:(/>8,-RD5;&]A62D5CCQO85PI%6QOX5EI%;QOQ`#3!"7A7$@KID@`)E,
|
||
M79BE9CCE885FI63E8(5DJK`"QF6E91C&7]!9IEO&61`)I5G)_K!-8*(`Y%BP
|
||
M^KP``H15L<'H.+P``H16\<'PW85=A5^%<H9;(/>8,-JD5;&]A6>D5CCQO85P
|
||
MI%6QOX5HI%;QOQ`#3%V5A7$@KID@@9E,SYBE:3CE8X5II6?E8H5GJK`"QFBE
|
||
M:!C&:C`#3.N7J0>%:L9KI&L9IHF%;;F_B1AE_X5NA6_`&/`#3.N73#N3I%6Q
|
||
MPZ16\<-@AF#)_]`)AE:I`&J%89`&A6&*2H56J8"%9J5D..56A62JI67I`(5E
|
||
M&&"&8,G_T`^I`&J%8:F`A6:F9*5ED!J%8:F`A6:*2H56I62J..56A62E9;`#
|
||
MQF6H&&"&8LG_T`RI`&J%8ZF`A6F*D`F%8ZF`A6F*2AAE9X5GJJ5H:0"%:!A@
|
||
MAF+)_]`/J0!JA6.I@(5IIF>E:)`7A6.I@(5IBJ9G2AAE9X5GI6B0!.9HJ!A@
|
||
MI7%*I7!JJJ5R2O!GJ+T`S#CY`,R05*J]`,VJA?>%^TG_:0"%^87]I'*Q]SCQ
|
||
M^855L?OQ_856I7#E5855I7'E5J55D![%<I`'Z.5RQ7*P^895JO`+O0#,./D`
|
||
MS*J]`,ZF56#*97*0^TS]F:(`I7"D<DS]F:5Q2J5P:JJI_V!+2$%!04XAA62&
|
||
M8H1C&&5B*7^JI6(XY60I?ZB](,\8>2#/A;6Y`,\X_0#/A;2*&&5C*7^JF#CE
|
||
M8RE_J+T`SQAY`,_)@&J%L;D@SSC](,_)@&J%LJ5B&&5C*7^JI6(XY6,I?ZBY
|
||
M`,\X_0#/A;.](,\8>2#/A;F*..5D*7^JF!AE9"E_J+T`SQAY`,_)@&J%81AE
|
||
ML86XI6$XY;&%L;T@SSCY(,_)@&J%81AELH6WI;(XY6&%LJ5C&&5D*7^JI6,X
|
||
MY60I?ZB](,\8>2#/..6QA;&Y(,\X_2#/..6XA;B]`,\X^0#/&&6RA;*]`,\8
|
||
M>0#/&&6WA;>E8BE_JKT`SPJ%MF!F:*("AE6U385@M5"%8;2WM;2J(*Z;IE6E
|
||
M8I5-I6.5M*5DE5"E996WRA#98&9HH@*&5;50A6"U2H5AM+&UMZH@KINF5:5B
|
||
ME5"E8Y6WI6252J5EE;'*$-E@9FBB`H95M4J%8+5-A6&TM+6QJB"NFZ95I6*5
|
||
M2J5CE;&E9)5-I665M,H0V6"E:!`0AF*$9*9@I&&&881@IF2D8KT`QQAY`,F%
|
||
M8KT`QGD`R(5CI6(896"%8I`"YF.Y`,<X_0#)A62Y`,;]`,B%9:5D&&5AA620
|
||
M`N9EIF@0#J9BA6*&9*5CIF6%989C8(B$5;%CA5>Q9858L6>%6K%IA5NQ:X5=
|
||
ML6V%7J6QIK*DLR!WG*15D:6*D:.EM*:UI+8@=YRD59&IBI&GI;>FN*2Y('><
|
||
MI%61K8J1JYCP`TP-G&!#4D%#2TE.1R!43T%35"P@1U)/34U)5"&%689<A%^H
|
||
M\%:D6(7WA?M)_QAI`87YA?VQ]SCQ^85AL?OQ_85BI%>Q]SCQ^85@L?OQ_1AE
|
||
M885AI6)I`*3W$`V%8J5A..57A6&E8N58I%@0`SCE]P9@)F$J!F`F82JD881O
|
||
MA7"E7/!BI%N%]X7[2?\8:0&%^87]L?<X\?F%8;'[\?V%8J1:L?<X\?F%8+'[
|
||
M\?T896&%8:5B:0"D]Q`-A6*E83CE6H5AI6+E6Z1;$`,XY?<&8"9A*@9@)F$J
|
||
MI&&JF!AE;X5OBF5PA7"E7_!BI%Z%]X7[2?\8:0&%^87]L?<X\?F%8;'[\?V%
|
||
M8J1=L?<X\?F%8+'[\?T896&%8:5B:0"D]Q`-A6*E83CE785AI6+E7J1>$`,X
|
||
MY?<&8"9A*@9@)F$JI&&%8I@896^JI6)E<&"F;Z5P8*6PA6&&5H159F@0&J16
|
||
ML:.%5[&EA5BQIX5:L:F%6[&KA5VQK85>I%7PU8B$5;%IT`B%8H5DA6;P)J2Q
|
||
MA:])_QAI`85@L:\X\6"%8J2TL:\X\6"%9*2WL:\X\6"%9J15L6OP+Z2RA:])
|
||
M_QAI`85@L:\X\6`896*%8J2UL:\X\6`8962%9*2XL:\X\6`896:%9J15L6WP
|
||
M**2SA:])_QAI`85@L:\X\6`896*%8J2VL:\X\6`8962%9*2YL:\X\6"@`!AE
|
||
M9A`!B*9H,`^D59%=I6216Z5BD5E,T9T895V%9IAE7H5GH@"@`*5D$`&(&&5:
|
||
MA62895N%91`!RH9<H@"@`*5B$`&(&&57A6*895B%8Q`!RH99I6?P$DIF9D99
|
||
M9F-F8D9<9F5F9*K0[J9FO0#*T"2D4X1OI%2$<:1EA7"%<DQ4GZ1BA&^D8X1P
|
||
MI&2$<:1EA'),/I_)`?#II&*%]X7[2?\8:0&%^87]L?<X\?F%;['[\?VD8QAQ
|
||
M]SCQ^85PI&2Q]SCQ^85QL?OQ_:1E&''W./'YA7*E4QAE;X5OD`/F<!BE5&5Q
|
||
MA7&0`N9RO0#+\'B%]X7[2?\8:0&%^87]L?<X\?FJL?OQ_<"`D`+E]ZB*97&%
|
||
M<9AE<H5RI&2Q]SCQ^;'[\?T897&D59'!I7)I`)'#I&.Q]SCQ^:JQ^_']P("0
|
||
M`N7WJ(IE;X5OF&5PA7"D8K'W./'YL?OQ_1AE;Z15D;VE<&D`D;],T9VD5:5O
|
||
MD;VE<)&_I7&1P:5RD<-,T9U)($A!5$4@0E)/0T-/3$DN("!!3D0@6454+"!)
|
||
G3B!!($-%4E1!24X@4T5.4T4L($D@04T@0E)/0T-/3$DN("U424-+
|
||
`
|
||
end
|
||
|
||
::::::::: tables.c000 :::::::::
|
||
|
||
begin 644 tables.c000
|
||
M`,``@`&"!(8)C!"4&9XDJC&X0,A1VF3N>020'*DVQ%+A<`"0(;)$UFG\D"2Y
|
||
M3N1Z$:A`V'$*I#[9=!"L2>:$(L%@`*!!XH0FR6P0M%G^I$KQF$#HD3KDCCGD
|
||
MD#SIED3RH5``L&$2Q'8IW)!$^:YD&M&(0/BQ:B3>F500S(E&!,*!0`#`@4($
|
||
MQHE,$-297B3JL7A`"-&:9"[YQ)!<*?;$DF$P`-"A<D06Z;R09#D.Y+J1:$`8
|
||
M\<JD?EDT$.S)IH1B02``X,&BA&9)+!#TV;ZDBG%80"@1^N3.N:20?&E61#(A
|
||
M$`#PX=+$MJF<D(1Y;F1:44A`.#$J)!X9%!`,"08$`@$````!`@0&"0P0%!D>
|
||
M)"HQ.$!(45ID;GF$D)RIML32X?``$"$R1%9I?)"DN<[D^A$H0%AQBJ2^V?00
|
||
M+$EFA*+!X``@06*$ILGL$#19?J3*\1A`:)&ZY`XY9)"\Z19$<J'0`#!ADL3V
|
||
M*5R0Q/DN9)K1"$!XL>HD7IG4$$R)Q@1"@<``0('"!$:)S!!4F=XD:K'X0(C1
|
||
M&F2N^420W"EVQ!)AL`!0H?)$END\D.0YCN0ZD>A`F/%*I/Y9M!!LR2:$XD&@
|
||
M`&#!(H3F2:P0=-D^I`IQV$"H$7KD3KDDD/QIUD2R(9``<.%2Q#:I')`$>>YD
|
||
MVE'(0+@QJB2>&900C`F&!((!@`"``8($A@F,$)09GB2J,;A`R%':9.YY!)`<
|
||
MJ3;$4N%P`)`ALD36:?R0)+E.Y'H1J$#8<0JD/MET$*Q)YH0BP6``H$'BA";)
|
||
M;!"T6?ZD2O&80.B1.N2..>20/.F61/*A4`"P81+$=BG<D$3YKF0:T8A`^+%J
|
||
M)-Z95!#,B48$PH%``,"!0@3&B4P0U)E>).JQ>$`(T9ID+OG$D%PI]L2283``
|
||
MT*%R1!;IO)!D.0[DNI%H0!CQRJ1^6300[,FFA&)!(`#@P:*$9DDL$/39OJ2*
|
||
M<5A`*!'ZY,ZYI)!\:59$,B$0`/#ATL2VJ9R0A'EN9%I12$`X,2HD'AD4$`P)
|
||
M!@0"`0!`/S\^/CT]/#P[.SHZ.3DX.#<W-C8U-34T-#,S,C(Q,3$P,"\O+BXM
|
||
M+2TL+"LK*RHJ*2DI*"@G)R<F)B4E)20D)",C(B(B(2$A("`?'Q\>'AX='1T<
|
||
M'!P;&QL:&AH9&1D9&!@8%Q<7%A86%145%104%!,3$Q,2$A(2$1$1$1`0$!`/
|
||
M#P\/#@X.#@T-#0T,#`P,#`L+"PL*"@H*"@D)"0D)"0@("`@(!P<'!P<'!@8&
|
||
M!@8&!04%!04%!00$!`0$!`0$`P,#`P,#`P,"`@("`@("`@("`0$!`0$!`0$!
|
||
M`0$!`0$`````````````````````````````````````````````````````
|
||
M```````````````````````````````!`0$!`0$!`0$!`0$!`0("`@("`@("
|
||
M`@(#`P,#`P,#`P0$!`0$!`0$!04%!04%!08&!@8&!@<'!P<'!P@("`@("0D)
|
||
M"0D)"@H*"@H+"PL+#`P,#`P-#0T-#@X.#@\/#P\0$!`0$1$1$1(2$A(3$Q,3
|
||
M%!04%145%186%A<7%Q@8&!D9&1D:&AH;&QL<'!P='1T>'AX?'Q\@("$A(2(B
|
||
M(B,C)"0D)24E)B8G)R<H*"DI*2HJ*RLK+"PM+2TN+B\O,#`Q,3$R,C,S-#0U
|
||
M-34V-C<W.#@Y.3HZ.SL\/#T]/CX_/T!`04%"0D-#1$1%149&1T=(2$E)2DI+
|
||
M3$Q-34Y.3T]045%24E-35%155E975UA965I:6UQ<75U>7U]@8&%B8F-D9&5E
|
||
M9F=G:&EI:FIK;&QM;FYO<'!Q<G)S='1U=G9W>'EY>GM[?'U]?G]_@(&"@H.$
|
||
MA(6&AX>(B8J*BXR-C8Z/D)"1DI.3E)66EI>8F9F:FYR=G9Z?H*"AHJ.DI*6F
|
||
MIZBIJ:JKK*VMKJ^PL;*RL[2UMK>WN+FZN[R]O;Z_P,'"P\3$Q<;'R,G*R\O,
|
||
MS<[/T-'2T]34U=;7V-G:V]S=WM_@X>'BX^3EYN?HZ>KK[.WN[_#Q\O/T]?;W
|
||
M^/GZ^_S]_O\```$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C
|
||
M)"4F)R@I*BLL+2XO,#$R,S0U-C<X.3H[/#T^/T!!0D-$149'2$E*2TQ-3D]0
|
||
M45)35%565UA96EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G-T=79W>'EZ>WQ]
|
||
M?H"!@H.$A8:'B(F*BXR-CH^0D9*3E)66EYB9FIN<G9Z?H*&BHZ2EIJ>HJ:JK
|
||
MK*VNK["QLK.TM;:WN+FZN[R]OK_`P<+#Q,7&Q\C)RLO,S<[/T-'2T]35UM?8
|
||
MV=K;W-W>W^#AXN/DY>;GZ.GJZ^SM[N_P\?+S]/7V]_CY^OO\_?[_`/____[^
|
||
M_OW]_?S\_/O[^_OZ^OKY^?GX^/CW]_?W]O;V]?7U]/3T\_/S\_+R\O'Q\?#P
|
||
M\._O[^_N[N[M[>WL[.SKZ^OKZNKJZ>GIZ.CHY^?GY^;FYN7EY>3DY./CX^/B
|
||
MXN+AX>'@X.#?W]_?WM[>W=W=W-S<V]O;VMK:VMG9V=@G)R8F)B4E)24D)"0C
|
||
M(R,B(B(A(2$@("`@'Q\?'AX>'1T='!P<'!L;&QH:&AD9&1@8&!@7%Q<6%A85
|
||
M%144%!04$Q,3$A(2$1$1$!`0$`\/#PX.#@T-#0P,#`P+"PL*"@H)"0D("`@(
|
||
M!P<'!@8&!04%!`0$!`,#`P("`@$!`0``````````````````````````````
|
||
M``$!`0$!`0$!`0$!`0$!`0$!`0$!`@("`@("`@("`@("`@("`@("`@("`P,#
|
||
M`P,#`P,#`P,#`P,#`P,#`P,$!`0$!`0$!`0$!`0$!`0$!`0$!`4%!04%!04%
|
||
M!04%!04%!04%!04%!08&!@8&^?GY^?GY^OKZ^OKZ^OKZ^OKZ^OKZ^OKZ^OKZ
|
||
M^_O[^_O[^_O[^_O[^_O[^_O[^_O\_/S\_/S\_/S\_/S\_/S\_/S\_/W]_?W]
|
||
M_?W]_?W]_?W]_?W]_?W]_?[^_O[^_O[^_O[^_O[^_O[^_O[^____________
|
||
M______________\`#!DE,CY+5V1Q?8J6HZ^\R-7B[OL'%"`M.D937VQXA9&>
|
||
MJ[?$T-WI]@,/'"@U04Y:9W2`C9FFLK_+V.7Q_@H7(S`]259B;WN(E*&NNL?3
|
||
MX.SY!A(?*SA$45UJ=X.0G*FUPL_;Z/0!#1HF,T!,665R?HN7I+&]RM;C[_P)
|
||
M%2(N.[C$T=WJ]@,0'"DU0DY;:'2!C9JFL[_,V>7R_@L7)#`]2E9C;WR(E:*N
|
||
MN\?4X.WY!A,?+#A%45YK=X20G:FVPL_<Z/4!#AHG-$!-669R?XN8I;&^RM?C
|
||
M\/P)%B(O.TA486YZAY.@K+G%TM_K^`01'2HW0U!<:76"CINHM,'-VN;S`-)I
|
||
M1C0J(QX:%Q43$1`/#@T,"PL*"@D)"`@(!P<'!P8&!@8&!04%!04%!00$!`0$
|
||
M!`0$!`0#`P,#`P,#`P,#`P,#`P,#`P,"`@("`@("`@("`@("`@("`@("`@("
|
||
M`@("`@("`@("`@("`@$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!
|
||
M`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!
|
||
M`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0``````````````````````
|
||
M`````````````````````````````````````(````"`````0%4`%X`G```@
|
||
M6JL-@`",(<!F%,>`/@#&D%TM`-6MAV)`'P#BQJN1>&!),QX*]N31P*^?CX!Q
|
||
M8U5(.R\B%PL`]>O@ULW#NK&IH)B0B(!X<6IC7%5/2$(\-C`J)1\:%`\*!0#[
|
||
M]O+MZ>3@W-C3S\O(Q,"\N;6RKJNGI*&>FI>4D8Z+B(:#@'U[>'5S<&YK:69D
|
||
M8E]=6UE65%)03DQ*2$9$0D`^/#HY-S4S,3`N+"LI)R8D(R$?'AP;&1@7%102
|
||
M$1`.#0L*"0@&!00"`0#__OS[^OGX]_7T\_+Q\._N[>SKZNGHY^;EY./BX>#?
|
||
MWMW<V]O:V=C7UM74U-,``!\R/TI265]E:6YR=GE\?X*%AXF,CI"2E)67F9J<
|
||
MGI^@HJ.DIJ>HJ:JLK:ZOL+&RL[2TM;:WN+FZNKN\O;V^O\#`P<+"P\3$Q<;&
|
||
MQ\?(R<G*RLO+S,S-SL[/S]#0T='2TM+3T]34U=76UM?7U]C8V=G9VMK;V]O<
|
||
MW-W=W=[>WM_?W^#@X>'AXN+BX^/CY.3DY>7EY>;FYN?GY^CHZ.CIZ>GJZNKJ
|
||
MZ^OK[.SL[.WM[>WN[N[N[^_O[_#P\/#Q\?'Q\O+R\O/S\_/T]/3T]/7U]?7V
|
||
M]O;V]O?W]_?W^/CX^/GY^?GY^OKZ^OK[^_O[^_S\_/S\_/W]_?W]_O[^_O[_
|
||
M`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$!`0$"`@("`@("`@("`@("
|
||
M`@("`@("`P,#`P,#`P,#`P,#`P0$!`0$!`0$!`0$!04%!04%!04&!@8&!@8&
|
||
M!P<'!P<'"`@("`@("0D)"0H*"@H*"PL+"PP,#`P-#0T.#@X/#P\0$!`1$1$2
|
||
M$A,3%!04%146%A<7&!@9&AH;&QP='1X>'R`A(2(C)"0E)B<H*2DJ*RPM+B\P
|
||
M,3,T-38W.#H[/#X_0$)#149(24M-3E!25%9765M=7V)D9FAJ;6]R='=Y?'^"
|
||
MA(>*C9"4EYJ>H:6HK+"TN+S`Q,C-T=;;W^3I[O3Y_OX!`0$!`0$!`0$!`0$!
|
||
M`0$!`0$!`@("`@("`@("`@("`@("`@("`@("`@(#`P,#`P,#`P,#`P,#`P,#
|
||
M!`0$!`0$!`0$!`0$!04%!04%!04%!@8&!@8&!@<'!P<'!P<("`@("`@)"0D)
|
||
M"0H*"@H*"PL+"PP,#`P-#0T.#@X/#P\0$!`1$1$2$A(3$Q04%145%A87%Q@8
|
||
M&1H:&QL<'1T>'A\@("$B(R,D)28G*"@I*BLL+2XO,#$R,S0V-S@Y.CP]/D!!
|
||
M0T1&1TE*3$U/45-55EA:7%Y@8F5G:6MN<'-U>'I]@(.%B(N.DI68FY^BIJFM
|
||
ML;6YO<'%RL[3U]SAYNOP]?H``@,%!@@)"PP.#Q`2$Q05%Q@9&AL;'!T>'A\?
|
||
M'R`@("`@("`?'Q\>'AT<&QL:&1@7%103$A`/#@P+"0@&!0,"`/[]^_KX]_7T
|
||
M\O'P[NWLZ^GHY^;EY>3CXN+AX>'@X.#@X.#@X>'AXN+CY.7EYN?HZ>OL[>[P
|
||
J\?+T]??X^OO]_@`"`P4&"`D+#`X/$!(3%!47&!D:&QL<'1X>'Q\?("`@
|
||
`
|
||
end
|
||
|
||
::::::::: table.rotmath :::::::::
|
||
|
||
begin 644 table.rotmath
|
||
M`(0````````````````!`0$!`0$!`0("`@("`@,#`P,$!`0$!04%!08&!@<'
|
||
M!P@("`D)"0H*"PL+#`P-#0X.#P\0$!$1$A(3$Q04%146%Q<8&!D:&AL<'!T>
|
||
M'A\@("$B(R,D)28F)R@I*2HK+"TN+B\P,3(S-#4U-C<X.3H[/#T^/T!!0D-$
|
||
M149'2$E*2TU.3U!14E-45E=865I;75Y?8&)C)",C(B$@(!\>'AT<'!L:&AD8
|
||
M&!<7%A45%!03$Q(2$1$0$`\/#@X-#0P,"PL+"@H)"0D("`@'!P<&!@8%!04%
|
||
M!`0$!`,#`P,"`@("`@(!`0$!`0$!`0``````````````````````````````
|
||
M`0$!`0$!`0$"`@("`@(#`P,#!`0$!`4%!04&!@8'!P<("`@)"0D*"@L+"PP,
|
||
M#0T.#@\/$!`1$1(2$Q,4%!45%A<7&!@9&AH;'!P='AX?("`A(B,C9&-B8%]>
|
||
M75M:65A75E134E%03TY-2TI)2$=&141#0D%`/SX]/#LZ.3@W-C4U-#,R,3`O
|
||
M+BXM+"LJ*2DH)R8F)20C(R(A("`?'AX='!P;&AH9&!@7%Q85%104$Q,2$A$1
|
||
M$!`/#PX.#0T,#`L+"PH*"0D)"`@(!P<'!@8&!04%!00$!`0#`P,#`@("`@("
|
||
3`0$!`0$!`0$```````````````@(
|
||
`
|
||
end
|
||
|
||
........
|
||
....
|
||
..
|
||
. -fin-
|
||
|
||
|
||
See you next issue!
|