431 lines
20 KiB
Plaintext
431 lines
20 KiB
Plaintext
USE A BUFFERING FIFO QUEUE TO OUTPUT YOUR GFX
|
||
|
||
--------------------------------------------------------------------------
|
||
This little text was originally placed in the Imphobia MAG 12. I've added
|
||
some little things and I want it to be available for every programmers and
|
||
not only sceners. I hope you'll like it and I encourage you to D/L the
|
||
Imphobia MAGs (great MAG, Darky !!!) available on ftp.cdrom.com /pub/demos
|
||
and ftp.arosnet.se to read other amazing programming tricks and more...
|
||
--------------------------------------------------------------------------
|
||
|
||
|
||
Maybe this article has no interest for you, but i think it's important to
|
||
be clear about how to display frames on your screen in a clean manner.
|
||
|
||
There are many demos done by respectable groups, full of nice 3D algorithms,
|
||
nice distos, ... which suffer of a bad display of the frames (ugly horizontal
|
||
cuts in the frames). The problem is similar with many games (except DOOM which
|
||
is perfect in this domain ;-) ).
|
||
|
||
In the good old time of Mode-X, there weren't such problems because everyone
|
||
was using the multiple pages offered by this scheme of adressing, in order to
|
||
get a perfect double (or triple) buffered display. I know that many of you
|
||
will argue that ModeX is slow, that now we have PCI/VLB adapters wich run damn
|
||
fast in Mode 13h, and so why to support Mode-X again ...
|
||
|
||
The reply is simple: i don't say that we must support Mode-X... no ! We have
|
||
a better tool now, called UNIVBE 5.1, which offer many extended modes with
|
||
multiple pages (unlike Mode 13h).
|
||
|
||
Others will argue: pffff... i don't want to get my 3D engine idle loosing time
|
||
to synchronise with the VGA display and to support double-buffering.
|
||
|
||
The reply is : It is possible to get your engine 100% efficient and to get a
|
||
perfect synchronised display, without "ugly cuts" ,... Even without using
|
||
UniVBE, if you still want to use your favourite Mode 13h and you have at least
|
||
a VLB/PCI adapter, there is a way to get a single-buffered display without
|
||
cuts (no, Wizard, it's not with a HBL handler ;-)), and without idle graphics.
|
||
|
||
|
||
=============
|
||
1: THE BASICS
|
||
=============
|
||
|
||
I just put here some lines grabbed from my Open GL user guide, and some
|
||
personnal comments:
|
||
|
||
In a movie, motion's achieved by taking a sequence of pictures (24 per sec),
|
||
and then projecting them at 24/s on the screen. In Computer Graphics, screens
|
||
typically refresh (redraw the picture) approximately 60 to 76/s, and some even
|
||
run at about 120 refreshes/sec. The usual video modes used in demos and games
|
||
have 70,60, and eventually 50 Hz refresh.
|
||
|
||
The 'key' idea that makes motion picture projection work is that when it is
|
||
displayed, EACH FRAME IS COMPLETED. This is NOT the case if you fill the video
|
||
page during its display (single-buffering), because you alter the screen
|
||
while the video processor decodes it !!!. So the video processor decodes a mix
|
||
of your old and your new frame, and you get those ugly cuts. You can say that
|
||
a "rep movsd" or "rep stosd" is faster than the decoding of video memory on
|
||
your PCI/VLB adapter in Mode 13h, and it is impossible to "see" the update of
|
||
the screen. Right, if you work with a small video mode (like 64 Kb video mem
|
||
update) and if you are SYNCHRONISED with the screen DURING the modification.
|
||
|
||
Now, suppose that you want to display your million-frame movie with a
|
||
program like this (we suppose the screen refreshes at 70Hz, ie. Mode 13h):
|
||
|
||
init_gfxmode();
|
||
for (i = 0; i < 1000000; i++)
|
||
{
|
||
clear_screen();
|
||
draw_frame(i);
|
||
SYNC / wait_until_a_70th_of_a_second_is_over();
|
||
i = i + number of frames to skip to get a constant speed on every machine;
|
||
}
|
||
|
||
If you add the time taken by your system to clear the screen and to draw a
|
||
typical frame, this program gives more and more disturbing results depending
|
||
on how close to 1/70 second it takes to clear and draw. Suppose the drawing
|
||
takes nearly a full 1/70 second. Items drawn first are visible for the full
|
||
1/70 second and present a solid image on the screen; items drawn toward the
|
||
end are instantly cleared as the programs starts on the next frame, so they
|
||
present at best a ghostlike image, since for most of the 1/70 second your eye
|
||
is viewing the cleared background instead of the items that were enough
|
||
unlucky to be drawn last. The problem is that this program doesn't display
|
||
completely drawn frames; instead, you watch the drawing as it happens.
|
||
|
||
********************* 0 solid scanlines
|
||
********************* .
|
||
********************* .
|
||
--------------------- . ghost scanlines !!!
|
||
--------------------- 199
|
||
|
||
There are many solutions to this problem:
|
||
|
||
A) WORK IN A BUFFER IN CENTRAL MEMORY, AND THEN COPY THIS BUFFER TO THE SCREEN
|
||
(REP MOVSD)
|
||
|
||
The code becomes:
|
||
|
||
init_gfxmode();
|
||
for (i = 0; i < 1000000; i++)
|
||
{
|
||
copy my buff to video;
|
||
clear my buff
|
||
draw_frame(i) in my buff;
|
||
SYNC / wait_until_a_70th_of_a_second_is_over();
|
||
i = i + number of frames to skip to get a constant speed on every machine;
|
||
}
|
||
|
||
This is better ... But if you work in a Hi-res GFX mode (ex. 640x400x16M =>
|
||
768k or 1M), may be the copy will take more than 1/70, and you will see a
|
||
piece of the previous frame at the bottom of the screen (instead of the ghost
|
||
frame described before). This is not esthetic. Moreover you will need to work
|
||
in central mem and then copy to video mem, this is far to be optimal when we
|
||
think that the recent Video cards have a flat linear display in which we can
|
||
work directly.
|
||
|
||
********************* 0 portion of frame i
|
||
********************* .
|
||
********************* . frontier
|
||
--------------------- . portion of frame i-1
|
||
--------------------- 199
|
||
|
||
The frontier is +- constant thanks to the SYNC (assuming the calculation of
|
||
a frame take a constant time... this can be true for plasmas, but not for 3D).
|
||
|
||
But concretely, this is worse because the SYNC line is often removed because
|
||
it takes time we could use for the calculation of effects.
|
||
|
||
In this case the code becomes:
|
||
|
||
init_gfxmode();
|
||
for (i = 0; i < 1000000; i++)
|
||
{
|
||
copy my buff to video;
|
||
clear my buff
|
||
draw_frame(i) in my buff;
|
||
i = i + number of frames to skip to get a constant speed on every machine;
|
||
}
|
||
|
||
This means that there is no more synchronisation with the retrace of the
|
||
screen. In this case the frontier between the old (at the bottom) and the new
|
||
frame will move on the screen, and this is REALLY ugly and visible.
|
||
|
||
i ****************** i+1 ******************* i+2 ******************
|
||
****************** ******************* ******************
|
||
****************** i ------------------- ******************
|
||
i-1 ------------------ ------------------- ******************
|
||
------------------ ------------------- i+1 ------------------
|
||
|
||
AND EVEN : (!)
|
||
i+2 ------------------ If you look carefully many 3D phong, mapped,
|
||
------------------ bumped demos, you will often see such things.
|
||
i+3 ****************** (Look them on a 486 DX2-66, Fast Pentiums can
|
||
****************** false the results on demos designed for 486).
|
||
******************
|
||
|
||
|
||
So, this is NOT the right thing to do if you want a quality animation.
|
||
|
||
B) USE DOUBLE BUFFERING
|
||
...
|
||
|
||
|
||
===================
|
||
2: DOUBLE BUFFERING
|
||
===================
|
||
|
||
Double buffering is a radical way to remove the problems described before.
|
||
The idea is to have 2 video pages, one is displayed while the other is being
|
||
drawn. When the drawing of a frame is complete, the two buffers are swapped.
|
||
So the one that was being viewed is now used for drawing, and vice versa. It's
|
||
like a movie projector with only two frames in a loop; while one is being
|
||
projected on the screen, an artist is desperately erasing and redrawing the
|
||
frame that is not visible. As long as the artist is quick enough, the viewer
|
||
notices no difference between this setup and one where all frames are already
|
||
drawn and the projector is simply displaying them one after the other. With
|
||
double-bufering, every frame is shown only when the drawing is complete ; the
|
||
viewer never sees a partially drawn frame.
|
||
|
||
This is a sample of code:
|
||
|
||
init_gfxmode();
|
||
j = 0; k = 1;
|
||
SYNC
|
||
SetVisualPage(j)
|
||
for (i = 0; i < 1000000; i++) {
|
||
clear page(k)
|
||
draw_frame(i) in page(k);
|
||
SYNC / wait_until_a_70th_of_a_second_is_over();
|
||
k <=> j
|
||
SetVisualPage(j) // must wait the vertical retrace to set new values to
|
||
// the video registers
|
||
i = i + number of frames to skip to get a constant speed on every machine;
|
||
}
|
||
|
||
The benefits are:
|
||
|
||
- You can work directly in video mem and use the possibility of FLAT linear
|
||
adressing.
|
||
- It is impossible to have interferences between the new and the old frames.
|
||
- Because you are working directly in video memory, you can even use the
|
||
BitBLT accelerator of your card to "clear page(k)" or to set a nice
|
||
background, or to draw lines, sprites, ... (There are very few cards which
|
||
have a BitBLT able to work in central memory, even if you just want to
|
||
specify a source in central mem... so in the single buffering scheme, the
|
||
copy buff to screen must be done by hand :-( ). With the new UniVBE 5.2 and
|
||
the VBE/AI, BitBLT will be a reality !!! Think about that !!! I'll speak
|
||
about BitBLT in a next article, both customs routines and VBE/AI support...
|
||
|
||
The prob:
|
||
|
||
- You MUST be synchronised with the screen !!! So your graphics engine is idle
|
||
until the vertical retrace is done, and that is time lost for calculation.
|
||
|
||
With the SYNC line, you wait until the current screen refresh period is over
|
||
so that the previous buffer is completely displayed. Assuming that your system
|
||
refreshes the display 70 times per second, this means that the fastest frame
|
||
rate you can achieve is 70 frames per second, and if all your frames can be
|
||
cleared and drawn in under 1/70 second, your animation will run smoothly at
|
||
that rate.
|
||
|
||
What often happens on such a system is that the frame is too complicated to
|
||
draw in 1/70 second, so each frame is displayed more than once. If, for
|
||
example, it takes 1/45 second to draw a frame, you get 35 frames per second,
|
||
and the graphics are idle for 1/35-1/45=1/157 second per frame. Altough 1/157
|
||
second of wasted time might not sound bad, it's wasted each 1/35 second, so
|
||
actually more than 1/5 of the time is wasted.
|
||
|
||
That means that if you're writing an application and gradually adding
|
||
features, at first each feature you add has no effect on the overall
|
||
performance - you still get 70 frames per second. Then, all of a sudden, you
|
||
add one new feature, and your performance is cut in half because the system
|
||
can't quite draw the whole thing in 1/70 of a second. A similar thing happens
|
||
when the drawing time per frame is more than 1/35 second - the performance
|
||
drops to 35 to 23 frames per second, and so on (70/1, 70/2, 70/3, 70/4, 70/5,
|
||
...).
|
||
|
||
|
||
====================================
|
||
3: N-BUFFERING / THE BUFFERING QUEUE
|
||
====================================
|
||
|
||
How to get cuts-free animation without idle graphics ?
|
||
|
||
The idea is to think in a different manner the couple CPU/Video. We can see
|
||
this as the classical problem of producer/consummer: here the CPU produces
|
||
frames and the Video consummes them in parallel.
|
||
|
||
The CPU produces the frames as fast as it cans, and the Video consumes the
|
||
frames at its own independant rate (ex. 70 frames/s). The frames produced by
|
||
the CPU are placed in a FIFO Queue which feeds the Video.
|
||
|
||
|
||
FIFO Queue (N entries max)
|
||
---------------------------------
|
||
CPU -> * * * * -> Video
|
||
---------------------------------
|
||
|
||
|
||
If the FIFO queue is full (the N entries are filled), then the CPU enters in
|
||
a idle loop until there is some place free to put the new frame it has
|
||
calculated.
|
||
|
||
If the FIFO queue is empty, the Video will keep the old frame displayed, and
|
||
look in the FIFO at the next refresh.
|
||
|
||
|
||
For N=2, we have the double-buffering described before.
|
||
N=3, we have triple-buffering which is often satisfactory, because it
|
||
breaks yet the rigid synchronism we had with double-buffering,
|
||
without using many buffers (3). ID Software have used triple-
|
||
buffering in their game DOOM, which work in Mode-X (which gives
|
||
3 pages 320x240x256 or 4 pages 320x200x256).
|
||
N=4, ...
|
||
.
|
||
.
|
||
The more the buffers, the more the CPU can anticipates frames and avoid to
|
||
enter in a idle loop.
|
||
|
||
Concretely, we can bufferize the start-adresses of the video pages we are
|
||
working on. In this case, we have a code like that:
|
||
|
||
|
||
init_gfxmode();
|
||
install_interrupt_handler();
|
||
|
||
// CPU (Producer) // Interrupt handler (Consummer)
|
||
(Handler called at each
|
||
j = 1; Vertical retrace)
|
||
InQ(0);
|
||
|
||
for (i = 0; i < 1000000; i++)
|
||
{ if (EmptyQ() == false)
|
||
clear page(j); // use BitBLT {
|
||
draw_frame(i) in page(j); new_start = OutQ();
|
||
while (FullQ() == true) {}; // idle loop SetVisualPage(new_start);
|
||
InQ(j); }
|
||
j = (j + 1) MOD N; iret
|
||
i = i + number of frames to
|
||
skip to get a constant
|
||
speed on every machine;
|
||
|
||
}
|
||
|
||
Yep, that's quite cool, uh ???
|
||
|
||
Note: to do an interrupt handler synchronized with a refresh of 70Hz, you just
|
||
have to reprogram the PC timer to a clock a bit faster like 75 Hz, wait
|
||
for the VR bit in 3DAh (resynchronisation) and restart the timer...(this
|
||
is called a semi-active wait). There are many VR-Handler available on
|
||
FTP sites or BBSes (look for example at the Starport BBS intro source
|
||
code, ... ).
|
||
|
||
Well, this code work fine if you have a multipage display ... This is not a
|
||
problem for SVGA modes: if we consider a 1M board, which is the actual
|
||
standard, we have 8 pages in 320x200x65K, 16 pages 320x200x256, 4 pages
|
||
640x400x256, 3 pages 640x480x256, ... (at least if you use UniVBE).
|
||
|
||
But 320x200x256 16 pages doesn't work on all cards, and so the good old Mode
|
||
13h has still a reason to exist. No problem, remember what i told before in
|
||
"The basics", don't use (physical) synchronised single-buffering with an Hi-
|
||
Resolution mode... Ok, but Mode 13h is a small mode which can be updated very
|
||
fast on PCI/VLB cards (64k to fill). The idea is to use a Logical N-buffering
|
||
combined with a synchronised Physical single buffering.
|
||
|
||
In a synchronised Physical single buffering, we work in buffers in central
|
||
memory, and then copy them into the video memory. So, we can imagine to
|
||
bufferize the addresses of those buffers (we place the addresses in the FIFO),
|
||
and then to have an interrupt handler (synchronised with the VR) which get the
|
||
address of the new buffer to display (= to copy) and invoke a copy routine for
|
||
this buffer.
|
||
|
||
Warning !!! This invoquation can not be a simple call, because if the copy
|
||
routine takes too much time, this can result in a total misfunctionning of the
|
||
interrupt handler (Remember that such periodic interrupt handler is a critical
|
||
code which has some real time constraints and which cannot miss an event !!!).
|
||
In order to avoid problems, the copy rout must be interruptible by the
|
||
handler. This is obtained if we invoke the copy rout using a context-switch:
|
||
we pop the stack layers until we reach the return address of the code
|
||
interrupted by the handler, and we insert the address of the copy-rout, and
|
||
then we re-push the layers, and when we'll do an "iret" at the end of the
|
||
handler, we'll jump to the copy-rout (which is interruptible by the handler,
|
||
because it's seen as a normal user application), at the end of the copy rout,
|
||
we do an iret to restore the code interrupted previously by the handler.
|
||
|
||
|
||
STACK
|
||
|
||
| Var 1 | | Var 1 |
|
||
| Var 2 | insert Adr 2 | Var 2 |
|
||
| Adr 1 | ---> | Adr 2 |
|
||
| ..... | | Adr 1 |
|
||
|
||
You don't have to forget that just before the return address, there is also
|
||
the status flag, and you have to consider it when you push/pop if you don't
|
||
want to obtain awesome crashes. Just refer you to your 8x86/80x86 manual to
|
||
see how the instructions iret, iretd, ret, ... work. In particular, when you
|
||
push the address (which is in the form segment:offset/selector:offset) of the
|
||
Copy_rout, you must push a dummy flag, because it will be invoked by an
|
||
iret/iretd (you just have to do a pushf/pushfd).
|
||
|
||
|
||
The code becomes:
|
||
|
||
init_gfxmode(13h);
|
||
install_interrupt_handler();
|
||
|
||
// CPU (Producer) // Interrupt handler (Consummer)
|
||
(Handler called at each
|
||
j = 1; Vertical retrace)
|
||
InQ(0);
|
||
|
||
for (i = 0; i < 1000000; i++)
|
||
{ if (EmptyQ() == false)
|
||
clear buffer(j); // use CPU {
|
||
draw_frame(i) in buffer(j); Adr_Buffer = OutQ();
|
||
while (FullQ() == true) {}; // idle loop Pop all local variables;
|
||
InQ(j); pushfd;
|
||
j = (j + 1) MOD N; Push Adr of Copy_Rout;
|
||
i = i + number of frames to RePush all local variables;
|
||
skip to get a constant }
|
||
speed on every machine; iretd; // this handler MUST
|
||
// be short !!!!
|
||
}
|
||
Adr_Buffer: Integer;
|
||
Copy_Rout:
|
||
(assume ds/es -> 0)
|
||
mov esi, Adr_Buffer
|
||
mov ecx,16000
|
||
mov edi,0a0000h
|
||
rep movsd
|
||
iretd
|
||
|
||
|
||
This is the idea... With this scheme of work, we get 100% efficient code and
|
||
100% synchronisation with the display. Moreover, there are many interesting
|
||
properties of the buffering queue, but i let you imagine that ;-). Good luck
|
||
with your implementation.
|
||
|
||
|
||
___________________________________
|
||
|
||
|
||
|
||
Greets to all my friends, all TFL-TDV members, all kewl guyz of the scene i
|
||
got a nice chat with, and all guyz who will greet me (us) in the future ;-)
|
||
|
||
I specially thanx Karma and Bismarck/TFL-TDV for inspirating me, and Karma for
|
||
playing with the bugs during the implementation of a N-Bufferized Mode 13h for
|
||
his Descent-like part in Hurtless ;-)
|
||
|
||
(C) 1996 Type One / TFL-TDV
|
||
|
||
Contact me at the following addresses:
|
||
|
||
llardin@is1.ulb.ac.be Laurent Lardinois
|
||
(until october 1996, after 271 chauss<73>e de Saint Job
|
||
jusk ask jcardin@is1.ulb.ac.be 1180 Bruxelles, Belgium
|
||
my new email)
|
||
|
||
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
|
||
The N-Buffering (up to 8 buffers used !) feature was implemented in the demo
|
||
"HURTLESS" we presented at Wired 95. It featured 320x200/640x200 Hi-Color,
|
||
320x200x256 chained multipages, BitBLT, FLAT LINEAR, and Video RAM booster
|
||
support, WITH or WITHOUT UniVBE. Have a look if you want to see the thing
|
||
working (however the demo might be unstable because of the intensive use of
|
||
Mikmod 2.03 virtual timers... but maybe we'll do a special release with a
|
||
new version of Mikmod. The SB support is really random). It is available on
|
||
ftp.cdrom.com, ftp.arosnet.se, and hagar.arts.kuleuven.ac.be .
|
||
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
|