VQuake

From Vogons Wiki
Jump to: navigation, search

VQuake is a version of Quake written specifically for the Rendition Verite V1000 accelerator. It works quite differently than GLQuake because the V1000 has unique strengths and weaknesses. It is based upon the Quake software renderer.

Details

Architecture

A usenet posting by former Rendition programmer Stefan Podell describes in detail how VQuake works in addition to why it is not optimal for V2x00.

Posted by Stefan Podell on October 30, 1997 at 04:34:56:

Okay, with my reputation tarnished by VQuake and VHexen2
not going any faster on the V2x00, I'm here to explain 
why this is the case.

First, the VQuake/VHexen2 engine can be roughly broken 
down into four parts (actually, most games are like 
this):

* Software setup of geometry commands
* Software creation of texture maps
* Transfer of textures and commands to the renderer
* Rendering

The first two parts are totally CPU dependent, the 
third is bus dependent, and the fourth, in my case,
is Verite dependent.

Now, some history on the design of VQuake.

When we first started working with Id on accelerated
Quake, Id's design was much like their design for Quake
2: that there would be a "driver" part of the code,
somewhat separated from the "main" game. (In Quake's
case, though, since it is a DOS game, this would be
accomplished with different executables rather than the
more elegant DLL model employed by Quake 2.) 

They had two 3D engine paths through the game. One was 
a traditional triangle/polygon based engine, which is
what they predicted all 3D accelerators would use, and
one was a fairly elaborate "span sorting" scheme which
the software renderer used.

So we set out accelerating the game using the polygon
interface. It looked great with filtering, but it
wasn't terribly fast. Even after doing polygon sorting
on the world polygons and turning off the Z buffer for
those, the performance was not great. (You must 
remember that the V1000 wasn't really designed to do
Z-buffering as its primary rendering style.) So Walt
Donovan, then with Rendition, and Michael Abrash, then
with Id, talked about using the software engine's span 
interface. 

For those who are interested, there have
been a few articles published by Michael on the guts
of the engine. I think Dr Dobb's Journal had the best,
most detailed one. To the best of my understanding,
the engine sorts all world surfaces (floors, walls,
ceilings, and sky) on each scanline of the screen, 
and keeps track of the edges of each span. When it
goes to draw a particular surface (polygon), it is
then guaranteed that every pixel it is drawing is the
only world pixel that will be drawn at those 
coordinates. (This is hard to explain...) Suffice to
say that when the world surfaces are drawn, exactly
the minimum possible number of pixels is drawn (i.e.,
a depth complexity of 1). Meanwhile, the Z buffer
has been filled with the correct Z values, but Z
comparison is not necessary, since we know the depth
complexity is one. Next, when we start drawing objects
(monsters, weapons, etc), we turn on the Z comparison
so that the objects are properly hidden by the world.

The reasoning behind this was that the Pentium was
pretty slow at drawing pixels, but fast at floating
point operations. By doing more of what the Pentium
was good at, Quake was able to do less of what the
Pentium was bad at. Overall, a performance win, with
the side benefit of allowing much more interesting 
scenes.

((((whew))))

Walt and Michael decided that since the Verite 1000
wasn't terribly good at Z-buffered pixels, that if
we let the Pentium take care of this span sorting, we
could reduce the number of pixels the Verite would 
draw. Furthermore, we'd be able to turn off the Z
compare function on the Verite. So now, rather
than having a depth complexity of around 1.5 
(about 450000 pixels at 640x480) pixels that draw
at the peak rate of about 10MP/s on the V1000, we
had a depth complexity of 1 (300000 pixels) that 
draw at a peak rate of about 17MP/s.

As with the software renderer, we could then turn on
the Z compare and draw all the more interesting 
objects. Yes these pixels would be as slow as always,
but there are generally far fewer of them.

So we set out writing new microcode to support Quake's
span data format. When we finally got this working,
sure enough, the performance was way better than the
original polygon-style engine.

Back to my four-part description of Quake's engine, we
traded more CPU work in stage one for less Verite work
in stage four, which ended up getting us a big win.

Note that when you increase the vertical resolution in
the game, the engine must sort more scanlines. And if
you increase the resolution in either direction, the
renderer must draw more pixels.

Also note that when Quake was written, P133's were 
pretty top-notch, and the software frame rates were
low enough that the span sorting was adequate to
keep up. With the Verite rendering much faster than
the Pentium could, suddenly the span sorting was the
bottleneck. It wasn't until we got to P200's that 
the Verite was busy most of the time.

So whether you're on a V1000 or V2x00, the CPU has
a lot of work to do.

Okay, that's part one.

Part two is texture maps. (This will be much shorter.)
The way the software renderer in Quake works is to take
a small texture tile (like a couple of bricks) and
duplicate it into a larger texture map for the world
surface while it is applying the light map (dynamic or
static). It then caches that texture map and draws 
with it. When it needs to draw that surface again, it
checks to see if the lights have changed (like when 
you fire a weapon). If they have, it must regenerate
the texture map and recache it.

The VQuake engine does the same thing, with the extra
step of having to download the texture map to video 
memory.

Quake also mipmaps these surfaces. The mipmap level
is chosen based on the size of the polygon (in pixels)
relative to the size of the texture map.

In VQuake, the texture cache is kept in video memory
along with the display buffers and Z buffer. The
quick equation for how much memory the display and Z
buffers take is Width * Height * 6 (3 buffers, each
16-bits deep). The rest is for texture maps (minus
about 128K for microcode).

So when you increase the resolution, two things 
happen that increase the demands on the CPU for
texture map generation. First, you have less texture
memory, so textures will fall out of the cache more
often, requiring regeneration. Second, higher 
resolution mipmaps will be chosen, further straining
the texture cache.

The assembly code for generating the textures is
darn near as good as it can be. I certainly can't
think of any instructions to remove, and Michael
Abrash, who wrote it, is a genius at this stuff.

We considered doing two pass lighting on the Verite,
but after some experiments decided the CPU could do
it faster.

So again, no matter the Verite chip, the CPU will be
very busy. (The texture mapping, by the way is the
primary reason that timedemo works so much better as
a real benchmark than timerefresh. In the demo
sequences, there's lots of combat going on, which
pushes the system much harder.)

Alright. That's part two.

Part three is the bus. As you know, the currently
available Verites use the PCI bus, and are able to
use DMA asynchronously, which does not use the 
CPU. The bus activity will steal some cycles from
the CPU, but not an appreciable amount.

Simple enough.

Finally, the renderer. The V2x00 chips are *much*
faster at drawing than the V1000. The fastest the
V1000 could go was 25MP/s. The V2100 goes 40MP/s 
and the V2200 goes 50MP/s. Adding features (Z,
alpha, fog, etc.) would slow the V1000 down a lot,
while having minimal impact on the V2x00 chips.

So I understand why people were expecting VQuake
and VHexen2 to go much faster on the V2x00. But
the fact of the matter is that a faster renderer
doesn't necessarily buy you much given the
architecture of this engine. And because of that,
we're working on a V2x00-specific version of the
engine, to take advantage of the extra pixel 
power, while lightening the load on the CPU.

I must admit that I was beginning to wonder if
my beliefs about the engine's behavior were 
really true. So I just did a weird hack on
VHexen2 to test something:

I put in a check at the time drawing commands
and texture maps are sent to the Verite to see
if the game was in "timedemo mode". If it was,
I just threw away the commands and continued.
Then, when timedemo was over, drawing would
kick back in and I would see the results. The
purpose of this was to simulate an *infinitely
fast* renderer and bus (something we'd all
like to have :-) (This is also known as a 
"speed of light" test.)

==============================================
Here are the results of running timedemo demo1
on my current VHexen2 build (beta 3 candidate).
I ran all the tests at 320x200, 512x384, and
640x480, with antialiasing set to 0 and to 7
at all resolutions.

The first three tests are with rendering turned
on, in other words, the numbers everyone has
been responding to so far. The last set of
numbers is my "infinitely fast" renderer.

(fps are antialias = 0/antialias = 7)

The first test was on my P166MMX (64MB RAM) 
with my V1000 reference board, which runs at 
the same speed as an Intergraph 3D 100.
(These seem slower to me than what I 
remember, but I can't find where I wrote 
down my old numbers, so I just re-ran
these.)
=====================
320x200 (51.0 / 43.1)
512x384 (26.7 / 24.2)
640x480 (16.9 / 15.5)
=====================

Next, the same PC with a V2200. The first
thing you'll notice is that it's a little
slower at low resolutions. I think this is
because the span microcode has a little extra
overhead on the 2200. It also doesn't
currently interleave buffers. I'm looking 
into fixing those things.
=====================
320x200 (48.1 / 41.9)
512x384 (32.9 / 28.5)
640x480 (22.0 / 20.1)
=====================

Next, a V2200 in a P2-300 (this computer has
a pretty sucky hard drive in it and 32MB RAM,
so there was more swapping going on than
on my 166MMX)
=====================
320x200 (71.0 / 63.4)
512x384 (43.3 / 36.1)
640x480 (27.9 / 23.5)
=====================

And finally, my P166MMX with my "infinitely
fast" renderer/bus test
=====================
320x200 (56.0 / 48.2)
512x384 (44.1 / 37.8)
640x480 (32.4 / 29.7)
=====================

So you can see that at low resolutions, a 
faster CPU gets you much more by way of 
performance than a faster renderer. And as
resolution increases, infinitely fast 
rendering is a good thing :-)

Again, all this adds up to us working on 
a V2x00 version of this engine. We'll tell
you more as we know more about its progress.

This is not to say that there's no room for
improvement in my current version of the 
game. So if I figure out any cool way to make
this go faster on a V2x00, I'll definitely
put it in. There is one thing Quake 2
does that I want to try in VHexen 2. I'll 
let you know.

Regards,
Stefan (maybe I should use a .plan file) Podell

_________________________________________________

Readme

This VQuake release is a small upgrade.  Everything in whatever previous
readme you have is probably still true.  This release simply adds one new
setting.

r_surfacelookup (default 1, enabled)

Disabling r_surfacelookup will make the CPU perform the 8 to 16 bit
conversion step for texture maps instead of the Verite.  On faster CPUs
this can increase performance.  Results also depend on what resolution
you choose.  In other words, your mileage may vary.


*********

NiNe's notes:

Installation:

1) Put "VQuake.exe" and "VQ95.bat" into your \Quake directory.
2) If you have a V1000, put "spd3d.uc" into your \Quake directory.
3) If you have a V2x00, put "spd3d.uc2" into your \Quake directory.
4) Run "VQuake.exe" or "VQ95.bat".

	    Welcome to the Verite port of Quake!

--- Quick Start

	Copy vquake.exe, spd3d.uc, and vq95.bat to your quake 
directory (the one that contains quake.exe.) If you already
have vquake.exe or spd3d.uc, make sure the new version
overwrites the old one.
	Type "vquake" to run the accelerated game. Use vq95.bat 
when running Windows95 to allow TCP/IP networking.

	NOTE: you *MUST* upgrade your software Quake from 1.01
to 1.06 before running vquake 1.07. You can get the patch file
from id software at http://www.idsoftware.com/dlquake.html.

-- beta 5
	fixed bug where aa particles disappeared after a while

-- beta 4
	folded in id's 2/6 release, which fixes at least the
	grappling hook bug (bad entity number)

	Also, if you get a page fault while running vquake, try
	replacing cwsdpmi.exe with the new version in cwsdpmi3.exe.
	Rename the old one to some different name, and then rename
	the new one to cwsdpmi.exe. This has worked for at least
	one person.

-- beta 3
	Fixed bugs:
	alias models get completely wrong textures sometimes (but now
		requires 20Mb minimum to run under windows.)

	some alias model mipmaps show a blue fringe.

	timerefresh under water is amusing.

	weird scanlines generated by some models.

	hardly any particles get drawn.

	if no soundblaster in dos, will hang.

	antialiased particles are in (see r_antialias below.) still not
	quite right, though.

	crashes with some hipnotic levels.

-- beta 2
	Fixed demo-mode rendering errors.
	Known bugs:
		occassional blue stripes in mipmapped textures
		alias models with incorrect textures; apparently
			a texture caching bug as save/load makes it work
		new particles not showing up
		env var BLASTER not being there in dos causes hang

-- Quake 1.07 beta 1
	Ported to id 1.07 source tree as of 1/22/97. Supports
	Rogue and Hipnotic level packs.
	Known bugs:
	Demos show rendering errors not visible during game play.
       	If no soundblaster in DOS, hangs.


--- Known Problems

	Please send me email (walt@rendition.com) if you have
problems with this release that do not occur with the
version of vquake distributed with your 3D accelerator board.
Please include a system description, and how to duplicate the
problem if at all possible. Thanks.
	
	You must disable UNIVBE if you have it installed. There is no 
reason to run it as the BIOS is VESA VBE 2.0 compliant.

	If Microsoft Networking for DOS is running, the game
may hang when it exits.

	If the game hangs or does not appear to run, reboot,
run vquake with the "-nodma" flag and try again. If it works,
then you probably have an older PCI BIOS and motherboard that
does not work with the Verite chip's busmastering DMA.

	Under Win95, you cannot suspend the game and continue
later. Attempts to do so will crash Win95. For the same reason,
you should turn off any screen savers.

	If a window pops up on the screen (e.g., from a Win95 
background process like a virus scanner), Win95 will crash.

	We have observed improved network performance when we
turn off the screen saver on the Quake server machine.

--- Cvars
	Note: changing these values will affect your performance
	and visual quality.

	d_mipscale (default 1.0)
		increasing it reduces texture detail for 
		more distant objects.

	d_mipcap (default 0)
		increasing it (to 1, 2, or 3) reduces texture
		detail for all objects.

	d_bilerp (default 1)
		setting to 0 turns off texture bilinear filtering.

	d_dither (default 1)
		setting to 0 turns off dithering.

	d_wamp (default 7.0)
		controls amplitude of water warping.

	d_wfreq (default 12.0)
		controls frequency of water warping.

	r_antialias (default 0)
		Use "r_antialias 7" to antialias just about everything,
		including particles.
-----

Related links