Octrees and Coherent Hierarchical Culling

Edit** For code that I started working on which might help some out, go here.

Last week I started creating the code for an Octree and I was going to use the example from the GPU Gems 2 Chapter 6 for my engine. So, I started the code for the octree, and finished about 90% of it. Then, I started reading and going over code from the GPU gems and I realized that the concept covered in the book was a good one, but the example was only good for sample code. I will first explain the concept in the book and its strengths. Then, I will explain why its bad for a real game engine and what I am doing differently to built off the chapters idea. For another take on scene graphs, read the following article.

 

Starting at the beginning.

In the books example, first the scene is partitioned by use of a KD tree or an Octree. All of the scenes geometry is inserted into this tree. This is step one and it is simple.

Step two is traversing the tree starting at the root node stepping down through the children checking to see if the node is within the scenes view frustum. So far, simple concept, right? Once a node is found to be in the view frustum, the video card is queued to see how many pixels of the object you are attempting to draw is drawn. How do you queue the video card?

A query is typically executed as shown in the following code:

D3D10_QUERY_DESC queryDesc;
... // Fill out queryDesc structure
ID3D10Query * pQuery;
pDevice->CreateQuery(&queryDesc, &pQuery);
pQuery->Begin();
... // draw your object you want to test here with everything turned off
// no color buffer writes.. just depth testing, no writes
pQuery->End();
UINT64 queryData; // This data type is different depending on the query type
// this is how you would test with the Stop and Wait method as mentioned in the chapter,
//which is discouraged because you wait for the results from the video card
while( S_OK != pQuery->GetData(&queryData, sizeof(UINT64), 0) ){}
// querydata will contain the number of pixels that passed the depth test

You draw the geometry of course, but you unbind your render targets, and have only the depth buffer bound with writes to it disabled, but testing turned on. So, it is almost like an early z test except you queue the video card to see how many of these pixels passed the depth test.

The speed of the algorithm comes from the fact that you continue checking and drawing the rest of the scene and check in every once and while to see if the results are available from queuing the video card. Also, if the results are not available, and the results from the previous frame shows that the object was visible, just draw the geometry. Temporal Coherence is just a fancy term meaning if an object was previously in the scene, odds are it will be in the next frame. Everything so far sounds like this is the way to go, right?

Here is what I found:  the number of state changes will be high; there is no ability to instance anything (high number of draw calls); and it does not take advantage of temporal coherence like it should. Let me elaborate on each of the three problems that I found.

First, the method forces the programmer to draw each piece of geometry as it is found to be visible, which will force many state changes because of the different types of geometry. The reason is because the queues that are issued to the video card return the number of pixels that pass the depth test. So, how can the video card return the number of pixels that passed the depth test unless it is currently being filled via regular draw calls to visible geometry.

Second, each queue to the video card is an additional draw call. The article does say that if the number of objects in the scene is high, the speedups might be hard to achieve because of the draw call increase. But, the situation is worse because objects cannot be instanced –they must be drawn as they are found to be visible, otherwise, the depth buffer will not contain the correct information which could lead to many false positives ( more overdraw).

Third, the book does say that the programmer could get more of a performance boost from assuming that objects which were visible 3 or 4 frames ago are still visible. However, I do not think the book goes far enough in this respect.

There are three factors that affect temporal coherence: camera orientation/position;  the objects position; and time (I realize this may seem like a Duh, but read on. The authors missed this, you might as well). If the camera and the object being viewed does not move, there should only be one test needed until some delta time (this could possibly be seconds of time)  is met where the object should be tested again. The only reason the test should be reissued after some delta time is in case another object moves in front of it and blocks it from view.

Also, borrowing from the whole claim behind the GPU Gems chapter is that objects in the scene do not move much frame to frame, we can make assumptions that will be correct most of the time.

Here are my changes:

Use the three factors listed above

1) if  the camera orientation/position has not changed much ( use a delta that makes sense for your engine) use the last scenes information on what to drawn. Skip all the expensive view frustum checks. Note: only new objects that enter the view frustum need to queue the video card for visibility. Until the orientation/position changes enough, the objects can be assumed in their last state. Camera orientation should have almost no impact on whether objects need to be re queued UNLESS, they are new objects entering the view frustum. Think about it, turn your head left to right. Nothing new is seen except for objects that ENTER your view frustum. So, the previous objects should be assumed in their last state. Position should be the heavy influence, but it is scene dependent.

2) if an objects position in the scene has not changed much ( use a delta that makes sense for your engine) use the last scenes information on what to drawn. Skip the occlusion test until the objects movement delta changes enough. Let the object themselves determine if the test needs to be done instead of a different class or function. Most objects in a scene remain static, even characters. Not everyone is running or moving all the time.

3) Each object should have a time stamp of the last time a real occlusion queue was issued in case another object moved in front and obscured it from view. Remember, this delta time can be VERY high –I would feel comfortable with a 5 second delta time. Think about it, how often does one object move in front of another object and totally obscure it from view? Even if you are drawing this object and it is obscured, this will happen in extremely rare cases.

 

The whole thinking behind the above three reasons is about only doing an action if something changes. So, if nothing changes in the scene, why does the work need to be done over and over? If something moves in the scene, it will either block something else, or be blocked itself. The above three cases will take care of the visibility of this object with very little work.

Questions.. Comments?

 

 

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>