Sunday, April 22, 2012

Independent Study: Week 7

Quick update to start the week off, I think it's as good a time as any to show that I have in fact brought this back to interactive frame rates with the help of a spatial hashing system:

It's a shame that, aesthetically, the video isn't very pleasing. It comes down to limitations of raw photon mapping really, as well as the hardware I have to work with. This scene has just short of 400,000 photons bouncing around it, but even that is not enough to uniformly light the objects within it. I would need to incorporate a final gathering phase of some sort if I wanted to correct the issue, and that would have a performance impact as well. I've got one or two ideas I'd like to explore but short of optimizing heavily and then using the new cycles to perform final gathering I don't see the quality improving much sadly.

More to come throughout the week.

Sunday, April 15, 2012

Independent Study: Week 6

This week's going to be all about optimization.

I've been trying to find the best balance between effort put in and performance increase as far as acceleration structures go. I thought about a kd-tree for a while, but the problem with those is the difficulty I've had finding a simple-to-implement sorting algorithm. I'm using a parallel selection sort based on Eric Bainville's work since it was extremely basic. It's not fast, but I don't really care about speed of sort for photon mapping itself because it's all done before-hand. The problem is that kd-trees require continuous refinement and re-sorting as you construct them. Every time you go down a level you swap the axis you're splitting on and that requires a partial re-sort. For the time being, I've passed on kd-trees because of these issues. Instead, I've implemented a spatial  grid that just works by partitioning 3D-space uniformly. It appears to be working so far, I've got a pretty decent speed-up, but it's also slightly broken at the moment:

It's not actually broken though, it's behaving exactly as I wanted it just isn't complete.Right now I'm only looking for photons within the grid cell where the ray tracing step intersected the object. This is partially correct, but I really need to be gathering photons from neighboring cells in all 3 dimensions. The distance away from which I should be checking other cells is equivalent to the radius of the photons that I'm emitting. That's next!

Update: 4/15/2012

Tried implementing neighbor search with my uniform I see why people don't roll their own algorithms! Performance was actually worse than without any "acceleration" structure at all, and I didn't have control over the span that I was searching over easily. One of the biggest problems I'm looking at right now is varying the size of the cells based on the radius of the photons within the scene. This isn't difficult itself, but then adjusting at run-time the resolution of the grid itself IS difficult to wrap my head around right now. I'll need to think on this for a while. If I can enforce that grid cells are the same dimensions as the radius of the photons, the nearest neighbor search becomes a problem with at most 26 grid cells involved. Ultimately, the kd-tree is looking like a better solution still, even if I have no idea how to do it yet.

Also, I've come to realize that my approach thus far is strikingly similar to that of Martin Fleisz.

Update 4/16/2012

Well I followed my hunch and looked into how Martin Fleisz approached acceleration structures. I'm not quite done with my own spin on his algorithm, but I'm getting close and thought I should update to say that hopefully this should be the one I settle on. Initial performance seems promising....if it's true. Hard to tell if I'm getting all the photons I should be with my queries. In the meantime, here's an awesome (broken) screenshot of the current state of the app:

Update: 4/16/2012

Well I fixed that bug, and after a night of hacking away at one problem after another I've come to this point:

I really don't know how I feel about this. It's hard to see but the final gather is still broken and cuts apart some photons, the map doesn't quite look accurate anymore as far as the inter-reflections on the larger sphere from the wall, and the grid structure itself is producing SERIOUS aliasing obviously. 

I'm really worried about the aliasing with this approach. I think the only way I can get rid of it is to cast multiple rays per pixel and then average the results. This would effectively cut my performance to a quarter of what it is now if I want it done right, effectively negating the entire advantage of this approach...

I guess I'll see what I think of it tomorrow...

Update: 4/17/2012

Here's an example of what I think is causing the aliasing. It's a closeup of the photons being cut off at grid boundaries like they shouldn't be:

Update: 4/18/2012

Fixed the error on the left and right walls, introduced a new one I can't figure out on literally every other flat surface and portions of the spheres. You can see it in this shot that also shows some really AWESOME errors I caused on purpose playing with the floating-point error:

Update: 4/19/2012

Still broken =/

Update: 4/21/2012

I FINALLY fixed that bug. It turned out to be a typo in my manual magnitude calculation for a certain vector. I hate myself for it...

I'm experimenting at the moment with larger batches of photons, and more uniform distributions. I'd expect that to be the highlight of next week's content and progress. Looking forward even more, I hope to play with texture memory a bit for Week 8!

Update: 4/21/2012

It's amazing what a better random function can do :)

Sunday, April 8, 2012

Independent Study: Week 5

Almost half-way! I'm making steady progress daily. Today I'm tackling photon mapping (hopefully updating this by the end of the day with an image!) but I thought I should record a major decision that I made today.

Defining The Photon Map

To define the photon map in the C++ header, I have to have static variables for all the dimensions of it. I've decided to forego acceleration structures for the time being in favor of a more simplistic multi-dimensional array sorted based on object type and various indices. The problem with that is that I have to define the limits of these dimensions statically. Fast-forward a ton of non-important stuff and the end result is that having two structures for Spheres and Planes is kind of annoying in this process. Not to mention unnecessary. Look at these two structure definitions:


typedef struct _SpherePrimitive {
cl_float4 center;
cl_float radius;
cl_uint materialIndex;
} SpherePrimitive;


typedef struct _PlanePrimitive {
cl_float4 normal;
cl_float originOffset;
cl_uint materialIndex;
} PlanePrimitive;

I don't see a single reason these two technically have to be separate. Which led me to this hybrid:


typedef struct _ObjectPrimitive {
cl_float4 positionOrNormal;
cl_float radiusOrOffset;
cl_uint materialIndex;
} ObjectPrimitive;

I haven't integrated this yet, but when I do I'll update with the results. I predict that nothing much will change performance-wise but defining my maps and various future structures will be that much easier.

Update: 4/8/2012
I switched everything over to generic objects and saw ~5% decrease in performance overall. I don't understand why exactly, but for the time being it's worth it for how much cleaner everything is.

I've also got most of the photon mapping done. Unfortunately the part I don't have completed is the storage format itself. I'm trying to find a middle-ground between ease-of-implementation and effectiveness for my initial implementation so that I'm not coding for hours on end just to find out that I've made a fundamental mistake and have to start over. I tried a 2-dimensional array based on object index, but without a lot of work it's just not going to pan out (initial tests not only didn't work, but took about 30 seconds per frame!!!). I think I might implement a 1-dimensional array instead and use a radix sort to keep everything in order by object index. Something like this:

struct Photon photons[SCENE_MAX_PHOTONS]; //holds all photons
uint* objectPhotonIndices; //holds the start index into "photons" for the given object

this approach will also make progressively adding photons to the map much easier than using a 2D array would have, as long as the total count remains below SCENE_MAX_PHOTONS.

Update: 4/9/2012
Well, decent progress this afternoon. I made the switch I mentioned earlier and it's looking promising!

The cool thing about this is that the two spheres in the middle are white.

I'm casting 10,000 photons into the scene (at the moment) uniformly from the point light located between the spheres and the blue wall. I don't have a sorting algorithm in place, so I'm only actually using the 500th through the 550th photons from the list arbitrarily, which is why this looks so bad relatively :).

Still, with only 50 photons from the list you can see some indirect light on the (100% red) ceiling, and same with the aqua floor. The spheres show the most though by far. I plan to implement a sorting algorithm for the photons next so that I can see what all 10K look like in real-time!

I'm a little worried about the performance. 2 Days ago I was getting 200 FPS at 1280x720 resolution. After implementing the photon mapping for 50 photons that has dropped to ~25 FPS at 640x480. I hope to come back and find something to optimize later on!

Note: As far as optimization goes, I don't know how successful I'll be. One of the most unfortunate things about my video card is that -- being a laptop card -- it lacks much dedicated memory. Unlike the smaller structures, my photon map was far too large to pass to my OpenCL kernels in __constant memory, and so I had to use the larger (and MUCH slower) __global access qualifier instead. I don't know what I can do about this...

I'll be updating soon when I get a sorting algorithm implemented. I've got my eye on a particular radix implementation that made it's debut in the Nvidia SDK a few months back.

Update: 4/9/2012

Debug Screenshot!

Update: 4/10/2012

How it looks when it's functioning correctly (so slow!!!)

Update: 4/12/2012

I modified the scene to be similar to the traditional Cornell Box. While playing with some debug settings and only rendering the indirect lighting I discovered how artistic these images can look. Here's one that looks almost hand-painted:

Update: 4/14/2012

I implemented really basic sorting and now I've doubled my speed!

Update: 4/14/2012

Added a random function for the photon emitting, changed storage to allow for all bounces of photons to be stored, and finally was able to replace all the hard-coded hacks with REAL, physically-based lighting. This is all true inverse-square law attenuation both from light to scene and scene to camera:

Wednesday, April 4, 2012

Independent Study: Week 4

Slow progress this week because of early midterms and projects in other courses ramping up. I'm still well on track with the tentative schedule I put into the original proposal however, which is a good thing. I plan to make a lot of progress on the material / object systems in the next few days in preparation for the photon mapping which I hope to start implementing this weekend. For now, here's a video that shows that I can successfully integrate my quaternion camera that I developed for a previous project into this project with moderate ease:

I think what I would like to do for photon mapping is get a very basic implementation done initially that uses something like a 1-dimensional (or perhaps 2-dimensional) sorted array for storing the photon map. That way I can get an idea of what kind of performance I have base-line, and then work on optimization. Here's the original projected schedule that I included in my proposal:

Week 1: Final decisions on architectures and hardware to be used are made.
Weeks 2-3: Basic OpenCL / CUDA Ray-Caster implemented (trivial, no bouncing)
Weeks 4-6: OpenCL / CUDA photon-map computation is implemented

Weeks 7-8: Benchmarking and Profiling, bottlenecks are identified and addressed
Weeks 9-10: Finishing touches. Final project completed.
Week 11: Formal Writeup / Final Paper completed.

It's currently week 4, so as I said I'm still well on track. I've also left myself an additional 2 weeks to get this part done, but more importantly is the 2 weeks I've allocated for optimization following that. At the latest, that is when I hope to implement an acceleration structure such as a kd-tree so that I can get some high frame rates. One of the cool things about this approach is that while performance will be decreased by the photon mapping itself, it's overall limited by the ray-tracer's resolution during final gathering. That means that  my "real-time" photon mapper may not be real-time at 1280x720, but dropping it down to 640x480 or lower could still yield decent frame rates. That's sort of cheating however, and I'd rather just get an awesome implementation that runs well :).

Also, caustics are (temporarily) off the table on this project until I get a better idea of the performance boundaries I'm dealing with. I'd love to implement the technique if I have time and I can keep the speed up.

Update: 4/7/2012

...for loops...

Tried adding multiple lights to my scene, a single light that I illuminate with phong shading gets me around 200 FPS at 1280x720 resolution. I up that to two hard-coded lights: 160 FPS...not bad at all. I keep the same two lights, but write the code with a for-loop so that I can handle an arbitrary number of lights: 100 FPS! Just the loop iteration is significantly reducing my frame-rates. I'm not really shocked, but it still sucks. I tried manually unwinding the loop in code and the speed actually went down, so I'm guessing OpenCL already takes care of that to some extent during compilation. Oh well, seeing as everything about this project is supposed to be about indirect lighting, I shouldn't technically need more than one light in the scene anyway. I've resorted to a preprocessor flag, USE_MULTIPLE_LIGHTS that when defined makes the code use a for loop, otherwise it just uses the first light in the scene. I might come back to this later on and see if I can find a faster way to get these going but I just don't see how at the moment...