I've been spending the last couple weeks working on performance optimizations.
The area I've been looking at, mainly, is where I store what cubes are visible in a chunk. I'm currently using a Dictionary<Vector3,bool> with the Vector3 being a position of the cube and the bool as just a throw-away variable. It's the smallest possible type as it only takes up 1 byte. The reason I'm using a Dictionary is I want to be able to do a lookup without having to search through the whole array. Dictionaries are indexed and have an O(1) key search, but they also use a good amount of overhead. I decided to test different types of dictionaries and see how much space they take. I created a trio of nested for loops to insert 32,768 objects in the Dictionary (the size of a full chunk). By changing the types, I was able to see the amount of bytes each Dictionary took. Here are my results:
18,568 dictionary<string,bool>
15,444 dictionary<vector3,bool>
15,424 dictionary<vector3,int/int16>
13,620 dictionary<int,bool>
13,612 dictionary<int16,bool>
13,608 dictionary<int,int>
13,588 dictionary<int16,int16>
I knew the <string,bool> would be high, but I tested it anyway for good measure. My current solution is 2nd highest (<vector3,bool>) and the lowest solution simply isn't feasible (<int16,int16>). What is feasible, though, is the <int,int> or <int,bool> solutions. The only problem is that it'd have to be a 1-way conversion.
When I instantiate a chunk, I go through every possible cube position x,y,z and check if a cube is supposed to be there. If it is, I put it into an Dictionary<Vector3,bool> called allCubesVisible. After that's all done, I go to the next step which is detect which cube faces should be displayed. I use a foreach(Vector3 location in allCubesVisible.Keys) to go through the dictionary and build the faces. If a cube is visible, I add it to a master Dictionary<Vector3,bool> called allCubes. allCubes only has cubes with visible faces. From here on out, allCubesVisible becomes a lookup table and nothing else (it's cheaper to check if a key exists in a dictionary than it is to run the Perlin Noise algorithm each time).
So my thinking is that once I'm done figuring out the initial set of visible cubes, I can convert my allCubesVisible dictionary from <vector3,bool> to <int,int>. Since I won't need to go through the allCubesVisible through a foreach anymore, I can convert it without looking back. I found a formula online that I can use:
[edit: that formula doesn't work. It generates duplicates. Time to figure out a new one.]
I don't think there's a way to go back to a Vector3 position from that, but it'd still give me a unique index for each position that I can use as a lookup.
So anyway, I'm toying around with a lot of ideas. One of the concepts I toyed with is destroying the allCubesVisible Dictionary and just running the Perlin Noise function check each time. This resulted in a drastically lower memory profile, but chunks went from taking 100-200ms to be built to 4-500ms. It let me build a lot more cubes though and I got several nice screenshots from it.
Wish I had a skybox so the in-game screenshot didn't look so bad.. maybe I'll take a break from optimization and dip my feet into .02a to add a skybox.
The area I've been looking at, mainly, is where I store what cubes are visible in a chunk. I'm currently using a Dictionary<Vector3,bool> with the Vector3 being a position of the cube and the bool as just a throw-away variable. It's the smallest possible type as it only takes up 1 byte. The reason I'm using a Dictionary is I want to be able to do a lookup without having to search through the whole array. Dictionaries are indexed and have an O(1) key search, but they also use a good amount of overhead. I decided to test different types of dictionaries and see how much space they take. I created a trio of nested for loops to insert 32,768 objects in the Dictionary (the size of a full chunk). By changing the types, I was able to see the amount of bytes each Dictionary took. Here are my results:
18,568 dictionary<string,bool>
15,444 dictionary<vector3,bool>
15,424 dictionary<vector3,int/int16>
13,620 dictionary<int,bool>
13,612 dictionary<int16,bool>
13,608 dictionary<int,int>
13,588 dictionary<int16,int16>
I knew the <string,bool> would be high, but I tested it anyway for good measure. My current solution is 2nd highest (<vector3,bool>) and the lowest solution simply isn't feasible (<int16,int16>). What is feasible, though, is the <int,int> or <int,bool> solutions. The only problem is that it'd have to be a 1-way conversion.
When I instantiate a chunk, I go through every possible cube position x,y,z and check if a cube is supposed to be there. If it is, I put it into an Dictionary<Vector3,bool> called allCubesVisible. After that's all done, I go to the next step which is detect which cube faces should be displayed. I use a foreach(Vector3 location in allCubesVisible.Keys) to go through the dictionary and build the faces. If a cube is visible, I add it to a master Dictionary<Vector3,bool> called allCubes. allCubes only has cubes with visible faces. From here on out, allCubesVisible becomes a lookup table and nothing else (it's cheaper to check if a key exists in a dictionary than it is to run the Perlin Noise algorithm each time).
So my thinking is that once I'm done figuring out the initial set of visible cubes, I can convert my allCubesVisible dictionary from <vector3,bool> to <int,int>. Since I won't need to go through the allCubesVisible through a foreach anymore, I can convert it without looking back. I found a formula online that I can use:
index = x + (y * chunkSize) + (z * chunkSize * chunkSize);
[edit: that formula doesn't work. It generates duplicates. Time to figure out a new one.]
I don't think there's a way to go back to a Vector3 position from that, but it'd still give me a unique index for each position that I can use as a lookup.
So anyway, I'm toying around with a lot of ideas. One of the concepts I toyed with is destroying the allCubesVisible Dictionary and just running the Perlin Noise function check each time. This resulted in a drastically lower memory profile, but chunks went from taking 100-200ms to be built to 4-500ms. It let me build a lot more cubes though and I got several nice screenshots from it.
Neat chain of islands off the coast. |
Mountain range out in the distance. |
Wish I had a skybox so the in-game screenshot didn't look so bad.. maybe I'll take a break from optimization and dip my feet into .02a to add a skybox.