Use instancing to send block positions in bulk

Description

Looking at the flame graph again, we're now spending most of our time in the draw_cube method, which calls set_uniform_vec3 to set the position uniform, and then gl::DrawArrays. Neither of these commands are doing much, but we're calling them a lot. With a 400x400x1 world, we're rendering 160,000 cubes per frame. At 60 frames per second, that's almost 10 million cubes per second.

I don't think modern GPUs would struggle to render 160,000 cubes per frame. However, there is overhead in sending OpenGL lots of small chunks of work (as opposed to batching it together). This StackOverflow post and the referenced presentation suggest that the CPU cannot send data fast enough to the GPU to process. One solution is to send larger batches of information at a time.

For our world, we're rendering the same cube many times, just in different positions. We can leverage instancing in OpenGL to specify how many cubes we want to draw and supply the positions all together via a buffer. Learn OpenGL has an excellent tutorial for how to do this.

This change involves sending over all the block positions to the renderer, which will then upload them to a vertex buffer and send to the GPU via glBufferData. In the shader, rather than receiving the position as a unifom, it's coming in as an input variable. To link the buffer data to the input variable in the shader, we can use glVertexAttribPointer to specify the type and layout of data (3 floats are represented as a vec3 in the shader).

When I run this in release mode (cargo run --release), I'm back to a consistent 60 FPS on my machine. Looking at the updated flamegraph, we're now spending almost all our time collecting the positions into a Rust vector each frame. We'll tackle this issue next.

Another interesting note is that the iridium:gdrv0 thread seems to have disappeared. The work in that thread likely corresponded to the overhead of the many drawArrays calls that we ended up replacing.

Commands

git clone git@github.com:atsheehan/iridium
cd iridium
git checkout e12277802cdc64cf2dea4d89941ada090ddf5d8a
cargo run --release

Code Changes

Modified shaders/cube.vertGitHub

@@ -1,12 +1,13 @@
11 #version 150
22
3+ in vec3 position;
4+
35 const float NEAR = 0.1;
46 const float FAR = 10000.0;
57
68 uniform vec3 camera_position;
79 uniform float camera_heading;
810 uniform float camera_pitch;
9- uniform vec3 position;
1011 uniform float aspect_ratio = 1.0;
1112
1213 out vec2 vertex_tex_coord;
@@ -1,12 +1,13 @@
1 #version 150
2
 
 
3 const float NEAR = 0.1;
4 const float FAR = 10000.0;
5
6 uniform vec3 camera_position;
7 uniform float camera_heading;
8 uniform float camera_pitch;
9- uniform vec3 position;
10 uniform float aspect_ratio = 1.0;
11
12 out vec2 vertex_tex_coord;
@@ -1,12 +1,13 @@
1 #version 150
2
3+ in vec3 position;
4+
5 const float NEAR = 0.1;
6 const float FAR = 10000.0;
7
8 uniform vec3 camera_position;
9 uniform float camera_heading;
10 uniform float camera_pitch;
 
11 uniform float aspect_ratio = 1.0;
12
13 out vec2 vertex_tex_coord;

Modified src/main.rsGitHub

@@ -117,9 +117,7 @@
117117 renderer.set_camera(world.camera());
118118 renderer.clear();
119119
120- for position in world.block_positions() {
121- renderer.draw_cube(&position);
122- }
120+ renderer.draw_cubes(world.block_positions());
123121
124122 renderer.present();
125123 fps_counter.finish_frame(current_instant);
@@ -117,9 +117,7 @@
117 renderer.set_camera(world.camera());
118 renderer.clear();
119
120- for position in world.block_positions() {
121- renderer.draw_cube(&position);
122- }
123
124 renderer.present();
125 fps_counter.finish_frame(current_instant);
@@ -117,9 +117,7 @@
117 renderer.set_camera(world.camera());
118 renderer.clear();
119
120+ renderer.draw_cubes(world.block_positions());
 
 
121
122 renderer.present();
123 fps_counter.finish_frame(current_instant);

Modified src/math.rsGitHub

@@ -28,6 +28,18 @@
2828
2929 Self(new_x, new_y, new_z)
3030 }
31+
32+ pub(crate) fn x(&self) -> f32 {
33+ self.0
34+ }
35+
36+ pub(crate) fn y(&self) -> f32 {
37+ self.1
38+ }
39+
40+ pub(crate) fn z(&self) -> f32 {
41+ self.2
42+ }
3143 }
3244
3345 impl Add<Vec3> for Vec3 {
@@ -28,6 +28,18 @@
28
29 Self(new_x, new_y, new_z)
30 }
 
 
 
 
 
 
 
 
 
 
 
 
31 }
32
33 impl Add<Vec3> for Vec3 {
@@ -28,6 +28,18 @@
28
29 Self(new_x, new_y, new_z)
30 }
31+
32+ pub(crate) fn x(&self) -> f32 {
33+ self.0
34+ }
35+
36+ pub(crate) fn y(&self) -> f32 {
37+ self.1
38+ }
39+
40+ pub(crate) fn z(&self) -> f32 {
41+ self.2
42+ }
43 }
44
45 impl Add<Vec3> for Vec3 {

Modified src/render.rsGitHub

@@ -94,6 +94,15 @@
9494 cube_vertex_array_id
9595 };
9696
97+ unsafe {
98+ let mut position_array_id = 0;
99+ gl::GenBuffers(1, &mut position_array_id);
100+ gl::BindBuffer(gl::ARRAY_BUFFER, position_array_id);
101+ gl::EnableVertexAttribArray(0);
102+ gl::VertexAttribPointer(0, 3, gl::FLOAT, gl::FALSE, 0, std::ptr::null());
103+ gl::VertexAttribDivisor(0, 1);
104+ }
105+
97106 let cube_texture_id = unsafe {
98107 let mut cube_texture_id = 0;
99108 gl::GenTextures(1, &mut cube_texture_id);
@@ -158,14 +167,26 @@
158167 }
159168 }
160169
161- pub(crate) fn draw_cube(&mut self, position: &Vec3) {
162- self.cube_program.set_uniform_vec3("position", position);
170+ pub(crate) fn draw_cubes(&mut self, positions: impl Iterator<Item = Vec3>) {
171+ let position_buffer: Vec<f32> = positions
172+ .flat_map(|position| [position.x(), position.y(), position.z()])
173+ .collect();
174+
175+ let num_instances = position_buffer.len() / 3;
163176
164177 unsafe {
165178 gl::UseProgram(self.cube_program.gl_id());
166179 gl::BindVertexArray(self.cube_vertex_array_id);
167180 gl::BindTexture(gl::TEXTURE_2D, self.cube_texture_id);
168- gl::DrawArrays(gl::TRIANGLES, 0, 36);
181+
182+ gl::BufferData(
183+ gl::ARRAY_BUFFER,
184+ (std::mem::size_of::<f32>() * 3 * num_instances) as isize,
185+ position_buffer.as_ptr() as *const c_void,
186+ gl::STATIC_DRAW,
187+ );
188+
189+ gl::DrawArraysInstanced(gl::TRIANGLES, 0, 36, num_instances as GLint);
169190 }
170191 }
171192
@@ -94,6 +94,15 @@
94 cube_vertex_array_id
95 };
96
 
 
 
 
 
 
 
 
 
97 let cube_texture_id = unsafe {
98 let mut cube_texture_id = 0;
99 gl::GenTextures(1, &mut cube_texture_id);
@@ -158,14 +167,26 @@
158 }
159 }
160
161- pub(crate) fn draw_cube(&mut self, position: &Vec3) {
162- self.cube_program.set_uniform_vec3("position", position);
 
 
 
 
163
164 unsafe {
165 gl::UseProgram(self.cube_program.gl_id());
166 gl::BindVertexArray(self.cube_vertex_array_id);
167 gl::BindTexture(gl::TEXTURE_2D, self.cube_texture_id);
168- gl::DrawArrays(gl::TRIANGLES, 0, 36);
 
 
 
 
 
 
 
 
169 }
170 }
171
@@ -94,6 +94,15 @@
94 cube_vertex_array_id
95 };
96
97+ unsafe {
98+ let mut position_array_id = 0;
99+ gl::GenBuffers(1, &mut position_array_id);
100+ gl::BindBuffer(gl::ARRAY_BUFFER, position_array_id);
101+ gl::EnableVertexAttribArray(0);
102+ gl::VertexAttribPointer(0, 3, gl::FLOAT, gl::FALSE, 0, std::ptr::null());
103+ gl::VertexAttribDivisor(0, 1);
104+ }
105+
106 let cube_texture_id = unsafe {
107 let mut cube_texture_id = 0;
108 gl::GenTextures(1, &mut cube_texture_id);
@@ -158,14 +167,26 @@
167 }
168 }
169
170+ pub(crate) fn draw_cubes(&mut self, positions: impl Iterator<Item = Vec3>) {
171+ let position_buffer: Vec<f32> = positions
172+ .flat_map(|position| [position.x(), position.y(), position.z()])
173+ .collect();
174+
175+ let num_instances = position_buffer.len() / 3;
176
177 unsafe {
178 gl::UseProgram(self.cube_program.gl_id());
179 gl::BindVertexArray(self.cube_vertex_array_id);
180 gl::BindTexture(gl::TEXTURE_2D, self.cube_texture_id);
181+
182+ gl::BufferData(
183+ gl::ARRAY_BUFFER,
184+ (std::mem::size_of::<f32>() * 3 * num_instances) as isize,
185+ position_buffer.as_ptr() as *const c_void,
186+ gl::STATIC_DRAW,
187+ );
188+
189+ gl::DrawArraysInstanced(gl::TRIANGLES, 0, 36, num_instances as GLint);
190 }
191 }
192