Screen space ray casting and basic collision in C++

Submitted by zach on Thu, 06/29/2023 - 18:54

In my last post I talked about a world editor tool that I've been working on for a while, and how I implemented a C++ rendering system in a C# UI. With a robust user input system now in place, I have the ability to move around throughout the editor. The next step is to enable user input so they can move objects around, edit their properties, and so on. In the tech demo I worked on to prove the concept of this editor, I had the ability to select an object from a list and edit its properties, but most game editors worth their salt allow the user to simply click on an item in the viewport to edit its properties. To achieve this, I needed two things: 1) A way to cast a ray from the camera, through the mouse, and into the world 2) A way to test if that ray hits an object in the world. Both of these are not terribly complex concepts, but having never implemented them it was a fun challenge!

To start working in this feature, I decided to focus on the ray casting from the camera first. I knew how to do ray-triangle intersection and figured the collision detection would be straight forward, so starting with task I wasn't as confident in seemed like the place to start. I knew that what I needed to do was to take the mouse position in screen space and unproject it back into world space. In theory this should have just been the opposite process of projecting vertices from world space into screen space. Typically for a vertex you convert from model space to screen space by multiplying the vertex by the ModelMatrix * ViewMatrix * ProjectionMatrix (we'll call this the MVP matrix from now on). So in theory I should have been able to take the same MVP matrix and multiply the screen space mouse coordinates by the inverse and reverse of the MVP matrix. Its not quite that simple though since the mouse coordinates have their origin at the top left corner of the screen, and Vulkan NDC coordinates have the origin at the center of the screen. This just means we need an extra step of transforming the mouse coordinates before we multiply it by the inverse PVM matrix. The problem with this method though, is that the process typically works with a vec3 or vec4, and we only have a vec2 for the mouse position. We need to make up the Z value for the mouse coordinate so that the mouse will live in the correct place in clip space. I initially assumed that using a Z value of 0.00 would put the mouse on the near plane of the camera frustum, but that wasn't the case. I looked again at the Vulkan clip space which told me that the near plane should actually be -1, but that didn't yield the right results either. The frustrating thing was that the computation provided correct results when the camera was at the origin of the world, but the results would have more error the further away from the origin the camera moved. After some trial and error I landed on a Z value of +1 providing correct results for all camera positions. This should have put the mouse on the far clipping plane which is not the result I wanted. I believe the reason that it works this way is because the view space transformation actually flips the Z axis so the camera will look in the +Z direction. This will also flip the clip space and place the +1 plane near to the camera and the -1 plane far to the camera.

Now with the screen space to world space conversion working, it was simple to cast a ray from the camera position through the world space mouse position. The next step was to implement a basic collision test to determine if the ray strikes any of the models in the scene. I did this in two steps: first I implemented a triangle class that would test for triangle-ray intersection, second I extended the Mesh class to contain a list of triangles and added a method to test a ray against all triangles in the mesh. With that code implemented it was simple to implement a brute force collision test against all objects in the scene. This is not efficient though, and the next steps will be to implement both a spatial partitioning scheme as well as adding some sort of bounding volume to the meshes to allow a coarse grain collision pass to limit computations each frame. With the collision implemented, it was easy to do a simple object highlight on hover to demonstrate the functionality. Once I improve the efficiency of the collision detection I can start working on actual transformation gizmos and this editor will start to feel like a real tool!

Check out this video demo showing of the new feature and don't forget to check out the code snippets below!

Here's a peak at how the mouse position is transformed from screen space to world space:

glm::mat4 projection_matrix = glm::perspective(glm::radians(60.), (double)g_width / (double)g_height, (double)0.1, (double)10000);


glm::mat4 view_matrix = m_currentCamera->GetViewMatrix();

// Cast mouse position from screen space to world space
POINT mouse_position = InputState::GetMousePosition();

// Convert integer points to floating point
float mouse_x = mouse_position.x;
float mouse_y = mouse_position.y;

// Convert from screen space to Vulkan NDC space
mouse_x = (2.f * (mouse_x / (float)g_width)) - 1.f;
mouse_y = (2.f * (mouse_y / (float)g_height)) - 1.f;
			
// Z must be at 1 for this unproject to work since we flip the Z axis
// in the camera view matrix
glm::vec4 screen_space_mouse = glm::vec4(mouse_x, mouse_y, 1.f,  1);

// Unproject the screen_space_mouse position back into world space
glm::mat4 pv =  projection_matrix * view_matrix;
screen_space_mouse =  glm::inverse(pv) * screen_space_mouse;

// Divide by W to fix the coordinates
screen_space_mouse /= screen_space_mouse.w;

Here's the code to check if a ray intersects a triangle:

bool c_Triangle::DoesRayIntersect(glm::vec3 ray_origin, glm::vec3 ray_direction, glm::mat4 transform, glm::vec3* out_hit_point)
{
	// Transform the points from model space into world space
    // Lots of potential for optimization here
	glm::mat4 normal_transform = glm::mat4(glm::inverseTranspose(glm::mat3(transform)));

	glm::vec3 verts[3];
	memcpy(verts, m_vertices, sizeof(glm::vec3) * 3);

	for (int i = 0; i < 3; ++i)
	{
		verts[i] = transform * glm::vec4(verts[i], 1.f);
	}

	glm::vec3 normal = normal_transform * glm::vec4(m_surface_unit_normal, 1.f);


	// Distance from a point to a plane is D = (P - Q) dot SurfaceNormal
	// Where P is the point, and Q is any point in the plane.
	// In this case Q can be any of the three vertices and we can substitute
	// P for the equation of a Ray: origin + t * direction. We then want to solve
	// this equation for D = 0.

	float ray_dir_dot_norm = glm::dot(ray_direction, normal);

	
	if (fabs(ray_dir_dot_norm) < 0.0001)
	{
		// ray_dir_dot_nrom is close enough to 0, no hit
		return false;
	}

	float point_dot_norm = glm::dot(verts[0], normal);

	float ray_origin_dot_norm = glm::dot(ray_origin, normal);

	float t = (point_dot_norm - ray_origin_dot_norm) / ray_dir_dot_norm;

	if (t < 0.0001)
	{
		// if t is negative, the hit is behind the ray
		return false;
	}
	else
	{
		// Need to make sure the hit lies within the triangle
		glm::vec3 hit_point = ray_origin + t * ray_direction;

		glm::vec3 R1 = glm::cross(hit_point - verts[0], verts[1] - verts[0]);
		glm::vec3 R2 = glm::cross(hit_point - verts[1], verts[2] - verts[1]);
		glm::vec3 R3 = glm::cross(hit_point - verts[2], verts[0] - verts[2]);

		float D1 = glm::dot(normal, R1);
		float D2 = glm::dot(normal, R2);
		float D3 = glm::dot(normal, R3);

		if (D1 >= 0.0 && D2 >= 0.0 && D3 >= 0.0)
		{
			if (out_hit_point != nullptr)
			{
				*out_hit_point = hit_point;
			}
			return true;
		}
		return false;
	}
}

Screen space ray casting and basic collision in C++

Tags