跳至主要内容

Take a look at Mesh Drawing Pipeline in UE 4.22

Preface


I have written some articles (include a chapter in my book) to explain the rendering procedure of unreal engine 4. However, after ue 4.22, things changed. Epic refactored the rendering pipeline and given a new name called Mesh Drawing Pipeline. I really recommend reading the original document
I want to take some time to talk a little about the new pipeline. The official document is for a professional developer and marked out almost all essential points. I read each word carefully to catch up with the idea behind it. Then I decided to write this blog. I want to talk about its architecture into details instead of just the updated part. I'm not a native English speaker, so if I wrote some confusing sentences, please tell me :D

The mini-map of rendering

I want to briefly talk about the process of rendering in ue4 to make you can understand.  Actually, for rendering, there are two directions. First is `how` second is `why.`  How is about the steps to gather the data of game objects and build a beautiful image to display. Why is about the formula and math behind these steps, mostly about the shader. I will only talk about the first. For the second one, you can find many documents and slides about the physically based rendering in unreal engine 4.

In the unreal engine, the main loop is ticking the whole game world infinitely until you quit. But the rendering is triggered by the update of the viewport. The visual content of the game viewport's update is not a part of the world tick. That's why there are two threads (not only two). One is game thread, and the other is render thread

When it be requested to update one frame, render thread start gathering information for rendering from visible game objects inside the viewport area.  Please think about this: there are two worlds in the game. One is the logical world, contains all actors and components. They exist whatever you are. The other is rendering world, which includes triangles, materials, and effects. They are the 'visual representation' of visible game objects. But they are not game objects. 
form is emptiness        —— sutra
After this gathering, these two worlds split. Render thread can do it own processing without damaging the logical world and game thread can also update the transform matrix of any objects without worrying about bothering the rendering thread data.
The rendering thread then culls out invisible objects. After this, it draws the objects in a set of Pass. This is called a rendering pass.  For example, the deferred renderer needs to render the objects into G-Buffer first. And then calculate light only for visible object based on G-Buffer information. After these passes finished, the render target contains the final output.
Inside the pass, the renderer will draw a part of game objects which is needed in the current pass. For example, BasePass does not draw the transparent objects. For each object need to draw, the renderer will gather the corresponding shader in material's shader map. That's why a material will produce a set of shaders instead of just one group of the pixel shader and vertex shader. The renderer will also consider the uniform parameters and rendering pipeline state. Finally, a DrawCall will be sent to GPU.
All cooperation with GPU is done by RHI command which is a small abstract layer of specific rendering API like Direct X 11 or OpenGL. Renderer fills in the RHI command list and asks the RHI thread to cooperate with GPU.

This is just a mini-map of rendering architecture. Please check the official rendering document. I wrote this just for the next parts.

The mesh drawing pipeline

I start with this huge map of the new mesh drawing pipeline.  It is actually almost a copy of the image from the official document.  The four circled number is for each chapter in this section. This actually about the answer to this question:
How does a UPrimitiveComponent (like UStaticMeshComponent) change into a DrawCall?
We are familiar with the components in the unreal engine. If we added a StaticMeshComponent to an Actor, we could see a mesh displayed on the screen. We also know, to draw something by GPU, we need to set the parameters, the pipeline states the vertex buffer, the index buffer and the shader. Then we send a DrawCall. This is almost the same in DirectX and OpenGL. So what happened in the middle?

0. From game thread to render thread

Like I said in chapter one, render thread takes a representation of the logical game object ( for example, UStaticMeshComponent). This representation is called Scene Proxy which contains needed data for rendering. Render thread only deals with scene proxies instead of game objects. And you can see, the prefix is 'F' which means it is not a UObject. 

1. From scene proxy to mesh batch

Mesh batch is a bridge between scene proxy and mesh passes. Like the document said:
FMeshBatch decouples the FPrimitiveSceneProxy implementation (user code) from mesh passes (private renderer module). It contains everything the pass needs to figure out final shader bindings and render state, so the proxy never knows what passes it will be rendered in.
You can see, a FMeshBatch has a set of MeshBatchElements, a vertex factory which can be thought as a vertex buffer, a material render proxy to represent the materials.
Sometimes there is more than one mesh batch. For some platforms, it cannot support instancing. So instead, the unreal engine divided the instances into many batches to draw. In this situation, the FMeshBatchElements array will be used to store the additional elements. But usually, it only uses the first element.
FMaterialRenderProxy is a three-level structure to the real shaders. Firstly, it contains a set of FMaterial. It can select different material by a given RHI level. Then, each FMaterial has a FMaterialShaderMap includes a lot of FShaders
In the unreal engine, each material will be compiled into a set of shaders instead of just one. That is because, for different passes, different platforms, and various conditions, the unreal engine will try to provide a most optimized shader. For example, the static switch node in the material will be compiled into two versions instead of an 'if-else' statement. The FShader structure is holding the pixel shader, vertex shader and so on. 

2. From one mesh batch to mesh draw commands

As I said, there are more than one passes existed in one render process. So It is reasonable to split FMeshBatch into smaller structures. In this case, it is FMeshDrawCommand
FMeshDrawCommand is an interface between FMeshBatch and the RHI. It’s a fully stateless draw description that stores everything that the RHI needs to know about a mesh draw: Which shaders to use. Their resource bindings. Draw call parameters.
So as the document said, a mesh batch contains everything to draw this UPrimitiveComponent. And a mesh draw command contains everything to draw this in one pass. To split the mesh batch into mesh draw commands, a PassProcessor will be used.

All of these happens in InitViews(). Before doing this, the renderer will do the CPU culling first and store the visibility information into the pass-relative visibility map. Then it calls SetupMeshPass after GatherDynamicMeshElements.
Here we can see the process. You can find out the command contains only one point to a specific shader. 

3. From a mesh draw command to RHI

This is done in the RenderPass code. Now it's time to transfer a FMeshDrawCommand object into a DrawCall. It means it will transfer that into a set of statement-setting API calls like setting the shader point, the parameters, and flags. And finally, it calls the step (4) DrawIndexedPrimitive and sends that to GPU.

Why faster?

There are three reasons explained by Epic in documents:
  1. Cache
  2. Parallel
  3. Merge

Cache

This is the image from official documents. It shows the three-level caching system. For particles which need to update mesh batch each frame, it will go through the higher path. And for UStaticMesh, it will go through the lower path which means it can cache mesh batches and all draw command inside them. 

Parallel

A lot of parallelisms have been used in this progress.  Please check to the source code. 

Merge

I think this is the most exciting part. And this has been proved as a great way to significantly decrease the draw call number without any other work to be done by developers.  The basic idea is merging the draw calls with the same shader binding into an instance drawing.

Now almost all GPUs in PC and consoles support instancing drawing.
“instancing drawing”的图片搜索结果
Instance on GPU can be really fast compared with sending more than one draw calls from CPU. There is a plug-in in marketspace to automatically merge the same static mesh into the instanced static mesh. But merging on rendering level can give more advance. For example, if we want to change some part after we use the plug-in to merge, it will be challenging. 
Currently, ue 4.22 will merge the draw calls fit the specific conditions. Please check the table in official documents.

However, Epic makes a really interesting step forward. It keeps the same parameter interface to merged primitives. By plug-in merging, you need to change the material expressions if you use some object parameter. For example, now you cannot use ActorWorldPosition because it merged all into one actor. By storing all these primitive level parameters into a structure called GPUScene, it can keep the same interface for material makers. 

Memory Efficient 

Specially thanks to Mr. Wang. and Mr. Chai. from Epic Games for the excellent comment. This part based on their great advice. 

The previous architecture is an object-oriented Architecture. It means rendering data has been put into different space in memory. However, in the new architecture, the Pass Processer tries to gather all mesh draw commands of the current pass into a tightness memory (TChunkedArray<FMeshDrawCommand>)
This means the renderer doesn't need to jump to different positions in memory during the iteration. This will significantly increase cache efficiency. 
Cache prefetching is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to a faster local memory before it is actually needed 

Summary

I just take a brief look at unreal's new Mesh Drawing Pipeline. Epic Games' idea behind the whole rendering architecture is always worth learning. I will continue to write something while digging in the source code. And I will always be happy to discuss with every reader. Please do leave comments.

评论