MadSharp: Unsafe – Telegram
MadSharp: Unsafe
233 subscribers
8 photos
1 video
16 links
The channel is all about unobvious C#/Unity hacks and optimizations.
Blog: meetemq.com
Download Telegram
Начинаем через 3 минуты!
Interseting discovery about Unity when using Vulkan.
See, Vulkan have two entry points to resolve functions: vkGetInstanceProcAddr and vkGetDeviceProcAddr.
vkGetInstanceProcAddr returns functions for a given VkInstance.
vkGetDeviceProcAddr on the other hand returns functions for a given VkDevice OR device child, e.g. VkQueue and VkCommandBuffer.
According to the docs, vkGetDeviceProcAddr is preferred when resolving device/-child functions, because returned address induce less overhead (probably because it doesn't need to resolve the child from VkInstance).

So, Unity requests vkGetDeviceProcAddr from Vulkan (or native plugin if hooked with InterceptVulkanInitialization). But actually never using it.
That means that all of the device/-child functions have an overhead to them. E.g. vkCmd* functions, vkQueue* functions etc. Those functions are actually used with insane frequency, in draw calls, uploading constants, binding buffer ranges etc.

The good thing is that we can reroute vkGetInstanceProcAddr to return pointers as vkGetDeviceProcAddr via native plugin. Which can give potential performance increase when events count is high.

How much performance? Well, you never know before you try. I think for 10K calls, say, on Android it could be measurable, like 0.5ms or something, but that's just my speculation.
👍5
Understanding GPU Virtual Addressing and Sparse Images/Buffers

Since the days of DirectX 11, and possibly even earlier, it's been possible to allocate memory on a GPU without actually using physical memory right away. But what does this mean?

Imagine you can create a buffer with a size of 64GB, even if your GPU only has 4GB of actual VRAM. How is this possible?

This works similarly to how virtual addresses work on a CPU. When you ask the operating system for memory, it doesn't immediately use real physical memory (RAM). Instead, it gives you a virtual address. The actual physical memory is only used when you start using that memory.

When you create a sparse buffer on a GPU, it only allocates a mapping table that looks something like this:
Page 0 = Address0
Page 1 = Address1
...


If you try to read this memory before it is backed by real memory, it will return zero because the memory doesn't actually exist yet.

Next, you allocate real, physical memory. This memory is usually aligned in pages (typically 64KB on modern GPUs). For example, let's say we allocate 2 pages, which equals 128KB. Then, we can bind these pages to the virtual address.

You can tell the GPU: "Bind my BufferAddress + 1GB (16384 pages) to the start of my allocated data." The mapping table then updates like this:

Page 16383 = NULL [previous value]
Page 16384 = AllocatedData + 0
Page 16385 = AllocatedData + 65536 Bytes
Page 16386 = NULL [previous value]


After binding the real memory to the virtual address, you can read or write to it in your shaders, compute passes, etc. Essentially, your 64GB buffer only takes up the size of the mapping table plus the 128KB of allocated real memory.
🔥82👍2
This media is not supported in your browser
VIEW IN TELEGRAM
Implemented bindless for Unity. Compute and Fragment shaders support for now. No weird trickery with compiling shaders manually or anything else. Just normal Unity shaders and native plugin.
👍16
Unity LockBufferForWrite: When You Should Prefer Them and Why Your Choice Matters

Написал тут небольшой обобщительный пост, когда юзать и когда не юзать LockBufferForWrite, и чем они отличаются от обычных GraphicsBuffer.

https://meetemq.com/2025/01/26/lockbufferforwrite-vs-other-buffer-types/
🔥73👍1👏1💅1
Compiling Shaders on Device with Fully Dynamic Shader

The idea is that in Unity you can load a shader asset via an asset bundle. In this asset, there is either compiled bytecode (for Metal or Vulkan, as well as DX11 DXBC / DX12 DXIL) or text.
The binary format of the asset bundle is known — there are plenty of open-source rippers on GitHub.
The binary format of the shader is also known.

This leaves only compiling the shader. The simplest case is when you have an Android device, your code is in GLSL, and you only need OpenGL ES. In that case, simply write the text into the shader asset.

[At first you will need to add the available shader variants in the Shader asset as they are always stored, and this will need to be done anyway.]

It's more complicated when you have GLSL and Vulkan:
You then need to compile the SPIR-V Cross compiler for Android. It's written in C++, so there shouldn't be any issues.
If you prefer HLSL — feel free to port DXC to Android. That shouldn't be too hard either.

The resulting output is also written into the shader inside the asset bundle.

Then load the asset bundle into Unity.

??????

PROFIT!
🔥6
This media is not supported in your browser
VIEW IN TELEGRAM
A long time ago I've made a proof of concept for Bindless textures in Unity.
Now it's open sourced and available for public use!

Bindless resources are the core of any GPU Driven Rendering pipeline (along with MDI).
MDI plugin can follow if there would be requests.

https://github.com/Meetem/DX12BindlessUnity
🔥17
Should I make a HPC# compiler? E.g. C# (MSIL) -> Native, will autovectorization and stuff (similar to Unity’s Burst, but without Unity and more flexible)
👍9🤨2
High Performance C# Compiler WIP
For now it's just a little subset of instructions, control flow, method calling and conversions.

But it's growing pretty fast and LLVM produces great code!
7