unity dx12 modify list
2026-02-10 10:58:22
1
0
0
duadua
--- # 清单 ## 核心修改 1. 支持dxc单独编译。支持运行单独加载dx11/dx12 shader blob。【82f97665】【36add27】 2. GPU资源接管。【3182b8e6】 2.1. 接入D3D12MA。支持启动参数切换开关。 2.2. 接管default/upload heap、placed/committed resource决策。 2.3. 接管small texture决策。 2.4. 接管scratch/staging buffer。梳理texture/buffer sync/async upload。 3. 封装ID3D12Resource。【ad51dee0】 3.1. 去除与之相关的global hash set 同步锁。 3.2. resource residency。接入微软官方residency数据结构。 3.3. resource mark access。 4. 新版resource barrier系统。【ad51dee0】 4.1. 实现于commandlist上,去掉全局usage锁。 4.2. 支持打包resource barrier统一flush。 5. Graphics Double Buffer封装。【bee0f1e7】 5.1. 无锁的double buffer resource结构,为后续改三线程提供工具。 5.2. 修复graphcis jobs导致的skin mesh render buffer同步问题。 6. read back buffer pool。【ad51dee0】 6.1. read back buffer实现真正有用的pool复用,避免每帧创建资源。 ## 其它修改 1. 支持启动参数切换dx11/dx12。 2. 支持dx12 shader feature宏。 3. 支持dx12 validtation info打印/拦截。 4. 日常crash修复。 5. 原理与最佳实践调研。 6. 分析graphcis jobs实现,修复graphcis jobs同步问题。 7. 梳理cb流程。 8. 梳理cpu/gpu fence管理并优化。 9. 梳理线程同步点。 10. async compute梳理支持验证。 11. wave指令梳理支持验证。 12. 添加gpu wait last frame gpu fence启动参数。 --- # 明细 ## 核心修改 - 1 支持dxc单独编译。支持运行单独加载dx11/dx12 shader blob。【82f97665】【36add27】 > - [Unity Shader 引擎流程(飞书文档)](https://funplus.feishu.cn/wiki/YWcJw4yzui9e8EkqNnacrn4nncB)。 > - 修复validation layer error log。 > - 增加validation layer log 拦截策略。 > - dx12和dx11两套变体,dx12默认开dxc编译。 > - 运行时根据平台加载对应blob。 > - GpuProgramsD3D12.cpp。 - 2 GPU资源接管。【3182b8e6】 > - [D3D12 Memory Allocator (GitHub)](https://github.com/GPUOpen-LibrariesAndSDKs/D3D12MemoryAllocator) > - 启动参数:添加`-d3d12maDisable`使用原生。默认开启d3d12ma。 > - 接入D3D12MA。支持启动参数切换开关。 > - 接管default/upload heap、placed/committed resource决策。 > - 接管small texture决策。 > - 接管scratch/staging buffer。梳理texture/buffer sync/async upload。 > - 可运行时切换原始allocator和d3d12ma。`-d3d12maDisable` > - 根据资源debugname可判断接管情况。 > - 接管heap分配。upload heap/ default heap等。 > - 接管resource分配。接管small texture分配。 > - 支持打印gpu分配json文件。支持c#接口调用。`GpuMemStartBuildStatsString`, `StopBuildStatsString`。 > - 更新copy queue策略。 > - 更新buffer fence和新buffer创建策略。 > - 核心文件如下。 ```cpp D3D12MA集成 D3D12HeapAllocatorMA.h - D3D12MA分配器接口 D3D12HeapAllocatorMA.cpp - D3D12MA实现 D3D12HeapAllocatorAdapter.h - 分配器适配器 DMA/D3D12MemAlloc.h - D3D12MA库头文件 原生分配器 D3D12HeapAllocator.h - 原生heap分配器 D3D12HeapAllocator.cpp - 原生分配器实现 ``` ```cpp // D3D12HeapAllocatorMA.h/cpp - D3D12MA分配器实现 class D3D12HeapAllocatorMA { D3D12MA::Allocator* m_Allocator; public: // 创建资源(自动决定placed或committed) HRESULT CreateResource( const D3D12_HEAP_PROPERTIES* pHeapProperties, D3D12_HEAP_FLAGS HeapFlags, const D3D12_RESOURCE_DESC* pDesc, D3D12_RESOURCE_STATES InitialResourceState, const D3D12_CLEAR_VALUE* pOptimizedClearValue, D3D12Resource** ppOutResource, const char* name, size_t* tagCounter, const D3D12_ALLOCATION_FLAGS flags = D3D12MA::ALLOCATION_FLAG_NONE) { // 1. 调用D3D12MA::Allocator::CreateResource D3D12MA::Allocation* allocation; HRESULT hr = m_Allocator->CreateResource( &allocDesc, // 包含heap properties和flags pDesc, InitialResourceState, pOptimizedClearValue, &allocation, IID_PPV_ARGS(&resource)); // 2. D3D12MA自动决策: // - 小资源 -> committed resource // - 大资源 -> placed resource(如果有合适的heap) // - 支持small texture alignment (4KB/64KB) // 3. 封装为D3D12Resource *ppOutResource = new D3D12Resource(resource, allocation, InitialResourceState, heapType); // 4. 设置debug name if (name) SetDebugName(resource, name); return hr; } // 创建Heap(支持自定义pool) HRESULT CreateHeapAdaptive( const D3D12_HEAP_DESC* pHeapDesc, HeapAllocationDMA** ppAllocation, D3D12Heap** ppHeap) { // D3D12MA自动管理heap sub-allocation return m_Allocator->CreateHeap(pHeapDesc, ppAllocation, ppHeap); } }; ``` - 3 封装ID3D12Resource。【ad51dee0】 > - [D3DX12Residency.h (Microsoft)](https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12_residency.h) > - 去除与之相关的global hash set 同步锁。 > - resource residency。接入微软官方residency数据结构。 > - resource mark access。 ```cpp 主文件 D3D12Resource.h - Resource封装头文件 D3D12Resource.cpp - Resource封装实现 D3D12Heap.h - Heap封装 Residency集成 Residency/D3D12ResidencyAdapter.h - Residency适配器 Residency/d3dx12Residency.h - 微软官方Residency实现 ``` ```cpp // D3D12Resource.h/cpp - 封装ID3D12Resource struct D3D12Resource : NonCopyable { // =============================================================== // 构造函数(支持committed和placed resource) // committed resource 或 placed resource D3D12Resource(ID3D12Resource* resource, ResourceAllocation* allocation, D3D12_RESOURCE_STATES initialState, D3D12_HEAP_TYPE heapType, bool isBackBuffer = false, bool isExternal = false); // placed resource from heap D3D12Resource(ID3D12Resource* resource, ResourceAllocation* allocation, D3D12_RESOURCE_STATES initialState, D3D12Heap* heap); ~D3D12Resource(); // =============================================================== // 核心成员 private: ID3D12Resource* m_Resource; // 原始D3D12资源指针 ResourceAllocation* m_Allocation; // 内存分配信息 D3D12Heap* m_PlacedHeap; // 如果是placed resource,关联的heap D3D12ResidencyObject* m_ResidencyObject; // Residency管理对象 D3D12ResourceState m_State; // 当前资源状态 D3D12_HEAP_TYPE m_HeapType; // =============================================================== // Residency接口 D3D12ResidencyObjectArray GetResidencyObjects() { if (IsPlaced() && m_PlacedHeap->IsResidencyTracked()) { // placed resource的residency由heap管理 return m_PlacedHeap->GetResidencyObjects(); } else if (m_ResidencyObject) { // committed resource直接管理 return D3D12ResidencyObjectArray(&m_ResidencyObject, 1); } return {}; } void TrackResidency(UINT64 size) { // 创建residency object并添加到residency manager m_ResidencyObject = D3DX12Residency::CreateManagedObject( GetDevice()->GetResidencyManager(), m_Resource, size); } // =============================================================== // 资源状态管理(无全局锁) const D3D12ResourceState& GetState() const { return m_State; } void SetState(D3D12_RESOURCE_STATES state, bool uav) { m_State.state = state; m_State.uav = uav; } #if ENABLE_D3D12_OBJECT_DEBUGNAME core::string m_DebugName; // Debug名称 #endif friend class GfxTaskExecutorD3D12; // 唯一可修改状态的类 }; // D3D12Heap.h/cpp - 封装ID3D12Heap struct D3D12Heap : NonCopyable { D3D12Heap(ID3D12Heap* heap, HeapAllocation* allocation, bool residencyTracked); // 获取residency objects(供placed resource使用) D3D12ResidencyObjectArray GetResidencyObjects(); private: ID3D12Heap* m_Heap; HeapAllocation* m_Allocation; D3D12ResidencyObject* m_ResidencyObject; bool m_ResidencyTracked; }; // GfxTaskExecutorD3D12.cpp - 资源访问标记 void GfxTaskExecutorD3D12::Execute(UINT64 fence, GfxDeviceD3D12SubmissionData* data, int count) { // 1. 收集本帧访问的资源 TD3D12ResourceSet accessedResources; for (auto& cmdList : data->m_CommandLists) { // 从command list收集资源 cmdList->CollectAccessedResources(accessedResources); } // 2. 标记资源访问(无全局锁,每个资源独立管理) for (auto* resource : accessedResources) { resource->MarkAccessed(fence); // 添加到residency manager auto residencyObjects = resource->GetResidencyObjects(); m_ResidencyManager->AddResourcesToSet(m_CurrentResidencySet, residencyObjects); } // 3. 确保资源驻留 m_ResidencyManager->MakeResident(m_CurrentResidencySet); } ``` - 4 新版resource barrier系统。【ad51dee0】 > - [dx12 resource barrier](https://funplus.feishu.cn/wiki/CCfmwnKU4iA4MHkzRRIcxRZznZe) > - [unity dx12 resource barrier 新版](https://funplus.feishu.cn/wiki/E6THwhKzKijzqhkRQ0Fcy5dPn9c) > - 实现于commandlist上,去掉全局usage锁。 > - 支持打包resource barrier统一flush。 ```cpp 核心文件 D3D12ResourceBarrier.h - Barrier系统头文件 D3D12ResourceBarrier.cpp - Barrier系统实现 D3D12ResourceBarrierDispatcher.h - Barrier调度器 D3D12ResourceBarrierState.h - Barrier状态定义 Command List集成 D3D12CommandList.h - Command List头文件 D3D12CommandList.cpp - Command List实现 ``` ```cpp // D3D12ResourceBarrier.h/cpp - Barrier系统核心 class D3D12ResourceBarrierSystem { // 资源状态映射(无全局锁) D3D12ResourceStateMap m_ResourceStateMap; // resource -> index D3D12ResourceStateArray m_ResourceState; // index -> state pair // Barrier缓存(批量flush) D3D12ResourceBarrierDispatcher m_ResourceBarrierCache; public: // 添加UAV barrier void ResourceBarrierUAV(D3D12Resource* resource) { auto it = m_ResourceStateMap.find(resource); if (it != m_ResourceStateMap.end()) { // 标记UAV状态 m_ResourceState[it->second].incoming.uav = 1; } else { // 新资源,添加状态 D3D12ResourceState state; state.state = D3D12_RESOURCE_STATE_UNORDERED_ACCESS; state.uav = 0; addResourceState(resource, state); } } // 添加Transition barrier void ResourceBarrierTransition(D3D12Resource* resource, D3D12_RESOURCE_STATES desiredState, UINT subResource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES) { auto it = m_ResourceStateMap.find(resource); if (it != m_ResourceStateMap.end()) { auto& incoming = m_ResourceState[it->second].incoming; auto& outgoing = m_ResourceState[it->second].outgoing; // 处理UAV barrier if (incoming.uav) { incoming.uav = 0; m_ResourceBarrierCache.AddUAV(resource); } // 获取当前状态(已转换或未转换) const bool transitioned = outgoing.value != D3D12_RESOURCE_STATE_INVALID; auto currentState = transitioned ? outgoing.state : incoming.state; // 添加transition barrier(如果需要) if (currentState != desiredState) { // 优化:read-to-read转换可以合并 if (IsReadState(currentState) && IsReadState(desiredState)) { // 合并read状态 incoming.state |= desiredState; } else { // 添加转换barrier m_ResourceBarrierCache.Add(resource, currentState, desiredState, subResource); outgoing.state = desiredState; } } } else { // 新资源,直接设置状态 D3D12ResourceState state; state.state = desiredState; state.uav = 0; addResourceState(resource, state); } } // 批量flush到command list void FlushResourceBarriers(D3D12CommandList* cmdList) { if (m_ResourceBarrierCache.GetCount() > 0) { ID3D12GraphicsCommandList* d3dCmdList = cmdList->GetCmdListAndMakeActive(); d3dCmdList->ResourceBarrier(m_ResourceBarrierCache.GetCount(), m_ResourceBarrierCache.GetData()); m_ResourceBarrierCache.Reset(); cmdList->SetHasChanges(true); } } // 收集状态到pending list(跨command list传递状态) void CollectStatePendingList(D3D12ResourceBarrierDispatcher& graphicsBarriers) { for (auto& state : m_ResourceState) { auto* resource = state.resource; const auto& resUsage = resource->GetState(); // 收集状态到dispatcher graphicsBarriers.AddTransition(resource, resUsage.state, state.outgoing.state); } } void Reset() { m_ResourceStateMap.clear(); m_ResourceState.clear(); } }; // D3D12CommandList.h/cpp - Command List中的Barrier使用 class D3D12CommandList { D3D12ResourceBarrierSystem m_BarrierSystem; public: void SetRenderTarget(const RenderTargetDesc& desc) { // 自动添加transition barrier for (int i = 0; i < m_RenderTargetDesc.m_RenderTargetCount; i++) { m_BarrierSystem.ResourceBarrierTransition( m_RenderTargetDesc.m_RenderTargets[i], D3D12_RESOURCE_STATE_RENDER_TARGET); } if (m_RenderTargetDesc.m_DepthStencil) { m_BarrierSystem.ResourceBarrierTransition( m_RenderTargetDesc.m_DepthStencil, D3D12_RESOURCE_STATE_DEPTH_WRITE); } } void ResourceBarrier(UINT numBarriers, const D3D12_RESOURCE_BARRIER* barriers) { for (UINT i = 0; i < numBarriers; i++) { auto& barrier = barriers[i]; if (barrier.Type == D3D12_RESOURCE_BARRIER_TYPE_TRANSITION) { D3D12Resource* resource = barrier.Transition.pResource; m_BarrierSystem.ResourceBarrierTransition( resource, barrier.Transition.StateAfter, barrier.Transition.Subresource); } } // 延迟flush,在Execute时统一处理 } void Execute() { // Flush所有barriers m_BarrierSystem.FlushResourceBarriers(this); // 提交command list到GPU ID3D12GraphicsCommandList* cmdList = GetCmdListAndMakeActive(); cmdList->Execute(m_CommandAllocator, m_CommandListType); } }; ``` - 5 Graphics Double Buffer封装。【bee0f1e7】 > - 无锁的double buffer resource结构,为后续改三线程提供工具。 > - 修复graphcis jobs导致的skin mesh render buffer同步问题。 > - 提供两套swap管理。global可以在固定时机主动swap,另一种则是在read/write时手动swap。 > - `GraphicsDoubleBuffer.h`。 ```cpp template<typename T, typename IndexManagerType = GlobalDoubleBufferIndexManager> struct GraphicsDoubleBuffer { // constructor and destructor GraphicsDoubleBuffer() { } ~GraphicsDoubleBuffer() = default; // get current read buffer (for other threads) - returns shared_ptr for safe usage std::shared_ptr<T> GetReadBuffer() { Swap(); return m_Buffers[GetReadIndex()]; } // unsafe get current read buffer (for other threads) - returns raw pointer, use with caution T* GetReadBufferUnsafe() { Swap(); return m_Buffers[GetReadIndex()].get(); } // initialize write buffer only template<typename... Args> void WriteBuffer(Args&&... args) { int writeIndex = GetWriteIndex(); MarkDirty(); WriteBuffer<T, Args...>(writeIndex, std::forward<Args>(args)...); } private: // helper to initialize a single buffer using SFINAE to detect Reset method template<typename U = T, typename... Args> typename std::enable_if<has_reset_method<U, void, Args...>::value, void>::type WriteBuffer(int index, Args&&... args) { if (!m_Buffers[index]) { m_Buffers[index] = std::make_shared<T>(std::forward<Args>(args)...); } else { m_Buffers[index]->Reset(std::forward<Args>(args)...); } } template<typename U = T, typename... Args> typename std::enable_if<!has_reset_method<U, void, Args...>::value, void>::type WriteBuffer(int index, Args&&... args) { m_Buffers[index] = std::make_shared<T>(std::forward<Args>(args)...); } // actual buffers std::shared_ptr<T> m_Buffers[2]; IndexManagerType m_IndexManagerType; int GetReadIndex() { return m_IndexManagerType.GetReadIndex(); } int GetWriteIndex() { return m_IndexManagerType.GetWriteIndex(); } void MarkDirty() { m_IndexManagerType.MarkDirty(); } void Swap() { m_IndexManagerType.Swap(); } }; ``` - 6 read back buffer pool。【ad51dee0】 > - read back buffer实现真正有用的pool复用,避免每帧创建资源。 ```cpp 主文件 AsyncReadbackBufferPool.h - Pool头文件 AsyncReadbackBufferPool.cpp - Pool实现 ``` ```cpp // AsyncReadbackBufferPool.h/cpp - Readback Buffer Pool实现 class AsyncReadbackBufferPool { public: static AsyncReadbackBufferPool& Instance(); // 从pool获取buffer D3D12Resource* Acquire(UINT64 size) { // 1. 查找合适的size bucket auto& bucket = m_ObjectPool[size]; // 2. 查找可复用的buffer for (auto& entry : bucket) { // 检查buffer是否已完成GPU使用 if (entry.buffer->IsFenceCompleted(entry.lastUsedFrame)) { // 复用buffer return entry.buffer; } } // 3. 没有可用buffer,创建新的 D3D12Resource* newBuffer = CreateReadbackBuffer(size); bucket.push_back({newBuffer, m_CurrentFrame}); m_TotalPoolSize += size; return newBuffer; } // 释放buffer到pool void Release(D3D12Resource* buffer, UINT64 size, D3D12Fence* fence, UINT64 fenceValue) { // 1. 记录fence,用于检查复用时机 m_PendingList.push_back({buffer, size, fence, fenceValue}); // 2. 添加到对应size bucket auto& bucket = m_ObjectPool[size]; bucket.push_back({buffer, m_CurrentFrame}); } // 每帧更新 void Update() { // 1. 检查pending list,回收完成的buffer for (auto it = m_PendingList.begin(); it != m_PendingList.end(); ) { if (it->fence->IsCompleted(it->fenceValue)) { // Buffer完成,可复用 it = m_PendingList.erase(it); } else { ++it; } } // 2. LRU清理超大的pool if (m_TotalPoolSize > m_MaxPoolSize) { EvictOldestBuffers(m_TotalPoolSize - m_MaxPoolSize); } // 3. 帧计数 m_CurrentFrame++; } private: struct PoolEntry { D3D12Resource* buffer; UINT64 lastUsedFrame; // 最后使用帧 }; struct PendingEntry { D3D12Resource* buffer; UINT64 size; UINT64 fenceValue; D3D12Fence* fence; }; std::unordered_map<UINT64, dynamic_array<PoolEntry>> m_ObjectPool; dynamic_array<PendingEntry> m_PendingList; UINT64 m_TotalPoolSize; UINT64 m_MaxPoolSize; // 最大pool大小 int m_CurrentFrame; }; // 使用示例 void GfxTaskExecutorD3D12::DoAsyncReadback() { auto& pool = AsyncReadbackBufferPool::Instance(); // 1. 获取buffer(避免每帧创建) D3D12Resource* readbackBuffer = pool.Acquire(requiredSize); // 2. 执行copy cmdList->CopyResource(readbackBuffer, sourceResource); // 3. 设置fence D3D12Fence* fence = GetCopyFence(); UINT64 fenceValue = fence->IncrementCounter(); cmdList->SetEventOnCompletion(fenceValue, m_ReadbackEvent); // 4. 释放buffer到pool(复用) pool.Release(readbackBuffer, requiredSize, fence, fenceValue); } ``` ## 其它修改 - 支持启动参数切换dx11/dx12。`-forceD3D11` `-forceD3D11` - 支持dx12 shader feature宏。`SHADER_API_D3D12` - 支持dx12 validtation info打印/拦截。 - 日常crash修复。 - 原理与最佳实践调研。 - 分析graphcis jobs实现,修复graphcis - 梳理线程同步点。`GfxDevice::EndAsyncJobFrame()`。 - 梳理cb流程。[unity中的dx12 resource upload机制](https://funplus.feishu.cn/wiki/L3P0wD8lSiHctpkhEUfcHFe9n7e) - async compute梳理支持验证。[unity dx12 async compute queue](https://funplus.feishu.cn/wiki/WumxwJ9c1iNYFlkVoiTcrR8PnHd) - wave指令梳理支持验证。[dx12 wave intrinsics](https://funplus.feishu.cn/wiki/Y9UTwyQZyiiG4okciKpcqN9JnAd) - 梳理cpu/gpu fence管理并优化。 > - 加gpu wait last frame gpu fence启动参数。 > - 添加控制当前帧gpu是否等待上一帧gpu。 > - `-dx12-graphics-gpu-wait-gpu-last-frame 1`。 > - 主要用于判断bug是否因gpu任务跨帧同步问题导致。 --- # todo - dx12 set render target 比 dx11多30次。 - validation error 需要持续维护 - 已有三线程,但三线程结构需要重新定制。 - 共享资源同步或无锁化。 - 用来同步的ThreadedStreamBuffer需要改掉。 - dx12 main thread wait gpu present平均耗时比dx11多7%。怀疑是cb的update region。 - dx12 cb 更新需要优化。 > - 不支持非0的offset。[yuechen] > - 单帧调用次数太多。memcpy + copyregion --- # 文档 - [D3D12 Memory Allocator (GitHub)](https://github.com/GPUOpen-LibrariesAndSDKs/D3D12MemoryAllocator) - [D3DX12Residency.h (Microsoft)](https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12_residency.h) - [DX12](https://funplus.feishu.cn/wiki/CwNKwmbcsi7jQLkkvGDcA6Danxc) - [dx12 resource memory](https://funplus.feishu.cn/wiki/DVx9wN31Ni3pkEkmznYcMmiYnPc) - [dx12 resource manager](https://funplus.feishu.cn/wiki/ZGK1w9hlziuEyrkgHXtcIEa3nNg) - [dx12 resource manager pool](https://funplus.feishu.cn/wiki/ZyAgwvh3ji5riekHsWxcCvIQnod) - [dx12 resource barrier](https://funplus.feishu.cn/wiki/CCfmwnKU4iA4MHkzRRIcxRZznZe) - [unity dx12 resource barrier 新版](https://funplus.feishu.cn/wiki/E6THwhKzKijzqhkRQ0Fcy5dPn9c) - [unity中的dx12 resource upload机制](https://funplus.feishu.cn/wiki/L3P0wD8lSiHctpkhEUfcHFe9n7e) - [dx12 descriptor heap](https://funplus.feishu.cn/wiki/CNhpwyukmiQsFVk1356coqitntb) - [dx12 descriptor binding](https://funplus.feishu.cn/wiki/Tx8lwMbiciepYwkBMEQczdYhn7d) - [dx12 descriptor bindless](https://funplus.feishu.cn/wiki/O15UwZYpeiuVxukzKmOcQqmSnqc) - [dx12 wave intrinsics](https://funplus.feishu.cn/wiki/Y9UTwyQZyiiG4okciKpcqN9JnAd) - [unity dx12 async compute queue](https://funplus.feishu.cn/wiki/WumxwJ9c1iNYFlkVoiTcrR8PnHd)
Pre: No Post
Next:
cheap light
0
likes
1
Weibo
Wechat
Tencent Weibo
QQ Zone
RenRen
Submit
Sign in
to leave a comment.
No Leanote account?
Sign up now.
0
comments
More...
Table of content
No Leanote account? Sign up now.