-
Notifications
You must be signed in to change notification settings - Fork 15
RWMC #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RWMC #218
Conversation
| Accumulator accumulator; | ||
| accumulator.initialize(accumulatorInitData); | ||
| //scalar_type meanLumaSq = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take accumulator byt deferend from the outside, otherwise static polymorphism gets harder with stateful accumulators
Consider this scenario, I already have an accumulator, and I just want to add a fwe samples
|
|
||
| // Li | ||
| measure_type getMeasure(uint32_t numSamples, uint32_t depth, NBL_CONST_REF_ARG(scene_type) scene) | ||
| output_storage_type getMeasure(uint32_t numSamples, uint32_t depth, NBL_CONST_REF_ARG(scene_type) scene, NBL_REF_ARG(typename Accumulator::initialization_data) accumulatorInitData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion to refactor, take sampleIndex (the i in the loop) and do the loop outside the path tracer (initialize the accumulator outside as well)
Also I know its not your code, but better rename depth to maxDepth
| return Li; | ||
| return accumulator.accumulation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take accumulator by reference and make the function void
31_HLSLPathTracer/main.cpp
Outdated
| }; | ||
|
|
||
| std::array<ICPUDescriptorSetLayout::SBinding, 1> descriptorSet0Bindings = {}; | ||
| std::array<ICPUDescriptorSetLayout::SBinding, 1> descriptorSet1Bindings = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add a new set, reuse the binding for the output image for the cascades as I wrote about the layers == CascadeCount in other comments
31_HLSLPathTracer/main.cpp
Outdated
| params.shader.entryPoint = "main"; | ||
| params.shader.entries = nullptr; | ||
| params.shader.requireFullSubgroups = true; | ||
| params.shader.requiredSubgroupSize = static_cast<IGPUShader::SSpecInfo::SUBGROUP_SIZE>(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is bad and breaks on other GPUs than NV and RDNA, I know its @keptsecret's "fault" but lets fix, make the subgroup size the log2 of the minimum the device reports (for best divergence handling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice code deduplication btw :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is bad and breaks on other GPUs than NV and RDNA, I know its @keptsecret's "fault" but lets fix, make the subgroup size the log2 of the minimum the device reports (for best divergence handling)
how do i query this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is bad and breaks on other GPUs than NV and RDNA, I know its @keptsecret's "fault" but lets fix, make the subgroup size the log2 of the minimum the device reports (for best divergence handling)
how do i query this?
its in physical device limits, copy it from FFT example
Nabla-Examples-and-Tests/28_FFTBloom/main.cpp
Line 732 in e5d5ae2
| params[i].shader.requiredSubgroupSize = static_cast<IPipelineBase::SUBGROUP_SIZE>(hlsl::findMSB(deviceLimits.maxSubgroupSize)); |
but make it the minimum device supports instead
31_HLSLPathTracer/main.cpp
Outdated
| imgViewInfo.viewType = IGPUImageView::ET_2D; | ||
| } | ||
| else | ||
| { | ||
| imgViewInfo.subresourceRange.layerCount = CascadeSize; | ||
| imgViewInfo.viewType = IGPUImageView::ET_2D_ARRAY; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify code, always make it a 2D Array (you can just have 1 layer, no problem)
31_HLSLPathTracer/main.cpp
Outdated
| cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE); | ||
| // disregard surface/swapchain transformation for now | ||
| const auto viewProjectionMatrix = m_camera.getConcatenatedMatrix(); | ||
| viewProjectionMatrix.getInverseTransform(pc.invMVP); | ||
| pc.sampleCount = spp; | ||
| pc.depth = depth; | ||
|
|
||
| // safe to proceed | ||
| // upload buffer data | ||
| cmdbuf->beginDebugMarker("ComputeShaderPathtracer IMGUI Frame"); | ||
| cmdbuf->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT); | ||
|
|
||
| // TRANSITION m_outImgView to GENERAL (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::ALL_TRANSFER_BITS, | ||
| .srcAccessMask = ACCESS_FLAGS::TRANSFER_WRITE_BIT, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::UNDEFINED, | ||
| .newLayout = IImage::LAYOUT::GENERAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } | ||
|
|
||
| // cube envmap handle | ||
| { | ||
| IGPUComputePipeline* pipeline; | ||
| if (usePersistentWorkGroups) | ||
| pipeline = renderMode == E_RENDER_MODE::ERM_HLSL ? m_PTHLSLPersistentWGPipelines[PTPipeline].get() : m_PTGLSLPersistentWGPipelines[PTPipeline].get(); | ||
| else | ||
| pipeline = renderMode == E_RENDER_MODE::ERM_HLSL ? m_PTHLSLPipelines[PTPipeline].get() : m_PTGLSLPipelines[PTPipeline].get(); | ||
| cmdbuf->bindComputePipeline(pipeline); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 2u, 1u, &m_descriptorSet2.get()); | ||
| cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(RenderPushConstants), &pc); | ||
|
|
||
| // TODO: shouldn't it be computed only at initialization stage and on window resize? | ||
| const uint32_t dispatchSize = usePersistentWorkGroups ? | ||
| m_physicalDevice->getLimits().computeOptimalPersistentWorkgroupDispatchSize(WindowDimensions.x * WindowDimensions.y, DefaultWorkGroupSize) : | ||
| 1 + (WindowDimensions.x * WindowDimensions.y - 1) / DefaultWorkGroupSize; | ||
|
|
||
| cmdbuf->dispatch(dispatchSize, 1u, 1u); | ||
| } | ||
|
|
||
| // TRANSITION m_outImgView to READ (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::FRAGMENT_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::GENERAL, | ||
| .newLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } | ||
|
|
||
|
|
||
| } | ||
|
|
||
| void beginCommandBufferAndDispatchPathracerPipelineUseRWMC(IGPUCommandBuffer* cmdbuf) | ||
| { | ||
| if (renderMode != E_RENDER_MODE::ERM_HLSL) | ||
| { | ||
| m_logger->log("Only HLSL render mode is supported.", ILogger::ELL_ERROR); | ||
| std::exit(-1); | ||
| } | ||
|
|
||
| cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE); | ||
| // disregard surface/swapchain transformation for now | ||
| const auto viewProjectionMatrix = m_camera.getConcatenatedMatrix(); | ||
| viewProjectionMatrix.getInverseTransform(rwmcPushConstants.invMVP); | ||
|
|
||
| rwmcPushConstants.start = rwmcCascadeStart; | ||
| rwmcPushConstants.depth = depth; | ||
| rwmcPushConstants.sampleCount = resolvePushConstants.sampleCount = spp; | ||
| rwmcPushConstants.base = resolvePushConstants.base = rwmcCascadeBase; | ||
| resolvePushConstants.minReliableLuma = rwmcMinReliableLuma; | ||
| rwmcPushConstants.kappa = resolvePushConstants.kappa = rwmcKappa; | ||
|
|
||
| // safe to proceed | ||
| // upload buffer data | ||
| cmdbuf->beginDebugMarker("ComputeShaderPathtracer IMGUI Frame"); | ||
| cmdbuf->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT); | ||
|
|
||
| // TRANSITION m_outImgView to GENERAL (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::ALL_TRANSFER_BITS, | ||
| .srcAccessMask = ACCESS_FLAGS::TRANSFER_WRITE_BIT, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::UNDEFINED, | ||
| .newLayout = IImage::LAYOUT::GENERAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } | ||
|
|
||
| // transit m_cascadeView layout to GENERAL, block until previous shader is done with reading from cascade | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> cascadeBarrier[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::NONE, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::NONE | ||
| } | ||
| }, | ||
| .image = m_cascadeView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = CascadeSize | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::UNDEFINED, | ||
| .newLayout = IImage::LAYOUT::GENERAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = cascadeBarrier }); | ||
| } | ||
|
|
||
| // TODO: shouldn't it be computed only at initialization stage and on window resize? | ||
| const uint32_t dispatchSize = usePersistentWorkGroups ? | ||
| m_physicalDevice->getLimits().computeOptimalPersistentWorkgroupDispatchSize(WindowDimensions.x * WindowDimensions.y, DefaultWorkGroupSize) : | ||
| 1 + (WindowDimensions.x * WindowDimensions.y - 1) / DefaultWorkGroupSize; | ||
|
|
||
| { | ||
| IGPUComputePipeline* pipeline = usePersistentWorkGroups ? m_PTHLSLPersistentWGPipelinesRWMC[PTPipeline].get() : m_PTHLSLPipelinesRWMC[PTPipeline].get(); | ||
|
|
||
| cmdbuf->bindComputePipeline(pipeline); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 1u, 1u, &m_descriptorSet1.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 2u, 1u, &m_descriptorSet2.get()); | ||
| cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(RenderRWMCPushConstants), &rwmcPushConstants); | ||
|
|
||
| cmdbuf->dispatch(dispatchSize, 1u, 1u); | ||
| } | ||
|
|
||
| // m_cascadeView synchronization - wait for previous compute shader to write into the cascade | ||
| // TODO: create this and every other barrier once outside of the loop? | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> cascadeBarrier[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS | ||
| } | ||
| }, | ||
| .image = m_cascadeView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = CascadeSize | ||
| } | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = cascadeBarrier }); | ||
| } | ||
|
|
||
| // reweighting | ||
| { | ||
| IGPUComputePipeline* pipeline = usePersistentWorkGroups ? m_resolvePersistentWGPipeline.get() : m_resolvePipeline.get(); | ||
|
|
||
| cmdbuf->bindComputePipeline(pipeline); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 1u, 1u, &m_descriptorSet1.get()); | ||
| cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(ResolvePushConstants), &resolvePushConstants); | ||
|
|
||
| cmdbuf->dispatch(dispatchSize, 1u, 1u); | ||
| } | ||
|
|
||
| // TRANSITION m_outImgView to READ (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::FRAGMENT_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::GENERAL, | ||
| .newLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deduplicate this code a bit, I think everything up to the resolve shader can stay as is, and the only diff between RWMC and no RWMC should be the push constants you set on the main PT shader
No description provided.