Skip to content

Conversation

@Przemog1
Copy link
Contributor

No description provided.

Comment on lines 296 to 298
Accumulator accumulator;
accumulator.initialize(accumulatorInitData);
//scalar_type meanLumaSq = 0.0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take accumulator byt deferend from the outside, otherwise static polymorphism gets harder with stateful accumulators

Consider this scenario, I already have an accumulator, and I just want to add a fwe samples


// Li
measure_type getMeasure(uint32_t numSamples, uint32_t depth, NBL_CONST_REF_ARG(scene_type) scene)
output_storage_type getMeasure(uint32_t numSamples, uint32_t depth, NBL_CONST_REF_ARG(scene_type) scene, NBL_REF_ARG(typename Accumulator::initialization_data) accumulatorInitData)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one suggestion to refactor, take sampleIndex (the i in the loop) and do the loop outside the path tracer (initialize the accumulator outside as well)

Also I know its not your code, but better rename depth to maxDepth

Comment on lines 302 to 326
return Li;
return accumulator.accumulation;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take accumulator by reference and make the function void

};

std::array<ICPUDescriptorSetLayout::SBinding, 1> descriptorSet0Bindings = {};
std::array<ICPUDescriptorSetLayout::SBinding, 1> descriptorSet1Bindings = {};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add a new set, reuse the binding for the output image for the cascades as I wrote about the layers == CascadeCount in other comments

params.shader.entryPoint = "main";
params.shader.entries = nullptr;
params.shader.requireFullSubgroups = true;
params.shader.requiredSubgroupSize = static_cast<IGPUShader::SSpecInfo::SUBGROUP_SIZE>(5);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is bad and breaks on other GPUs than NV and RDNA, I know its @keptsecret's "fault" but lets fix, make the subgroup size the log2 of the minimum the device reports (for best divergence handling)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice code deduplication btw :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is bad and breaks on other GPUs than NV and RDNA, I know its @keptsecret's "fault" but lets fix, make the subgroup size the log2 of the minimum the device reports (for best divergence handling)

how do i query this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is bad and breaks on other GPUs than NV and RDNA, I know its @keptsecret's "fault" but lets fix, make the subgroup size the log2 of the minimum the device reports (for best divergence handling)

how do i query this?

its in physical device limits, copy it from FFT example

params[i].shader.requiredSubgroupSize = static_cast<IPipelineBase::SUBGROUP_SIZE>(hlsl::findMSB(deviceLimits.maxSubgroupSize));

but make it the minimum device supports instead

Comment on lines 799 to 805
imgViewInfo.viewType = IGPUImageView::ET_2D;
}
else
{
imgViewInfo.subresourceRange.layerCount = CascadeSize;
imgViewInfo.viewType = IGPUImageView::ET_2D_ARRAY;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify code, always make it a 2D Array (you can just have 1 layer, no problem)

Comment on lines 1409 to 1657
cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE);
// disregard surface/swapchain transformation for now
const auto viewProjectionMatrix = m_camera.getConcatenatedMatrix();
viewProjectionMatrix.getInverseTransform(pc.invMVP);
pc.sampleCount = spp;
pc.depth = depth;

// safe to proceed
// upload buffer data
cmdbuf->beginDebugMarker("ComputeShaderPathtracer IMGUI Frame");
cmdbuf->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT);

// TRANSITION m_outImgView to GENERAL (because of descriptorSets0 -> ComputeShader Writes into the image)
{
const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = {
{
.barrier = {
.dep = {
.srcStageMask = PIPELINE_STAGE_FLAGS::ALL_TRANSFER_BITS,
.srcAccessMask = ACCESS_FLAGS::TRANSFER_WRITE_BIT,
.dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.dstAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS
}
},
.image = m_outImgView->getCreationParameters().image.get(),
.subresourceRange = {
.aspectMask = IImage::EAF_COLOR_BIT,
.baseMipLevel = 0u,
.levelCount = 1u,
.baseArrayLayer = 0u,
.layerCount = 1u
},
.oldLayout = IImage::LAYOUT::UNDEFINED,
.newLayout = IImage::LAYOUT::GENERAL
}
};
cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers });
}

// cube envmap handle
{
IGPUComputePipeline* pipeline;
if (usePersistentWorkGroups)
pipeline = renderMode == E_RENDER_MODE::ERM_HLSL ? m_PTHLSLPersistentWGPipelines[PTPipeline].get() : m_PTGLSLPersistentWGPipelines[PTPipeline].get();
else
pipeline = renderMode == E_RENDER_MODE::ERM_HLSL ? m_PTHLSLPipelines[PTPipeline].get() : m_PTGLSLPipelines[PTPipeline].get();
cmdbuf->bindComputePipeline(pipeline);
cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get());
cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 2u, 1u, &m_descriptorSet2.get());
cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(RenderPushConstants), &pc);

// TODO: shouldn't it be computed only at initialization stage and on window resize?
const uint32_t dispatchSize = usePersistentWorkGroups ?
m_physicalDevice->getLimits().computeOptimalPersistentWorkgroupDispatchSize(WindowDimensions.x * WindowDimensions.y, DefaultWorkGroupSize) :
1 + (WindowDimensions.x * WindowDimensions.y - 1) / DefaultWorkGroupSize;

cmdbuf->dispatch(dispatchSize, 1u, 1u);
}

// TRANSITION m_outImgView to READ (because of descriptorSets0 -> ComputeShader Writes into the image)
{
const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = {
{
.barrier = {
.dep = {
.srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS,
.dstStageMask = PIPELINE_STAGE_FLAGS::FRAGMENT_SHADER_BIT,
.dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS
}
},
.image = m_outImgView->getCreationParameters().image.get(),
.subresourceRange = {
.aspectMask = IImage::EAF_COLOR_BIT,
.baseMipLevel = 0u,
.levelCount = 1u,
.baseArrayLayer = 0u,
.layerCount = 1u
},
.oldLayout = IImage::LAYOUT::GENERAL,
.newLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL
}
};
cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers });
}


}

void beginCommandBufferAndDispatchPathracerPipelineUseRWMC(IGPUCommandBuffer* cmdbuf)
{
if (renderMode != E_RENDER_MODE::ERM_HLSL)
{
m_logger->log("Only HLSL render mode is supported.", ILogger::ELL_ERROR);
std::exit(-1);
}

cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE);
// disregard surface/swapchain transformation for now
const auto viewProjectionMatrix = m_camera.getConcatenatedMatrix();
viewProjectionMatrix.getInverseTransform(rwmcPushConstants.invMVP);

rwmcPushConstants.start = rwmcCascadeStart;
rwmcPushConstants.depth = depth;
rwmcPushConstants.sampleCount = resolvePushConstants.sampleCount = spp;
rwmcPushConstants.base = resolvePushConstants.base = rwmcCascadeBase;
resolvePushConstants.minReliableLuma = rwmcMinReliableLuma;
rwmcPushConstants.kappa = resolvePushConstants.kappa = rwmcKappa;

// safe to proceed
// upload buffer data
cmdbuf->beginDebugMarker("ComputeShaderPathtracer IMGUI Frame");
cmdbuf->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT);

// TRANSITION m_outImgView to GENERAL (because of descriptorSets0 -> ComputeShader Writes into the image)
{
const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = {
{
.barrier = {
.dep = {
.srcStageMask = PIPELINE_STAGE_FLAGS::ALL_TRANSFER_BITS,
.srcAccessMask = ACCESS_FLAGS::TRANSFER_WRITE_BIT,
.dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.dstAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS
}
},
.image = m_outImgView->getCreationParameters().image.get(),
.subresourceRange = {
.aspectMask = IImage::EAF_COLOR_BIT,
.baseMipLevel = 0u,
.levelCount = 1u,
.baseArrayLayer = 0u,
.layerCount = 1u
},
.oldLayout = IImage::LAYOUT::UNDEFINED,
.newLayout = IImage::LAYOUT::GENERAL
}
};
cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers });
}

// transit m_cascadeView layout to GENERAL, block until previous shader is done with reading from cascade
{
const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> cascadeBarrier[] = {
{
.barrier = {
.dep = {
.srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.srcAccessMask = ACCESS_FLAGS::NONE,
.dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.dstAccessMask = ACCESS_FLAGS::NONE
}
},
.image = m_cascadeView->getCreationParameters().image.get(),
.subresourceRange = {
.aspectMask = IImage::EAF_COLOR_BIT,
.baseMipLevel = 0u,
.levelCount = 1u,
.baseArrayLayer = 0u,
.layerCount = CascadeSize
},
.oldLayout = IImage::LAYOUT::UNDEFINED,
.newLayout = IImage::LAYOUT::GENERAL
}
};
cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = cascadeBarrier });
}

// TODO: shouldn't it be computed only at initialization stage and on window resize?
const uint32_t dispatchSize = usePersistentWorkGroups ?
m_physicalDevice->getLimits().computeOptimalPersistentWorkgroupDispatchSize(WindowDimensions.x * WindowDimensions.y, DefaultWorkGroupSize) :
1 + (WindowDimensions.x * WindowDimensions.y - 1) / DefaultWorkGroupSize;

{
IGPUComputePipeline* pipeline = usePersistentWorkGroups ? m_PTHLSLPersistentWGPipelinesRWMC[PTPipeline].get() : m_PTHLSLPipelinesRWMC[PTPipeline].get();

cmdbuf->bindComputePipeline(pipeline);
cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get());
cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 1u, 1u, &m_descriptorSet1.get());
cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 2u, 1u, &m_descriptorSet2.get());
cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(RenderRWMCPushConstants), &rwmcPushConstants);

cmdbuf->dispatch(dispatchSize, 1u, 1u);
}

// m_cascadeView synchronization - wait for previous compute shader to write into the cascade
// TODO: create this and every other barrier once outside of the loop?
{
const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> cascadeBarrier[] = {
{
.barrier = {
.dep = {
.srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS,
.dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS
}
},
.image = m_cascadeView->getCreationParameters().image.get(),
.subresourceRange = {
.aspectMask = IImage::EAF_COLOR_BIT,
.baseMipLevel = 0u,
.levelCount = 1u,
.baseArrayLayer = 0u,
.layerCount = CascadeSize
}
}
};
cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = cascadeBarrier });
}

// reweighting
{
IGPUComputePipeline* pipeline = usePersistentWorkGroups ? m_resolvePersistentWGPipeline.get() : m_resolvePipeline.get();

cmdbuf->bindComputePipeline(pipeline);
cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get());
cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 1u, 1u, &m_descriptorSet1.get());
cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(ResolvePushConstants), &resolvePushConstants);

cmdbuf->dispatch(dispatchSize, 1u, 1u);
}

// TRANSITION m_outImgView to READ (because of descriptorSets0 -> ComputeShader Writes into the image)
{
const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = {
{
.barrier = {
.dep = {
.srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
.srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS,
.dstStageMask = PIPELINE_STAGE_FLAGS::FRAGMENT_SHADER_BIT,
.dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS
}
},
.image = m_outImgView->getCreationParameters().image.get(),
.subresourceRange = {
.aspectMask = IImage::EAF_COLOR_BIT,
.baseMipLevel = 0u,
.levelCount = 1u,
.baseArrayLayer = 0u,
.layerCount = 1u
},
.oldLayout = IImage::LAYOUT::GENERAL,
.newLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL
}
};
cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers });
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deduplicate this code a bit, I think everything up to the resolve shader can stay as is, and the only diff between RWMC and no RWMC should be the push constants you set on the main PT shader

@keptsecret keptsecret merged commit d4e5754 into hlsl_path_tracer Nov 14, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants