Image and Video Denoiser on CUDA
Image/video denoisers are widely used in many applications. We have developed CUDA-accelerated denoiser kernels that run on existing CUDA hardware from NVIDIA. We have implemented both luma and chroma noise removal and got very high performance both for image and video processing.
CUDA Denoiser Library Features
- Input format: 8/10/12/14/16-bit per channel input data array from CPU or GPU memory
- Output format: 24/48-bit output data array in CPU or GPU memory
- Denoising with 16/32-bit accuracy
- Denoising algorithms
- Wavelet denoiser (raw and rgb) CDF 5/3 and CDF 9/7 with Hard, Soft, Garrote thresholding
- Bilateral denoiser
- NLM denoiser
- Compatibility with FastVCR software for machine vision cameras
- Timing and performance measurements
- Compatibility with Windows-10/11, Linux Ubuntu and L4T (Jetson)
Benchmarks for fast image and video denoiser on CUDA
Image resolution: 4112×2176 (8.9 MPix), 16-bit per channel, RGB
Test description: all data in GPU memory, timing includes GPU computations only
Wavelet transform: CDF 9/7
Number or DWT resolutions: up to 7
DWT thresholds for YCbCr: 80;150;150
NLM denoiser parameters: windows 3×3 and 3×3, strength 500
Bilateral denoiser parameters: 3×3, sigmaColor 5, sigmaSpace 500
Software: OS Windows-10, CUDA-12.6
Hardware: NVIDIA GeForce RTX 4090
- RAW DWT denoiser - 1.8 ms (4.9 GPix/s)
- DWT denoiser (YCbCr, 4:4:4) - 3.05 ms (2.9 GPix/s)
- NLM denoiser (RGB) - 0.19 ms (40 GPix/s)
- NLM denoiser (YCbCr, 4:2:0) - 0.20 ms (40 GPix/s)
- NLM denoiser (YCbCr, 4:4:4) - 0.37 ms (21 GPix/s)
- Bilateral denoiser (RGB) - 0.13 ms (61 GPix/s)
The above results are comparable with the processing time of our best MG debayer algorithm which is around 0.6 ms (which is 13 GPix/s) for that image on that GPU.
We have designed that software as a part of our GPU Image & Video Processing SDK. Now our customers have opportunity to utilize GPU-accelerated denoiser in their applications as a part of general image processing pipeline.
Testing
To test our CUDA denoisers, please download FastVCR software.
CUDA-based denoising roadmap
- Acceleration of NLM and Bilateral denoisers - done
- Temporal denoiser on CUDA - in progress
|