Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (2024)

\ArxivPaper\BibtexOrBiblatex\electronicVersion\PrintedOrElectronic

\teaser

Left: Path-traced image of a 36 GB HiP-CT data set, rendered in 95 seconds on a high-end GPU. Right: Gaussian splat representation of the same data set requiring 69 MB and rendered at 60 frames per second. The pixel resolution is 2048x2048.

S. Niedermayr¹\orcid0009-0008-3370-0149, C. Neuhauser¹\orcid0000-0002-0290-1991, K. Petkov²\orcid0009-0008-1914-4625, K. Engel²\orcid0009-0001-1423-898X, R. Westermann¹\orcid0000-0002-3394-0731
¹Technical University of Munich, ²Siemens Healthineers

Abstract

Interactive photorealistic rendering of 3D anatomy is used in medical education to explain the structure of the human body.It is currently restricted to frontal teaching scenarios, where evenwith a powerful GPU and high-speed access to a large storage device where the data set is hosted, interactive demonstrations can hardly be achieved.We present the use of novel view synthesis via compressed 3D Gaussian Splatting (3DGS) to overcome this restriction, and to even enable students to perform cinematic anatomy on lightweight and mobile devices.Our proposed pipeline first finds a set of camera poses that captures all potentially seen structures in the data. High-quality images are then generated with path tracing and converted into a compact 3DGS representation, consuming < 70 MB even for data sets of multiple GBs. This allows for real-time photorealistic novel view synthesis that recovers structures up to the voxel resolution and is almost indistinguishable from the path-traced images.

{CCSXML}

<ccs2012><concept><concept_id>10010147.10010371.10010372</concept_id><concept_desc>Computing methodologiesRendering</concept_desc><concept_significance>100</concept_significance></concept></ccs2012>

\ccsdesc

[300]Computing methodologiesComputer graphics; Rendering

\printccsdesc

1 Introduction

Cinematic Anatomy (CA) is an immersive anatomy learning application,which is designed to improve anatomy education through the use of photorealistic 3D rendering via path-tracing[CEGM16]. Instead of real 3D anatomy models, it utilizes volume data provided by medical scanning devices.The application is used in the field of anatomy education, e.g., in the JKU medSPACE[JKU], a lecture space for teaching anatomy. It is used for teaching the diverse and complex individual human anatomy, anatomical variations, and pathology, to enhance learners’ competency with immersive photorealistic 3D visualization ofdata from real patients.Several studies have shown the benefits of using photo-realistic volumetric rendering of clinical volume data for teaching and understanding anatomy[GES∗18, BKE∗19, SWS∗22].

Additionally to stereoscopic projection modes for frontal education, students need to run CA on their mobile devicesfor personalized learning experiences. In addition to significant performance losses on such devices, the portability of the created contentis however often limited by the data size, especially when employing data from high-resolution imaging modalities like photon-counting CT, 7 Tesla MRI and phase-contrast CT[WTW∗21].Thus, CA is mostly used in frontal teaching scenarios, where the demonstrator uses a powerful GPU and has high-speed access to a large storage device where the data set is stored. As Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices demonstrates, even then is it difficult to render the data at interactive rates.

We demonstrate that differentiable 3DGS[KKLD23], which reconstructs a 3D Gaussian scene representationfrom images of this scene, can address the limitations of CA. The Gaussian representation can be rendered at high speedfrom arbitrary views,avoiding time-consuming path tracing.In combination with compressed 3DGS[NSW24], the memory consumption of the Gaussian representation is significantly reduced, and with GPU rasterization, 3D Gaussian splatting runs efficiently even on mobile devices.Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices demonstrates these properties with a high-resolution CT scan.

Contribution. To use compressed 3DGS for CA, we propose a processing pipeline including the following adaptations:

•
We extend the view selection method proposed by Kopanas and Drettakis[KD23] for volume rendering to automatically find a set of cameras that captures all potentially seen structures under the current transfer function setting.
•
We extend 3DGS with differentiable alpha channel rendering to create background-free reconstructions and drastically improve the reconstruction of translucent materials.

We analyze the quality, performance, and memory requirements with several high-resolution data sets. Training images are rendered with a publically available CA tool.The results demonstrate that the memory requirement is significantly below the initial data size.Since the renderable representation is so small, students can quickly download it over low-bandwidth channels and render on their mobile devices. Rendering performance is about two orders of magnitudes faster than optimized path tracing, with almost no perceptible loss of image quality.

Limitations. The use of 3DGS for CA comes with the following limitations: Firstly, lighting conditions are baked into the 3D Gaussian representation and cannot be changed during rendering. Secondly, due to the use of preset transfer functions and clip planes, the approach is less effective in supporting interactive volume exploration. Overcoming these limitations is difficult, and we discuss possible improvement strategies at the end of our work.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (2)

2 Related Work

3DGS [KKLD23] builds upon elliptical weighted average (EWA) volume splatting [ZPVBG01] to efficiently compute the projections of 3D Gaussian kernels onto the 2D image plane. In addition, the number and parameters of the Gaussian kernels that are used to model the scene are optimized with differentiable rendering.Mip-Splatting[YCH∗23] modifies 3DGS by integrating anti-aliasing with a 3D smoothing and a 2D Mip filter. It achieves improved quality of novel views at scales the Gaussian representation has not been optimized for. A number of approaches have concurrently proposed to convert the 3D Gaussian representations generated by 3DGS into a more compact form [NSW24, LRS∗24]. For typical scenes, the memory requirements of 3DGS is below 50 MB without any noticeable differences in the reconstructed images.

3 Cinematic Anatomy Pipeline

The different stages of the proposed CA pipeline are shown in Fig.1.After loading a data set, one or multiple so-called presets are selected by the user.A preset includes the transfer function setting as well as material classifications and fixed clip planes that are used to reveal certain anatomical structures.

For each preset, multiple views capturing all potentially seen structures in the data are computed (cf.Section3.1).In this way, we recover structures in the final object representation which are not seen when generating images with camera positions on a surrounding sphere.These views are handed over to a physically-based renderer, i.e., a volumetric path tracer, which renders one image for every view using the corresponding preset (cf.Section3.2).

Once the images for a selected preset have been rendered, 3DGS generates a set of 3D Gaussian splats with shape and appearance attributes so that their rendering matches the given images.Once the parameters of the Gaussians are computed via differentiable rendering, they are compressed using sensitivity-aware vector quantization and entropy encoding (cf.Section3.3).The final compressed 3DGS representation is rendered with WebGPU using GPU sorting and rasterization of projected 2D splats, with a pixel shader that evaluates and blends the 2D projections in image space. We embed Mip-Splatting [YCH∗23] to account for different levels of detail and enable smooth transitions when the focal length is increased.

3.1 View Selection

Novel view synthesis requires that all visible scene parts are covered in the training images.Kopanas and Drettakis[KD23] propose an automatic camera placement algorithm for SRNs which maximises observation frequency and angular uniformity. The observation frequency lies between 0 (no camera observes a point) and 1 (all cameras observe a point). The angular uniformity considers the total variation distance between a 2D histogram of the directions of cameras observing a point in spherical coordinates and a uniform distribution.The algorithm iterates over batches of 1000 randomly sampled camera poses. In each iteration the cameras lying inside of or too close to occupied space are rejected. From the remaining camera poses the one is selected resulting in the highest improvement of the reconstruction, measured by observation frequency and angular uniformity.

We propose two modifications of this algorithm for 3DGS of volumetric data sets.Firstly, we observe that due to the high dimensionality of the search space, a huge number of cameras needs to be evaluated and optimal camera poses might be missed. To avoid this, we use Bayesian Optimal Sampling (BOS)[Moc89, Gar23] to adaptively place cameras in regions that are more promising to yield an improved maximum (called exploitation) or in previously rather unexplored regions (called exploration). In this way, the chance of missing optimal camera poses is significantly reduced, and fewer training images need to be generated.Secondly, instead of using a binary visibility indicating whether a point lies inside or outside the camera frustum, we use a continuous visibility that also considers (partial) occlusion. At each voxel, GPU ray marching is performed to compute the maximum transmittance, which is then used as a visibility indicator. For data sets not fitting into GPU memory, we perform this step with a lower-resolution copy.

BOS applies a probabilistic (usually Gaussian) surrogate model and an acquisition function.The former expresses Bayesian belief about the output of the objective function derived from prior evaluations, and the latter is used for selecting the next set of parameters for evaluating the objective function.For the acquisition function, the upper confidence bound[BCdF10] is used with parameter $\kappa=10$ , which controls the trade-off between exploitation and exploration.The value was empirically determined to work well with all test data sets, but can also be subjected to hyperparameter optimization either regarding the energy term by Kopanas and Drettakis[KD23] or the reconstruction quality with respect to images in a training set. We make use of the publically available software library Limbo[CCAM18] to perform the optimization process.

In Fig.2, we showcase the ability of automatic view selection using BOS for improved capturing of internal structures of a concave 2D test data set. When randomly sampling cameras on the hemisphere around an object, structures in the interior are missed. When sampling only a small set of cameras on the hemisphere, BOS selects few additional camera poses which capture insufficiently seen structures on the outside and not yet captured structures on the inside. We demonstrate the resulting improvements in the reconstruction quality with 3DGS in Section4.3. We further compare the convergence rate and performance of BOS and random sampling[KD23] in the supplementary material.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (3)

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (4)

3.2 Image Generation

We render volume data with Monte Carlo volume path tracing from multiple views to generate a set of training images.Delta tracking[WMHL65] is used to determinea scattering event (the Henyey-Greenstein phase-function[HG41]is applied to determine the next ray direction), an absorption event (the path is terminated and the emissive color is regarded as the path contribution) or a null collision (the path is followed unchanged).A surface intersection is assumed when the density gradient magnitude exceeds a user-specified iso-value.Global illumination is then simulated by generating a reflection event for the surface with the new ray direction sampled proportional to the probability density function of a chosen reflectance distribution function. This process is repeated until the ray leaves the volume domain or an absorption event happens.

High dynamic range light maps are employed to look up lighting information from the environment.Next event estimation is used to importance-sample rays towards the light source and potentially reduce the variance of the rendered image.All Monte Carlo samples are accumulated and averaged in a floating-point accumulation buffer. A tone-mapping pass maps the accumulated result into the final lower dynamic range output buffer.For fast image generation, we apply performance optimization methods such as empty-space skipping (based on the transfer function preset) and memory coherent scattering. The latter optimization ensures that rays of neighboring pixels are scattered in the same direction, thus ensuring optimized cache utilization.

3.3 Compressed Differentiable 3D Gaussian Splatting

Differentiable 3DGS describes an objects by a set of 3D Gaussians

G(x)=\alpha e^{-\frac{1}{2}x^{T}\Sigma^{-1}x}.

(1)

Each Gaussian is centered at $x\in\mathbb{R}^{3}$ , and the covariance matrix $\Sigma\in\mathbb{R}^{3\times 3}$ describes its orientation and shape.A Gaussian has an opacity $\alpha\in[0,1]$ , and a view-dependent color that is represented by a set of spherical harmonics (SH) coefficients.

The 2D projection of a 3D Gaussian is a 2D Gaussian with a covariancethat is derivedfrom the view transformation matrix and the Jacobian of the affine approximation of the projective transformation.The scene is rendered by projecting all Gaussian into the image plane in sorted order and blending their contributions.

While Zwicker et al.[ZPVBG01] model a 3D scalar field via a set of 3D Gaussians so that the field can be reconstructed sufficiently well, Kerbl et al.[KKLD23] optimize the position, shape, opacity and SH coefficients of each 3D Gaussian so that their rendering matches a set of initial images of the object. The optimization is performed via differentiable rendering, by taking into account the changes in pixel color due to changes of the 3D Gaussian parameters. The optimization process removes some of the initially selected 3D Gaussians (if no contribution), adaptively splits Gaussians, and modifies their shapes and appearance attributes to minimize an image-based loss function.

To further reduce the memory consumption of 3DGS, we utilize the compression proposed by Niedermayret al.[NSW24].It encodes SH coefficients and Gaussian shape parameters into compact codebooks via sensitivity-aware vector quantization, and then fine-tunes the parameters on the training images. Quantization-aware training [RORF16] is used to represent the scene parameters with fewer bits. We call this strategy High-Rate-compression (HR-compression). We also provide an option that uses only quantization-aware training to reduce all scene parameters but the Gaussians’ positions to an 8-bit representation during optimization. We will subsequently call this strategy High-Quality-compression (HQ-compression).Since CA requires the entire data set in focus, we can omit the scaling factor that is usually stored per Gaussian to represent scenes with objects in focus and surrounding background.

Alpha Channel Reconstruction In contrast to classical novel view synthesis, where only RGB colors are reconstructed, in volume rendering applications also the per-pixel accumulated opacity (i.e., alpha) needs to be reconstructed for blending correctly over the background.Therefore, we extend 3DGS to allow for differentiable rendering of images with alpha channel.We use a combination of per pixel L1 and SSIM Loss to accurately reconstruct the alpha channel of the volume rendered training images. As shown in Section4.4, the reconstruction quality can be improved significantly in this way.

4 Results and evaluation

We analyze the performance, memory consumption and quality of the proposed pipeline for CA with a variety of high-resolution medical data sets showing different anatomical structures. Our 3DGS implementation is a modification of the code provided by Kerblet al.[KKLD23]. For compression and rendering, we use the settings described by Niedermayret al.[NSW24].

4.1 Data Sets

The hierarchical phase-contrast tomography (HiP-CT) data was acquired at the European Synchrotron Radiation Facility (ESRF) in the context of the Human Organ Atlas project[WTW∗21].

Kidney is a HiP-CT scan from beamline 5 of the complete left kidney from body donor LADAF-2020-27 downsampled to 50.16 µm resolution ( $1510\times 1706\times 1415$ voxels in size) and quantized to 8bit precision.

Brain is a HiP-CT scan from beamline 18 of the complete brain of body donor LADAF-2021-17 downsampled for rendering to 46.84 µm resolution ( $3224\times 3224\times 3585$ voxels in size) and quantized to 8bit precision. While the kidney data set is publically available, the brain data has not been published yet.

Body is a human CT angiography scan at resolution $317\times 317\times 835$ from the collection by Wasserthal[Was23], image id s0287. The data set contains some semi-transparent material showing significant differences under directional lighting. We render it under complex lighting conditions to challenge 3DGS’s reconstruction capabilities.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (5)

We show all data sets in Fig.3, and provide an interactive online demonstration at https://anonymous-demo-user.github.io/cinematic-3dgs/. For each data set, between one and three presets have been used, including segmentations, transfer function and lighting conditions. 3DGS optimization has been performed on training images of resolution $2048\times 2048$ .

4.2 Preprocessing

With a GPU providing sufficient RAM, the initial images of all data sets can be generated with the publically available CA package, and using the built-in animation system to generate the views. We have useda research version providing batch rendering support, running on an NVIDIA A100 GPU for Brain and an NVIDIA RTX A5000 for Kidney and Body.

	Path Tracing			3DGS (HR-Compression)
	Size	Time	Views	Size	Time	Gaussians
Brain	36.4 GB	158 Min	99	69 MB	106 Min	4.8 M
Kidney	3.6 GB	6 Min	101	33 MB	53 Min	2.3 M
Body	0.2 GB	23 Min	99	7 MB	50 Min	0.9 M

Table1 shows in columns Size the size of each data set in GB compared to the size of the final Gaussian representation in MB, when compressed using HR-compression.Column Views shows the number of training images used for differentiable Gaussian splatting optimization. Columns Time show the times to render the initial images via path tracing, and the computation times for generating the compressed Gaussian representations. Note that $90\%$ of the latter time are required by the optimization to generate the 3D Gaussian representation, and only about $10\%$ are consumed by the compression.Column Gaussians gives the number of 3D Gaussians in the final representation.As can be seen in columns Size, the compressed Gaussian representation is so small that it can be downloaded over low-bandwidth channels and rendered on mobile devices equipped with mid- or even low-end GPUs.

4.3 View Selection

Automatic view selection is demonstrated with Body, which exhibits a lot of structures that are not visible from cameras placed on an ellipsoid around the volume.As a baseline, we reconstruct the volume with images from 256 randomly placed cameras on the ellipsoid.For comparison, we reduce this number to 128 and generate 128 additional cameras with the proposed view selection algorithm (see Fig.4). As can be seen, overall improved reconstruction quality of parts not seen with random camera selection is achieved.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (6)

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (7)

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (8)

4.4 Quality Evaluation

Fig.5 compares test images that have not been seen during 3DGS optimization to images rendered with HQ- and HR-compressed 3DGS.Close-up views reveal only subtle color shifts between path traced images and images generated via HR-compressed 3DGS. HQ-compression leads to an increase in memory by a factor of three, yet differences in image quality are further reduced and become so small that they are hardly noticeable by eye.

Notably, significant losses in reconstruction quality are introduced when differentiable 3DGS optimizes only for RGB color (see Fig.6 for an example). Extending 3DGS so that also opacity is considered in the optimization process improves greatly the reconstruction quality and removes unwanted artifacts caused by the background.

Table2 shows the average SSIM and PSNR between the test images and the novel views rendered with HR-compressed 3DGS, averaged over all presets. For PSNR and SSIM, only pixels which are not empty (alpha $>0$ ) in the rendered and ground truth image are considered.PSNR (Alpha) measures the PSNR for the alpha channel between rendered images and ground truths.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (10)

	SSIM	PSNR	PSNR (Alpha)
Scene
Brain	0.72	23.23	34.09
Kidney	0.84	25.80	30.20
Body	0.87	26.90	29.57

We further shed light on the capabilities of 3DGS to reconstruct semi-transparent regions in a data set. Body is used with a preset so that certain tissue types in the data set become semi-transparent, see Fig.7.While overall the novel view matches the test images fairly well, the closeup views show that some fine details are not reconstructed accurately, and especially the semi-transparent structures are blurred out. This effect increases with increasing depth complexity, since it becomes more and more difficult for 3DGS to represent all possible color and opacity distributions accurately.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (11)

When using a preset with strong directional lighting from the environment map, one observes some high-frequency illumination variations especially in the volumetric regions. This makes it more difficult for 3DGS to accurately recover the tissue structures. Interestingly, Fig.8 demonstrates that the reconstruction works very well and does not show any severe reconstruction artefacts. At the same time, the semi-transparent regions are again blurred out to a certain extent. We believe that 3DGS has in particular problems with settings where the view rays accumulate matter over a long distance through semi-transparent, yet heterogeneous regions. In such situations, a subtle change of the camera pose can lead to strong changes of the per-pixel accumulated colors and opacities. Thus, 3DGS needs to optimize a significantly increased number of parameters, requiring far more Gaussians to accurately represent the data.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (12)

4.5 Rendering Performance

The WebGPU implementation by Niedermayr et al. [NSW24] is used for performance testing.It enables rendering of compressed 3DGS up to $4\times$ faster than the renderer by Kerblet al.[KKLD23] and runs in a modern browser.

Table3 shows that the rendering times even on an integrated iGPU is higher than 10 frames per second for the biggest data set Brain.On current mid- to high-end GPUs, 60 frames per second can be achieved for all data sets.This makes the CA pipeline especially appealing for applications where stereoscopic rendering is required.While low memory consumption facilitates efficient rendering on mobile devices, for instance, in mobile AR applications, high rendering performance is required to render two images (one for the left and one for right eye) at sufficient frame rates.In a supplementary video we demonstrate rendering performance of roughly 5 to 20 frames per second on a mobile device with Qualcomm Adreno 740 GPU.

	Brain	Body	Kidney
NVIDIA RTX 4070 TI Super	$65$	$226$	$170$
NVIDIA RTX A5000	$68$	$341$	$199$
AMD Ryzen™ 9 7900X iGPU	$12$	$42$	$16$

5 Discussion and Outlook

Our experiments show that compressed 3DGS enables interactive CAwith extremely large data sets,by restricting to static presets.We believe that this limitation is acceptable for educational use since not more than a few presets are usually selected. Since the memory requirement of compressed 3DGS is so low, a separate Gaussian representation can be computed for each preset.

In all experimentswe have simulated static lighting conditions with an environment map that does not change relative to the object.Thus, the objects are seen under the same lighting condition in every view, resulting in rather smooth illumination when changing the camera pose.This, however, changes when a headlight is used, and a point’s illumination varies with varying camera pose (see Fig.9). Notably, while in this situation most regions can be resolved very well by 3DGS, in some other regions the novel views show reconstruction artifacts. The strong variation of the reflected light under an illumination that changes in every image cannot be captured well by 3DGS.One approach we see to address this limitation is via re-lighting. By generating training images with optical material properties instead of illumination, it might be possible to better recover highly varying lighting conditions at runtime.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (13)

A useful component for volume exploration is an interactive clip plane, resulting in object points that were previously unseen to become visible.One possibility to include clip planes is to restrict the plane movement to discrete steps and compute a separate Gaussian representation for each step. While this will significantly increase the memory requirements, we are confident that a fairly compact representation can be obtained by exploiting spatial coherence and progressively encoding the 3D Gaussians that appear and disappear when making subsequent steps.

6 Conclusion

We have demonstrated the use of differentiable 3DGS for novel view synthesis from path traced images of high resolution medical data sets. We have shown that the 3D Gaussian representation can be compressed — at hardly perceivable loss in image quality — to a size that enables download and storage on even mobile devices. The Gaussian representation needs to be re-generated for every selected preset, yet even for many presets the overall memory is still significantly below the memory required by the data set. Computationally expensive path tracing can be avoided at rendering time, enabling fast display on mid- and even low-end devices.

We have also pointed at current limitations of 3DGS for CA. As the most important ones we see the current absence of support for clip planes and the quality degradation when a headlight is used. We have sketched future research directions to address these limitations, and we are confident that improvements can be achieved. There is also a pressing need to handle time-varying data sets, since we see more and more scanning technologies that can accurately measure blood flow and deforming tissue. Tailoring 3DGS for interactively visualizing such dynamic processes is another important goal.

Finally, we want to mention that besides CA we see in-situ visualization as another promising application of 3DGS. For data sets which are simulated on a supercomputer and are so large that they cannot be streamed out, images of the data set can be generated directly on the supercomputer and then streamed out to a system where novel view synthesis is performed. By using advanced implementations of 3DGS optimization, this might even become possible at rates enabling an explorative visual analysis.

References

[BCdF10]Brochu E., Cora V.M., deFreitas N.:A tutorial on bayesian optimization of expensive cost functions, withapplication to active user modeling and hierarchical reinforcement learning.CoRR (2010).arXiv:1012.2599.
[BKE^∗19]Binder J., Krautz C., Engel K., Grützmann R., Fellner F.A., BurgerP.H., Scholz M.:Leveraging medical imaging for medical education — a cinematicrendering-featured lecture.Annals of Anatomy - Anatomischer Anzeiger (2019).doi:https://doi.org/10.1016/j.aanat.2018.12.004.
[CCAM18]Cully A., Chatzilygeroudis K., Allocati F., Mouret J.-B.:Limbo: A Flexible High-performance Library for Gaussian Processesmodeling and Data-Efficient Optimization.The Journal of Open Source Software 3, 26 (2018), 545.doi:10.21105/joss.00545.
[CEGM16]Comaniciu D., Engel K., Georgescu B., Mansi T.:Shaping the future through innovations: From medical imaging toprecision medicine.Medical Image Analysis (2016).doi:https://doi.org/10.1016/j.media.2016.06.016.
[CJ10]Chen M., Jäenicke H.:An information-theoretic framework for visualization.TVCG (2010).
[CXG^∗22]Chen A., Xu Z., Geiger A., Yu J., Su H.:Tensorf: Tensorial radiance fields.In ECCV (2022).
[CZ19]Chen Z., Zhang H.:Learning implicit fields for generative shape modeling.In CVPR (2019).
[FKYT^∗22]Fridovich-Keil S., Yu A., Tancik M., Chen Q., Recht B., Kanazawa A.:Plenoxels: Radiance fields without neural networks.In CVPR (June 2022).
[Gar23]Garnett R.:Bayesian Optimization.Cambridge University Press, 2023.
[GES^∗18]Glemser P.A., Engel K., Simons D., Steffens J., Schlemmer H.-P.,Orakcioglu B.:A new approach for photorealistic visualization of rendered computedtomographyimages.World Neurosurgery (2018).doi:https://doi.org/10.1016/j.wneu.2018.02.174.
[HG41]Henyey L.G., Greenstein J.L.:Diffuse radiation in the galaxy.Astrophysical Journal (Jan. 1941).doi:10.1086/144246.
[HMES20]Hofmann N., Martschinke J., Engel K., Stamminger M.:Neural denoising for path tracing of medical volumetric data.ACM TOG (Aug. 2020).doi:10.1145/3406181.
[IGMM22]Iglesias-Guitian J.A., Mane P., Moon B.:Real-time denoising of volumetric path tracing for direct volumerendering.TVCG (2022).doi:10.1109/TVCG.2020.3037680.
[JKU]JKU:JKU medSPACE.https://ars.electronica.art/futurelab/en/projects-jku-medspace/.Accessed: 2024-05-12.
[JLM^∗23]Jabbireddy S., Li S., Meng X., Terrill J.E., Varshney A.:Accelerated Volume Rendering with Volume Guided Neural Denoising.In EuroVis 2023 - Short Papers (2023), Hoellt T., Aigner W.,Wang B., (Eds.), The Eurographics Association.doi:10.2312/evs.20231042.
[JS06]Ji G., Shen H.-W.:Dynamic view selection for time-varying volumes.TVCG (2006).
[KD23]Kopanas G., Drettakis G.:Improving NeRF Quality by Progressive Camera Placement forFree-Viewpoint Navigation.In VMV (2023), The Eurographics Association.doi:10.2312/vmv.20231222.
[KKLD23]Kerbl B., Kopanas G., Leimkühler T., Drettakis G.:3d gaussian splatting for real-time radiance field rendering.ACM TOG (July 2023).
[LJLB21]Lu Y., Jiang K., Levine J.A., Berger M.:Compressive neural representations of volumetric scalar fields.CGF (2021).
[LRS^∗24]Lee J.C., Rho D., Sun X., Ko J.H., Park E.:Compact 3d gaussian representation for radiance field.In CVPR (2024).
[LSW^∗23]Li L., Shen Z., Wang Z., Shen L., Bo L.:Compressing volumetric radiance fields to 1 mb.In CVPR (June 2023).
[MESK22]Müller T., Evans A., Schied C., Keller A.:Instant neural graphics primitives with a multiresolution hashencoding.ACM TOG (July 2022).doi:10.1145/3528223.3530127.
[MHK^∗19]Martschinke J., Hartnagel S., Keinert B., Engel K., Stamminger M.:Adaptive temporal sampling for volumetric path tracing of medicaldata.CGF (2019).doi:https://doi.org/10.1111/cgf.13771.
[Moc89]Mockus J.:Bayesian Approach to Global Optimization: Theory andApplications.Springer Netherlands, 1989.doi:10.1007/978-94-009-0909-0.
[MON^∗19]Mescheder L., Oechsle M., Niemeyer M., Nowozin S., Geiger A.:Occupancy networks: Learning 3d reconstruction in function space.In CVPR (2019).
[MST^∗20]Mildenhall B., Srinivasan P.P., Tancik M., Barron J.T., RamamoorthiR., Ng R.:NeRF: Representing scenes as neural radiance fields for viewsynthesis.In Computer Vision – ECCV 2020 (2020).doi:10.1007/978-3-030-58452-8_24.
[NDSRJ20]Nimier-David M., Speierer S., Ruiz B., Jakob W.:Radiative backpropagation: An adjoint method for lightning-fastdifferentiable rendering.Transactions on Graphics (Proceedings of SIGGRAPH) (2020).doi:10.1145/3386569.3392406.
[NGHJ18]Novák J., Georgiev I., Hanika J., Jarosz W.:Monte carlo methods for volumetric light transport simulation.In Computer graphics forum (2018), vol.37, Wiley OnlineLibrary, pp.551–576.
[NSJ14]Novák J., Selle A., Jarosz W.:Residual ratio tracking for estimating attenuation in participatingmedia.ACM Trans. Graph. 33, 6 (2014), 179–1.
[NSW24]Niedermayr S., Stumpfegger J., Westermann R.:Compressed 3d gaussian splatting for accelerated novel viewsynthesis.In CVPR (2024).
[PFS^∗19]Park J.J., Florence P., Straub J., Newcombe R., Lovegrove S.:DeepSDF: Learning continuous signed distance functions for shaperepresentation.In CVPR (2019).
[PJH23]Pharr M., Jakob W., Humphreys G.:Physically based rendering: From theory to implementation.MIT Press, 2023.
[RLN^∗23]Rho D., Lee B., Nam S., Lee J.C., Ko J.H., Park E.:Masked wavelet representation for compact neural radiance fields.In CVPR (2023).
[RORF16]Rastegari M., Ordonez V., Redmon J., Farhadi A.:XNOR-Net: ImageNet Classification Using BinaryConvolutional Neural Networks.In ECCV (2016).
[SWS^∗22]Steffen T., Winklhofer S., Starz F., Wiedemeier D., Ahmadli U.,Stadlinger B.:Three-dimensional perception of cinematic rendering versusconventional volume rendering using ct and cbct data of the facial skeleton.Annals of Anatomy - Anatomischer Anzeiger (2022).doi:https://doi.org/10.1016/j.aanat.2022.151905.
[TLB^∗09]Tao Y., Lin H., Bao H., Dong F., Clapworthy G.:Structure-aware viewpoint selection for volume visualization.In 2009 IEEE Pacific Visualization Symposium (2009), IEEE.
[TWC^∗16]Tao Y., Wang Q., Chen W., Wu Y., Lin H.:Similarity voting based viewpoint selection for volumes.In CGF (2016), Wiley Online Library.
[vLMB23]Šmajdek U., Lesar u., Marolt M., Bohak C.:Combined volume and surface rendering with global illuminationcaching.The Visual Computer (June 2023).doi:10.1007/s00371-023-02932-9.
[VMN08]Vázquez P.-P., Monclús E., Navazo I.:Representative views and paths for volume models.In International Symposium on Smart Graphics (2008), Springer.
[Was23]Wasserthal J.:Dataset with segmentations of 117 important anatomical structures in1228 ct images, oct 2023.doi:10.5281/zenodo.10047292.
[WHW22]Weiss S., Hermüller P., Westermann R.:Fast neural representations for direct volume rendering.CGF (2022).doi:https://doi.org/10.1111/cgf.14578.
[WMHL65]Woodco*ck E.R., Murphy T., Hemmings P.J., Longworth T.C.:Techniques used in the GEM code for Monte Carlo neutronicscalculations in reactors and other systems of complex geometry.Applications of Computing Methods to Reactor Problems (1965).
[WTW^∗21]Walsh C., Tafforeau P., Wagner W., Jafree D., Bellier A., Werlein C.,Kühnel M., Boller E., Walker-Samuel S., Robertus J., Long D., Jacob J.,Marussi S., Brown E., Holroyd N., Jonigk D., Ackermann M., Lee P.:Imaging intact human organs with local resolution of cellularstructures using hierarchical phase-contrast tomography.Nature Methods (Nov. 2021).doi:10.1038/s41592-021-01317-x.
[WW22]Weiss S., Westermann R.:Differentiable direct volume rendering.TVCG (2022).doi:10.1109/TVCG.2021.3114769.
[YCH^∗23]Yu Z., Chen A., Huang B., Sattler T., Geiger A.:Mip-splatting: Alias-free 3d gaussian splatting.CoRR (2023).arXiv:2311.16493.
[YLLY19]Yang C., Li Y., Liu C., Yuan X.:Deep learning-based viewpoint recommendation in volume visualization.Journal of Visualization (2019).
[YYS^∗23]Yuan Y., Yang J., Sun Q., Huang Y., Ma S.:Cinematic volume rendering algorithm based on multiple lights photonmapping.Multimedia Tools and Applications (May 2023).doi:10.1007/s11042-023-15075-9.
[ZPVBG01]Zwicker M., Pfister H., VanBaar J., Gross M.:EWA volume splatting.In VIS (2001), IEEE.doi:10.1109/VISUAL.2001.964490.

Supplementary Material

A 3DGS Optimizations

A.1 Volume Guided Initialization

When using 3DGS, an initial set of 3D Gaussian kernels is first selected. These Gaussians are then removed, split or re-positioned, and the shape and appearance of the Gaussian kernels is optimized. Kerbl et al.[KKLD23] obtain the initial positions of the 3D Gaussians from the given images with structure from motion, or with random initialization where Gaussians are randomly positioned in the scene. For volume rendering, we randomly place Gaussians within the volume bounding box and set their initial color to grey. All other parameters are initialized as proposed by Kerbl et al.[KKLD23].

Since in Cinematic Anatomy the 3D object and presets are known, an interesting question is whether the optimization process can be accelerated by initially placing Gaussians at locations were they will end up anyway. Thus, we initially position one Gaussian at every non-empty voxel in a low resolution version of the volume, and set the Gaussians’ initial colors and opacities via the transfer function. Regions that are under-sampled by the initial sampling will be nevertheless represented by Gaussians due to adaptive splitting and relocation during optimization.

In Fig.10 we exemplarily compare the effectiveness of the different initialization schemes based on optimization convergence for one of our test data sets. An initialization with the Gaussians’ positions and colors from a previous reconstruction is used as gold standard. As can be seen, while all initialization techniques reach the same level of fidelity, volume-guided initialization does so with less iteration steps. However, it is fair to say that in all of our experiments the performance improvements were overall not significant, so that we decided to use random initialization in all tests.

Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices (14)

			Duration	Size	Points	Train Images	Test Images	SSIM	PSNR	PSNR (Alpha)
Scene	Preset	Resolution
Brain	1	2k	108 Min	170 MB	5.1 M	87	12	0.64	20.73	32.76
Brain	2	2k	88 Min	156 MB	5.0 M	87	12	0.81	25.78	35.86
Kidney	1	2k	50 Min	73 MB	2.7 M	91	13	0.81	23.93	27.70
Kidney	2	2k	49 Min	60 MB	2.2 M	87	12	0.87	27.89	32.85
Body	1	2k	58 Min	30 MB	1.0 M	87	12	0.89	28.93	31.54
	2	2k	43 Min	28 MB	1.0 M	87	12	0.87	25.38	29.94
	3	2k	43 Min	30 MB	1.0 M	87	12	0.88	27.65	30.40
	4	1k	20 Min	55 MB	1.9 M	224	32	0.86	28.68	25.41

			Size	Gaussians	SSIM	PSNR	PSNR (Alpha)
Scene	Preset	Resolution
Brain	1	2k	69	4.8 M	0.63	20.67	32.48
Brain	2	2k	69	4.8 M	0.81	25.79	35.70
Kidney	1	2k	37	2.6 M	0.81	23.91	27.81
Kidney	2	2k	30	2.1 M	0.87	27.70	32.58
Body	1	2k	7	1.0 M	0.88	28.51	30.60
	2	2k	7	0.9 M	0.86	25.03	29.08
	3	2k	7	1.0 M	0.87	27.17	29.03
	4	1k	22	1.5 M	0.81	26.13	24.46

A.2 Mip Splatting

Scenes rendered with 3DGS can show severe artifacts when novel camera perspectives diverge from those the 3D Gaussian representation was optimized for.Yuet al.[YCH∗23] name the following two reasons for this behavior: Firstly, the 3D Gaussian representation exhibits frequencies that are too high to be faithfully reconstructed by the used sampling rate. Secondly, during splat-based rendering, a 2D dilation filter is applied that causes artefacts when zooming out and 2D splats become too small.

The problem is mitigated by introducing a 3D smoothing (i.e., low-pass) filter which constrains the size of the 3D Gaussians based on the maximal sampling frequency induced by the input views. A 2D Mip filter is applied in image space to avoid under-sampling.We observe that this extension to 3DGS significantly improves the fidelity of the reconstructed volumes for varying zoom levels.

B Performance Statistics and Further Results

In Table4 and Table5, we provide detailed statistics using HR- and HQ-compression for all used presets. Additional qualitative results are shown in Fig.13.

C View Selection

In Fig.11, we compare the convergence rate of BOS-based view selection to the approach of Kopanas and Drettakis[KD23], using preset 4 of Body. The final energy term adapted from Kopanas and Drettakis[KD23] is plotted after selecting a total of 32 cameras using a variable number of tested candidate views. Fig.12 shows the time it takes the view selection algorithms to select the 32 camera poses for these two strategies. BOS is able to find better maxima than pure random sampling of the candidate camera poses as described by Kopanas and Drettakis, albeit at a slightly higher computation time for the same number of generated candidate poses. The tests were run on a system with an AMD Ryzen 9 3900X 12-core (24-thread) CPU and an NVIDIA GeForce RTX 3090 GPU and averaged over 8 different random seeds. The BOS algorithm is run on the CPU, while the computation of the transmittance as described in the main manuscript is performed on the GPU using ray marching of a downscaled version of the volume.