Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
126 views10 pages

Gaussian Ellipsoid

The document presents an ellipsoid-based projection method to enhance the rendering quality of 3D Gaussian Splatting (3DGS) by addressing errors caused by the affine approximation of projection transformations. This method improves rendering quality and speed, demonstrating superior performance over existing techniques like 3DGS and Mip-Splatting. The proposed approach is applicable to various 3DGS-based works, making it a versatile solution for novel view synthesis in computer vision and graphics.

Uploaded by

Peter Ip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views10 pages

Gaussian Ellipsoid

The document presents an ellipsoid-based projection method to enhance the rendering quality of 3D Gaussian Splatting (3DGS) by addressing errors caused by the affine approximation of projection transformations. This method improves rendering quality and speed, demonstrating superior performance over existing techniques like 3DGS and Mip-Splatting. The proposed approach is applicable to various 3DGS-based works, making it a versatile solution for novel view synthesis in computer vision and graphics.

Uploaded by

Peter Ip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation

Han Qi1 Tao Cai2 Xiyue Han3


1
Beijing Institute of Technology
arXiv:2411.07579v1 [cs.CV] 12 Nov 2024

3dgs ours
PSNR: 24.40 PSNR: 24.48
SSIM: 0.702 SSIM: 0.705
LPIPS: 0.320 LPIPS: 0.316

mip-splatting mip-splatting+ours
PSNR: 24.46 PSNR: 24.52
SSIM: 0.702 SSIM: 0.705
LPIPS: 0.322 LPIPS: 0.317 ground truth

Figure 1. Our method achieves a comprehensive improvement in rendering quality compared to 3D Gaussian splatting (3D-GS) [24] while
also accelerating rendering speed. We propose an ellipsoid-based projection method to replace the Jacobian of the affine approximation
of the projection transformation in 3DGS. Our ellipsoid-based projection method can be applied to any 3DGS-based work to enhance
rendering quality. This figure shows the rendering results of applying our method to 3DGS and Mip-Splatting [52] in the scene bicycle of
Mip-NeRF360 dataset [3], in witch the rendering quality is enhanced with much less blur and artifacts.

Abstract 1. Introduction

Novel View Synthesis (NVS) plays a crucial role in com-


Recently, 3D Gaussian Splatting has dominated novel- puter vision and computer graphics, with numerous appli-
view synthesis with its real-time rendering speed and state- cations, including robotics, virtual reality, and 3D gaming.
of-the-art rendering quality. However, during the rendering One of the most influential works in this field is the Neu-
process, the use of the Jacobian of the affine approximation ral Radiance Field (NeRF) [36], proposed by Mildenhall et
of the projection transformation leads to inevitable errors, al. in 2020. NeRF utilizes a Multilayer Perceptron (MLP)
resulting in blurriness, artifacts and a lack of scene con- to store geometric and appearance information of a scene
sistency in the final rendered images. To address this is- and employs differentiable volume rendering [9, 22, 29, 35].
sue, we introduce an ellipsoid-based projection method to Although NeRF and its extensions can render high-quality
calculate the projection of Gaussian ellipsoid on the im- images, their training time is excessively long, and the ren-
age plane, witch is the primitive of 3D Gaussian Splatting. dering speed is far from meeting the standards of real-time
As our proposed ellipsoid-based projection method cannot rendering (≥ 30 fps). Recently, 3D Gaussian Splatting
handle Gaussian ellipsoids with camera origins inside them (3DGS) [24] has made a significant impact in NVS due to
or parts lying below z = 0 plane in the camera space, we its real-time rendering speed, high-quality rendering results
designed a pre-filtering strategy. Experiments over multi- and competitive training times. Unlike NeRF, 3DGS repre-
ple widely adopted benchmark datasets show that using our sents scenes using a set of Gaussian ellipsoids explicitly. By
ellipsoid-based projection method can enhance the render- projecting each Gaussian ellipsoid onto the image plane and
ing quality of 3D Gaussian Splatting and its extensions. using α-blending for rendering, the properties of each Gaus-
sian ellipsoid, position, pose, scale, transparency, and color,

1
are optimized based on a multi-view photometric loss. • We only modified the code related to the projection part
Although 3D Gaussian Splatting has shown impressive in 3DGS, making our ellipsoid-based projection method
results, due to the local affine approximation of the pro- easily transferable to other works based on 3DGS to fur-
jection transformation at the center of each Gaussian ellip- ther improve their rendering quality.
soid during rendering, errors are inevitably introduced, neg-
atively affecting rendering quality and scene consistency. 2. Related Work
We observed blurriness and artifacts in distant objects in
the scene, which we attribute to larger errors in the approx- 2.1. Novel View Synthesis
imated projection transformation as the distance from the The goal of Novel View Synthesis (NVS) is to gener-
Gaussian ellipsoid center increases, particularly since dis- ate images from new perspectives different from those of
tant Gaussian ellipsoids are generally large. Additionally, the captured images. There have been notable progres-
while the details rendered in the training set are of high sions in NVS, especially since the introduction of Neu-
quality, the results in the test set show a decline, likely due ral Radiance Fields (NeRF) [36]. NeRF uses a Multi-
to the lack of scene consistency on account of the approxi- Layer Perceptron (MLP) to represent represent geometry
mated projection transformation. and view-dependent appearance, optimized through vol-
To solve this problem, we propose an ellipsoid-based ume rendering techniques [9, 22, 29, 35] to achieve high-
projection method. Our core idea is to calculate the ellipse quality rendering results. However, during the rendering
equation projected onto the image plane based on the work process, the geometric and appearance information for each
of David Eberly [10], given the equation of the ellipsoid and point along the ray must be obtained through a complex
the image plane. We first derive the gaussian ellipsoid equa- MLP network, resulting in slow rendering speeds. Subse-
tion from the covariance matrix of the 3D Gaussian func- quent work has utilized distillation and baking techniques
tion. Then we find the equation of the cone formed by lines [20, 41, 42, 49, 51] to improve NeRF’s rendering speed,
that pass through the camera’s origin and are tangent to the usually sacrificing rendering quality. Furthermore, using
ellipsoid surface. Finally, we determine the intersection line feature-grid representations [6, 14, 26, 38, 45] have in-
of this cone with the image plane, which gives us the pro- creased the training speed of NeRF. In addition, some ex-
jected ellipse equation. tended works have further improved rendering quality [2–4]
During our experiments, we discovered that there are two achieving state-of-the-art performance.
types of Gaussian ellipsoids that will cause the training pro- Recently, 3D Gaussian splatting [24] has emerged as a
cess to diverge and negatively impact the system. The first method for representing intricate scenes using 3D Gaus-
type has the camera’s origin inside the ellipsoid, where no sian ellipsoids. This approach has shown remarkable results
lines through the camera’s origin can be tangent to it. The in NVS, enabling efficient optimization and rendering high
second type consists of Gaussian ellipsoids that have a por- quality images in real-time. This method has rapidly been
tion below z = 0 plane in the camera space, the projec- extended to various domains [7, 18, 23, 28, 40, 46, 48, 54].
tion of these ellipsoids results in hyperbola or parabola [10] Most related work enhances the rendering quality of 3D
rather than an ellipse. To avoid negatively impacting the Gaussian splatting through anti-aliasing [30, 52, 53], com-
system, we designed filtering algorithms specifically for bining it with other techniques like NeRF [12, 31, 34], or
these two types of Gaussian ellipsoids. Extensive experi- proposing new Gaussian augmentation and reduction strate-
ments demonstrated that our method not only improves ren- gies. Additionally, others improve rendering quality by reg-
dering quality compared to 3DGS but also further acceler- ularization [53], or modifying Gaussian properties [21, 47].
ates rendering speed. Recently, some works [8, 33, 37, 55] has adopted ray trac-
In summary, we make the following contributions: ing to replace the rasterization-based rendering method of
• We proposed an ellipsoid-based projection method, ad- 3DGS to improve rendering quality, but this has also led to
dressing the negative impact on rendering quality and a significant decrease in rendering speed. In this work, we
scene consistency caused by approximating the projection propose a new projection method to optimize the rendering
transformation using the Jacobian of its affine approxima- process and enhance rendering quality.
tion in 3DGS.
2.2. Primitive-based Differentiable Rendering
• We design a pre-filtering strategy for Gaussian ellip-
soids that cannot be projected before the rendering pro- Primitive-based rendering techniques, which project geo-
cess, enhancing the system’s robustness and contributing metric primitives onto the image plane, have been widely
to faster rendering speed. studied for their efficiency [16, 17, 39, 43, 56, 57]. Differ-
• Experiments conducted on challenging benchmark entiable point-based rendering methods provide significant
datasets demonstrated that our method surpasses 3DGS flexibility in representing complex structures, making them
in both rendering quality and rendering speed. ideal for novel view synthesis. Notably, NPBG [1] raster-

2
izes features from point clouds onto the image plane and
then employs a convolutional neural network for RGB im- Σ = RSST RT (1)
age prediction. DSS [50] aims to optimize oriented point
The appearance components of each Gaussian ellipsoid
clouds derived from multi-view images under specific light-
are represented by α ∈ [0, 1] and spherical harmonics (SH).
ing conditions. Pulsar [27] presents a tile-based acceler-
α controls the transparency of the Gaussian ellipsoid, and
ation structure to enhance the efficiency of rasterization.
SH controls the color distribution of the ellipsoid from dif-
More recently, 3DGS [24] projects anisotropic Gaussian
ferent viewpoints.
ellipsoids onto the image plane and uses α-blending tech-
3DGS generates a set of sparse point clouds through
niques to render the ellipses on the plane based on depth or-
Structure from Motion (SfM) to initialize the Gaussian el-
dering, to further optimizing the Gaussian ellipsoids. Dur-
lipsoids, with the centers of the ellipsoids corresponding to
ing the projection process, 3DGS introduces errors by ap-
the point clouds. For further rendering, the covariance ma-
proximating the projection transformation using the Jaco-
trix Σ of a Gaussian ellipsoid is transformed into the camera
bian of its affine approximation. To fundamentally elim-
space using a viewing transformation W. Then, the covari-
inate the negative impact of these errors, we propose an
ance matrix is projected to image plane via the projection
ellipsoid-based projection method to replace the Jacobian
transformation, which is approximated by the correspond-
of the affine approximation of the projection transformation
ing local affine approximation Jacobian matrix J.
in 3DGS.
2.3. Perspective Projection of an Ellipsoid Σ′ = JWΣWT JT (2)

In 1999, David Eberly et al. provided a comprehensive pro- After removing the third row and third column of Σ , the
cess for calculating the projection of an ellipsoid onto a 2D covariance matrix Σ2D was obtained. By projecting the
given plane [10]. This algorithm has primarily been used for center point of the Gaussian ellipsoid to the image plane us-
the localization of 3D objects [11, 15, 44] and for designing ing the viewing transformation W and the projection trans-
shadow models [5, 32]. In this work, we build upon David formation P, the center point of the ellipse p2D can be ob-
Eberly’s findings to propose a projection method for pro- tained.
jecting 3D Gaussian ellipsoids onto the image plane. The
representation of the ellipsoid is achieved using the inverse p2D = PWp (3)
of the covariance matrix of 3D gaussian function. After ob- The color c ∈ R3×1 of a Gaussian ellipsoid at given
taining the ellipse equation, we further convert the result viewpoint is calculated based on the spherical harmonic and
from the camera space to the image space. The projection the relative position of the ellipsoid to the camera’s origin.
algorithm [10] proposed by David Eberly et al. has certain Finally, α-blending is used to render each pixel according
preconditions. Therefore, we designed a pre-filtering strat- to the sorted K Gaussian ellipsoids from near to far.
egy to eliminate Gaussian ellipsoids that do not meet these
preconditions before the rendering process.
K
X k−1
Y
3. Preliminaries C(x) = ck αk Gk2D (x) (1 − αj Gj2D (x)) (4)
k=1 j=1
In this section, we first introduce the scene representation
method of 3DGS [24], along with the rendering and opti- G 2D (x) is the probability density function of a 2D Gaussian
mization process in Sec. 3.1. Subsequently, in Sec. 3.2, we distribution, which reflects the distance between the pixel x
provide some algebraic details of the method proposed by and the 2D Gaussian ellipse center p2D .
David Eberly et al. in 1999 [10] for projecting ellipsoids
onto the image plane. 1
Gk2D (x) = e− 2 (x−pk
2D T
) (Σ2D
k +sI)
−1
(x−p2D
k ) (5)
3.1. 3D Gaussian Splatting
Adding sI to Σ2D serves to prevent the projected 2D Gaus-
3D Gaussian splatting [24] proposes a method for represent- sian ellipse from being smaller than one pixel.
ing three-dimensional scenes using Gaussian ellipsoids as Since the rendering process is fully differentiable, af-
scene primitives. The geometric features of each Gaussian ter calculating the photometric loss between each rendered
ellipsoid are represented by p ∈ R3×1 and the covariance image and the reference image, the parameters of each
matrix Σ ∈ R3×3 . Where, p controls the geometric posi- Gaussian ellipsoid can be optimized using gradient descent,
tion of the Gaussian ellipsoid, while Σ controls its shape. thereby enhancing rendering quality. During the optimiza-
The covariance matrix Σ consist of a rotation matrix R and tion process, based on the gradient of the 2D Gaussian el-
a scaling matrix S, which control the orientation and scale lipse’s center p2D in the image plane and the scaling ma-
of the Gaussian ellipsoid, respectively. trix S of each Gaussian ellipsoid, 3D Gaussian splatting can

3
adaptively adjust the number of Gaussian ellipsoids through
clone, split, and prune strategies.
3.2. Projection Algorithm for Ellipsoid
The standard equation of an ellipsoid is (x − c)T A(x −
c) = 1, where c is the center of the ellipsoid and A is a
positive definite matrix. Given the camera origin e and the
image plane n · x = d, where n is a unit-length vector, d
is a constant. The task is to calculate the projection of the
ellipsoid onto the image plane.
The first step is to compute the cone that tightly bounds
the ellipsoid. Consider a ray l(t) = e + td, where t > 0
and d is a unit-length vector. The intersection of the ray and
ellipsoid can be defined as

(l(t) − c)T A(l(t) − c) = 1, (6) Figure 2. Ellipsoid-based Projection Method. We first derive
or equivalently by the ellipsoid equation based on the covariance matrix of the 3D
Gaussian function. Then, using the method in Sec. 3.2, we obtain
the equation of a cone with its vertex at the camera origin and
dT Adt2 + 2δ T Adt + δ T Aδ − 1 = 0, (7)
tangent to the ellipsoid. Finally, we calculate the intersection line
where δ = e − c. This is a quadratic equation in t. When between the cone and the image plane, which gives the equation
the equation has a single real-valued root, the ray is tangent of the projection of the ellipsoid.
to the ellipsoid. The condition for the quadratic equation to
have a single real-valued root is
4.1. Ellipsoid-based Projection Method
(δ T Ad)2 − (dT Ad)(δ T Aδ − 1) = 0. (8) Gaussian functions are closed under linear transformations
It can further be transformed into but not under nonlinear transformations, which means the
result of nonlinear transformations can no longer represent
dT (AT δδ T A − (δ T Aδ − 1)A)d = 0. (9) an ellipsoid (or ellipse). Since projection transformation is
nonlinear, 3DGS [24] adopts a local affine approximation
The direction vector D of the ray can also be expressed as of the projection transformation P at the center of the Gaus-
x−k
d = |x−k| . Substituting this into the equation yields sian ellipsoid to obtain the Jacobian matrix J. Using the the
Jacobian of the affine approximation of the projection trans-
(x − k)T (AT δδ T A − (δ T Aδ − 1)A)(x − k) = 0. (10) formation will inevitably introduce errors at positions other
than the center of the Gaussian ellipsoid during the projec-
This equation describes a cone with vertex at E. tion process, with errors increasing the farther one moves
Next, by solving the combined equations of the cone and from the center.
the image plane, the projection of the ellipsoid onto the Based on the method introduced by David Eberly et
plane can be obtained. al. [10], we designed an ellipsoid-based projection method
for 3DGS. To simplify the computation, we will project the
4. Proposed Method Gaussian ellipsoid in the camera space C{x, y, z}. First,
we need to derive the equation of the Gaussian ellipsoid
Due to errors introduced by approximating the projection
based on the 3D covariance matrix Σ. In 3D Gaussian
transformation using the Jacobian of its affine approxima-
Splatting [24], the author obtains the corresponding Gaus-
tion in 3D Gaussian splatting [24], we propose an ellipsoid-
sian ellipse equation by setting the exponent part of the 2D
based projection method to eliminate the errors and enhance
Gaussian function equal to 32 (satisfying the 3σ principle).
rendering quality and scene consistency. In Sec. 4.1, we
Similarly, for the 3D Gaussian function, we also set its ex-
explain why errors are inevitably introduced in the projec-
ponent part equal to 32 :
tion process of 3DGS, followed by the presentation of our
ellipsoid-based projection method and its theoretical deriva-
(x − pc )T Σ−1
c (x − pc ) = 9, (11)
tion. Next, in Sec. 4.2, we identify two types of Gaussian
ellipsoids that cannot be rendered using our proposed pro- where pc is the center point of Gaussian ellipsoid in cam-
jection method and present a pre-filtering strategy to filter era space and Σc is the 3D covariance matrix Σ in camera
out these two types of Gaussian ellipsoids. space.

4
In camera space, the camera origin c = [0, 0, 0]. Accord-
ing to equations Eq. (6), Eq. (8) and Eq. (10), the equation
of the cone can be obtained.

xT (Σ−T T −1 T −1 −1
c pc pc Σc − (pc Σc pc − 9)Σc )x = 0 (12)

Setting the third element of x = [x, y, z] to 1 gives the in-


tersection between the cone and the z = 1 plane, and the
equation of this intersection is an ellipse equation that de-
pends only on x and y.
For a point x2d = [x, y] on the z = 1 plane, scal-
ing and translation are required to convert it into a point
ximg = [ximg , y img ] on the image plane I{ximg , y img }.
Figure 3. Pre-filtering Strategy. There are two types of Gaussian
The corresponding relationship is
ellipsoids that need to be filtered out in advance. Otherwise, the
 system may fail to converge. The first type is Gaussian ellipsoids
ximg = fx x + 12 w that contain the camera origin within them, represented by the blue
(13) ellipsoid in the figure. The second type consists of Gaussian ellip-
= fy y + 12 h,
 img
y soids with portions located below the z = 0 plane in the camera
in witch fx and fy are the camera intrinsic parameters, and space. The projection of these ellipsoids results in parabolas or
w and h represent the width and height of the image, re- hyperbolas, as shown by the orange ellipsoid in the figure.
spectively.
We substitute ximg and y img into the ellipse equation to
obtain the equation of the ellipse on the image plane plane results in a hyperbola, not an ellipse, as the orange
Gaussian ellipsoid in Fig. 3.
By substituting the camera origin c = [0, 0, 0] into the
(ximg − pimg )T Σ−1
img (x
img
− pimg ) = 9, (14) left side of Eq. (11), if the result is less than or equal to 9

where Σ−1img ∈ R
2×2
is the inverse of the second-order co-
variance matrix corresponding to the ellipse and pimg ∈ (c − pc )T Σ−1
c (c − pc ) ≤ 9, (15)
R2×1 is the center point of the ellipse.
We find that the center point of the ellipse calculated it indicates that the camera origin lies inside or on the sur-
from our projection transformation has a slight difference face of the ellipsoid. Such ellipsoids belong to the first type
compared to the result obtained by directly projecting the of Gaussian ellipsoids.
center point of the Gaussian ellipsoid onto the image plane. The equation of the ellipsoid is E(x) = 0, where E(x) =
This indicates that the center point of the ellipse is no longer (x−pc )T Σ−1 c (x−pc )−9. For the second type of Gaussian
the projection of the center point of the Gaussian ellipsoid, ellipsoid, we first need to find the lowest point of the ellip-
which also explains why the projected result remains an el- soid surface in the direction of the z-axis, where the gradi-
lipse, but it is not possible to find an affine transformation ent direction of the ellipsoid surface is perpendicular to the
that directly converts the 3D Gaussian covariance matrix plane z = 0, satisfying ∇E(x) = kn, in witch n = [0, 0, 1].
into a 2D one. By combining the ellipsoid equation and the gradient equa-
4.2. Pre-filtering Strategy tion

For the ellipsoid-based projection method we proposed, 


there are two types of Gaussian ellipsoids that cannot be E(x) = 0


rendered [10] and need to be filtered out beforehand, oth-


∂E(x)
∂x = 0
erwise, they will cause the training process to diverge and (16)

negatively impact the system. The first type is the Gaussian


 ∂E(x) = 0,

ellipsoid with the camera origin inside it, for which there ∂y
is no line passing through the camera origin that is tangent
to it, as the blue Gaussian ellipsoid in Fig. 3. The second we can determine the coordinate [x, y, z] of that point. If
type is the Gaussian ellipsoid that has a portion below z = 0 z ≤ 0, it needs to be filtered. When z = 0, the projection
plane in the camera space. Using our projection method, the result is a parabola, while when z < 0, the projection result
projection of this type of Gaussian ellipsoid onto the image is a hyperbola.

5
Dataset Mip-NeRF360 (7 scenes) Tanks&Temples (2 scenes) Deep Blending (2 scenes)
Method|Metric PSNR↑ SSIM↑ LPIPS↓ FPS Mem PSNR↑ SSIM↑ LPIPS↓ FPS Mem PSNR↑ SSIM↑ LPIPS↓ FPS Mem
Plenoxels 23.62 0.670 0.443 6.79 2.1GB 21.08 0.719 0.379 13.0 2.3GB 23.06 0.795 0.510 11.2 2.7GB
INGP-Base 26.43 0.725 0.339 11.7 13MB 21.72 0.723 0.330 17.1 13MB 23.62 0.797 0.423 3.26 13MB
INGP-Big 26.75 0.751 0.302 9.43 48MB 21.92 0.745 0.305 14.4 48MB 24.96 0.817 0.390 2.79 48MB
M-NeRF360 29.09 0.842 0.210 0.06 8.6MB 22.22 0.759 0.257 0.14 8.6MB 29.40 0.901 0.245 0.09 8.6MB
3DGS 28.78 0.857 0.210 114 711MB 23.64 0.848 0.177 168 442MB 29.60 0.904 0.244 130 682MB
Ours 28.82 0.858 0.208 120 733MB 23.57 0.849 0.176 269 448MB 29.66 0.904 0.242 143 706MB

Table 1. We evaluated our method on the Mip-NeRF360, Tank&Temple, and Deep Blending datasets by comparing it with previous
approaches. The results for Plenoxels [13], InstantNGP [38], and Mip-NeRF360 [3] were obtained directly from the 3DGS [24]. Note
that the results for all Mip-NeRF360 datasets in the table are calculated based on 7 scenes, excluding scene flowers and treehill. For a fair
comparison with previous methods, the results for Tanks&Temples were calculated based on scene truck and train, and the full results for
all 7 scenes are shown in Tab. 2.

Dataset Tanks&Temples (7 scenes) tion in 3DGS with our ellipsoid-based projection method.
Method|Metric PSNR↑ SSIM↑ LPIPS↓ FPS Mem In the comparison experiments with Mip-Splatting [52], we
3DGS 24.423 0.845 0.183 200 366MB integrated our ellipsoid-based projection method into Mip-
Ours 24.424 0.846 0.181 351 374MB
Splatting in the same way. All our experiments are con-
Table 2. Comparison between our method and the 3DGS across ducted on a single GTX RTX3090 GPU.
all 7 scenes of the Tank&Temple dataset.
5.2. Comparisons with 3DGS
Similar to 3DGS [24], we use PSNR, SSIM, and LPIPS as
5. Experiments metrics to evaluate rendering quality. Additionally, we as-
We first introduce datasets used in our experiments and sess the rendering speed of our method using Frames Per
implementation details in Sec. 5.1. In Sec. 5.2, we eval- Second (FPS), which we calculate by averaging the render-
uate our method on three datasets and compare it with ing time for all images in each scene to obtain the FPS.
3DGS [24] and other state-of-the-art. Subsequently, in The results of the rendering quality evaluation are shown
Sec. 5.3, we applied the ellipsoid-based projection method in Tab. 1 and Tab. 2, with the metrics for each dataset calcu-
in Mip-Splatting [52] and compared it with the original lated by averaging across all scenes in it. For all scenes in
method. Finally, in Sec. 5.4 we analyze the limitations of Mip-NeRF360 dataset [3], our method outperforms 3DGS
our method and explore ways for future improvement. in PSNR, SSIM, and LPIPS. In some scenes within the
Tanks&Temples dataset [25], our results show a slight gap
5.1. Datasets and Implementation in PSNR compared to 3DGS. In the two scenes from the
Deep Blending dataset [19], our method scores slightly
Datasets For training and testing, we perform exper- lower than 3DGS in SSIM. Across all 16 scenes, however,
iments on images from a total of 16 real-world scenes. our method achieves higher LPIPS scores than 3DGS, in-
Specifically, we evaluate our ellipsoid-based projection dicating that our rendered results align more closely with
method on 7 scenes from the Mip-NeRF360 dataset [3], 7 human perception. Fig. 4 shows a visual comparison be-
scenes from the Tanks&Temples dataset [25], and 2 scenes tween our method and 3DGS for several test views. Com-
from the Deep Blending dataset [19]. The selected scenes pared to 3DGS, our results have reduced blur (as shown in
showcase diverse styles, including bounded indoor environ- counter, kitchen, room in Fig. 4) and artifacts (as shown
ments and unbounded outdoor ones. To divide the datasets in Barn, Caterpillar in Fig. 4), with better scene consis-
into training and testing sets, we follow the method used in tency (as shown in garden, Courthouse in Fig. 4) and de-
3DGS [24], assigning every 8th photo to the test set. For the tails (as shown in palyroom, Meetingroom, truck in Fig. 4).
images in the Mip-NeRF360 and Tanks&Temples datasets, Furthermore, our method significantly outperforms 3DGS
we use 21 original resolution for training and rendering. in rendering speed, likely because we pre-filter Gaussian
Implementation Our method is built on the open-source ellipsoids before rendering. In addition to the Gaussian
code of 3DGS [24]. Following the 3DGS framework, we ellipsoids mentioned in Sec. 4.2 that prevent the training
set the number of training iterations to 30k for all scenes, process from converging, we also filter out Gaussian ellip-
using the same loss function and densification strategy as soids outside the cone of vision that would not be rendered.
3DGS, with all hyperparameters remaining consistent. We Additionally, our method significantly outperforms Plenox-
only modified the CUDA kernels in the projection parts of els [13] and Instant-NGP [38] in rendering quality. Except
the forward and backward processes, replacing the Jacobian for PSNR, which is lower than the best-performing Mip-
of the affine approximation of the projection transforma- NeRF360 [3] on their dataset, our method surpasses Mip-

6
3DGS Ours Mip-splatting Mip-splatting+Ours Ground Truth
counter
garden
kitchen
room
playroom
truck
Meeting
room
Courthouse
Barn
Caterpillar

Figure 4. We demonstrated the rendering results of applying our ellipsoid-based projection method to 3DGS and Mip-Splatting, resulting
in less blur and artifact and better scene consistency.

7
Dataset Mip-NeRF360 (7 scenes) Tanks&Temples (7 scenes) Deep Blending (2 scenes)
Method|Metric PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓
Mip-Splatting 28.87 0.858 0.210 24.60 0.847 0.185 29.87 0.911 0.192
Mip-Splatting+Ours 28.94 0.859 0.208 24.55 0.847 0.183 29.90 0.911 0.191

Table 3. Evaluate the effectiveness of our method on mip-splatting across all 16 scenes in the three datasets.

NeRF360 across all other metrics. ues through repeated experiments. One potential solution is
to include the filter parameters as optimizable variables and
5.3. Comparisons with Mip-Splatting optimize them during training.
To further demonstrate the effectiveness of our algorithm,
6. Conclusion
we applied our ellipsoid-based projection method to Mip-
Splatting [52] and compared it with the original Mip- We propose an ellipsoid-based projection method that
Splatting method using PSNR, SSIM, and LPIPS metrics. avoids the errors introduced by the affine approximation of
We utilized Mip-Splatting’s core methods, 3D-filter and the projection transformation used in 3DGS. Additionally,
2D-filter, but retained the Gaussian densification strategy we introduce a pre-filtering strategy to remove Gaussian el-
consistent with 3DGS rather than adopting the newly pro- lipsoids that negatively impact the system or are outside the
posed strategy from Mip-Splatting. In our experiments, we cone of vision, enhancing system robustness. Compara-
observed that the Gaussian densification strategy in Mip- tive experiments with 3DGS demonstrate that our method
Splatting led to a significant increase in the number of Gaus- significantly improves rendering quality in both complex
sians for the original method, but had no noticeable effect indoor and outdoor scenes, while also further accelerat-
when used with our projection transformation. Therefore, ing rendering speed. By integrating our method with Mip-
we conclude that this strategy is specific to Mip-Splatting. Splatting, rendering quality was further improved, proving
We set the 2D-filter parameter to 0.25, making the kernel the versatility of our method and its ease of adaptation to
size a 3 × 3 pixel region on the image. As shown in the any work based on 3DGS.
results in Tab. 3, applying the ellipsoid-based projection
method to Mip-Splatting improved rendering quality. Sim- References
ilar to the results in Sec. 5.2, our method achieved overall
[1] Kara-Ali Aliev, Artem Sevastopolsky, Maria Kolos, Dmitry
metric improvements on the Mip-NeRF360 dataset, showed Ulyanov, and Victor Lempitsky. Neural point-based graph-
a slight decrease in PSNR on the Tanks&Temples dataset, ics. In Computer Vision–ECCV 2020: 16th European Con-
and a slight decrease in SSIM on the Deep Blending dataset. ference, Glasgow, UK, August 23–28, 2020, Proceedings,
Visual results in Fig. 4 show that, similar to 3DGS [24] in Part XXII 16, pages 696–712, 2020. 2
Sec. 5.2, Mip-Splatting with the ellipsoid-based projection [2] Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter
method also reduces blurriness and artifacts, while enhanc- Hedman, Ricardo Martin-Brualla, and Pratul P. Srinivasan.
ing scene consistency and details. Mip-nerf: A multiscale representation for anti-aliasing neu-
ral radiance fields. In Proceedings of the IEEE/CVF Interna-
5.4. Limitations tional Conference on Computer Vision (ICCV), pages 5855–
5864, 2021. 2
Our method does not surpass 3DGS [24] in every metric [3] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P.
across all scenes, indicating there is still room for improve- Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded
ment. In Sec. 5.3, we observed that Mip-Splatting’s [52] anti-aliased neural radiance fields. In Proceedings of the
Gaussian densification strategy had limited impact on our IEEE/CVF Conference on Computer Vision and Pattern
method, suggesting that we may not be using the most suit- Recognition (CVPR), pages 5470–5479, 2022. 1, 6
able Gaussian densification strategy. If a more appropriate [4] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P.
strategy could be identified, rendering quality might be fur- Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased
grid-based neural radiance fields. In Proceedings of the
ther enhanced. In the experiments in Sec. 5.2 and Sec. 5.3,
IEEE/CVF International Conference on Computer Vision
we used fixed values for the filter parameters. However, we (ICCV), pages 19697–19705, 2023. 2
found that our method is more sensitive to filter parameters [5] Luis Bolanos, Shih-Yang Su, and Helge Rhodin. Gaussian
than 3DGS, whether for the screen-space dilation filter in shadow casting for neural characters. In Proceedings of
3DGS or the 2D mip filter in Mip-Splatting. Additionally, the IEEE/CVF Conference on Computer Vision and Pattern
the optimal filter parameters vary across different scenes, Recognition, pages 20997–21006, 2024. 3
making parameter selection crucial. Since filter parameters [6] Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and
are continuous values, it is difficult to identify optimal val- Hao Su. Tensorf: Tensorial radiance fields. In Computer Vi-

8
sion – ECCV 2022, pages 333–350. Springer Nature Switzer- [20] Peter Hedman, Pratul P. Srinivasan, Ben Mildenhall,
land, 2022. 2 Jonathan T. Barron, and Paul Debevec. Baking neural ra-
[7] Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xi- diance fields for real-time view synthesis. In Proceedings of
aofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping the IEEE/CVF International Conference on Computer Vision
Liu, and Guosheng Lin. Gaussianeditor: Swift and control- (ICCV), pages 5875–5884, 2021. 2
lable 3d editing with gaussian splatting. In Proceedings of [21] Zhentao Huang and Minglun Gong. Textured-gs: Gaussian
the IEEE/CVF Conference on Computer Vision and Pattern splatting with spatially defined color and opacity, 2024. 2
Recognition (CVPR), pages 21476–21485, 2024. 2 [22] James T. Kajiya and Brian P Von Herzen. Ray tracing vol-
[8] Jorge Condor, Sebastien Speierer, Lukas Bode, Aljaz Bozic, ume densities. SIGGRAPH Comput. Graph., 18(3):165–174,
Simon Green, Piotr Didyk, and Adrian Jarabo. Don’t splat 1984. 1, 2
your gaussians: Volumetric ray-traced primitives for model- [23] Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallab-
ing and rendering scattering and emissive media, 2024. 2 hula, Gengshan Yang, Sebastian Scherer, Deva Ramanan,
and Jonathon Luiten. Splatam: Splat track & map 3d gaus-
[9] Robert A. Drebin, Loren Carpenter, and Pat Hanrahan. Vol-
sians for dense rgb-d slam. In Proceedings of the IEEE/CVF
ume rendering. SIGGRAPH Comput. Graph., 22(4):65–74,
Conference on Computer Vision and Pattern Recognition
1988. 1, 2
(CVPR), pages 21357–21366, 2024. 2
[10] David Eberly. Perspective projection of an ellipsoid. Geo- [24] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler,
metric Tools, LLC, pages 1–4, 1999. 2, 3, 4, 5 and George Drettakis. 3d gaussian splatting for real-time
[11] David Eberly. Reconstructing an ellipsoid from radiance field rendering. ACM Transactions on Graphics, 42
its perspective projection onto a plane. Geo- (4), 2023. 1, 2, 3, 4, 6, 8
metricTools. com. https://www. geometrictools. [25] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen
com/Documentation/ReconstructEllipsoid. pdf (accessed Koltun. Tanks and temples: Benchmarking large-scale scene
May 2020), 2007. 3 reconstruction. ACM Transactions on Graphics (ToG), 36
[12] Linus Franke, Darius Rückert, Laura Fink, and Marc Stam- (4):1–13, 2017. 6
minger. Trips: Trilinear point splatting for real-time radiance [26] Jonas Kulhanek and Torsten Sattler. Tetra-nerf: Represent-
field rendering. Computer Graphics Forum, 43(2):e15012, ing neural radiance fields using tetrahedra. In Proceedings of
2024. 2 the IEEE/CVF International Conference on Computer Vision
[13] Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong (ICCV), pages 18458–18469, 2023. 2
Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: [27] Christoph Lassner and Michael Zollhofer. Pulsar: Effi-
Radiance fields without neural networks. In Proceedings of cient sphere-based neural rendering. In Proceedings of
the IEEE/CVF conference on computer vision and pattern the IEEE/CVF Conference on Computer Vision and Pattern
recognition, pages 5501–5510, 2022. 6 Recognition, pages 1440–1449, 2021. 3
[14] Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong [28] Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko,
Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: and Eunbyung Park. Compact 3d gaussian representation for
Radiance fields without neural networks. In Proceedings of radiance field. In Proceedings of the IEEE/CVF Conference
the IEEE/CVF Conference on Computer Vision and Pattern on Computer Vision and Pattern Recognition (CVPR), pages
Recognition (CVPR), pages 5501–5510, 2022. 2 21719–21728, 2024. 2
[15] Vincent Gaudillière, Gilles Simon, and Marie-Odile Berger. [29] Marc Levoy. Efficient ray tracing of volume data. ACM
Perspective-1-ellipsoid: Formulation, analysis and solutions Trans. Graph., 9(3):245–261, 1990. 1, 2
of the camera pose estimation problem from one ellipse- [30] Jiameng Li, Yue Shi, Jiezhang Cao, Bingbing Ni, Wenjun
ellipsoid correspondence. International Journal of Computer Zhang, Kai Zhang, and Luc Van Gool. Mipmap-gs: Let gaus-
Vision, 131(9):2446–2470, 2023. 3 sians deform with scale-specific mipmap for anti-aliasing
rendering, 2024. 2
[16] Markus Gross and Hanspeter Pfister. Point-based graphics.
[31] Jiaze Li, Zhengyu Wen, Luo Zhang, Jiangbei Hu, Fei Hou,
Elsevier, 2011. 2
Zhebin Zhang, and Ying He. Gs-octree: Octree-based 3d
[17] Jeffrey P Grossman and William J Dally. Point sample ren- gaussian splatting for robust object-level 3d reconstruction
dering. In Rendering Techniques’ 98: Proceedings of the under strong lighting, 2024. 2
Eurographics Workshop in Vienna, Austria, June 29—July 1, [32] Zhen Li, Marek Ziebart, Santosh Bhattarai, and David Harri-
1998 9, pages 181–192, 1998. 2 son. A shadow function model based on perspective projec-
[18] Antoine Guédon and Vincent Lepetit. Sugar: Surface- tion and atmospheric effect for satellites in eclipse. Advances
aligned gaussian splatting for efficient 3d mesh reconstruc- in Space Research, 63(3):1347–1359, 2019. 3
tion and high-quality mesh rendering. In Proceedings of [33] Alexander Mai, Peter Hedman, George Kopanas, Dor
the IEEE/CVF Conference on Computer Vision and Pattern Verbin, David Futschik, Qiangeng Xu, Falko Kuester,
Recognition (CVPR), pages 5354–5363, 2024. 2 Jonathan T. Barron, and Yinda Zhang. Ever: Exact volumet-
[19] Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, ric ellipsoid rendering for real-time view synthesis, 2024. 2
George Drettakis, and Gabriel Brostow. Deep blending for [34] Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir
free-viewpoint image-based rendering. ACM Transactions Tadeja, and Przemysław Spurek. Gaussian splatting with
on Graphics (ToG), 37(6):1–15, 2018. 6 nerf-based color and opacity, 2024. 2

9
[35] N. Max. Optical models for direct volume rendering. IEEE puter Vision and Pattern Recognition (CVPR), pages 20310–
Transactions on Visualization and Computer Graphics, 1(2): 20320, 2024. 2
99–108, 1995. 1, 2 [49] Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin,
[36] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Pratul P. Srinivasan, Richard Szeliski, Jonathan T. Barron,
Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: and Ben Mildenhall. Bakedsdf: Meshing neural sdfs for real-
representing scenes as neural radiance fields for view synthe- time view synthesis. In ACM SIGGRAPH 2023 Conference
sis. Commun. ACM, 65(1):99–106, 2021. 1, 2 Proceedings. Association for Computing Machinery, 2023.
[37] Nicolas Moenne-Loccoz, Ashkan Mirzaei, Or Perel, Ric- 2
cardo de Lutio, Janick Martinez Esturo, Gavriel State, Sanja [50] Wang Yifan, Felice Serena, Shihao Wu, Cengiz Öztireli,
Fidler, Nicholas Sharp, and Zan Gojcic. 3d gaussian ray trac- and Olga Sorkine-Hornung. Differentiable surface splatting
ing: Fast tracing of particle scenes, 2024. 2 for point-based geometry processing. ACM Transactions on
[38] Thomas Müller, Alex Evans, Christoph Schied, and Alexan- Graphics (TOG), 38(6):1–14, 2019. 3
der Keller. Instant neural graphics primitives with a multires- [51] Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and
olution hash encoding. ACM Trans. Graph., 41(4), 2022. 2, Angjoo Kanazawa. Plenoctrees for real-time rendering of
6 neural radiance fields. In Proceedings of the IEEE/CVF In-
[39] Hanspeter Pfister, Matthias Zwicker, Jeroen Van Baar, and ternational Conference on Computer Vision (ICCV), pages
Markus Gross. Surfels: Surface elements as rendering primi- 5752–5761, 2021. 2
tives. In Proceedings of the 27th annual conference on Com- [52] Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and
puter graphics and interactive techniques, pages 335–342, Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat-
2000. 2 ting. In Proceedings of the IEEE/CVF Conference on Com-
[40] Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide puter Vision and Pattern Recognition (CVPR), pages 19447–
Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- 19456, 2024. 1, 2, 6, 8
sianavatars: Photorealistic head avatars with rigged 3d gaus- [53] Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, and
sians. In Proceedings of the IEEE/CVF Conference on Com- Eric Xing. Fregs: 3d gaussian splatting with progressive
puter Vision and Pattern Recognition (CVPR), pages 20299– frequency regularization. In Proceedings of the IEEE/CVF
20309, 2024. 2 Conference on Computer Vision and Pattern Recognition
[41] Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas (CVPR), pages 21424–21433, 2024. 2
Geiger. Kilonerf: Speeding up neural radiance fields with [54] Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang,
thousands of tiny mlps. In Proceedings of the IEEE/CVF In- Deqing Sun, and Ming-Hsuan Yang. Drivinggaussian:
ternational Conference on Computer Vision (ICCV), pages Composite gaussian splatting for surrounding dynamic au-
14335–14345, 2021. 2 tonomous driving scenes. In Proceedings of the IEEE/CVF
[42] Christian Reiser, Rick Szeliski, Dor Verbin, Pratul Srini- Conference on Computer Vision and Pattern Recognition
vasan, Ben Mildenhall, Andreas Geiger, Jon Barron, and (CVPR), pages 21634–21643, 2024. 2
Peter Hedman. Merf: Memory-efficient radiance fields for [55] Yang Zhou, Songyin Wu, and Ling-Qi Yan. Unified gaussian
real-time view synthesis in unbounded scenes. ACM Trans. primitives for scene representation and rendering, 2024. 2
Graph., 42(4), 2023. 2 [56] Matthias Zwicker, Hanspeter Pfister, Jeroen Van Baar, and
[43] Miguel Sainz and Renato Pajarola. Point-based rendering Markus Gross. Ewa volume splatting. In Proceedings Visu-
techniques. Computers & Graphics, 28(6):869–879, 2004. 2 alization, 2001. VIS’01., pages 29–538, 2001. 2
[44] Srinath Sridhar, Helge Rhodin, Hans-Peter Seidel, Antti [57] Matthias Zwicker, Hanspeter Pfister, Jeroen Van Baar, and
Oulasvirta, and Christian Theobalt. Real-time hand track- Markus Gross. Surface splatting. In Proceedings of the
ing using a sum of anisotropic gaussians model. In 2014 28th annual conference on Computer graphics and interac-
2nd International Conference on 3D Vision, pages 319–326, tive techniques, pages 371–378, 2001. 2
2014. 3
[45] Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel
grid optimization: Super-fast convergence for radiance fields
reconstruction. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR), pages
5459–5469, 2022. 2
[46] Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang
Zeng. Dreamgaussian: Generative gaussian splatting for ef-
ficient 3d content creation, 2024. 2
[47] Zhe Jun Tang and Tat-Jen Cham. 3igs: Factorised tensorial
illumination for 3d gaussian splatting, 2024. 2
[48] Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng
Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang.
4d gaussian splatting for real-time dynamic scene render-
ing. In Proceedings of the IEEE/CVF Conference on Com-

10

You might also like