Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
64 views17 pages

PGSR

The document presents a novel method called Planar-based Gaussian Splatting Reconstruction (PGSR) for high-fidelity surface reconstruction from multi-view RGB images, achieving improved geometric accuracy and rendering quality. PGSR introduces an unbiased depth rendering technique and employs single-view and multi-view regularization to ensure global geometric consistency. Experimental results demonstrate that PGSR outperforms existing methods in terms of training speed and reconstruction accuracy, making it a significant advancement in the field of 3D surface reconstruction.

Uploaded by

djnole0129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views17 pages

PGSR

The document presents a novel method called Planar-based Gaussian Splatting Reconstruction (PGSR) for high-fidelity surface reconstruction from multi-view RGB images, achieving improved geometric accuracy and rendering quality. PGSR introduces an unbiased depth rendering technique and employs single-view and multi-view regularization to ensure global geometric consistency. Experimental results demonstrate that PGSR outperforms existing methods in terms of training speed and reconstruction accuracy, making it a significant advancement in the field of 3D surface reconstruction.

Uploaded by

djnole0129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

1

PGSR: Planar-based Gaussian Splatting for Efficient


and High-Fidelity Surface Reconstruction
Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai,
Nan Wang, Haomin Liu, Hujun Bao, Guofeng Zhang†

Fig. 1: PGSR representation. We present a Planar-based Gaussian Splatting Reconstruction representation for efficient and high-fidelity surface reconstruction
from multi-view RGB images without any geometric prior (depth or normal from pre-trained model). The courthouse reconstructed by our method demonstrates
that PGSR can recover geometric details, such as textual details on the building. From left to right: input SfM points, planar-based Gaussian ellipsoid, rendered
view, textured mesh, surface, and normal.

Abstract—Recently, 3D Gaussian Splatting (3DGS) has at- tion (PGSR) to achieve high-fidelity surface reconstruction while
tracted widespread attention due to its high-quality rendering, ensuring high-quality rendering. Specifically, we first introduce
and ultra-fast training and rendering speed. However, due to the an unbiased depth rendering method, which directly renders
unstructured and irregular nature of Gaussian point clouds, it the distance from the camera origin to the Gaussian plane
is difficult to guarantee geometric reconstruction accuracy and and the corresponding normal map based on the Gaussian
multi-view consistency simply by relying on image reconstruction distribution of the point cloud, and divides the two to obtain
loss. Although many studies on surface reconstruction based the unbiased depth. We then introduce single-view geometric,
on 3DGS have emerged recently, the quality of their meshes is multi-view photometric, and geometric regularization to preserve
generally unsatisfactory. To address this problem, we propose a global geometric accuracy. We also propose a camera exposure
fast planar-based Gaussian splatting reconstruction representa- compensation model to cope with scenes with large illumination
variations. Experiments on indoor and outdoor scenes show
H. Bao, G. Zhang, W. Ye and H. Li are with the State Key that our method achieves fast training and rendering while
Lab of CAD&CG, Zhejiang University. E-mails: {baohujun, zhang- maintaining high-fidelity rendering and geometric reconstruction,
guofeng}@zju.edu.cn. outperforming 3DGS-based and NeRF-based methods. Our code
D. Chen and W. Xie are with the State Key Lab of CAD&CG, Zhejiang will be made publicly available, and more information can be
University and SenseTime Research. D. Chen is also affiliated with Tetras.AI. found on our project page (https://zju3dv.github.io/pgsr/).
E-mails: [email protected], [email protected].
Y. Wang is with Shanghai AI Laboratory. Index Terms—Planar-Based Gaussian Splatting, Surface Re-
S. Zhai, N. Wang and H. Liu are with SenseTime Research.
† Corresponding author construction, Neural Rendering, Neural Radiance Fields.
2

Fig. 2: Unbiased depth rendering. (a) Illustration of the rendered depth: We take a single Gaussian, flatten it into a plane, and fit it onto the surface as an
example. Our rendered depth is the intersection point of rays and surfaces, matching the actual surface. In contrast, the depth from previous methods [11],
[24] corresponds to a curved surface and may deviate from the actual surface. (b) We use true depth to supervise two different depth rendering methods.
After optimization, we map the positions of all Gaussian points. Gaussians of our method fit well onto the actual surface, while the previous method results
in noise and poor adherence to the surface.

I. I NTRODUCTION In this paper, we propose a novel unbiased depth render-


ing method based on 3DGS, facilitating the integration of

N OVEL view synthesis and geometry reconstruction are


challenging and crucial tasks in computer vision, widely
used in AR/VR [13], [65], [71], 3D content generation [10],
various geometric constraints to achieve precise geometric
estimation. Previous methods [24] render depth by blending
the accumulations of each Gaussian at the z-position of the
[18], [48], [53], [63], and autonomous driving. To achieve camera, resulting in two main issues as shown in Fig. 2.
a realistic and immersive experience in AR/VR, novel view The depth corresponds to a curved surface and may deviate
synthesis needs to be sufficiently convincing, and 3D re- from the actual surface. To address these issues, we compress
construction [32], [36], [62], [64], [66] needs to be finely 3D Gaussians into flat planes and blend their accumulations
detailed. Recently, neural radiance fields [22], [41], [42], [61] to obtain normal and camera-to-plane distance maps. These
have been widely used to tackle this task, achieving high- maps are then transformed into depth maps. This method
fidelity novel view synthesis [2], [3], [44] and 3D geometry involves blending Gaussian plane accumulations to determine
reconstruction [33], [56]. However, due to the computationally a pixel’s plane parameters. The intersection of the ray and
intensive volume rendering methods, neural radiance fields plane defines the depth, depending on the Gaussian’s position
often require training times of several hours to even hundreds and rotation. By dividing the distance map by the normal
of hours, and rendering speeds are difficult to achieve in real- map, we cancel out the ray accumulation weights, ensuring
time. Recently, 3D Gaussian Splatting (3DGS) [27] has made the depth estimation is unbiased and falls on the estimated
groundbreaking advancements in this field. By optimizing the plane. In our experiment shown in Fig. 2, we used true depth
positions, rotations, scales, and appearances of the explicit 3D to guide two depth rendering methods. After optimization, we
Gaussians and combining alpha-blend rendering, 3DGS has mapped the positions of all Gaussian points. Results show
achieved training times in the order of minutes and rendering that our method produces Gaussians that closely align with
speeds in the millisecond range. the actual surface, while the previous method generates noisy
Although 3DGS achieves high-fidelity novel view render- Gaussians that fail to adhere precisely to the surface.
ing and fast training and rendering speeds. As discussed in After rendering the plane parameters for each pixel, we
previous methods [19], [24], Gaussians often do not conform apply single-view and multi-view regularization to optimize
well to actual surfaces, resulting in poor geometric accuracy. these parameters. Empirically, adjacent pixels often belong to
Fig. 3 also shows this conclusion. Extracting accurate meshes the same plane. Using this local plane assumption, we compute
from millions of discrete Gaussian points is an extremely a normal map from neighboring pixel depth estimations and
challenging task. The fundamental reason for this lies in the ensure consistency between this normal map and the rendered
disorderly and irregular nature of Gaussians, which makes normal map. At geometric edges, the local plane assumption
them unable to accurately model the surfaces of real scenes. fails, so we detect these edges using image edges and reduce
Moreover, optimizing solely based on image reconstruction the weight in these areas, achieving smooth geometry and
loss can easily lead to local optima, ultimately resulting in consistent depth and normals. However, due to the discrete and
Gaussians failing to conform to actual surfaces and exhibiting unordered nature of Gaussians, geometry may be inconsistent
poor geometric accuracy. In many practical tasks, geometric across multiple views. To address this, we apply multi-view
reconstruction accuracy is a crucial metric. Therefore, to regularization ensuring global geometric consistency. Similar
address these issues, we propose a novel framework based to the Eikonal loss [56], we incorporate a multi-view geometric
on 3DGS that achieves high-fidelity geometric reconstruction consistency loss to ensures smooth and consistent geometric
while maintaining the high-quality rendering quality, fast train- reconstruction, even in areas with noise, blur, or weak textures.
ing, and rendering speeds characteristic of 3DGS. We use two photometric coefficients to compensate for
3

overall changes in image brightness, further improving re- either through triangulation [6] or implicit surface fitting [25],
construction quality. Finally, we validate the rendering and [26]. Despite being well-established and extensively utilized in
reconstruction quality on the MipNeRF360, the DTU [23] and academia and industry, these traditional methods are suscep-
the Tanks and Temples(TnT) [28] dataset. Experimental results tible to artifacts stemming from erroneous matching or noise
demonstrate that, while maintaining the original Gaussian introduced during the pipeline. In response, several approaches
rendering quality and rendering speed, our method achieves aim to enhance reconstruction completeness and accuracy by
state-of-the-art reconstruction accuracy. Moreover, our training integrating deep neural networks into the matching process
speed only requires one hour on a single GPU, while the state- [50], [54].
of-the-art method based on NeRF [33] requires eight GPUs
over two days. In summary, our method makes the following B. Neural Surface Reconstruction
contributions:
Numerous pioneering efforts have leveraged pure deep
• We propose a novel unbiased depth rendering method.
neural networks to predict surface models directly from single
Based on this rendering method, we can render the or multiple image conditions using point clouds [14], [34],
reliable plane parameters for each pixel, facilitating the voxels [12], [58], and triangular meshes [32], [55] or implicit
incorporation of various geometric constraints. fields [40], [47] in end-to-end manner. However, these methods
• We introduce single-view and multi-view regulariza-
often incur significant computational overhead during network
tions to optimize the plane parameters of each pixel, inference and demand extensively labeled training 3D models,
achieving high-precision global geometric consistency. hindering their real-time and real-world applicability.
• The exposure compensation simply and effectively en-
With the rapid advancement in neural surface reconstruction
hances reconstruction accuracy. tasks, a meticulously designed scene recovery method named
• Our method, while maintaining the high rendering ac-
NeRF [41] emerged. NeRF-based methods take 5D ray in-
curacy and speed of the original GS, achieves state-of- formation as input and predict density and color sampled in
the-art reconstruction accuracy, and our training time continuous space, yielding notably more realistic rendering
is near 100 times faster compared to state-of-the-art results. However, this representation falls short in capturing
reconstruction methods based on NeRF [33]. high-fidelity surfaces.
Consequently, several approaches have transformed NeRF-
based network architectures into surface reconstruction frame-
works by incorporating intermediate representations such as
occupancy [46] or signed distance fields [56], [60]. Despite the
potent surface reconstruction capabilities exhibited by NeRF-
based frameworks, the stacked multi-layer-perceptron (MLP)
layers impose constraints on inference time and representation
ability. To address this challenge, various following studies aim
Fig. 3: Rendered Depth. The original depth in 3DGS exhibits significant
noise, while our depth is smoother and more accurate. to reduce dependency on MLP layers by decomposing scene
information into separable structures, such as points [59] and
voxels [31], [33], [35].
II. R ELATED W ORK
C. Gaussian Splatting based Surface Reconstruction
Surface reconstruction is a cornerstone field in computer
graphics and computer vision, aimed at generating intricate SuGaR [19] proposed a method to extract Mesh from
and accurate surface representations from sparse or noisy 3DGS. They introduced regularization terms to encourage
input data. Obtaining high-fidelity 3D models from real-world Gaussian fitting to the scene surface. By sampling 3D point
environments is pivotal for enabling immersive experiences in clouds from the Gaussian using the density field, they utilized
augmented reality (AR) and virtual reality (VR). This paper Poisson reconstruction to extract a mesh from these sampled
focuses exclusively on surface reconstruction under given point clouds. While encouraging Gaussian fitting to the sur-
poses, which can be readily computed using SLAM [5], [7], face enhances geometric reconstruction accuracy, irregular 3D
[8] or SFM [43], [51], [57] methods. Gaussian shapes make modeling smooth geometric surfaces
challenging. Moreover, due to the discreteness and disorder
of the Gaussian, relying solely on image reconstruction loss
A. Traditional Surface Reconstruction can lead to overfitting, resulting in incomplete geometric
Traditional methods adhere to the universal multi-view information and surface mismatch. 2DGS [21] achieves view-
stereo pipeline, which can be roughly categorized based on consistent geometry by collapsing the 3D volume into a set
the intermediate representation they rely on, such as point of 2D oriented planar Gaussian disks. GOF [69] establishes
cloud [16], [30], volume [29], depth map [4], [17], [52], a Gaussian opacity field, enabling geometry extraction by
etc. The commonly used method separates the overall MVS directly identifying its level-set. However, these 3DGS-based
problem into several parts, by initially extracting dense point methods still produce biased depth and multi-view geometric ?
clouds from multi-view images through block-based match- consistency is not guaranteed. To address these issues, we
ing [1], followed by the construction of surface structures flattened the Gaussian into a planar shape, which is more
4

Fig. 4: PGSR Overview. We compress Gaussians into flat planes and render distance and normal maps, which are then transformed into unbiased depth maps.
Single-view and multi-view geometric regularization ensure high precision in global geometry. Exposure compensation RGB loss enhances reconstruction
accuracy.

suitable for modeling actual surfaces and facilitates rendering (SH) from the Gaussian Gi . Ti is the cumulative opacity. N
parameters such as normals and distances from the plane to is the number of Gaussians that the ray passes through.
the origin. Based on these plane parameters, we proposed The center µi of a Gaussian Gi . can be projected into the
unbiased depth estimation, allowing us to extract geometric camera coordinate system as:
parameters from the Gaussian. Then, we introduced geometric ⊤
xi , yi , zi , 1 = W [µi , 1]⊤ ,

regularization terms from single-view and multi-view to opti-
mize these geometric parameters, achieving globally consistent Previous Methods [11], [24] render depth under the current
high-precision geometric reconstruction. viewpoint: X
D= Ti αi zi .
III. P RELIMINARY OF 3D G AUSSIAN S PLATTING
i∈N
3DGS [27] explicitly represents 3D scenes with a set of
3D Gaussians {Gi }. Each Gaussian is defined by a Gaussian
function: IV. M ETHOD
1 ⊤
Σ−1 Given multi-view RGB images of static scenes, our goal
Gi (x|µi , Σi ) = e− 2 (x−µi ) i (x−µi ) , is to achieve efficient and high-fidelity scene geometry re-
where µi ∈ R3 and Σi ∈ R3×3 are the center of a point construction and rendering quality. Compared to 3DGS, we
pi ∈ P and corresponding 3D covariance matrix, respectively. achieve global consistency in geometry reconstruction while
The covariance matrix Σi is factorized into a scaling matrix maintaining similar rendering quality. Initially, we improve
Si ∈ R3×3 and a rotation matrix Ri ∈ R3×3 : the modeling of scene geometry attributes by compressing
3D Gaussians into a 2D flat plane representation, which
Σi = Ri Si Si⊤ Ri⊤ .
is used to generate plane distance and normal maps, and
3DGS allows fast α-blending for rendering. Given a trans- subsequently converted into unbiased depth maps. We then
formation matrix W and an intrinsic matrix K, µi and Σi introduce single-view geometric, multi-view photometric, and
can be transformed to camera coordinate corresponding to W geometric consistency loss to ensure global geometry consis-
and then projected to 2D coordinate: tency. Additionally, the exposure compensation model further
′ ′ improves reconstruction accuracy.
µi = KW [µi , 1]⊤ , Σi = J W Σi W ⊤ J ⊤ ,
where J is the Jacobian of the affine approximation for the A. Planar-based Gaussian Splatting Representation
projective transformation. Rendering color C ∈ R3 of a pixel
u can be obtained in a manner of α-blending: In this section, we will discuss how to transform 3D
Gaussians into a 2D flat plane representation. Based on this
i−1
X Y plane representation, we introduce an unbiased depth render-
C= Ti αi ci , Ti = (1 − αi ),
ing method, which will render plane-to-camera distance and
i∈N j=1
normal maps, and can then be converted into depth maps.
′ ′
where αi is calculated by evaluating Gi (u|µi , Σi ) multiplied With geometric depth, distance, and normal maps available,
with a learnable opacity corresponding to Gi , and the view- it becomes easier to introduce single-view regularization and
dependent color ci ∈ R3 is represented by spherical harmonics multi-view regularization in the following sections.
5

Fig. 5: The rendering and mesh reconstruction results in various indoor and outdoor scenes that we have achieved. PGSR achieves high-precision
geometric reconstruction from a series of RGB images without requiring any prior knowledge.

to the ambiguity of the normal direction when there are two


directions for the shortest axis, we resolve this issue by using
the viewing direction to determine the normal direction. This
implies that the angle between the viewing direction and the
normal direction should be greater than 90 degrees. The final
normal map under the current viewpoint is achieved through
α-blending:
X i−1
Y
N= RcT ni αi (1 − αj ), (2)
i∈N j=1
Fig. 6: Unbiased Depth.
where Rc is the rotation from the camera to the global world.
The distance from the plane to the camera center can be
Due to the difficulty in modeling real-world scene geometry expressed as di = (RcT (µi − Tc ))RcT nTi , where Tc is the
attributes such as depth and normals using 3D Gaussian camera center in the world. µi is the center of gaussian Gi .
shapes, it’s necessary to flatten the 3D Gaussians into 2D The final distance map under the current viewpoint is achieved
flat Gaussians in order to accurately represent the geometry through α-blending: plane distance gaussian

surface of the actual scene. Achieving precise geometry re- X i−1


Y
construction and high-quality rendering requires the 2D flat D= di αi (1 − αj ), (3)
Gaussians to accurately conform to the scene surface. Since i∈N j=1
the 2D flat Gaussians approximate a local plane, we can
Referencing Fig. 6, after obtaining the distance and normal
conveniently render the depth and normals of the scene. P of the plane through rendering, we can determine the corre-
Flattening 3D Gaussian: The covariance matrix i = sponding depth map by intersecting rays with the plane:
Ri Si SiT RiT of a 3D Gaussian expresses the ellipsoidal shape.
Here, Ri represents the orthonormal basis of the ellipsoid’s D alpha-blednding
D(p) = . plane distance (4)
three axes, and the scale factor Si defines the size along each N (p)K −1 p̃ normal

direction. By compressing the scale factor along specific axes, plane /


where p = [u, v]T is the 2D position on the image plane.
the Gaussian ellipsoid can be flattened into planes aligned
p̃ denotes the homogeneous coordinate of p, and K is the
with those axes. We compress the Gaussian ellipsoid along the
intrinsic of camera.
direction of the minimum scale factor, effectively flattening the
As shown in Fig. 2, our method of rendering depth has
ellipsoid into a plane closest to its original shape. According
two major advantages compared to other depth rendering tech-
to the method [9], we directly minimize the minimum scale
niques. First, Our depth shapes are consistent with flattened
factor Si = diag(s1 , s2 , s3 ) for each Gaussian:
Gaussian shapes, which can truly reflect actual surfaces. Previ-
Ls =∥ min(s1 , s2 , s3 ) ∥1 . (1) ous methods typically involve directly rendering the depth map
based on α-blending of the depth Z of Gaussians. Their depth
Unbiased Depth Rendering: The direction of the minimum is curved, inconsistent with the flat Gaussian shape, causing
scale factor corresponds to the normal ni of the Gaussian. Due geometric conflicts. In contrast, we render the normal and
6

similar to normal consistency in 2DGS


distance maps of the plane first and then convert them into the Finally, we add the single-view normal loss is:
depth map. Our depth lies on the Gaussian fast plane. When 1 X 5
the 3D Gaussian flat planes fit the actual surface, the rendered Lsvgeo = ∇I ∥ Nd (p) − N (p) ∥1 , (6)
W
depth can ensure complete consistency with the actual surface. p∈W
Second, since the accumulation weight for each ray may Where ∇I is the image gradient normalized to the range of
be less than 1, previous rendering methods are affected by 0 to 1, N (p) is from Equation 2, and W is the set of image
the weight accumulation, potentially resulting in depths that pixels.
are closer to the camera side and overall underestimated. In 2) Multi-View Regularization: Single-view geometry regu-
contrast, our depth is obtained by dividing the distance from larization can maintain consistency between depth and normal
the rendering origin to the plane by the normal, effectively geometry, providing fairly accurate initial geometric informa-
eliminating the influence of weight accumulation coefficients. tion. However, due to the irregular discretization of Gaussian
point cloud optimization, we found that the geometry structure
across multiple views is not entirely consistent. Therefore, it
is necessary to introduce multi-view geometry regularization
to ensure global consistency of the geometry structure.
Multi-View Geometric Consistency: The image loss often
suffers from influences such as image noise, blur, and weak
textures. In these cases, the geometric solution for photomet-
ric consistency is unreliable. Due to the discrete nature of
Gaussian properties, we cannot establish a spatially dense or
semi-dense SDF field as in SDF methods based on NeRF. We
are unable to use spatial smoothness constraints, such as the
Fig. 7: Qualitative comparison on DTU dataset. PGSR produces smooth Eikonal loss [56], to avoid the influence of unreliable solutions.
and detailed surfaces. To mitigate the impact of unreliable geometric solutions and
ensure multi-view geometric consistency, we introduce this
consistency prior constraint, which helps converge to the
correct solution position, enhancing geometric smoothness.
B. Geometric Regularization
We render the normals N and the plane distances D to
1) Single-View Regularization: The original 3DGS relying the camera for both the reference frame and the neighboring
solely on image reconstruction loss can easily fall into local frame. As shown in Fig. 9, for a specific pixel pr in the
overfitting optimization, leading to Gaussian shapes incon- reference frame, the corresponding normal is nr and the
sistent with the actual surface. Based on this, we introduce distance is dr . The pixel pr in the reference frame can be
geometric constraints to ensure that the 3D Gaussian fits the mapped to a pixel pn in the neighboring frame through the
actual surface as closely as possible. homography matrix Hrn :
Local Plane Assumption: Encouraged by these meth-
p˜n = Hrn p˜r , (7)
ods [24], [37], [49], we adopt the assumption of local planarity
to constrain the local consistency of depth and normals, Trn nTr
Hrn = Kn (Rrn − )Kr−1 , (8)
meaning a pixel and its neighboring pixels can be considered dr
as an approximate plane. After rendering the depth map, where p̃ is the homogeneous coordinate of p, Rrn and Trn
we sample four neighboring points using a fixed template. are the relative transformation from the reference frame to the
With these known depths, we compute the plane’s normal. neighboring frame. Similarly, for the pixel pn in the neighbor-
This process is repeated for the entire image, generating ing frame, we can obtain the normal nn and the distance dn to
normals from the rendered depth map. We then minimize the compute the homography matrix Hnr . The pixel pr undergo
difference between this normal map and the rendered normal forward and backward projections between the reference frame
map, ensuring geometric consistency between local depth and and the neighboring frame through Hrn and Hnr . Minimizing
normals. the forward and backward projection error constitutes the
Image Edge-Aware Single-View Loss: Neighboring pixels multi-view geometric consistency regularization:
may not necessarily fully adhere to the local planarity as-
1 X
sumption, especially in edge regions. To address this issue, Lmvgeom = ϕ(pr ) (9)
We use image edges to approximate geometric edges. For a V
pr ∈V
pixel point p, we sample four points from the neighboring
where ϕ(pr ) =∥ pr − Hnr Hrn pr ∥ is the forward and
pixels, such as up, down, left, and right. We project the four
backward projection error of pr . When ϕ(pr ) exceeds a certain
sampled depth points into 3D points {Pj |j = 1, ..., 4} in the
threshold, it can be considered that the pixel is occluded or
camera coordinate system, then calculate the normal of the
that there is a significant geometric error. To prevent errors
local plane for the pixel point p is:
caused by occlusion, these pixels will not be included in the
(P1 − P0 ) × (P3 − P2 ) multi-view regularization term. If these pixels are mistakenly
Nd (p) = , (5) identified as occluded due to geometric errors, it does not
|(P1 − P0 ) × (P3 − P2 )|
forward-backward
exclude
7

Fig. 8: Qualitative comparison on Tanks and Temples dataset. We visualize surface quality using a normal map generated from the reconstructed mesh.
PGSR outperforms other baseline approaches in capturing scene details, whereas baseline methods exhibit missing or noisy surfaces.

affect our final convergence. This is because the single-view pr to the neighboring frame patch Pn using the homography
regularization term and the use of sparse 3D Gaussians to matrix Hrn . Focusing on geometric details, we convert color
represent dense scenes will gradually propagate high-precision images into grayscale. Multi-view photometric regularization
geometry, eventually leading all Gaussians to converge to the requires that Pr and Pn should be as consistent as possible.
correct positions. V is the set of all pixels in the image We use the normalized cross correlation (NCC) [68] of patches
excluding those with high forward and backward projection in the reference frame and the neighboring frame to measure
error. the photometric consistency:
Multi-View Photometric Consistency: Drawing inspira- 1 X
Lmvrgb = (1 − N CC(Ir (pr ), In (Hrn pr ))), (10)
tion from multi-view Stereo (MVS methods) [4], [15], [51], we V
pr ∈V
employ photometric multi-view consistency constraints based
on plane patches. We map a 11x11 pixel patch Pr centered at where V is the set of all pixels in the image, excluding
8

TABLE I: Quantitative results of rendering quality for novel view synthesis on Mip-NeRF360 dataset. ”Red”, ”Orange” and ”Yellow” denote the best,
second-best, and third-best results. PGSR achieves results close to 3DGS and outperforms similar reconstruction method SuGaR.
Indoor scenes Outdoor scenes Average on all scenes
PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓
NeRF [41] 26.84 0.790 0.370 21.46 0.458 0.515 24.15 0.624 0.443
NeRF-based

Deep Blending [20] 26.40 0.844 0.261 21.54 0.524 0.364 23.97 0.684 0.313
INGP [44] 29.15 0.880 0.216 22.90 0.566 0.371 26.03 0.723 0.294
M-NeRF360 [2] 31.72 0.917 0.180 24.47 0.691 0.283 28.10 0.804 0.232
Neus [56] 25.10 0.789 0.319 21.93 0.629 0.600 23.74 0.720 0.439
GS-based

3DGS [27] 30.99 0.926 0.199 24.24 0.705 0.283 27.24 0.803 0.246
SuGaR [19] 29.44 0.911 0.216 22.76 0.631 0.349 26.10 0.771 0.283
2DGS [21] 30.39 0.923 0.183 24.33 0.709 0.284 27.03 0.804 0.239
GOF [69] 30.80 0.928 0.167 24.76 0.742 0.225 27.78 0.835 0.196
PGSR 30.41 0.930 0.161 24.45 0.730 0.224 27.43 0.830 0.193

ground truth image, while the SSIM loss requires the rendered
image to have similar structures to the ground truth image. To
enhance the robustness of exposure coefficient estimation, we
need to ensure that the rendered image and the ground truth
image have sufficient structural similarity before performing
the estimation. After training, Iir is required to be globally
consistent and maintain structural similarity with the ground
truth image, while Iia can adjust the brightness of images to
match the ground truth image perfectly.

D. Training
In summary, our final training loss L consists of the image
Fig. 9: Multi-view photometric and geometric loss. reconstruction loss Lrgb , the flattening 3D Gaussian loss Ls ,
the geometric loss Lgeo :
those with high forward and backward projection errors. L = Lrgb + λ1 Ls + Lgeo . (15)
3) Geometric Regularization Loss: Finally, the geomet- We set λ1 = 100. For the image reconstruction loss, we set
ric regularization loss includes single-view geometric, multi- λ = 0.2. For the geometric loss, we set λ2 = 0.01, λ3 = 0.2,
view geometric, and multi-view photometric consistency con- and λ4 = 0.05.
straints:
V. E XPERIMENTS
Lgeo = λ2 Lsvgeo + λ3 Lmvrgb + λ4 Lmvgeom . (11) Datasets: To validate the effectiveness of our method, we
conducted experiments on various real-world datasets, includ-
ing objects, and indoor and outdoor environments. We chose
C. Exposure Compensation Image Loss the widely used MiP-NeRF360 dataset [2] for evaluating novel
Due to changes in external lighting conditions, cameras may view synthesis performance. The large and complex scenes
have different exposure times during different shooting mo- of the TnT [28] and 15 object-centric scenes of the DTU
ments, leading to overall brightness variations in images. The dataset [23] were selected to assess reconstruction quality.
original 3DGS does not consider brightness changes, which Evaluation Criterion: We chose three widely used image
can result in floating artifacts in practical scenes. To model evaluation metrics to validate novel view synthesis: peak
the overall brightness variations at different times, we assign signal-to-noise ratio (PSNR), structural similarity index mea-
two exposure coefficients, a and b, to each image. Ultimately, sure (SSIM), and the learned perceptual image patch similarity
images with exposure compensation can be obtained by simply (LPIPS) [70]. For assessing surface quality, we employed the
computing with exposure coefficients: F1 score and chamfer distance.
Implementation Details: Our training strategy and hy-
Iia = exp(ai )Iir + bi , (12)
perparameters are generally consistent with 3DGS [27]. The
where Iiris the rendered image and Iia
is the exposure- training iterations for all scenes are set to 30,000. We adopt
adjusted image. We employ the following image loss: the densification strategy of AbsGS [67]. The learning rate
for the exposure coefficient is 0.001. We begin by rendering
Lrgb = (1 − λ)L1 (I˜ − Ii ) + λLSSIM (Iir − Ii ). (13) the depth for each training view, followed by utilizing the
(
Iia , if LSSIM (Iir − Ii ) < 0.5 TSDF Fusion algorithm [45] to generate the corresponding
I˜ = (14) TSDF field. Subsequently, we extract the mesh [38] from the
Iir , if LSSIM (Iir − Ii ) >= 0.5
TSDF field. We only utilize the exposure compensation on the
where Ii is the ground truth image. The L1 loss constraint Tanks and Temples dataset. All experiments in this paper are
ensures that the exposure-adjusted image is consistent with the conducted on Nvidia RTX 4090 GPU.
9

TABLE II: Quantitative results of chamfer distance(mm)↓ on DTU dataset [23]. PGSR achieves the highest reconstruction accuracy and is over 100 times
faster than the SDF method based on NeRF.

24 37 40 55 63 65 69 83 97 105 106 110 114 118 122 Mean Time


VolSDF [60] 1.14 1.26 0.81 0.49 1.25 0.70 0.72 1.29 1.18 0.70 0.66 1.08 0.42 0.61 0.55 0.86 > 12h
NeuS [56] 1.00 1.37 0.93 0.43 1.10 0.65 0.57 1.48 1.09 0.83 0.52 1.20 0.35 0.49 0.54 0.84 > 12h
Neuralangelo [33] 0.37 0.72 0.35 0.35 0.87 0.54 0.53 1.29 0.97 0.73 0.47 0.74 0.32 0.41 0.43 0.61 > 128h
SuGaR [19] 1.47 1.33 1.13 0.61 2.25 1.71 1.15 1.63 1.62 1.07 0.79 2.45 0.98 0.88 0.79 1.33 1h
2DGS [21] 0.48 0.91 0.39 0.39 1.01 0.83 0.81 1.36 1.27 0.76 0.70 1.40 0.40 0.76 0.52 0.80 0.32h
GOF [69] 0.50 0.82 0.37 0.37 1.12 0.74 0.73 1.18 1.29 0.68 0.77 0.90 0.42 0.66 0.49 0.74 2h
PGSR(DS) 0.34 0.58 0.29 0.29 0.78 0.58 0.54 1.01 0.73 0.51 0.49 0.69 0.31 0.37 0.38 0.53 0.6h
PGSR 0.31 0.52 0.27 0.27 0.76 0.54 0.49 0.98 0.69 0.49 0.46 0.56 0.28 0.35 0.36 0.49 1.0h

TABLE III: Quantitative results of F1 Score↑ for reconstruction on Tanks


and Temples dataset. PGSR achieves similar reconstruction accuracy to
Neuralgangelo, but our training speed is over a hundred times faster.
NeuS Geo-Neus Neurlangelo SuGaR 2D GS GOF PGSR
Barn 0.29 0.33 0.70 0.14 0.36 0.51 0.66
Caterpillar 0.29 0.26 0.36 0.16 0.23 0.41 0.41
Courthouse 0.17 0.12 0.28 0.08 0.13 0.28 0.21
Ignatius 0.83 0.72 0.89 0.33 0.44 0.68 0.80
Meetingroom 0.24 0.20 0.32 0.15 0.16 0.28 0.29
Truck 0.45 0.45 0.48 0.26 0.26 0.58 0.60
Mean 0.38 0.35 0.50 0.19 0.30 0.46 0.50
Time >24h >24h >128h 2h 34.2 m 2h 1.2h Fig. 10: The qualitative comparison of our unbiased depth method with
the previous depth method [11], [24] is depicted in the normal map. Our
overall geometric structure appears smoother and more precise.
A. Real-time Rendering
TABLE IV: Ablation study on the Meetingroom of TnT dataset.
For the validation of rendering quality, we follow the
3DGS method and conduct validation on the Mip-NeRF360 Model setting F1-Score↑ PSNR↑
dataset [2]. We compare with current state-of-the-art meth- w/o Single-view 0.26 27.46
ods for pure novel view synthesis as well as similar re- w/o Multi-view 0.15 28.14
w/o Our unbiased depth 0.20 26.80
construction methods to ours, including NeRF [41], Deep
Full model 0.29 27.30
Blending [20], INGP [44], Mip-NeRF360 [2], NeuS [56],
3DGS [27], SuGaR [19], 2DGS [21], and GOF [69]. As shown
in Table I and Fig. 5, compared to the current state-of-the- precise, especially in flat regions. Table IV also demonstrates
art methods, our approach not only provides excellent surface that our depth rendering method achieves higher reconstruction
reconstruction quality but also achieves outstanding novel view and rendering accuracy.
synthesis results. Single-View and Multi-view Regularization: The single-
view regularization term can provide a good initial geometric
B. Reconstruction accuracy without relying on multi-view information. When
single-view regularization is removed, the reconstruction ac-
We compared our method, PGSR, with current state-of-the-
curacy decreases. Multi-view regularization effectively con-
art neural surface reconstruction methods including NeuS [56],
strains the consistency of geometry between multiple views,
Geo-NeuS [15], and NeuralAngelo [33]. We also compared it
improving overall reconstruction accuracy. From Table IV,
with recently emerged reconstruction methods based on 3DGS,
it is evident that multi-view regularization is crucial for
such as SuGaR [19], 2DGS [21], and GOF [69]. All results
reconstruction accuracy.
are summarized in Fig. 5, Fig. 7, Fig. 8, Table II and Table III.
Exposure Compensation: We validated the exposure com-
The DTU dataset: Our method achieves the highest
pensation on the Ignatius series of the TnT dataset. As shown
reconstruction accuracy with relatively fast training speed.
in Table V, exposure compensation enhances reconstruction
PGSR(DS) denotes downsampling to half the original image
and rendering quality.
size for training. Our method significantly outperforms other
3DGS-based reconstruction methods. As shown in Fig. 7, our
surfaces are smoother and contain more details. D. Virtual Reality Application
The TnT dataset: The F1 score of PGSR is similar to As shown in Fig. 11, we used our method to separately
NeuralAngelo and better compared to other current reconstruc- reconstruct the original materials. We then extracted the exca-
tion methods. Our training time is over 100 times faster than vator and Ignatius using masks and placed them in the garden
NeuralAngelo. Moreover, compared to NeuralAngelo, we can scene. By rendering the scene and objects separately and using
reconstruct more surface details.
TABLE V: Ablation study on exposure Compensation.
C. Ablations Model setting F1-Score↑ PSNR↑
Our Unbiased Depth: From Fig 10, it can be observed that w/o exposure modeling 0.76 21.71
our overall geometric structure appears smoother and more w exposure modeling 0.80 25.77
10

[2] Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman,


Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-nerf: A multiscale
representation for anti-aliasing neural radiance fields. In Proceedings
of the IEEE/CVF International Conference on Computer Vision, pages
5855–5864, 2021.
[3] Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan,
and Peter Hedman. Zip-nerf: Anti-aliased grid-based neural radiance
fields. In Proceedings of the IEEE/CVF International Conference on
Computer Vision, pages 19697–19705, 2023.
[4] Neill DF Campbell, George Vogiatzis, Carlos Hernández, and Roberto
Cipolla. Using multiple hypotheses to improve depth-maps for multi-
view stereo. In Computer Vision–ECCV 2008: 10th European Con-
ference on Computer Vision, Marseille, France, October 12-18, 2008,
Proceedings, Part I 10, pages 766–779. Springer, 2008.
[5] Carlos Campos, Richard Elvira, Juan J Gómez Rodrı́guez, José MM
Montiel, and Juan D Tardós. Orb-slam3: An accurate open-source library
for visual, visual–inertial, and multimap slam. IEEE Transactions on
Robotics, 37(6):1874–1890, 2021.
[6] Frédéric Cazals and Joachim Giesen. Delaunay triangulation based
Fig. 11: Virtual Reality Application. (a) Original materials, including
surface reconstruction. In Effective computational geometry for curves
garden scene, excavator, and Ignatius. (b) A Virtual Reality effect showcase
and surfaces, pages 231–276. Springer, 2006.
synthesized from these original materials.
[7] Danpeng Chen, Nan Wang, Runsen Xu, Weijian Xie, Hujun Bao, and
Guofeng Zhang. Rnin-vio: Robust neural inertial navigation aided
our rendered depth to determine occlusion relationships, we visual-inertial odometry in challenging scenes. In 2021 IEEE Inter-
national Symposium on Mixed and Augmented Reality (ISMAR), pages
achieved immersive, high-fidelity virtual reality effects with 275–283. IEEE, 2021.
high-precision depth estimation. [8] Danpeng Chen, Shuai Wang, Weijian Xie, Shangjin Zhai, Nan Wang,
Hujun Bao, and Guofeng Zhang. Vip-slam: An efficient tightly-coupled
rgb-d visual inertial planar slam. In 2022 International Conference on
VI. L IMITATIONS AND F UTURE W ORK Robotics and Automation (ICRA), pages 5615–5621. IEEE, 2022.
[9] Hanlin Chen, Chen Li, and Gim Hee Lee. Neusg: Neural implicit surface
Although our PGSR efficiently and faithfully performs geo- reconstruction with 3d gaussian splatting guidance. arXiv preprint
arXiv:2312.00846, 2023.
metric reconstruction, it also faces several challenges. Firstly, [10] Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang
we cannot perform geometric reconstruction in regions with Tang, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi
missing or limited viewpoints, leading to incomplete or less ac- Zhang. Artist-Created Mesh Generation with Autoregressive Transform-
ers. arXiv, 2024.
curate geometry. Exploring methods to improve reconstruction [11] Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma,
quality under insufficient constraints using priors is another Wenping Wang, and Xuejin Chen. Gaussianpro: 3d gaussian splatting
avenue for further investigation. Secondly, our method does with progressive propagation. arXiv preprint arXiv:2402.14650, 2024.
[12] Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and
not consider scenarios involving reflective surfaces or mirrors, Silvio Savarese. 3D-R2N2: A unified approach for single and multi-
so reconstruction in these environments will pose challenges. view 3D object reconstruction. In European Conference on Computer
Integrating with existing 3DGS work that accounts for reflec- Vision, volume 9912, pages 628–644, 2016.
[13] Nianchen Deng, Zhenyi He, Jiannan Ye, Budmonde Duinkharjav, Pra-
tive surfaces would enhance reconstruction accuracy in such neeth Chakravarthula, Xubo Yang, and Qi Sun. Fov-nerf: Foveated neu-
scenarios. Finally, we found that there are some floating points ral radiance fields for virtual reality. IEEE Transactions on Visualization
in the scene, which affect the rendering and reconstruction and Computer Graphics, 28(11):3854–3864, 2022.
[14] Haoqiang Fan, Hao Su, and Leonidas J. Guibas. A point set generation
quality. Integrating more advanced 3DGS baselines [39] would network for 3D object reconstruction from a single image. In IEEE
help further enhance overall quality. Conference on Computer Vision and Pattern Recognition, pages 2463–
2471, 2017.
[15] Qiancheng Fu, Qingshan Xu, Yew Soon Ong, and Wenbing Tao. Geo-
VII. C ONCLUSION neus: Geometry-consistent neural implicit surfaces learning for multi-
view reconstruction. Advances in Neural Information Processing Sys-
In this paper, we propose a novel unbiased depth rendering tems, 35:3403–3416, 2022.
[16] Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust
method based on 3DGS. With this method, we render the multiview stereopsis. IEEE Transactions on Pattern Analysis and
plane geometry parameters for each pixel, including normal, Machine Intelligence, 32(8):1362–1376, 2010.
distance, and depth maps. We then incorporate single-view and [17] Silvano Galliani, Katrin Lasinger, and Konrad Schindler. Massively
parallel multiview stereopsis by surface normal diffusion. In Proceedings
multi-view geometric regularization, and exposure compensa- of the IEEE international conference on computer vision, pages 873–
tion model to achieve precise global consistency in geometry. 881, 2015.
We validate our rendering and reconstruction quality on the [18] Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian
Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui
MipNeRF360, DTU, and TnT datasets. The experimental Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai
results indicate that our method achieves the highest geometric Ye, He Tong, Jingwen He, Yu Qiao, and Hongsheng Li. Lumina-t2x:
reconstruction accuracy and rendering quality compared to the Transforming text into any modality, resolution, and duration via flow-
based large diffusion transformers. arXiv preprint arxiv:2405.05945,
current state-of-the-art methods. 2024.
[19] Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian
splatting for efficient 3d mesh reconstruction and high-quality mesh
R EFERENCES rendering. arXiv preprint arXiv:2311.12775, 2023.
[20] Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George
[1] Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Gold- Drettakis, and Gabriel Brostow. Deep blending for free-viewpoint image-
man. Patchmatch: A randomized correspondence algorithm for structural based rendering. ACM Transactions on Graphics (ToG), 37(6):1–15,
image editing. ACM Trans. Graph., 28(3):24, 2009. 2018.
11

[21] Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua the Asian Computer Vision Conference (ACCV 2012), pages 257–270.
Gao. 2d gaussian splatting for geometrically accurate radiance fields. Springer Berlin Heidelberg, 2012.
arXiv preprint arXiv:2403.17888, 2024. [44] Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller.
[22] Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Instant neural graphics primitives with a multiresolution hash encoding.
Binbin Lin, Deng Cai, and Wanli Ouyang. Nerf-det++: Incorporating ACM transactions on graphics (TOG), 41(4):1–15, 2022.
semantic cues and perspective-aware depth supervision for indoor multi- [45] Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David
view 3d detection. arXiv preprint arXiv:2402.14464, 2024. Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie
[23] Rasmus Jensen, Anders Dahl, George Vogiatzis, Engin Tola, and Henrik Shotton, Steve Hodges, and Andrew Fitzgibbon. Kinectfusion: Real-time
Aanæs. Large scale multi-view stereopsis evaluation. In Proceedings of dense surface mapping and tracking. In 2011 10th IEEE international
the IEEE conference on computer vision and pattern recognition, pages symposium on mixed and augmented reality, pages 127–136. Ieee, 2011.
406–413, 2014. [46] Michael Niemeyer, Lars M. Mescheder, Michael Oechsle, and Andreas
[24] Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Geiger. Differentiable volumetric rendering: Learning implicit 3D rep-
Wenping Wang, and Yuexin Ma. Gaussianshader: 3d gaussian splat- resentations without 3D supervision. In IEEE Conference on Computer
ting with shading functions for reflective surfaces. arXiv preprint Vision and Pattern Recognition, pages 3501–3512, 2020.
arXiv:2311.17977, 2023. [47] Jeong Joon Park, Peter Florence, Julian Straub, Richard A. Newcombe,
[25] Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. Poisson surface and Steven Lovegrove. DeepSDF: Learning continuous signed distance
reconstruction. In Proceedings of the fourth Eurographics symposium functions for shape representation. In IEEE Conference on Computer
on Geometry processing, volume 7, 2006. Vision and Pattern Recognition, pages 165–174, 2019.
[26] Michael Kazhdan and Hugues Hoppe. Screened poisson surface recon- [48] Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dream-
struction. ACM Transactions on Graphics (ToG), 32(3):1–13, 2013. fusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988,
[27] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George 2022.
Drettakis. 3d gaussian splatting for real-time radiance field rendering. [49] Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, Raquel Urtasun, and Jiaya Jia.
ACM Transactions on Graphics, 42(4):1–14, 2023. Geonet: Geometric neural network for joint depth and surface normal
[28] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks estimation. In Proceedings of the IEEE Conference on Computer Vision
and temples: Benchmarking large-scale scene reconstruction. ACM and Pattern Recognition, pages 283–291, 2018.
Transactions on Graphics (ToG), 36(4):1–13, 2017. [50] Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dym-
[29] Kiriakos N Kutulakos and Steven M Seitz. A theory of shape by space czyk. From coarse to fine: Robust hierarchical localization at large scale.
carving. International journal of computer vision, 38:199–218, 2000. In CVPR, 2019.
[30] Maxime Lhuillier and Long Quan. A quasi-dense approach to surface [51] Johannes L Schonberger and Jan-Michael Frahm. Structure-from-motion
reconstruction from uncalibrated images. IEEE transactions on pattern revisited. In Proceedings of the IEEE conference on computer vision
analysis and machine intelligence, 27(3):418–433, 2005. and pattern recognition, pages 4104–4113, 2016.
[31] Hai Li, Xingrui Yang, Hongjia Zhai, Yuqian Liu, Hujun Bao, and
[52] Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-
Guofeng Zhang. Vox-surf: Voxel-based implicit surface representation.
Michael Frahm. Pixelwise view selection for unstructured multi-view
IEEE Transactions on Visualization and Computer Graphics, 2022.
stereo. In European Conference on Computer Vision (ECCV), 2016.
[32] Hai Li, Weicai Ye, Guofeng Zhang, Sanyuan Zhang, and Hujun Bao.
[53] Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng.
Saliency guided subdivision for single-view mesh reconstruction. In
Dreamgaussian: Generative gaussian splatting for efficient 3d content
2020 International Conference on 3D Vision (3DV), pages 1098–1107.
creation. arXiv preprint arXiv:2309.16653, 2023.
IEEE, 2020.
[54] Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale,
[33] Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H Taylor, Mathias
and Marc Pollefeys. Patchmatchnet: Learned multi-view patchmatch
Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-
stereo. In Proceedings of the IEEE/CVF conference on computer vision
fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF
and pattern recognition, pages 14194–14203, 2021.
Conference on Computer Vision and Pattern Recognition, pages 8456–
8465, 2023. [55] Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and
[34] Chen-Hsuan Lin, Chen Kong, and Simon Lucey. Learning efficient point Yu-Gang Jiang. Pixel2mesh: Generating 3D mesh models from single
cloud generation for dense 3D object reconstruction. In Conference on RGB images. In European Conference on Computer Vision, volume
Artificial Intelligence, pages 7114–7121, 2018. 11215, pages 55–71, 2018.
[35] Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian [56] Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Ko-
Theobalt. Neural sparse voxel fields. In Advances in Neural Information mura, and Wenping Wang. Neus: Learning neural implicit surfaces
Processing Systems, pages 15651–15663, 2020. by volume rendering for multi-view reconstruction. arXiv preprint
[36] Xiangyu Liu, Weicai Ye, Chaoran Tian, Zhaopeng Cui, Hujun Bao, arXiv:2106.10689, 2021.
and Guofeng Zhang. Coxgraph: multi-robot collaborative, globally [57] Changchang Wu. Towards linear-time incremental structure from mo-
consistent, online dense reconstruction system. In 2021 IEEE/RSJ tion. In 2013 International Conference on 3D Vision-3DV 2013, pages
International Conference on Intelligent Robots and Systems (IROS), 127–134. IEEE, 2013.
pages 8722–8728. IEEE, 2021. [58] Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, and
[37] Xiaoxiao Long, Yuhang Zheng, Yupeng Zheng, Beiwen Tian, Cheng Lin, Shengping Zhang. Pix2Vox: Context-aware 3D reconstruction from
Lingjie Liu, Hao Zhao, Guyue Zhou, and Wenping Wang. Adaptive single and multi-view images. In IEEE/CVF International Conference
surface normal constraint for geometric estimation from monocular on Computer Vision, pages 2690–2698, 2019.
images. arXiv preprint arXiv:2402.05869, 2024. [59] Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan
[38] William E Lorensen and Harvey E Cline. Marching cubes: A high Sunkavalli, and Ulrich Neumann. Point-nerf: Point-based neural radi-
resolution 3d surface construction algorithm. In Seminal graphics: ance fields. In Proceedings of the IEEE/CVF conference on computer
pioneering efforts that shaped the field, pages 347–353. 1998. vision and pattern recognition, pages 5438–5448, 2022.
[39] Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua [60] Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume
Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering of neural implicit surfaces. In Advances in Neural Information
rendering. arXiv preprint arXiv:2312.00109, 2023. Processing Systems, pages 4805–4815, 2021.
[40] Lars M. Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian [61] Weicai Ye, Shuo Chen, Chong Bao, Hujun Bao, Marc Pollefeys,
Nowozin, and Andreas Geiger. Occupancy networks: Learning 3D Zhaopeng Cui, and Guofeng Zhang. IntrinsicNeRF: Learning Intrinsic
reconstruction in function space. In IEEE Conference on Computer Neural Radiance Fields for Editable Novel View Synthesis. In Proceed-
Vision and Pattern Recognition, pages 4460–4470, 2019. ings of the IEEE/CVF International Conference on Computer Vision,
[41] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T 2023.
Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes [62] Weicai Ye, Xinyu Chen, Ruohao Zhan, Di Huang, Xiaoshui Huang,
as neural radiance fields for view synthesis. Communications of the Haoyi Zhu, Hujun Bao, Wanli Ouyang, Tong He, and Guofeng Zhang.
ACM, 65(1):99–106, 2021. Dynamic-Aware Tracking Any Point for Structure from Motion in the
[42] Yuhang Ming, Weicai Ye, and Andrew Calway. idf-slam: End-to-end Wild. arXiv preprint, 2024.
rgb-d slam with neural implicit mapping and deep feature tracking. arXiv [63] Weicai Ye, Chenhao Ji, Zheng Chen, Junyao Gao, Xiaoshui Huang,
preprint arXiv:2209.07919, 2022. Song-Hai Zhang, Wanli Ouyang, Tong He, Cairong Zhao, and Guofeng
[43] Pierre Moulon, Pascal Monasse, and Renaud Marlet. Adaptive structure Zhang. DiffPano: Scalable and Consistent Text to Panorama Generation
from motion with a contrario model estimation. In Proceedings of with Spherical Epipolar-Aware Diffusion. arXiv preprint, 2024.
12

[64] Weicai Ye, Xinyue Lan, Shuo Chen, Yuhang Ming, Xingyuan Yu, Hujun
Bao, Zhaopeng Cui, and Guofeng Zhang. Pvo: Panoptic visual odometry.
In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 9579–9589, June 2023.
[65] Weicai Ye, Hai Li, Tianxiang Zhang, Xiaowei Zhou, Hujun Bao, and
Guofeng Zhang. SuperPlane: 3D plane detection and description from
a single image. In 2021 IEEE Virtual Reality and 3D User Interfaces
(VR), pages 207–215. IEEE, 2021.
[66] Weicai Ye, Xingyuan Yu, Xinyue Lan, Yuhang Ming, Jinyu Li, Hujun
Bao, Zhaopeng Cui, and Guofeng Zhang. Deflowslam: Self-supervised
scene motion decomposition for dynamic dense slam. arXiv preprint
arXiv:2207.08794, 2022.
[67] Zongxin Ye, Wenyu Li, Sidun Liu, Peng Qiao, and Yong Dou. Absgs:
Recovering fine details for 3d gaussian splatting. arXiv preprint
arXiv:2404.10484, 2024.
[68] Jae-Chern Yoo and Tae Hee Han. Fast normalized cross-correlation.
Circuits, systems and signal processing, 28:819–843, 2009.
[69] Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields:
Efficient and compact surface reconstruction in unbounded scenes. arXiv
preprint arXiv:2404.10772, 2024.
[70] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver
Wang. The unreasonable effectiveness of deep features as a perceptual
metric. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 586–595, 2018.
[71] Tianxiang Zhang, Chong Bao, Hongjia Zhai, Jiazhen Xia, Weicai Ye,
and Guofeng Zhang. Arcargo: Multi-device integrated cargo load-
ing management system with augmented reality. In 2021 IEEE Intl
Conf on Dependable, Autonomic and Secure Computing, Intl Conf on
Pervasive Intelligence and Computing, Intl Conf on Cloud and Big
Data Computing, Intl Conf on Cyber Science and Technology Congress
(DASC/PiCom/CBDCom/CyberSciTech), pages 341–348. IEEE, 2021.
13

PGSR
2DGS
GOF

scan24 scan37 scan40


PGSR
2DGS
GOF

scan55 scan63 scan65


Fig. 12: Qualitative comparisons in surface reconstruction between PGSR, 2DGS, and GOF on the DTU dataset.
14

PGSR
2DGS
GOF

scan69 scan83 scan97


PGSR
2DGS
GOF

scan105 scan106 scan110


Fig. 13: Qualitative comparisons in surface reconstruction between PGSR, 2DGS, and GOF on the DTU dataset.
15

PGSR
2DGS
GOF

scan114 scan118 scan122


Fig. 14: Qualitative comparisons in surface reconstruction between PGSR, 2DGS, and GOF on the DTU dataset.
16

Input
PGSR
2DGS
GOF

Fig. 15: Qualitative comparisons in surface reconstruction between PGSR, 2DGS, and GOF.
17

(a) Rendered RGB (b) Mesh (c) Mesh Normal

Fig. 16: PGSR achieves high-precision geometric reconstruction in various indoor and outdoor scenes from a series of RGB images without requiring any
prior knowledge.

You might also like