0% found this document useful (0 votes)

10 views12 pages

Sparc 3 D

Uploaded by

newgod20121309

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views12 pages

Sparc 3 D

Uploaded by

newgod20121309

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Sparc3D: Sparse Representation and Construction for

High-Resolution 3D Shapes Modeling

Zhihao Li1,2∗ , Yufei Wang1 , Heliang Zheng2,‡ , Yihao Luo2,3,† , Bihan Wen1,†
1
Department of EEE, Nanyang Technological University, Singapore
2 3
Math Magic Imperial-X, Imperial College London, UK
[email protected], [email protected], [email protected]
[email protected], [email protected]
arXiv:2505.14521v3 [cs.CV] 12 Jun 2025

https://lizhihao6.github.io/Sparc3D

Figure 1: Sparc3D Reconstruction Results. Leveraging our sparse deformable marching cubes
(Sparcubes) representation and sparse convolutional VAE (Sparconv-VAE), our method achieves
state-of-the-art reconstruction quality on challenging 3D inputs. It robustly handles open surfaces
(automatically closed into watertight meshes), recovers hidden interior structures, and faithfully
reconstructs highly complex geometries (see zoom-in views, top to bottom). All outputs are fully
watertight and 3D-printable, demonstrating the potential of our framework for high-resolution 3D
mesh generation. Best viewed with zoom-in.

Abstract

High-fidelity 3D object synthesis remains significantly more challenging than 2D

image generation due to the unstructured nature of mesh data and the cubic complex-
ity of dense volumetric grids. Existing two-stage pipelines—compressing meshes
with a VAE (using either 2D or 3D supervision), followed by latent diffusion sam-
pling—often suffer from severe detail loss caused by inefficient representations and
modality mismatches introduced in VAE. We introduce Sparc3D, a unified frame-
work that combines a sparse deformable marching cubes representation Sparcubes
∗
This work was conducted during Zhihao Li’s research internship at Math Magic.
†
Corresponding authors; ‡ :project lead.

Preprint. Under review.

with a novel encoder Sparconv-VAE. Sparcubes converts raw meshes into high-
resolution (10243 ) surfaces with arbitrary topology by scattering signed distance
and deformation fields onto a sparse cube, allowing differentiable optimization.
Sparconv-VAE is the first modality-consistent variational autoencoder built en-
tirely upon sparse convolutional networks, enabling efficient and near-lossless
3D reconstruction suitable for high-resolution generative modeling through latent
diffusion. Sparc3D achieves state-of-the-art reconstruction fidelity on challenging
inputs, including open surfaces, disconnected components, and intricate geometry.
It preserves fine-grained shape details, reduces training and inference cost, and
integrates naturally with latent diffusion models for scalable, high-resolution 3D
generation.

1 Introduction

Recent breakthroughs in 3D object generation [2, 15, 23, 28, 29, 32, 39] have enabled applications
in virtual domains, such as AR/VR [3, 14, 16, 19] and robotics simulation [34, 35]—as well as in
physical contexts like 3D printing [29]. Despite this progress, synthesizing high-fidelity 3D assets
remains far more challenging than generating 2D imagery or text stemming from the inherently
unstructured nature of 3D data and the cubic scaling of dense volumetric representations.
Drawing on the success of text-to-image diffusion models [24], many 3D generation pipelines [2,
15, 28, 32, 39] employ a two-stage process: a variational autoencoder (VAE) followed by latent
diffusion. Most current VAEs employ either 3D supervision[2, 15, 28, 39]—typically via a global
vector set representation—or 2D supervision[32]—commonly via a sparse-voxel representation—but
both suffer from limited resolution and a modality mismatch between inputs and outputs.
3D supervised VAEs [2, 15, 28, 39] require watertight meshes for sampling supervision signals, yet
most raw meshes are not watertight and must be remeshed. The common pipeline [2, 15, 39] shown
in Fig. 2, first samples an Unsigned Distance Function (UDF) on a voxel grid, then approximates a
Signed Distance Function (SDF) by subtracting two voxel sizes—halving the effective resolution and
introducing errors that propagate through both VAE reconstruction and diffusion generation. After
applying Marching Cubes [18] or Dual Marching Cubes [25], the mesh becomes double-layered,
necessitating an additional step to retain only the largest connected component. This step inadvertently
discards smaller yet crucial features, misaligning the conditioned input image from raw mesh with
the reconstructed 3D shape during diffusion training.
Most recently, TRELLIS [32] demonstrated the possibility of training a 3D VAE using only 2D
supervision. While it avoids degradation from mesh conversion, it still depends on dense volumetric
grids for 2D projections, which limit resolution. Moreover, lacking any 3D topological constraints,
the generated object’s interior geometry may be incorrect and its surfaces can remain open—a critical
flaw for applications such as 3D printing.
All of these VAEs also contend with a fundamental modality gap: 3D supervised methods [2, 15,
28, 39] ingest surface points and normals but decode SDF values, while TRELLIS [32] encodes
voxelized DINOv2 [22] features into SDF. Bridging this gap requires heavy attention mechanisms,
which increase model complexity and risk of amplifying underlying inconsistencies.
In this work, we introduce Sparcubes (Sparse Deformable Marching Cubes), a fast, near-lossless
pipeline for converting raw meshes into watertight surfaces. Our method begins by identifying a
sparse set of activated voxels from the input mesh and performing a flood-fill to assign coarse signed
labels. We then optimize grid-vertex deformations via gradient descent and refine them using a
view-dependent 2D rendering loss. Sparcubes converts a raw mesh into a watertight 10243 grid in
under 30 seconds—achieving a threefold speedup over prior watertight conversion methods [2, 15]
without sacrificing fine details or small components.
Building on Sparcubes, we introduce Sparconv-VAE, a modality-consistent VAE composed of a
sparse encoder and a self-pruning decoder. By eliminating the modality gap, Sparconv-VAE can
employ a lightweight architecture without relying on heavy global attention mechanisms. Experi-
mental results show that our VAE achieves state-of-the-art reconstruction performance and minimal
training cost. Furthermore, it seamlessly integrates with existing latent diffusion pipelines such as
TRELLIS [32], further enhancing the resolution of the generated 3D objects.

2
Raw mesh UDF SDF Double-layer mesh Reconstructed mesh

Sample UDF to Marching Keep max

UDF SDF Cubes
❌ watertight component ✅ watertight
near far inner error error outer
raw mesh recover mesh raw mesh recover mesh double-layer missing component

Figure 2: Problems of the previous SDFs extraction pipeline. The widely used SDFs extraction
workflow [2, 15, 39] suffers from two critical failures: resolution degradation (show as error) and
missing geometry (circled on the right). Converting UDF to SDF by subtracting two voxel sizes
effectively halves the spatial resolution. Moreover, the SDF extraction yields a double-layer mesh,
from which only the largest connected component is retained, inadvertently discarding smaller
but important component. Together, these two deficiencies substantially limit the upper-bound
performance of downstream VAEs and generation models. Best viewed with zoom-in.

Our main contributions can be summarized as:

• We propose Sparcubes, a fast, near-lossless remeshing algorithm that converts raw meshes
into watertight surfaces at 10243 resolution in approximately 30 s, achieving a 3× speedup
over prior methods without sacrificing any components.
• We introduce Sparconv-VAE, a modality-consistent variational autoencoder that employs a
sparse convolutional encoder and a self-pruning decoder. By eliminating the input–output
modality gap, our architecture achieves high computational efficiency and near-lossless
reconstruction, without global attention.
• Experimental results show that, our Sparc3D framework, comprising Sparcubes and
Sparconv-VAE, achieves state-of-the-art reconstruction fidelity, reduces training cost, and
seamlessly integrates with current latent diffusion frameworks to enhance the resolution of
generated 3D objects.

2 Related Work

2.1 3D Shape Representation and Generation

Mesh and point cloud. Triangle meshes and point clouds are the most common representations of
3D data. Triangle meshes, composed of vertices and triangular faces, offer precise modeling of surface
detail and arbitrary topology. However, their irregular graph structure complicates learning, requiring
neural networks to handle non-uniform neighborhoods, varying vertex counts, and the absence of a
canonical ordering. To address this, recent methods [4, 5, 11, 27, 30] adopt autoregressive models to
jointly generate geometry and connectivity, though these suffer from limited context length and slow
sampling. In contrast, point clouds represent surfaces as unordered sets of 3D points, making them
easy to sample from distributions [20, 31, 33], but difficult to convert directly into watertight surfaces
due to the lack of explicit connectivity [9, 36].
Isosurface. Isosurface is a continuous surface representing a mesh boundary via a signed distance
field (SDF). Most methods [13, 18, 25, 26] subdivide space into voxels, polygonize each cell, then
stitch them into a mesh. Marching Cubes uses a fixed lookup table but can suffer topological
ambiguities [18]. Dual Marching Cubes (DMC) fixes this by placing vertices on edges where the
isosurface crosses and linking them via Dual Contouring, yielding watertight meshes [13, 25]. Both
rely on a uniform cube size, limiting detail; FlexiCubes [26] deforms the grid and applies isosurface
slope weights to adapt voxel sizes to local geometry, improving accuracy.
Many 3D generation methods adopt SDF-based supervision [2, 15, 23, 28, 32, 38, 39]. Techniques
relying solely on 2D supervision [32] may implicitly learn an SDF, but often produce open surfaces
or incorrect interiors due to the lack of volumetric constraints. In contrast, fully 3D-supervised
approaches [2, 15, 23, 28, 38, 39] extract explicit SDFs from meshes with arbitrary topology, mak-
ing accurate and adaptive SDF extraction a critical but challenging component for high-fidelity
reconstruction.

3
2.2 3D Shape VAEs

VecSet-based VAEs. VecSet-based methods represent 3D shapes as sets of global latent vectors
constructed from local surface features. 3DShape2VecSet [37] embeds sampled points and normals
into a VecSet using a transformer and supervises decoding via surrounding SDF values. CLAY [38]
scales the architecture to larger datasets and model sizes, while TripoSG [17] enhances expressiveness
through mixture-of-experts (MoE) modules. Dora [2] and Hunyuan2 [39] improve sampling by
prioritizing high-curvature regions. Despite these advances, all approaches face a modality mismatch:
local point features are compressed into global latent vectors and decoded back to local fields, forcing
the VAE to perform both feature abstraction and modality conversion, which increases reliance on
attention and model complexity.

Sparse voxel–based VAEs. In contrast, sparse voxel-based VAEs preserve spatial structure by
converting meshes into sparse voxel grids with feature vectors. XCube [23] replaces the global VecSet
in 3DShape2VecSet [37] with voxel-aligned SDF and normal features, improving detail preservation.
TRELLIS [32] enriches this representation with aggregated DINOv2 [22] features, enabling joint
modeling of 3D geometry and texture. TripoSF [12] further scales the framework to high-resolution
reconstructions (see Supplementary Material for details). Nonetheless, these methods still face the
challenge of modality conversion—mapping point normals or DINOv2 descriptors to continuous
SDF fields remains a key bottleneck.

3 Method

3.1 Preliminaries

Distance fields. A distance field is a scalar function Φ : R3 → R that measures the distance to a
surface. The unsigned distance function (UDF) encodes only magnitude, while the signed distance
function (SDF) adds sign to distinguish interior and exterior:
UDF(x, M) = min ∥x − y∥2 , SDF(x, M) = sign(x, M) · UDF(x, M), (1)
y∈M

where sign(x, M) ∈ {−1, +1} indicates inside/outside status. For non-watertight or non-manifold
meshes, computing sign is non-trivial [1].
Marching cubes and sparse variants. The marching cubes algorithm [18] extracts an isosurface
mesh from a volumetric field Φ by interpolating surface positions across a voxel grid:
S = {x ∈ R3 | Φ(x) = 0}. (2)
Sparse variants [10] operate on narrow bands |Φ(x)| < ϵ to reduce memory usage. We further
introduce a sparse cube grid (V, C), where V is a set of sampled vertices and C contains 8-vertex
cubes. Deformable and weighted dual variants, exemplified by FlexiCubes [26], extend this process
by modeling the surface as a deformed version of the sparse cube grid. Specifically, each grid node ni
in the initial grid is displaced to a new position ni + ∆ni , forming a refined grid (N + ∆N, C, Φ, W )
that better conforms to the implicit surface, where the displacements ∆ni and per-node weights
wi ∈ W are learnable during optimization.

3.2 Sparcubes

Our method, Sparcubes (Sparse Deformable Marching Cubes), reconstructs watertight and geometri-
cally accurate surfaces from arbitrary input meshes through sparse volumetric sampling, coarse-to-fine
sign estimation, and deformable refinement. Unlike dense voxel methods, Sparcubes represents
geometry using a sparse set of voxel cubes, where each cube vertex carries a signed distance value.
This representation enables efficient computation, memory scalability, and supports downstream
surface extraction or direct use in learning-based pipelines.
As shown in Fig. 3, the core pipeline consists of the following steps:
Step 1: Active voxel extraction and UDF computation. We begin by identifying a sparse set of
active voxels within a narrow band around the input surface. These are voxels whose corner vertices

4
ze
ti mi
Op

❌ watertight ✅ watertight
Raw mesh near far inner outer inner outer GT rendering Reconstructed
mesh
raw mesh raw mesh raw mesh SparCubes rendering

Step1: Active voxel and UDF Step2: Flood fill and SDF Step3: Optimize deformation Step4: Rendering refinement

Figure 3: Illustration of our Sparcubes reconstruction pipeline for converting a raw mesh into a
watertight mesh.

lie within a threshold distance ϵ from the mesh M. For each corner vertex x ∈ R3 , we compute the
unsigned distance to the surface:
UDF(x) = min ∥x − y∥2 . (3)
y∈M
This yields a sparse volumetric grid Φ, with distance values concentrated near the surface geometry,
suitable for efficient storage and processing.
Step 2: Flood fill for coarse sign labeling. To convert the unsigned field into a signed distance
function (SDF), we apply a volumetric flood fill algorithm [21] starting from known exterior regions
(e.g., corners of the bounding box). This produces a binary occupancy label T (x) ∈ {0, 1}, indicating
whether point x is inside or outside the shape. We then construct the coarse signed distance field as:
SDF(x) = (1 − 2T (x)) · UDF(x), (4)
which gives a consistent sign assignment under simple labeling and forms the basis for further
refinement.
Step 3: Gradient-based deformation optimization. Instead of explicitly refining a globally
accurate SDF, we directly optimize the geometry of the sparse cube structure to better conform to
the underlying surface. Given an initial volumetric representation (V, C, Φv )—where V denotes the
set of sparse cube corner vertices, C is the set of active cubes, and Φ is the signed distance field
defined at each vertex—we perform a geometric deformation to obtain (V + ∆V, C, Φv ). This results
in a geometry-aware sparse SDF volume that more accurately approximates the zero level set of
the implicit surface. Notably, for points where Φ(x) > 0, the SDF values are often only coarse
approximations, particularly in regions far from the observed surface or near topological ambiguities.
These regions may exhibit significant errors due to poor connectivity, occlusions, or non-watertight
input geometry. As such, rather than refining Φ globally, we optimize the vertex positions ∆V
to implicitly correct the spatial alignment of the zero level set. To improve the accuracy of sign
estimation and geometric alignment, we displace each vertex slightly along the unsigned distance
field gradient:
x′ = x − η · ∇UDF(x), δ(x) ≈ δ(x′ ). (5)
This heuristic captures local curvature and topological cues that are not easily recovered through
purely topological methods such as flood fill. It also allows us to estimate sign information in regions
with ambiguous connectivity, such as thin shells or open surfaces. The final data structure is a sparse
cube grid with SDF values on each corner (V, C, Φv , ∆V ), denoted as Sparcubes.

Step 4: Rendering-based refinement. Sparcubes supports differentiable mesh extraction, enabling

further end-to-end refinement with perceptual signals. When multi-view images, silhouettes, or depth
maps are available, we optionally introduce a differentiable rendering loss to further enhance visual
realism and geometric alignment. Given a reconstructed mesh Mr extracted from the deformed
Sparcubes, we compute a multi-term rendering loss:
Lrender = ∥RD (Mr ) − Iobs
D 2
∥2 + ∥RN (Mr ) − Iobs
N 2
∥2 , (6)
D
where R (Mr ) denotes the rendered depth image of the mesh under known camera parameters, and
RN (Mr ) is the corresponding rendered normal map. The terms Iobs D N
and Iobs are the observed depth
image and the ground truth normal map derived from the input or canonical mesh. Leveraging our
voxel-based data structure, we can easily identify visible voxels and render exclusively within those
regions, greatly reducing computational cost.

5
3.3 Sparconv-VAE

Building on our Sparcubes representation, we develop Sparconv-VAE, a sparse convolution-based

variational autoencoder without high-consuming global attentions, which directly compresses the
Sparcubes parameters {ϕ ∈ Φv , δ ∈ ∆V } into a sparse latent feature z and decodes back to the same
format without any modality conversion.
Architecture and Loss Function. Our encoder is a cascade of sparse residual convolutional blocks
that progressively downsample the input features. At the coarsest resolution, a lightweight local
attention module aggregates neighborhood information. The decoder mirrors this process, interleaving
sparse residual convolutions with self-pruning upsample blocks to restore the original resolution and
predict the Sparcubes parameters {ϕ̂, δ̂}. Each self-pruning block first predicts the occupancy of
the subdivided voxel occupancy mask o, supervised by Locc = BCE(ô, o), then applies a learned
upsampling to refine the voxel features. Because ϕ is sign-sensitive (inside vs. outside), we split
its prediction into a sign branch and a magnitude branch. The sign branch predicts sign(ϕ̂) under
Lϕsign = BCE(sign(ϕ̂), sign(ϕ)), while the magnitude branch regresses ϕ̂ with Lϕmag = ∥ϕ̂, ϕ∥2 .
The deformation vectors are optimized via Lδ = ∥δ̂, δ∥2 . Finally, we regularize the latent distribution
using the VAE’s Kullback–Leibler divergence LKL = KL(q(z|δ, ϕ)∥p(z)), yielding a single cohesive
training objective that jointly minimizes the occupancy, sign, magnitude, deformation and KL
divergence losses:
L = λocc Locc + λsign Lϕsign + λmag Lϕmag + λδ Lδ + λKL LKL . (7)

A detailed description of the module design and the choice of λ are in the Supplementary Material.
Hole filling. Although predicted occupancy may be imperfect and introduce small holes, our
inherently watertight Sparcubes representation allows for straightforward hole detection and filling.
We first identify boundary half-edges. For each face f = {v0 , v1 , v2 }, we emit the directed edges
(v0 → v1 ), (v1 → v2 ), and (v2 → v0 ). By sorting each pair of vertices into undirected edges
and counting occurrences, edges whose undirected counterpart appears only once are marked as
boundary edges. We build an outgoing-edge map keyed by source vertex, and then recover closed
boundary loops by walking each edge until returning to its start. To triangulate each boundary loop
C = {vi }ni=1 , we follow a classic ear-filling pipeline: compute a geometric score at every vertex,
fill the “best” ear, and repeat until all open small boundaries vanish. Specifically, the score on each
pending filled angle Ai is defined by
Ai = atan2(∥di−1→i × di→i+1 ∥2 , − di−1→i · di→i+1 ). (8)
In each iteration, we select the vertex with the smallest Ai (i.e., the sharpest convex ear), form the
triangle (vi−1 , vi , vi+1 ), and update the boundary. Merging all new triangles with the original face
set closes every small hole.

4 Experiments
4.1 Experiment Settings

Implementation details. We implement all Sparcubes as custom CUDA kernels. Following TREL-
LIS [32], we train both the Sparconv-VAE and its latent flow model on 500 K high-quality assets
from Objaverse [8] and Objaverse-XL [7]. The VAE runs on 32 A100 GPUs (batch size 32) with
AdamW (initial LR 1 × 10−4 ) for two days. We then fine-tune the TRELLIS latent flow model on
our VAE latents using 64 A100 GPUs (batch size 64) for ten days. At inference, we sample with a
classifier-free guidance scale of 3.5 over 25 steps, matching TRELLIS settings.
Dataset. Following Dora [2], we curated a VAE test set by selecting the most challenging examples
from the ABO [6] and Objaverse [8] datasets—specifically those exhibiting occluded components,
intricate geometric details, and open surfaces. To avoid any overlap with training data used in prior
work, we additionally assembled a “Wild” dataset with multiple components from online sources that
is disjoint from both ABO and Objaverse. For generation, we also benchmarked our method against
TRELLIS [32] using the wild dataset.
Compared methods. We compare our Sparconv-VAE with the previous sate-of-the art vaes, including
TRELLIS [32], Craftsman [15], Dora [2] and XCubces [23]. Because our diffusion architecture and

6
Table 1: Quantitative comparison of watertight remeshing across the ABO [6], Objaverse [8],
and In-the-Wild datasets. Chamfer Distance (CD, ×104 ), Absolute Normal Consistency (ANC,
×102 ) and F1 score (F1, ×102 ) are reported.

Method ABO [6] Objaverse [8] Wild

CD ↓ ANC ↑ F1 ↑ CD ↓ ANC ↑ F1 ↑ CD ↓ ANC ↑ F1 ↑
Dora-wt-512 [2] 1.16 76.94 83.18 4.25 75.77 61.35 67.2 78.51 64.99
Dora-wt-1024 [2] 1.07 76.94 84.56 4.35 75.04 63.84 63.7 78.77 65.90
Ours-wt-512 1.01 77.75 85.21 3.09 75.35 64.81 0.47 88.58 96.95
Ours-wt-1024 1.00 77.66 85.39 3.01 74.98 65.65 0.46 88.55 97.06

Raw mesh Ours-wt-512 Dora-wt-512 Ours-wt-1024 Dora-wt-1024

Figure 4: Qualitative comparison of watertight remeshing pipelines. We evaluate our Sparcubes

remeshing pipeline against previous widely used one [2, 15, 39], i.e., Dora-wt [2], at voxel resolutions
of 512 and 1024. Compared with the previous method, our Sparcubes preserves crucial components
(e.g., the car wheel) and recovers finer geometric details (e.g., the shelving frame). Our wt-512 result
even outperforms the wt-1024 remeshed by Dora-wt [2]. Best viewed with zoom-in.

model size match those of TRELLIS [32], we evaluate our generation results against it to ensure a
fair comparison.

4.2 Comparation Results

Watertight remeshing results. We evaluate both our watertight remeshing (serving as VAE ground
truth) on the ABO [6], Objaverse [8], and Wild datasets, using Chamfer distance (CD), Absolute
Normal Consistency (ANC), and F1 score (F1) as metrics. As Table 1 shows, our Sparcubes
consistently outperforms prior pipelines [2, 15, 39] (reported under “Dora-wt” [2] for brevity) across
all datasets and metrics. Remarkably, our wt-512 remeshed outputs even exceed the quality of the
wt-1024 results produced by previous methods. Fig. 4 presents qualitative comparisons: our approach
faithfully preserves critical components (e.g., the car wheel) and recovers fine geometric details (e.g.,
shelving frames).
VAE reconstruction results. We further assess our Sparconv-VAE reconstruction against TREL-
LIS [32], Craftsman [15], Dora [2], and XCubes [23] in Table 2. Across the majority of datasets and
metrics, our Sparconv-VAE outperforms these prior methods. Qualitative results in Fig. 5 illustrate
that our VAE faithfully reconstructs complex shapes with fine details (all columns), converts open
surfaces into double-layered watertight meshes (columns 1, 4, and 6), and reveals unvisible hidden
internal structures (column 6).
Generation results. We also valid the effectiveness of our Sparconv-VAE for generation by fine-
tuning the TRELLIS [32] pretrained model. Under the same diffusion architecture and model
size (see Fig. 6), our approach synthesizes watertight 3D shapes with exceptional fidelity and rich
detail—capturing, for example, the sharp ridges of pavilion eaves, the subtle facial features of human
figures, and the intricate structural elements of robots.

7
Table 2: Quantitative comparison of VAE reconstruction across the ABO [6], Objaverse [8], and
In-the-Wild datasets. Chamfer Distance (CD, ×104 ), Absolute Normal Consistency (ANC, ×102 )
and F1 score (F1, ×102 ) are reported.
Method ABO [6] Objaverse [8] Wild
CD ↓ ANC ↑ F1 ↑ CD ↓ ANC ↑ F1 ↑ CD ↓ ANC ↑ F1 ↑
TRELLIS [32] 1.32 75.48 80.59 4.29 74.34 59.27 0.70 85.60 94.04
Craftsman [15] 1.51 77.46 77.47 2.53 77.37 55.28 0.89 87.81 92.28
Dora [2] 1.45 77.21 78.54 4.85 77.19 54.37 68.2 78.79 62.07
XCubes [23] 1.42 65.45 77.57 3.67 61.81 51.65 2.02 62.21 73.74
Ours-512 1.01 78.09 85.33 3.09 75.59 64.92 0.47 88.74 96.97
Ours-1024 1.00 77.69 85.41 3.00 75.10 65.75 0.46 88.70 97.12

Surface Error
GT
Ours-512
Ours-1024
TRELLIS
CraftsMan
Dora
XCube

Figure 5: Qualitative comparison of VAE reconstructions. Our Sparconv-VAE demonstrates

superior performance in reconstructing complex geometries, converting open surfaces into double-
layered watertight meshes, and revealing unvisible internal structures. Best viewed with zoom-in.

8
Condition image Ours (front) Ours (back) TRELLIS (front) TRELLIS (back)

Figure 6: Qualitative comparison of single-image-to-3D generation. Under the same architecture

and model size [32], the generator trained with our Sparconv-VAE yields more detailed reconstruc-
tions than TRELLIS [32]. Best viewed with zoom-in.

4.3 Ablation Studies

Conversion cost. Compared to existing remeshing methods [2, 15, 39], our Sparcubes achieves
substantial speedups: at a 512-voxel resolution, conversion takes only around 15 s—half than [2, 15,
39]—and at 1024-voxel resolution, it completes in around 30 s versus around 90 s by [2, 15, 39].
Moreover, by eliminating modality conversion in our VAE design, we avoid the additional SDF
resampling step, which in earlier pipelines adds roughly 20 s at 512 resolution and about 70 s at
1024 resolution [2, 15, 39]. Detailed performance comparisons can be found in the Supplementary
Material.
Training cost. Thanks to our modality-consistent design, Sparconv-VAE converges less than two
days—about four times faster than previous methods, i.e., sparse voxel–based TRELLIS [32] and
vecset-based approaches [2, 15] each require roughly seven days to train.
VAE with 2D rendering supervision. We also investigate the effect of incorporating 2D rendering
losses into our VAE by using the mask, depth, and normal rendering objectives. We find that adding
2D rendering supervision yields negligible improvement for our Sparconv-VAE. This observation
concurs with Dora [2], where extra 2D rendering losses were likewise deemed unnecessary for a
3D-supervised VAEs. We attribute this to the fact that sufficiently dense 2D renderings encode
essentially the same information as the underlying 3D geometry—each view being a projection of the
same 3D shape.

5 Conclusion

We introduce Sparc3D, a unified framework that tackles two longstanding bottlenecks in 3D gen-
eration pipelines: topology-preserving remeshing and modality-consistent latent encoding. At
its heart, Sparcubes transforms raw, non-watertight meshes into fully watertight surfaces at high
resolution—retaining fine details and small components. Building on this, Sparconv-VAE, a sparse-
convolutional variational autoencoder with a self-pruning decoder, directly compresses and recon-
structs our sparse representation without resorting to heavyweight attention, achieving state-of-the-art
reconstruction fidelity and faster convergence. When coupled with latent diffusion (e.g., TRELLIS),
Sparc3D elevates generation resolution for downstream 3D asset synthesis. Together, these contribu-

9
tions establish a robust, scalable foundation for high-fidelity 3D generation in both virtual (AR/VR,
robotics simulation) and physical (3D printing) domains.
Limitations. While our Sparcubes remeshing algorithm excels at preserving fine geometry and
exterior components, it shares several drawbacks common to prior methods. First, it does not retain
any original texture information. Second, when applied to fully closed meshes with internal structures,
hidden elements will be discarded during the remeshing process.

References
[1] M. Atzmon and Y. Lipman. Sal: Sign agnostic learning of shapes from raw data. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, pages 2565–2574, 2020. 4

[2] R. Chen, J. Zhang, Y. Liang, G. Luo, W. Li, J. Liu, X. Li, X. Long, J. Feng, and P. Tan. Dora: Sampling
and benchmarking for 3d shape variational auto-encoders. arXiv preprint arXiv:2412.17808, 2024. 2, 3, 4,
6, 7, 8, 9

[3] T. Chen, C. Ding, S. Zhang, C. Yu, Y. Zang, Z. Li, S. Peng, and L. Sun. Rapid 3d model generation
with intuitive 3d input. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 12554–12564, 2024. 2

[4] Y. Chen, T. He, D. Huang, W. Ye, S. Chen, J. Tang, X. Chen, Z. Cai, L. Yang, G. Yu, et al. Meshanything:
Artist-created mesh generation with autoregressive transformers. arXiv preprint arXiv:2406.10163, 2024.
3

[5] Y. Chen, Y. Wang, Y. Luo, Z. Wang, Z. Chen, J. Zhu, C. Zhang, and G. Lin. Meshanything v2: Artist-created
mesh generation with adjacent mesh tokenization. arXiv preprint arXiv:2408.02555, 2024. 3

[6] J. Collins, S. Goel, K. Deng, A. Luthra, L. Xu, E. Gundogdu, X. Zhang, T. F. Y. Vicente, T. Dideriksen,
H. Arora, et al. Abo: Dataset and benchmarks for real-world 3d object understanding. In Proceedings of
the IEEE/CVF conference on computer vision and pattern recognition, pages 21126–21136, 2022. 6, 7, 8

[7] M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V. Voleti, S. Y.
Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects. Advances in Neural Information Processing
Systems, 36:35799–35813, 2023. 6

[8] M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kemb-
havi, and A. Farhadi. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 13142–13153, 2023. 6, 7, 8

[9] E. Gómez-Déniz. Another generalization of the geometric distribution. Test, 19:399–415, 2010. 3

[10] R. Hanocka, A. Hertz, N. Fish, R. Giryes, S. Fleishman, and D. Cohen-Or. Point2mesh: A self-prior for
deformable meshes. In ACM Transactions on Graphics (TOG), volume 39, 2020. 4

[11] Z. Hao, D. W. Romero, T.-Y. Lin, and M.-Y. Liu. Meshtron: High-fidelity, artist-like 3d mesh generation at
scale. arXiv preprint arXiv:2412.09548, 2024. 3

[12] X. He, Z.-X. Zou, C.-H. Chen, Y.-C. Guo, D. Liang, C. Yuan, W. Ouyang, Y.-P. Cao, and Y. Li. Sparseflex:
High-resolution and arbitrary-topology 3d shape modeling. arXiv preprint arXiv:2503.21732, 2025. 4

[13] T. Ju, F. Losasso, S. Schaefer, and J. Warren. Dual contouring of hermite data. In Proceedings of the 29th
annual conference on Computer graphics and interactive techniques, pages 339–346, 2002. 3

[14] W. Li. Synthesizing 3d vr sketch using generative adversarial neural network. In Proceedings of the 2023
7th International Conference on Big Data and Internet of Things, pages 122–128, 2023. 2

[15] W. Li, J. Liu, R. Chen, Y. Liang, X. Chen, P. Tan, and X. Long. Craftsman: High-fidelity mesh generation
with 3d native generation and interactive geometry refiner. arXiv preprint arXiv:2405.14979, 2024. 2, 3, 6,
7, 8, 9

[16] X. Li, Q. Zhang, D. Kang, W. Cheng, Y. Gao, J. Zhang, Z. Liang, J. Liao, Y.-P. Cao, and Y. Shan. Advances
in 3d generation: A survey. arXiv preprint arXiv:2401.17807, 2024. 2

[17] Y. Li, Z.-X. Zou, Z. Liu, D. Wang, Y. Liang, Z. Yu, X. Liu, Y.-C. Guo, D. Liang, W. Ouyang, et al. Triposg:
High-fidelity 3d shape synthesis using large-scale rectified flow models. arXiv preprint arXiv:2502.06608,
2025. 4

10
[18] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In
Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH
’87, page 163–169, New York, NY, USA, 1987. Association for Computing Machinery. 2, 3, 4

[19] L. Luo, P. N. Chowdhury, T. Xiang, Y.-Z. Song, and Y. Gryaditskaya. 3d vr sketch guided 3d shape
prototyping and exploration. In Proceedings of the IEEE/CVF International Conference on Computer
Vision, pages 9267–9276, 2023. 2

[20] S. Luo and W. Hu. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, pages 2837–2845, 2021. 3

[21] S. Mauch. Efficient algorithms for solving static Hamilton-Jacobi equations. PhD thesis, California
Institute of Technology, 2000. 5

[22] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza,

F. Massa, A. El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint
arXiv:2304.07193, 2023. 2, 4

[23] X. Ren, J. Huang, X. Zeng, K. Museth, S. Fidler, and F. Williams. Xcube: Large-scale 3d generative
modeling using sparse voxel hierarchies. In Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pages 4209–4219, 2024. 2, 3, 4, 6, 7, 8

[24] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with
latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pages 10684–10695, 2022. 2

[25] S. Schaefer and J. Warren. Dual marching cubes: Primal contouring of dual grids. In 12th Pacific
Conference on Computer Graphics and Applications, 2004. PG 2004. Proceedings., pages 70–76. IEEE,
2004. 2, 3

[26] T. Shen, J. Munkberg, J. Hasselgren, K. Yin, Z. Wang, W. Chen, Z. Gojcic, S. Fidler, N. Sharp, and J. Gao.
Flexible isosurface extraction for gradient-based mesh optimization. ACM Transactions on Graphics
(TOG), 42(4):1–16, 2023. 3, 4

[27] Y. Siddiqui, A. Alliegro, A. Artemov, T. Tommasi, D. Sirigatti, V. Rosov, A. Dai, and M. Nießner. Meshgpt:
Generating triangle meshes with decoder-only transformers. In Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pages 19615–19625, 2024. 3

[28] D. Tochilkin, D. Pankratz, Z. Liu, Z. Huang, A. Letts, Y. Li, D. Liang, C. Laforte, V. Jampani, and Y.-P.
Cao. Triposr: Fast 3d object reconstruction from a single image. arXiv preprint arXiv:2403.02151, 2024.
2, 3

[29] T. Wang, B. Zhang, T. Zhang, S. Gu, J. Bao, T. Baltrusaitis, J. Shen, D. Chen, F. Wen, Q. Chen, et al. Rodin:
A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 4563–4573, 2023. 2

[30] Z. Wang, J. Lorraine, Y. Wang, H. Su, J. Zhu, S. Fidler, and X. Zeng. Llama-mesh: Unifying 3d mesh
generation with language models. arXiv preprint arXiv:2411.09595, 2024. 3

[31] L. Wu, D. Wang, C. Gong, X. Liu, Y. Xiong, R. Ranjan, R. Krishnamoorthi, V. Chandra, and Q. Liu. Fast
point cloud generation with straight flows. In Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pages 9445–9454, 2023. 3

[32] J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang. Structured 3d latents
for scalable and versatile 3d generation. arXiv preprint arXiv:2412.01506, 2024. 2, 3, 4, 6, 7, 8, 9

[33] G. Yang, X. Huang, Z. Hao, M.-Y. Liu, S. Belongie, and B. Hariharan. Pointflow: 3d point cloud generation
with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer
vision, pages 4541–4550, 2019. 3

[34] Y. Yang, B. Jia, P. Zhi, and S. Huang. Physcene: Physically interactable 3d scene synthesis for embodied
ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages
16262–16272, 2024. 2

[35] Y. Yang, F.-Y. Sun, L. Weihs, E. VanderBilt, A. Herrasti, W. Han, J. Wu, N. Haber, R. Krishna, L. Liu, et al.
Holodeck: Language guided generation of 3d embodied ai environments. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 16227–16237, 2024. 2

11
[36] L. Yushi, S. Zhou, Z. Lyu, F. Hong, S. Yang, B. Dai, X. Pan, and C. C. Loy. Gaussiananything: Interactive
point cloud flow matching for 3d generation. In The Thirteenth International Conference on Learning
Representations, 2025. 3

[37] B. Zhang, J. Tang, M. Niessner, and P. Wonka. 3dshape2vecset: A 3d shape representation for neural fields
and generative diffusion models. ACM Transactions On Graphics (TOG), 42(4):1–16, 2023. 4

[38] L. Zhang, Z. Wang, Q. Zhang, Q. Qiu, A. Pang, H. Jiang, W. Yang, L. Xu, and J. Yu. Clay: A controllable
large-scale generative model for creating high-quality 3d assets. ACM Transactions on Graphics (TOG),
43(4):1–20, 2024. 3, 4

[39] Z. Zhao, Z. Lai, Q. Lin, Y. Zhao, H. Liu, S. Yang, Y. Feng, M. Yang, S. Zhang, X. Yang, et al. Hun-
yuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation. arXiv preprint
arXiv:2501.12202, 2025. 2, 3, 4, 7, 9

Machine Learning For Wireless Networks With Artificial Intelligence: A Tutorial On Neural Networks
No ratings yet
Machine Learning For Wireless Networks With Artificial Intelligence: A Tutorial On Neural Networks
98 pages
ABB Review 2 2021 EN 72dpi
No ratings yet
ABB Review 2 2021 EN 72dpi
84 pages
Electronics 07 00235 PDF
No ratings yet
Electronics 07 00235 PDF
15 pages
Satellite Image Classification With Deep Learning Survey
No ratings yet
Satellite Image Classification With Deep Learning Survey
5 pages
DL Deeplearningbook PDF
No ratings yet
DL Deeplearningbook PDF
10 pages
ShapeNet: An Information-Rich 3D Model Repository
No ratings yet
ShapeNet: An Information-Rich 3D Model Repository
11 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
Large-Scale Multi-Class and Hierarchical Product Categorization For An E-Commerce Giant
No ratings yet
Large-Scale Multi-Class and Hierarchical Product Categorization For An E-Commerce Giant
11 pages
Atlasnet: A Papier-M Ach E Approach To Learning 3D Surface Generation
No ratings yet
Atlasnet: A Papier-M Ach E Approach To Learning 3D Surface Generation
16 pages
Neural Network Concepts Quiz
No ratings yet
Neural Network Concepts Quiz
152 pages
Cvxnet:: Learnable Convex Decomposition
No ratings yet
Cvxnet:: Learnable Convex Decomposition
14 pages
Scancomplete: Large-Scale Scene Completion and Semantic Segmentation For 3D Scans
No ratings yet
Scancomplete: Large-Scale Scene Completion and Semantic Segmentation For 3D Scans
15 pages
A Papier-M Ach e Approach To Learning 3D Surface Generation
No ratings yet
A Papier-M Ach e Approach To Learning 3D Surface Generation
9 pages
3D Object Detection with Patch Refinement
No ratings yet
3D Object Detection with Patch Refinement
10 pages
3D-Consistent Image Generation
No ratings yet
3D-Consistent Image Generation
17 pages
Watch This Space: Securing Satellite Communication Through Resilient Transmitter Fingerprinting
No ratings yet
Watch This Space: Securing Satellite Communication Through Resilient Transmitter Fingerprinting
14 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
On The Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural Networks
No ratings yet
On The Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural Networks
61 pages
Pix2Vox Context-Aware 3D Reconstruction From Single and Multi-View Images
No ratings yet
Pix2Vox Context-Aware 3D Reconstruction From Single and Multi-View Images
9 pages
Learning Efficient Point Cloud Generation For Dense 3D Object Reconstruction
No ratings yet
Learning Efficient Point Cloud Generation For Dense 3D Object Reconstruction
8 pages
Nerf Paper IA 3D
No ratings yet
Nerf Paper IA 3D
8 pages
Deep Learning & Feature Learning
No ratings yet
Deep Learning & Feature Learning
2 pages
Neural Watertight Manifold Meshes
No ratings yet
Neural Watertight Manifold Meshes
16 pages
Artificial Neural Network Notes
No ratings yet
Artificial Neural Network Notes
9 pages
2024 - An Image Is Worth 32 Tokens For Reconstruction and Generation - Yu Et Al
No ratings yet
2024 - An Image Is Worth 32 Tokens For Reconstruction and Generation - Yu Et Al
20 pages
Advanced Topics in Machine Learning
No ratings yet
Advanced Topics in Machine Learning
2 pages
Park DeepSDF Learning Continuous Signed Distance Functions For Shape Representation CVPR 2019 Paper
No ratings yet
Park DeepSDF Learning Continuous Signed Distance Functions For Shape Representation CVPR 2019 Paper
10 pages
Improved MeshCNN for 3D Shape Analysis
No ratings yet
Improved MeshCNN for 3D Shape Analysis
6 pages
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
No ratings yet
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
11 pages
Be Cse Aiml Cse DS Ai DS Ai ML de
No ratings yet
Be Cse Aiml Cse DS Ai DS Ai ML de
143 pages
SurfaceNet: 3D Neural Network for MVS
No ratings yet
SurfaceNet: 3D Neural Network for MVS
9 pages
Real-Time 3D Capture with RGBD Sensors
No ratings yet
Real-Time 3D Capture with RGBD Sensors
11 pages
AI and Machine Learning Glossary
No ratings yet
AI and Machine Learning Glossary
15 pages
Lecture 16 Hao
No ratings yet
Lecture 16 Hao
56 pages
What You See Is What You GAN: Rendering Every Pixel For High-Fidelity Geometry in 3D GANs
No ratings yet
What You See Is What You GAN: Rendering Every Pixel For High-Fidelity Geometry in 3D GANs
27 pages
3D Reconstruction 2021
No ratings yet
3D Reconstruction 2021
27 pages
2d 3d Reconstruction
No ratings yet
2d 3d Reconstruction
11 pages
Questions Sep Batch
No ratings yet
Questions Sep Batch
11 pages
3D Reconstruction for Researchers
No ratings yet
3D Reconstruction for Researchers
23 pages
3D Reconstruction for Researchers
No ratings yet
3D Reconstruction for Researchers
74 pages
A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems
No ratings yet
A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems
8 pages
Fast Radiance Fields for Experts
No ratings yet
Fast Radiance Fields for Experts
21 pages
Self-Supervised CAD Reconstruction
No ratings yet
Self-Supervised CAD Reconstruction
11 pages
ROM Paper
No ratings yet
ROM Paper
25 pages
Pixel2Mesh++: Multi-View 3D Mesh Generation Via Deformation
No ratings yet
Pixel2Mesh++: Multi-View 3D Mesh Generation Via Deformation
17 pages
Week 1 Sol Merged
No ratings yet
Week 1 Sol Merged
39 pages
Yariv Mosaic-SDF For 3D Generative Models CVPR 2024 Paper
No ratings yet
Yariv Mosaic-SDF For 3D Generative Models CVPR 2024 Paper
10 pages
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
No ratings yet
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
11 pages
2023 - One-2-3-45 - Liu Et Al
No ratings yet
2023 - One-2-3-45 - Liu Et Al
19 pages
PSDF Fusion: Probabilistic Signed Distance Function For On-The-Fly 3D Data Fusion and Scene Reconstruction
No ratings yet
PSDF Fusion: Probabilistic Signed Distance Function For On-The-Fly 3D Data Fusion and Scene Reconstruction
17 pages
Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah Download PDF
No ratings yet
Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah Download PDF
41 pages
Mathematics: Voxel-Based 3D Object Reconstruction From Single 2D Image Using Variational Autoencoders
No ratings yet
Mathematics: Voxel-Based 3D Object Reconstruction From Single 2D Image Using Variational Autoencoders
11 pages
Aleesa2020 Article ReviewOfIntrusionDetectionSyst
No ratings yet
Aleesa2020 Article ReviewOfIntrusionDetectionSyst
32 pages
Sparse Neu S
No ratings yet
Sparse Neu S
22 pages
Neuroscience Spike Sorting Advances
No ratings yet
Neuroscience Spike Sorting Advances
4 pages
3D Generative Models A Survey
No ratings yet
3D Generative Models A Survey
21 pages
Ijepes D 25 00018
No ratings yet
Ijepes D 25 00018
45 pages
Sparse4D: Efficient Multi-view 3D Detection
No ratings yet
Sparse4D: Efficient Multi-view 3D Detection
10 pages
Empowering Object Detection in Dynamic Environments With AI-Driven MIMO Radar Technology
No ratings yet
Empowering Object Detection in Dynamic Environments With AI-Driven MIMO Radar Technology
6 pages
A Survey of Deep Learning Audio Generation Methods
No ratings yet
A Survey of Deep Learning Audio Generation Methods
14 pages
Ren XCube Large-Scale 3D Generative Modeling Using Sparse Voxel Hierarchies CVPR 2024 Paper
No ratings yet
Ren XCube Large-Scale 3D Generative Modeling Using Sparse Voxel Hierarchies CVPR 2024 Paper
11 pages
2211.13220: TetraDiffusion: Tetrahedral Diffusion Models For 3D Shape Generation
No ratings yet
2211.13220: TetraDiffusion: Tetrahedral Diffusion Models For 3D Shape Generation
39 pages
3D-Adapter: Geometry-Consistent Multi-View Diffusion
No ratings yet
3D-Adapter: Geometry-Consistent Multi-View Diffusion
22 pages
Autodecoding Latent 3D Diffusion Models
No ratings yet
Autodecoding Latent 3D Diffusion Models
22 pages
Chen 等 - MeshAnything Artist-Created Mesh Generation with Autoregressive Transformers
No ratings yet
Chen 等 - MeshAnything Artist-Created Mesh Generation with Autoregressive Transformers
16 pages
Wang Sparse Convolutional Networks For Surface Reconstruction From Noisy Point Clouds WACV 2024 Paper
No ratings yet
Wang Sparse Convolutional Networks For Surface Reconstruction From Noisy Point Clouds WACV 2024 Paper
10 pages
Applied Neural Networks With Tensorflow 2 Api Oriented Deep Learning With Python 1St Edition Orhan Gazi Yalcın Yalçın Orhan
100% (4)
Applied Neural Networks With Tensorflow 2 Api Oriented Deep Learning With Python 1St Edition Orhan Gazi Yalcın Yalçın Orhan
69 pages
Generative Decoder For MAE Pre-Training On LiDAR Point Clouds
No ratings yet
Generative Decoder For MAE Pre-Training On LiDAR Point Clouds
14 pages
Text-Guided Sparse Voxel Pruning For Efficient 3D Visual Grounding
No ratings yet
Text-Guided Sparse Voxel Pruning For Efficient 3D Visual Grounding
14 pages
3D Reconstruction From Multiview 2D Images
No ratings yet
3D Reconstruction From Multiview 2D Images
21 pages
Lecture 2
No ratings yet
Lecture 2
61 pages
Programmatic Cad 3d Mesh Deep Learning
No ratings yet
Programmatic Cad 3d Mesh Deep Learning
53 pages
CV - V Unit Notes
No ratings yet
CV - V Unit Notes
15 pages
Generative Ai With Python Harnessing The Power of Machine Learning and Deep Learning To Build Creative and Intelligent Systems
100% (3)
Generative Ai With Python Harnessing The Power of Machine Learning and Deep Learning To Build Creative and Intelligent Systems
239 pages
Unified Vision For A Sustainable Future A Multidisciplinary Approach Towards The Sustainable Development Goals 1st Edition Mir Sayed Shah Danish Download
100% (1)
Unified Vision For A Sustainable Future A Multidisciplinary Approach Towards The Sustainable Development Goals 1st Edition Mir Sayed Shah Danish Download
60 pages
MVD: M - D 3DG: Ream Ulti View Iffusion For Eneration
No ratings yet
MVD: M - D 3DG: Ream Ulti View Iffusion For Eneration
21 pages
Shap E: Generating Conditional 3D Implicit Functions: Heewoo Jun Alex Nichol
No ratings yet
Shap E: Generating Conditional 3D Implicit Functions: Heewoo Jun Alex Nichol
23 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
Hunyuan3D 1.0: A Unified Framework For Text-to-3D and Image-to-3D Generation
No ratings yet
Hunyuan3D 1.0: A Unified Framework For Text-to-3D and Image-to-3D Generation
11 pages
Deep Marching Tetrahedra A Hybrid Representation For High Resolution 3d Shape Synthesis
No ratings yet
Deep Marching Tetrahedra A Hybrid Representation For High Resolution 3d Shape Synthesis
15 pages
Paper 5
No ratings yet
Paper 5
11 pages
Hyper3D: Efficient 3D Representation Via Hybrid Triplane and Octree Feature For Enhanced 3D Shape Variational Auto-Encoders
No ratings yet
Hyper3D: Efficient 3D Representation Via Hybrid Triplane and Octree Feature For Enhanced 3D Shape Variational Auto-Encoders
15 pages
SparC Sparse Representation and Construction For H
No ratings yet
SparC Sparse Representation and Construction For H
12 pages
Slide 3DP 12 3D Data Representation
No ratings yet
Slide 3DP 12 3D Data Representation
53 pages
DeepSDF Learning Continuous Signed Distance Funct
No ratings yet
DeepSDF Learning Continuous Signed Distance Funct
10 pages
Kaehler 16 Hierarchical
No ratings yet
Kaehler 16 Hierarchical
6 pages
ShapeFlex Point Cloud Generation Through Primitive Deformations Using Diffusion Models
No ratings yet
ShapeFlex Point Cloud Generation Through Primitive Deformations Using Diffusion Models
17 pages

Sparc 3 D

Uploaded by

Sparc 3 D

Uploaded by

Sparc3D: Sparse Representation and Construction for

High-Resolution 3D Shapes Modeling

High-fidelity 3D object synthesis remains significantly more challenging than 2D

Preprint. Under review.

Sample UDF to Marching Keep max

Our main contributions can be summarized as:

2.1 3D Shape Representation and Generation

Step 4: Rendering-based refinement. Sparcubes supports differentiable mesh extraction, enabling

Building on our Sparcubes representation, we develop Sparconv-VAE, a sparse convolution-based

Method ABO [6] Objaverse [8] Wild

Raw mesh Ours-wt-512 Dora-wt-512 Ours-wt-1024 Dora-wt-1024

Figure 4: Qualitative comparison of watertight remeshing pipelines. We evaluate our Sparcubes

4.2 Comparation Results

Figure 5: Qualitative comparison of VAE reconstructions. Our Sparconv-VAE demonstrates

Figure 6: Qualitative comparison of single-image-to-3D generation. Under the same architecture

4.3 Ablation Studies

[22] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza,

You might also like