Paper	Model	Input	Parameter/Pnum	GPU
DiT-3D	Diffusion Transformers	Voxelized PC
PointFlow	AE flow-based	PointCloud	1.61M
FlowGAN	GAN flow-based	Single Image	N = 2500	A40 45GB
BuilDiff	Diffusion models	Single Image	1024 to 4096	A40 45GB
CCD-3DR	CDPM	Single Image	8192	3090Ti 24GB
SG-GAN	SG-GAN	Single Image
HaP	Diffusion+SMPL+DepthEstimation	Single Image	10000	4x3090Ti

Review

Explainability of Vision Transformers: A Comprehensive Review and New Perspectives 关于视觉中使用 Transformer 的 Review

Generative approach(Img2PC)

Network Framework

GAN(generative adversarial networks) Wasserstein GAN (WGAN) 解决本质问题 | 莫烦Python
VAE(variational auto-encoders)
Auto-regressive models
Normalized flows(flow-based models), PointFlow
- 相当于多个生成器，并且可逆
- Flow-based Generative Model - YouTube
- Flow-based Generative Model 笔记整理 - 知乎 (zhihu.com)
- Normalization Flow (标准化流) 总结 - 知乎 (zhihu.com)
DiT(Diffusion Transformers), DiT-3D

Flow-based：

Loss

点云倒角距离 CD ↓
$\begin{aligned}\mathcal{L}_{CD}&=\sum_{y’\in Y’}min_{y\in Y}||y’-y||_2^2+\sum_{y\in Y}min_{y’\in Y’}||y-y’||_2^2,\end{aligned}$

推土距离 EMD (Earth Mover’s distance)↓
$\mathcal{L}_{EMD}=min_{\phi:Y\rightarrow Y^{\prime}}\sum_{x\in Y}||x-\phi(x)||_{2}$ , φ indicates a parameter of bijection.

Diffusion Models

DMV3D

DMV3D: Denoising Multi-View Diffusion Using 3D Large Reconstruction Mode (justimyhxu.github.io)

Diffusion Model + Triplane NeRF + Multi-view Image input

DiffuStereo

DiffuStereo Project Page (liuyebin.com)

多视图
DoubleField 粗网格估计 + Diffusion 生成高质量 Disparity Flow 和 Depth + ICP 配准（点云融合）

Human as Points(HaP)

yztang4/HaP (github.com)
Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images(arxiv.org)
Human as Points—— Explicit Point-based 3D Human Reconstruction from Single-view RGB Images.pdf (readpaper.com)

深度估计+SMPL 估计得到两个稀疏点云，输入进 Diffusion Model 进行精细化生成

HumanNorm

BuilDiff

预计 2023.11 release
BuilDiff论文阅读笔记
 weiyao1996/BuilDiff: BuilDiff: 3D Building Shape Generation using Single-Image Conditional Point Cloud Diffusion Models (github.com)
BuilDiff: 3D Building Shape Generation using Single-Image Conditional Point Cloud Diffusion Models (readpaper.com)

RODIN

RODIN Diffusion (microsoft.com)
Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion (readpaper.com)

微软大数据集 + Diffusion + NeRF Tri-plane

Single-Image 3D Human Digitization with Shape-Guided Diffusion

利用针对一般图像合成任务预先训练的高容量二维扩散模型作为穿着人类的外观先验
通过以轮廓和表面法线为条件的形状引导扩散来修复缺失区域

CCD-3DR

CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D Reconstruction (readpaper.com)

$PC^2$

PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction (lukemelas.github.io)
$PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction (readpaper.com)

相机位姿???

DiT-3D

DiT-3D论文阅读笔记
 DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation (readpaper.com)

Make-It-3D

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

DreamGaussian

Gaussian Splatting + Diffusion

Wonder3D

Wonder3D: Single Image to 3D using Cross-Domain Diffusion (xxlong.site)

Diffusion 一致性出图 + Geometry Fusion (novel geometric-aware optimization scheme)

CLIP

openai/CLIP: CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image (github.com)

对比语言-图片预训练模型

GenNeRF

Generative Neural Fields by Mixtures of Neural Implicit Functions (arxiv.org)

LDM3D-VR

LDM3D-VR: Latent Diffusion Model for 3D VR (arxiv.org)
视频演示T.LY URL Shortener

从给定的文本提示生成图像和深度图数据，此外开发了一个 DepthFusion 的应用程序，它使用生成的 RGB 图像和深度图来使用 TouchDesigner 创建身临其境的交互式 360°视图体验

Control3D

Control3D: Towards Controllable Text-to-3D Generation

草图+文本条件生成 3D

Etc

Colored PC 3D Colored Shape Reconstruction from a Single RGB Image through Diffusion (readpaper.com)

GAN

3DHumanGAN

3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping

多视图一致的人体照片生成

SE-MD

SE-MD: A Single-encoder multiple-decoder deep network for point cloud generation from 2D images. (readpaper.com)
单编码器—>多解码器
每个解码器生成某些固定视点，然后融合所有视点来生成密集的点云

NFs (Normalizing Flows)

PointFlow

stevenygd/PointFlow: PointFlow : 3D Point Cloud Generation with Continuous Normalizing Flows (github.com)
PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows (readpaper.com)

Other

Automatic Reverse Engineering

Automatic Reverse Engineering: Creating computer-aided design (CAD) models from multi-view images (readpaper.com)

多视图图像生成 CAD 命令序列
局限性：

CAD 序列的长度仍然局限于 60 个命令，因此只支持相对简单的对象
表示仅限于平面和圆柱表面，而许多现实世界的对象可能包括更灵活的三角形网格或样条表示

SimIPU

zhyever/SimIPU: [AAAI 2021] Official Implementation of “SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations” (github. Com)

雷达点云+图片

3DRIMR

3DRIMR: 3D Reconstruction and Imaging via mmWave Radar based on Deep Learning. (readpaper.com)
MmWave Radar + GAN

TransHuman

TransHuman论文阅读笔记
 TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering (pansanity666.github.io)

ImplicitFunction(NeRF)

Nvdiffrec

网格优化比 mlp 优化难，速度慢 from NeRF wechat

还有一个 nvdiffrcmc，效果可能好一些 Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising
后续还有个 NeuManifold: Neural Watertight Manifold Reconstruction with Efficient and High-Quality Rendering Support，应该比 nvdiffrec 要好

NVlabs/nvdiffrec: Official code for the CVPR 2022 (oral) paper “Extracting Triangular 3D Models, Materials, and Lighting From Images”. (github.com)