Model

pi-GAN 主要结构

1 pi-GAN 的overview

image-20231029204458327

映射网络是一个简单的 ReLU MLP,它将噪声向量 z 作为输入并输出频率 γi 和相移 βi,它调节 SIREN 的每一层。

image-20231029211833414

2 pi-GAN代码主要函数

image-20231126194019676

3 渐进式增长辨别器overview

image-20231120165148746

image-20231120165057947

pi-GAN 主要的网络

1 siren:

  1. FiLMed-SIREN network (8个隐藏层) backbone

(network): ModuleList(

(0): FiLMLayer(

(layer): Linear(in_features=3, out_features=256, bias=True))

(1): FiLMLayer(

(layer): Linear(in_features=256, out_features=256, bias=True))

(2): FiLMLayer(

(layer): Linear(in_features=256, out_features=256, bias=True))

(3): FiLMLayer(

(layer): Linear(in_features=256, out_features=256, bias=True))

(4): FiLMLayer(

(layer): Linear(in_features=256, out_features=256, bias=True))

(5): FiLMLayer(

(layer): Linear(in_features=256, out_features=256, bias=True))

(6): FiLMLayer(

(layer): Linear(in_features=256, out_features=256, bias=True))

(7): FiLMLayer(

(layer): Linear(in_features=256, out_features=256, bias=True))

)

  1. 密度 颜色 输出层

(final_layer): 密度输出层

Linear(in_features=256, out_features=1, bias=True)

(color_layer_sine): 颜色转换 + ray direction d

FiLMLayer(

(layer): Linear(in_features=259, out_features=256, bias=True)

)

(color_layer_linear): 颜色输出层

Sequential(

(0): Linear(in_features=256, out_features=3, bias=True)

(1): Sigmoid()

)

  1. Mapping network

(mapping_network):

CustomMappingNetwork(

(network):

Sequential(

(0): Linear(in_features=256, out_features=256, bias=True)

(1): LeakyReLU(negative_slope=0.2, inplace=True)

(2): Linear(in_features=256, out_features=256, bias=True)

(3): LeakyReLU(negative_slope=0.2, inplace=True)

(4): Linear(in_features=256, out_features=256, bias=True)

(5): LeakyReLU(negative_slope=0.2, inplace=True)

(6): Linear(in_features=256, out_features=4608, bias=True)

))

2 Novel View Synthesis Details

  1. 冻结隐式表示的参数,为每层 MLP 寻找合适的 \(\gamma_i\)\(\beta_i\) ,生成辐射场,渲染出与目标图像最佳匹配。
  2. 计算 1w 次随机噪声向量输入的 \(\gamma\)\(\beta\) 的平均值,然后启动梯度惩罚去最小化 MES 图像重建 loss。

EG3D的overview

image-20231029204842971

image-20231029204854626

FENERF的主要结构

1 Overview

image-20231122210056844

image-20231122213829176

2 FENeRF generator architecture

image-20231122213913969

FPN(Feature Pyramid Network)

​ 发表于CVPR2017,用于目标检测,contribution是通过lateral connection让高层高语义信息和低层高分辨率信息特征融合。

Modify

  1. siren.py[68-71],增加mapping network的隐藏层层数 3层 -> 4层
  2. siren.py[364-443],增加3D特征向量到颜色输出层。

image-20231126193204895

DeBug ing···

Code Architecture

1 discriminators.py

  1. ProgressiveDiscriminator() -> ResidualCoordConvBlock(inplanesm, planes, downsample) -> CoordConv(inplanes, planes)

ProgressiveDiscriminator:渐进式增长判别器

ResidualCoordConvBlock:渐进式增长卷积层

CoordConv:带坐标信息的卷积层计算

ResidualCoordConvBlock 网络层

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
0:  
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(18, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(34, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)
1:
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(130, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(258, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)
2:
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(66, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(130, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)
3:
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(130, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(258, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)
4:
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(258, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(402, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)
5:
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(402, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(402, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)
6:
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(402, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(402, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)
7:
(network): Sequential(
(0): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(402, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): CoordConv(
(addcoords): AddCoords()
(conv): Conv2d(402, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): LeakyReLU(negative_slope=0.2, inplace=True)
)

AdapterBlock 网络结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
(0): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 16, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(1): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(2): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(3): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 128, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(4): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 256, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(5): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 400, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(6): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 400, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(7): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 400, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)
(8): AdapterBlock(
(model): Sequential(
(0): Conv2d(3, 400, kernel_size=(1, 1), stride=(1, 1))
(1): LeakyReLU(negative_slope=0.2)
)
)

from_RGB()的最后一层: (final_layer): Conv2d(400, 259, kernel_size=(2, 2), stride=(1, 1))

2 siren.py

  1. TALLSIREN: forward -> forward_with_frequencies_phase_shifts

forward input:

position x: 输入的位置信息 [img_size * img_size * num_steps]

z: mapping network 的噪声输入 [256]

ray_directions: 光线方向


Model
http://seulqxq.top/posts/47195/
作者
SeulQxQ
发布于
2023年10月29日
许可协议