# OpenPCDet

Overall Framework

  • OpenPCDet官网上给了个整体的模型框架,非常清楚:
  • 其中,一些典型模型的组成如下所示:

Dataset

PointPillar实例

模型初始化

  • 模型PointPillar在OpenPCDet中是个detector,似乎有对应的backbone(实际的网络定义)
  • detector的基类是Detector3DTemplate,主要靠里面的build_networks方法来实现模型定义。模型中包括的模块有:
    • VFE(short for Voxel Feature Encoder):
      • 甚至找到了个专门介绍VFE模块的博客。看起来是负责做由point cloud到pillar的encoding过程;
      • 定义在./pcdet/models/backbones_3d/vfe/pillar_vfe.py
      • VFE的一些参数设置:
      • VFE的功能主要是靠PFNLayer实现的,定义也在同一个文件中;
        • PointPillar的PFNLayer就是一个Linear(10, 64) + 一个BN1d:
        • 初始化函数里除了定义PFNLayer就是算几个offset;
    • backbone_3d
      • 这个是空的(of course…
    • map_to_bev_module:
      • 定义在./pcdet/models/backbones_2d/map_to_bev/pointpillar_scatter.py;具体是PointPillarScatter
      • 看起来是生成pseudo image的模块;
      • grid size(432, 496, 1)分别对应于self.nx, self.ny, self.nz.
    • pef:
      • None
    • backbone_2d
      • 定义在./pcdet/models/backbones_2d/base_bev_backbone.py;具体是BaseBEVBackbone
      • 正经的模型定义;具体参数如下所示:
      • 分成3个阶段(enc-dec对称的,一共6个block),每个stage有1(含有stride的) + num_layer[idx]个层(更具体的是,ZeroPad-Conv-BN-ReLU + num_layer[idx] * (Conv2d-BN-ReLU));
      • in case I don't remember:
      • deblocks则相对更thin一些,只有一个deconv;
      • self.blocks & self.deblocks的全景图(nmd真的有必要放吗?):

      • forward函数:
              def forward(self, data_dict):
                  """
                  Args:
                      data_dict:
                          spatial_features
                  Returns:
                  """
                  spatial_features = data_dict['spatial_features']
                  ups = []
                  ret_dict = {}
                  x = spatial_features
                  for i in range(len(self.blocks)):
                      x = self.blocks[i](x)
        
                      stride = int(spatial_features.shape[2] / x.shape[2])
                      ret_dict['spatial_features_%dx' % stride] = x
                      if len(self.deblocks) > 0:
                          ups.append(self.deblocks[i](x))
                      else:
                          ups.append(x)
        
                  if len(ups) > 1:
                      x = torch.cat(ups, dim=1)
                  elif len(ups) == 1:
                      x = ups[0]
        
                  if len(self.deblocks) > len(self.blocks):
                      x = self.deblocks[-1](x)
        
                  data_dict['spatial_features_2d'] = x
        
                  return data_dict
        
    • dense_head:
      • 定义在./pcdet/models/dense_heads/anchor_head_single.py,具体是AnchorHeadSingle
        • AnchorHeadSingle的基类AnchorHeadTemplate
          • 参数非常之多:
          • predict_box_when_trainingFalse不知道啥意思
          • 里面有一个self.box_coder属性,来源是./pcdet/utils/box_coder_utils.py里的ResidualCoder不知道是干啥的
          • anchor_target_cfg属性包括:
          • 还会通过self.generate_anchors方法生成一些anchor(这是怎么回事?anchor居然是事先生成的吗?
          • anchor_generator_cfg包括:
            • 这个方法里还包了一个AnchorGenerator(来源是./pcdet/models/dense_heads/target_assigner/anchor_generator.py)的实例化,这也太复杂了吧
            • 看起来这个实例才是真正干活(指生成anchor)的;
            • 该类的generate_anchor方法:包含非常复杂的张量变换,大意是按照空间切成grid,把anchor assign到每个grid的中心;每个anchor包括坐标(3)、尺寸(3+1)、方向(2)共7个维度;
          • 后面还有一个self.target_assigner,是AxisAlignedTargetAssigner的实例(我去怎么这么多,而且怎么还有个专门的assigner啊??),初始化部分把各个类的threshold记录下来了;
          • self.build_losses
            • 有三种loss,但是没有细看各个loss的作用是啥,以及怎么实现的
            • loss cfg如下所示:
        • AnchorHeadSingle类中,模型有三个输出头:
          • self.conv_cls:顾名思义;
            • 但是为什么输出的类别有6*3种??
          • self.conv_box:顾名思义
          • self.conv_dir_cls:dir -> direction,似乎给direction也有一个专门的输出头;
    • point_head:
      • None
    • roi_head:
      • None
  • 来个完整的模型描述:
    PointPillar(
    (vfe): PillarVFE(
      (pfn_layers): ModuleList(
        (0): PFNLayer(
          (linear): Linear(in_features=10, out_features=64, bias=False)
          (norm): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (backbone_3d): None
    (map_to_bev_module): PointPillarScatter()
    (pfe): None
    (backbone_2d): BaseBEVBackbone(
      (blocks): ModuleList(
        (0): Sequential(
          (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
          (1): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), bias=False)
          (2): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (3): ReLU()
          (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (5): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (6): ReLU()
          (7): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (8): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (9): ReLU()
          (10): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (11): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (12): ReLU()
        )
        (1): Sequential(
          (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
          (1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), bias=False)
          (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (3): ReLU()
          (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (6): ReLU()
          (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (9): ReLU()
          (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (12): ReLU()
          (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (15): ReLU()
          (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (18): ReLU()
        )
        (2): Sequential(
          (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
          (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
          (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (3): ReLU()
          (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (6): ReLU()
          (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (9): ReLU()
          (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (12): ReLU()
          (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (15): ReLU()
          (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (18): ReLU()
        )
      )
      (deblocks): ModuleList(
        (0): Sequential(
          (0): ConvTranspose2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU()
        )
        (1): Sequential(
          (0): ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
          (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU()
        )
        (2): Sequential(
          (0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(4, 4), bias=False)
          (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU()
        )
      )
    )
    (dense_head): AnchorHeadSingle(
      (cls_loss_func): SigmoidFocalClassificationLoss()
      (reg_loss_func): WeightedSmoothL1Loss()
      (dir_loss_func): WeightedCrossEntropyLoss()
      (conv_cls): Conv2d(384, 18, kernel_size=(1, 1), stride=(1, 1))
      (conv_box): Conv2d(384, 42, kernel_size=(1, 1), stride=(1, 1))
      (conv_dir_cls): Conv2d(384, 12, kernel_size=(1, 1), stride=(1, 1))
    )
    (point_head): None
    (roi_head): None
    )