Cylindrical Convolutional Networks for
Joint Object Detection and Viewpoint Estimation

Sunghun Joung¹

Seungryong Kim^2,3

Hanjae Kim¹

Minsu Kim¹

Ig-Jae Kim⁴

Junghyun Cho⁴

Kwanghoon Sohn¹

Yonsei University¹

EPFL²

Korea University³

KIST⁴

[CVPR'20 paper]

[CVPR'20 slide]

[Code]

Illustration of cylindrical convolutional networks (CCNs): Given a single image of objects, we apply a view-specific convolutional kernel to extract the shape characteristic of object from different viewpoints.

Existing techniques to encode spatial invariance within deep convolutional neural networks only model 2D transformation fields. This does not account for the fact that objects in a 2D space are a projection of 3D ones, and thus they have limited ability to severe object viewpoint changes. To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint. With the view-specific feature, we simultaneously determine objective category and viewpoints using the proposed sinusoidal soft-argmax module. Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.

Paper

Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn

Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

CVPR, 2020

[pdf] [bibtex]

Video

Acknowledgements

This research was supported by R&D program for Advanced Integrated-intelligence for Identification (AIID) through the National Research Foundation of KOREA (NRF) funded by Ministry of Science and ICT (NRF-2018M3E3A10572).
This webpage template was borrowed from the project pages of colorful folks and hmr.