PhD Thesis Defense
Zoom link: https://caltech.zoom.us/j/3984708391
The world surrounding us is full of structured entities. Scenes can be structured as the sum of objects arranged in space, objects can be decomposed into parts, and even small molecules are composed of atoms. As humans can organize and structure many concepts into smaller components, structural representation has become a powerful tool for various applications. Computer vision utilizes the part-based representation for classical object detection and categorization tasks, and computational neuroscientists use the structural representation to achieve an interpretable and low-dimensional encoding for behavior analysis. Furthermore, structural encoding of the molecules allows the application of machine learning models to optimize experimental reaction conditions in organic chemistry.
To perform the high-level tasks described above, accurate detection of the structural component should be accomplished in advance. In this dissertation, we first propose methods to improve the pose estimation algorithm, where the task is to localize the semantic parts of the target instance from a 2D image. As the collection of a large number of human annotations is a prerequisite for the task to be successful, we aim to design a model that automatically discovers the structure information from the visual inputs without supervision. Lastly, we demonstrate the efficacy of the structural representation by applying it to various scientific applications such as behavior analysis and organic chemistry.