Photorealistic and Semantic 3D Scene Representation for Visual SLAM
Semester/Masters Project
This project aims to develop a semantically-aware, photorealistic SLAM system capable of constructing dense 3D maps that bridge the gap between geometric reconstruction and scene understanding to enhance autonomous robotic perception.
Semantic segmentation in an indoor environment
Background
Simultaneous Localization and Mapping (SLAM) is the foundation of robotic autonomy, requiring the concurrent estimation of a robot’s pose and the construction of its surrounding environment. While traditional SLAM frameworks focus on geometric primitives, they often lack the “semantic awareness” needed for complex decision-making. Recent advancements in dense 3D reconstruction and deep learning-based segmentation offer a path toward photorealistic maps that are also semantically labeled. However, accurately fusing high-dimensional visual features with spatial geometry remains a significant challenge, especially when aiming for real-time performance and high-fidelity representations.
Description
How does a robot distinguish a navigable hallway from a glass wall, or a pedestrian from a static statue? This project moves beyond simple point clouds to develop semantically rich, photorealistic 3D maps. You will integrate state-of-the-art segmentation models with dense SLAM pipelines to create environments where every pixel has both a coordinate and a meaning. The work involves exploring deep-learning architectures, spatial data fusion, and large-scale scene representation. This project is ideal for those interested in the intersection of Computer Vision and Robotics, offering the chance to build intelligent systems that truly understand the world they inhabit.
Work Packages
- The work will focus on state-of-the-art radiance field models and semantic segmentation methods, and on evaluating the resulting system across diverse environments.
Requirements
- The students taking this project need to have programming experience with Python and C++