In the alphabetical order.
University of North Carolina at Chapel Hill
Towards Bringing 3D World Models to Life Leveraging Crowd Sourced Data
Crowd sourced imagery (images and video) is the richest data source available for 3D reconstruction of the world. The tremendous amount of imagery provided by photo sharing web sites, e.g. Flickr, and video sharing sites, for example YouTube, not only covers the world’s appearance, but also reflects the temporal evolution of the world and its dynamic parts. It has long been a goal of computer vision to obtain virtual 3D models from such rich imagery. Some major current research challenges are the scale of the data, e.g. the Yahoo 100 million-image dataset, which is still only a fraction of what is needed to reconstruct the world, and the lack of data for reconstructing the dynamic elements of the world, e.g. the people in the scene. Specifically, we are currently facing significant challenges to process said imagery within a reasonable time frame given limited compute resources. The talk discusses our work on highly efficient large-scale image registration for the reconstruction of static 3D models from world-scale photo collections and our work on large-scale image-based search to address the scalability. In addition, the talk will discuss our approaches towards enabling the reconstruction of scene dynamics, to achieve the goal of bringing the 3D models to life, and the absolute scale determination for the resulting models. Examples include reconstructing people using crowd-sourced imagery.
Jan-Michael Frahm is a full professor at the University of North Carolina at Chapel Hill where he heads the 3D computer vision group. He received his Dr.-Ing. in computer vision in 2005 from the Christian-Albrechts University of Kiel, Germany. His diploma in Computer Science is from the University of Lübeck. His research interests include a variety of topics on the intersection of computer vision, computer graphics, robotics. He has over 100 peer-reviewed publications and is editor in chief for the Elsevier Journal on Image and Vision Computing.
The long march of 3D reconstruction: from tabletop to outer space
After briefly recalling the milestones in the history of 3D reconstruction, and describing the reconstruction pipeline implemented in 3DF Zephyr, I will survey some extreme 3D reconstruction cases that customers have reported, ranging from intra-oral to outer space, including deep water wrecks and destroyed artefacts.
Andrea Fusiello is Associate Professor at the University of Udine, Italy (since 2012), where he teaches Fundamentals of Computer Science (undergraduate) and Computer Vision (graduate). He was Associate Professor at the University of Verona, Italy (2005-2011), visiting professor at the University of Bourgogne (2008, 2017), and EPSRC research fellow at Heriot-Watt University, GB (1999). He received his M.S. in Computer Science from University of Udine, Italy, and his PhD in Computer Engineering from the University of Trieste, Italy in 1999. In 2011 he founded a start-up company, 3Dflow srl, in the area of computer vision and photogrammetry and has been involved in industrial research projects with companies since his PhD. His current research interests include computer vision, image analysis, 3-D model acquisition, and image-based rendering.
Computer Vision, Visual Learning, and 3D Reconstruction
Professor Quan leads a computer vision team that uses photographs and deep visual learning technologies to produce complete 3D reconstruction of all types of locations and objects. In this talk, he reviews the developments in computer vision and visual learning over the past. He also turns the focus on recent exciting work in deep visual learning and 3D reconstruction breakthrough in computer vision. Here, he showcases the approach using case studies of large-scale 3D reconstructions of hundreds of square kilometers high-rise metropolitan areas and undeveloped rural areas from drones, and small-scale daily objects from smartphones. He also demonstrates the online cloud platform and portal www.altizure.com with its crowd-sourced Altizure Earth, developed and funded by the HKUST team, rivaling the popular Google Earth!
Long Quan received the Ph.D. in Computer Science at INRIA, France, in 1989. Before joining the Department of Computer Science at the Hong Kong University of Science and Technology (HKUST) in 2001 to found his computer vision group, he has been one of the founding members of INRIA Grenoble Computer Vision Group since 1990.
He directed the founding best French PhD thesis in computer science by Peter Sturm, le prix de these Gilles Kahn in 1998, the Piero Zamperoni Best Student Paper Award in 2000 by Maxime Lhuillier, the first of six highlights of Siggraph 2007, the Best Student Poster Paper of CVPR 2008. His many graduate students are now world computer vision leaders at INRIA and CNRS in France, Lund University in Sweden, NUS in Singapore, Beijing University and DJI in China, SFU in Canada, and Microsoft, Google, and Princeton in USA.
He has served in all the major computer vision journals, as an Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), a Regional Editor of Image and Vision Computing Journal (IVC), an editorial board member of the International Journal of Computer Vision (IJCV), an editorial board member of the Electronic Letters on Computer Vision and Image Analysis (ELCVIA), an associate editor of Machine Vision and Applications (MVA), and an editorial member of Foundations and Trends in Computer Graphics and Vision.
He has contributed to all the major computer vision conferences, IEEE International Conference on Computer Vision (ICCV), European Conference on Computer Vision (ECCV), and IEEE Computer Vision and Pattern Recognition (CVPR), and IAPR International Conference on Pattern Recognition (ICPR). He served as a Program Chair of ICPR 2006 Computer Vision and Image Analysis, a Program Chair of ICPR 2012 Computer and Robot Vision, and a General Chair of the ICCV 2011 in Barcelona. He is the founding director of the HKUST Center for Visual Computing and Image Science. He is also an IEEE Fellow of the Computer Society.
Most recently, with his HKUST graduates, he founded altizure.com, the world's first portal for generating 3D from drone and smartphone photos!
School of Computing
The University of Utah
Depth, Semantics, and Localization for Autonomous Driving Applications
The success of an autonomous driving system (mobile robot, self-driving car) hinges on the accuracy and speed of inference algorithms that are used in understanding and recognizing the 3D world. A unifying theme of my research is the modeling of scene understanding problems via the lens of probabilistic graphical models and the development of new inference algorithms. In the first part of the talk, I will show novel algorithms for depth estimation from a single image. In addition to the standard orthogonality, linear perspective, and parallelism constraints, we investigate a few novel constraints based on the physical realizability of the scene structure. We treat the line segments in the image to be part of a graph similar to straws and connectors game, where the goal is to back-project the line segments in 3D space while ensuring that some of these 3D line segments connect with each other (i.e., truly intersect in 3D space) to form the 3D structure. We use a mixed integer linear program (MILP) to satisfy the novel constraints during the 3D reconstruction.
I will conclude by showing a few novel algorithms for stereo-based depth sensing, semantic boundary detection and segmentation, and the use of semantics for localization.
Srikumar Ramalingam is an associate professor in the school of computing at the University of Utah since 2017. Before that, he worked as a senior principal research scientist at Mitsubishi Electric Research Lab (MERL) since 2008. He received a Marie Curie VisionTrain scholarship from the European Union to pursue his doctoral studies in computer science and applied mathematics at INRIA Rhone-Alpes (France) under the guidance of Dr. Peter Sturm. His Ph.D. thesis on generic imaging models received INPG best thesis prize and AFRIF Thesis Prize (honorable mention) from the French Association for Pattern Recognition. His other notable awards include R&D 100 and RSS best paper runner-up. His research interests are in computer vision, machine learning, robotics, and autonomous driving.
Director, Oculus Research Pittsburgh
Associate Professor, Robotics Institute, Carnegie Mellon University
Social Perception with Machine Vision
Yaser Sheikh is an Associate Professor at the Robotics Institute, Carnegie Mellon University. He also directs the Facebook Reality Lab in Pittsburgh focused on achieving photorealistic social interactions in AR and VR. His research is broadly focused on machine perception and rendering of social behavior, spanning sub-disciplines in computer vision, computer graphics, and machine learning. He has won Popular Science’s "Best of What’s New" Award, the Honda Initiation Award (2010), best student paper award at CVPR (2018), best paper awards at WACV (2012), SAP (2012), SCA (2010), ICCV THEMIS (2009), best demo award at ECCV (2016), and placed first in the MSCOCO Keypoint Challenge (2016); he received the Hillman Fellowship for Excellence in Computer Science Research (2004). Yaser has served as a senior committee member at leading conferences in computer vision, computer graphics, and robotics including SIGGRAPH (2013, 2014), CVPR (2014, 2015, 2018), ICRA (2014, 2016), ICCP (2011), and served as an Associate Editor of CVIU. His research is sponsored by various government research offices, including NSF and DARPA, and several industrial partners including the Intel Corporation, the Walt Disney Company, Nissan, Honda, Toyota, and the Samsung Group. His research has been featured by various media outlets including The New York Times, The Verge, Popular Science, BBC, MSNBC, New Scientist, slashdot, and WIRED.
Visiting Assistant Professor
Department of Computer Science
Born in the wild: Self-supervised 3D Face Model Learning
A broad range of applications in visual eﬀects, computer animation, autonomous driving, and man-machine interaction heavily depend on robust and fast algorithms to obtain high-quality reconstructions of our physical world in terms of geometry, motion, reflectance, and illumination. Especially, with the increasing popularity of virtual, augmented and mixed reality devices, there comes a rising demand for real-time and low-latency solutions. The extraction of spatio-temporally coherent representations from visual data is a highly challenging and ill-posed problem, since the image formation process has convolved multiple physical dimensions into flat color measurements.
This talk covers data-parallel optimization and state-of-the-art machine learning techniques to tackle the underlying 3D and 4D reconstruction problems based on novel mathematical models and fast algorithms. In the context of face reconstruction, many approaches rely on strong priors, such as parametric face models learned from limited scan data. However, prior models restrict generalization of the true diversity in facial geometry and skin reflectance. The particular focus of this talk is on alleviating this problem based on self-supervised learning of a parametric face model from a collection of unlabeled in-the-wild images. The approach can be trained end-to-end without dense annotations by fusing a convolutional encoder with a differentiable expert-designed renderer and a self-supervised training loss.
Michael Zollhöfer is a Visiting Assistant Professor at Stanford University. His stay at Stanford is funded by a postdoctoral fellowship of the Max Planck Center for Visual Computing and Communication (MPC-VCC), which he received for his work in the fields of computer vision, computer graphics, and machine learning. Before joining Stanford University, Michael was a Postdoctoral Researcher at the Max Planck Institute for Informatics working with Christian Theobalt. He received his PhD from the University of Erlangen-Nuremberg for his work on real-time reconstruction of static and dynamic scenes. During his PhD, he was an intern at Microsoft Research Cambridge working with Shahram Izadi on data-parallel optimization for real-time template-based surface reconstruction. The primary goal of his research is to teach computers to reconstruct and analyze our world at frame rate based on visual input. To this end, he develops key technology to invert the image formation models of computer graphics based on data-parallel optimization and state-of-the-art deep learning techniques. The reconstructed intrinsic scene properties, such as geometry, motion, reflectance, and illumination are the foundation for a broad range of applications not only in virtual and augmented reality, visual effects, computer animation, autonomous driving, and man-machine interaction, but also in other fields such as medicine and biomechanics.