What is 3D Vision?
Here, we treat the subject of 3D Vision by separating it into two main components: 3D measurement and 3D recognition.
3D measurement refers to the measurement of the dimensions of a target object, such as its length and width, and the estimation of its shape by using its point cloud representation.
On the other hand, 3D recognition refers to searching for a target object within a specified space, and estimating its 3D position and orientation.
The principle of imaging through a camera
The image captured by a single camera provides only two dimensional information. After light passes through the lens of the camera, an image is formed. Tracing this image formation process backwards, for a given pixel on an image, it can be associated with a unique ray of light that passes through the pixel's location on the image plane and the len's center. However, it is not possible to identify the location along this ray from where the light originated.
3D space and 2D images
The purpose of 3D measurement is to find the location along the ray where the light that formed the image originated from. There are three basic principles that are used for 3D measurement:
- Time of flight
- Depth from focus
3D measurement using the time-of-flight principle has the merit that it can be used outdoors. However, its accuracy is not so high. 3D measurement using depth from focus requires a mechanism to adjust the focus, so only a relatively narrow depth range can be measured in practice. On the other hand, 3D measurement using the triangulation principle can be used in a wide range of depth ranging from a few millimeters to several kilometers. Here, we will focus on 3D measurement based on the triangulation technique.
Using a single camera, it is not possible to measure depth in 3D space. However, with two cameras, depth can be recovered by calculating the intersection point of two rays of light. The baseline connecting two camera centers and the two rays emanating from the camera centers and intersecting at the 3D point to be measured forms a triangle. Thus, this method of 3D measurement is called triangulation. Strictly speaking, triangulation is not limited to using only two camearas, but applies as well when multiple cameras are used. 3D measurement using triangulation requires two main problems to be solved. The first problem is camera calibration. Camera calibration refers to the determination of intrinsic parameters (focal length, principal point, lens distortion) and extrinsic parameters (position and orientation of the two cameras). The second problem is stereo correspondence, or matching. It involves finding the pixels in the images of two cameras that corresponds to the same 3D point. It is only after we have correct stereo correspondences that we can perform triangulation through the intersection of the two rays of the matching pixels.
Features on the target object can be used for matching. However, these features are generally not very dense, so the result is a sparse 3D point set. A more advanced technique is to create dense features artificially, by using a projector to project gray code or phase shift patterns onto the target object.
3D measurement with a camera and a projector
By shifting a sine wave signal, a phase value can be associated with each pixel on an image. The phase value is then used to establish stereo correspondences between the images of different cameras.
3D recognition is to search the pose of a known 3D object. The 6 unknown parameters are 3D position (X, Y, Z coordinates) and 3D orientation (α, β, γ). For 2D recognition, the 3 unknown parameters are 2D position (X,Y) and 2D orientation (angle α).
In the same way as 2D recognition, 3D recognition requires evaluating the similarity between the 3D shape determined by a speficied pose and the input pose. This evaluation can be performed using contour information or point cloud information.
Recognition of 3D objects using contour matching
Recognition of 3D objects using point cloud matching
The 3D Vision Sensor (TVS) from Kyoto Robotics provides the ''eyes'' of industrial robots.
TVS 3.0 series is equipped with a projector and 4 cameras. It enables 3D object recognition by comparing the input images with registered 3D models using both contour and point cloud data. Based on the recognition results, TVS can then optimally control the motion of a robot to perform the required task.
The principle of recognition in TVS
In the past, industrial robots were only able to repeat registered motion. However, robots equipped with TVS are able to make decisions autonomously and then take appropriate actions. TVS recognizes randomly-placed objects in a container and the robot carries out bin picking. This is possible due to the fast and accurate 3D pose recognition of TVS, and by operating the robot to ensure an optimized and collision-free route for picking.
Bin picking with 3D Robot Vision