Sony Patent | Information Processing Apparatus And Target Object Recognition Method

Patent: Information Processing Apparatus And Target Object Recognition Method

Publication Number: 20200050833

Publication Date: 20200213

Applicants: Sony

Abstract

A captured image acquisition section 50 acquires, from an imaging apparatus 12, data of a polarized image obtained by capturing a target object and stores the data into an image data storage section 52. A region extraction section 60 of a target object recognition section 54 extracts a region in which a figure of the target object is included in the polarized image. A normal line distribution acquisition section 62 acquires a distribution of normal line vectors on a target object surface in regard to the extracted region. A model adjustment section 66 adjusts a three-dimensional model of the target object stored in a model data storage section 64 in a virtual three-dimensional space such that the three-dimensional model conforms to the distribution of the normal line vectors acquired from the polarized image to specify a state of the target object.

TECHNICAL FIELD

[0001] The present invention relates to an information processing apparatus and a target object recognition method for recognizing a state of a target object utilizing a captured image.

BACKGROUND ART

[0002] A game is known which utilizes a display image formed by capturing an image of part of the body of a user such as the head by a video camera and extracting and replacing a predetermined region of the image such as the eye, the mouth, or the hand with a different image (for example, refer to PTL 1). Also, a user interface system is known which accepts a movement of the mouth or the hand whose image is captured by a video camera as an operation instruction of an application. The technology for capturing an image of a real world and displaying a virtual world that reacts with the movement in the real world or utilizing the image in some information processing is utilized in a wide range of fields irrespective of the scale from a small-sized portable terminal to leisure facilities.

CITATION LIST

Patent Literature

[0003] [PTL 1]

[0004] Published European Patent Application No. 0999518

SUMMARY

Technical Problem

[0005] In image analysis of acquiring the position or the posture of a target object from a captured image, there is a problem that the accuracy in processing is liable to become unstable due to the appearance, position, image capturing environment, and the like of the target object. For example, in a general technology that utilizes feature points to extract a figure of a target object from a captured image or perform matching, the accuracy of processing deteriorates if a target object originally has an insufficient number of feature points or if a target object exists at a position far from a camera and is small in apparent size. As a request for robustness in processing accuracy increases, a granularity of processing is decreased spatially or temporally or the algorithm is complicated, resulting in increase of a processing load.

[0006] The present invention has been made in view of such a problem as described above, and it is an object of the present invention to provide a technology capable of acquiring a state of a target object efficiently and accurately using a captured image.

Solution to Problem

[0007] A mode of the present invention relates to an information processing apparatus. This information processing apparatus includes a normal line distribution acquisition section configured to acquire a distribution of normal line vectors of a target object surface from a polarized image obtained by capturing a target object, a target object recognition section configured to specify a state of the target object by collating a shape of the target object registered in advance with the distribution of the normal line vectors, and an output data generation section configured to perform processing based on the specified state of the target object to generate output data and output the output data.

[0008] Another mode of the present invention relates to a target object recognition method. This target object recognition method includes a step of acquiring data of a polarized image obtained by capturing a target object from an imaging apparatus, a step of acquiring a distribution of normal line vectors of a target object surface from the polarized image, a step of specifying a state of the target object by collating a shape of the target object registered in advance with the distribution of the normal line vectors, and a step of performing processing based on the specified state of the target object to generate output data and output the output data to an external apparatus.

[0009] It is to be noted that also arbitrary combinations of the constituent features described above and converted matters of the representations of the present invention between a method, an apparatus, and the like are effective as modes of the present invention.

Advantageous Effect of Invention

[0010] According to the present invention, a state of a target object can be acquired efficiently and accurately using a captured image.

BRIEF DESCRIPTION OF DRAWINGS

[0011] FIG. 1 is a view depicting an example of a configuration of an information processing system according to an embodiment 1.

[0012] FIG. 2 is a view depicting an example of a structure of an imaging device provided in an imaging apparatus in the embodiment 1.

[0013] FIG. 3 is a view depicting a configuration of an internal circuit of the information processing apparatus in the embodiment 1.

[0014] FIG. 4 is a view depicting a configuration of functional blocks of the information processing apparatus in the embodiment 1.

[0015] FIG. 5 is a view illustrating a flow until a target object recognition section acquires a distribution of normal line vectors of a target object in the embodiment 1.

[0016] FIG. 6 is a view depicting an example of a configuration of basic data of a target object stored in a model data storage section in the embodiment 1.

[0017] FIG. 7 is a view illustrating a process of a model adjustment section adjusting a state of a three-dimensional model of a target object in the embodiment 1.

[0018] FIG. 8 is a view depicting an example in which a difference occurs in reliability of normal line vectors obtained from a polarized image in the embodiment 1.

[0019] FIG. 9 is a flow chart depicting a processing procedure of acquiring a state of a target object using a polarized image and generating and outputting output data by the information processing apparatus of the embodiment 1.

[0020] FIG. 10 is a view exemplifying distributions of normal line vectors obtained in the embodiment 1.

[0021] FIG. 11 is a view exemplifying distributions of normal line vectors obtained in the embodiment 1.

[0022] FIG. 12 is a view exemplifying distributions of normal line vectors obtained in the embodiment 1.

[0023] FIG. 13 is a view exemplifying distributions of normal line vectors obtained in the embodiment 1.

[0024] FIG. 14 is a view depicting a configuration of functional blocks of an information processing apparatus in an embodiment 2.

[0025] FIG. 15 is a view depicting a configuration of functional blocks of an information processing apparatus in an embodiment 3.

[0026] FIG. 16 is a view illustrating a principle of segmenting an image plane into regions on the basis of a distribution of normal line vectors in the embodiment 3.

[0027] FIG. 17 is a view illustrating another principle of segmenting an image plane into regions on the basis of a distribution of normal line vectors in the embodiment 3.

[0028] FIG. 18 is a view illustrating an example of a particular technique of segmenting an image plane into regions from a distribution of normal line vectors by a region decision section in the embodiment 3.

[0029] FIG. 19 is a view depicting an example of a region identified on the basis of a distribution of normal line vectors in the embodiment 3.

[0030] FIG. 20 is a view illustrating an example of a criterion that can be used for region segmentation in the embodiment 3.

[0031] FIG. 21 is a view illustrating another example of a technique of segmenting an image plane into regions from a distribution of normal line vectors by the region decision section in the embodiment 3.

[0032] FIG. 22 is a flow chart depicting a processing procedure of segmenting an image plane into regions using a polarized image and generating and outputting output data by the information processing apparatus of the embodiment 3.

DESCRIPTION OF EMBODIMENTS

Embodiment 1

[0033] FIG. 1 depicts an example of a configuration of an information processing system in the present embodiment. This information processing system includes an imaging apparatus 12 capturing an image of a target object 8 at a predetermined frame rate, an information processing apparatus 10a acquiring data of the captured image and performing predetermined information processing, and a display apparatus 16 outputting a result of the information processing. The information processing system may further include an inputting apparatus that accepts an operation for the information processing apparatus 10a from a user. The information processing apparatus 10a may further be communicatable with an external apparatus such as a server by connection to a network such as the Internet.

[0034] The information processing apparatus 10a, the imaging apparatus 12, and the display apparatus 16 may be connected with each other by a wire cable or may be wirelessly connected with each other by a wireless LAN (Local Area Network) or the like. Further, two or more of the information processing apparatus 10a, the imaging apparatus 12, and the display apparatus 16 may be combined as an integrated apparatus. For example, the information processing system may be implemented by a camera, a portable terminal, or the like that is equipped with them. As an alternative, the display apparatus 16 may be formed as a head-mounted display that is mounted on the head of a user such that an image is displayed in front of the eyes of the user, and the imaging apparatus 12 may be mounted on the head-mounted display such that an image corresponding to a line of sight of the user may be captured. In any case, the appearance shape of the information processing apparatus 10a, the imaging apparatus 12, and the display apparatus 16 is not limited to a depicted one.

[0035] In such a system as described above, the information processing apparatus 10a successively acquires data of images captured at a predetermined frame rate by the imaging apparatus 12 and analyzes the data to specify the position or the posture of the target object 8 in the real space. In a case in which the target object 8 has a variable shape like an elastic body, the information processing apparatus 10a specifies also the shape of the target object 8. Then, the information processing apparatus 10a carries out information processing so as to be compatible with the specified result to generate data of a display image or sound and then outputs the data to the display apparatus 16. The substance of the information processing that is carried out in an associated relationship with the state of the target object 8 is not specifically restricted, and therefore, the target object 8 may be various accordingly.

[0036] For example, the target object 8 may be a controller for a game such that an operation for the game may be performed by the user grasping and moving the same. In this case, an image representative of the game world can change in response to the movement of the controller, or an image in which the controller is replaced with a virtual object can be displayed on a captured image capturing the user. Alternatively, also it is possible to cause a head-mounted display to display an image that represents a virtual object interacting with the hand of the user in a field of vision corresponding to a line of sight of the user who has the head-mounted display mounted thereon.

[0037] Since the information processing to be performed utilizing the state of the target object 8 in this manner can be considered variously, the following description will be focused on a technique of efficiently and particularly specifying a position, a posture, or a shape of the target object 8 from a captured image. Although a position, a posture, and a shape of a target object are hereinafter referred to collectively as a “state of target object,” this does not mean to always specify all of them, but at least one of them may be specified as occasion demands. For this purpose, the imaging apparatus 12 in the present embodiment at least captures a polarized image of an image captured space. Then, the information processing apparatus 10a acquires normal line information of the target object 8 obtained from the polarized image and utilizes the normal line information to particularly specify a state of the target object 8.

[0038] FIG. 2 depicts an example of a structure of an imaging device provided in the imaging apparatus 12. It is to be noted that FIG. 2 schematically depicts a functional structure of a cross section of the device while a detailed structure of interlayer insulating films, wiring lines, and the like is omitted. The imaging device 110 includes a microlens layer 112, a wire grid type polarizer layer 114, a color filter layer 116, and a light detection layer 118. The wire grid type polarizer layer 114 includes a polarizer formed from a plurality of linear conductor members arrayed in a stripe shape at intervals smaller than a wavelength of incident light. When light condensed by the microlens layer 112 enters the wire grid type polarizer layer 114, a polarized light component in a direction parallel to a line of the polarizer is reflected while only a perpendicularly polarized light component passes through the wire grid type polarizer layer 114.

[0039] A polarized image is acquired by detecting the polarized light component passing through the wire grid type polarizer layer 114 by the light detection layer 118. The light detection layer 118 has a semiconductor device structure of a general CCD (Charge Coupled Device) image sensor, a CMOS (Complementary Metal Oxide Semiconductor) image sensor, or the like. The wire grid type polarizer layer 114 includes such an array of polarizers having main axis angles different from each other in a reading unit of charge in the light detection layer 118, namely, in a unit of a pixel or a unit greater than the unit of a pixel. On the right side in FIG. 2, a polarizer array 120 when the wire grid type polarizer layer 114 is viewed from above is exemplified.

[0040] A line indicated by slanting lines in FIG. 2 represents a conductor (wire) configuring a polarizer. It is to be noted that a rectangle of a broken line represents a region of a polarizer of one main axis angle, and the broken line itself is not formed actually. In the example depicted, four polarizers of different main axis angles are disposed in four regions 122a, 122b, 122c, and 122d in two rows and two columns. In FIG. 2, polarizers on a diagonal line have the main axis angles orthogonal to each other, and polarizers neighboring with each other have a difference of 45 degrees. In other words, four polarizers of the main axis angles at intervals of 45 degrees are provided.

[0041] Each polarizer passes through a polarized light component having a direction orthogonal to the direction of the wire. Consequently, in the underlying light detection layer 118, polarization information of four directions at intervals of 45 degrees can be obtained in regions corresponding to the four regions 122a, 122b, 122c, and 122d. A plurality of such polarizer arrays of the four main axis angles are further arrayed in a longitudinal direction and a lateral direction, and a peripheral circuit controlling a charge reading out timing is connected to them, whereby an image sensor that acquires four different kinds of polarization information at the same time as two-dimensional data can be implemented.

[0042] In the imaging device 110 depicted in FIG. 2, the color filter layer 116 is provided between the wire grid type polarizer layer 114 and the light detection layer 118. The color filter layer 116 includes arrays of filters through which light of red, green, or blue passes, for example, in a corresponding relationship to each of pixels. Consequently, polarization information is obtained for individual colors in accordance with a combination of a main axis angle of a polarizer in the wire grid type polarizer layer 114 and a color of a filter in the color filter layer 116 located below the wire grid type polarizer layer 114. In particular, since polarization information of the same direction and the same color is obtained discretely on the image plane, by suitably interpolating the polarization information, a polarized image for each direction and for each color is obtained.

[0043] Further, by arithmetically operating polarized images of the same colors, it is also possible to reproduce a non-polarized color image. An image acquisition technology that uses a wire grid type polarizer is disclosed, for example, also in JP 2012-80065 A, or the like. However, the device structure of the imaging apparatus 12 in the present embodiment is not limited to the depicted one. For example, since a polarized luminance image is basically used for specification of a state of a target object in the present embodiment, it is also possible to omit the color filter layer 116 as long as a color image is not required by a different use. Further, the polarizer is not limited to that of the wire grid type, but any polarizer placed into practical use, such as a line dichroic polarizer, may be applicable. Alternatively, a structure in which a polarizing plate whose main axis angle is changeable is disposed on the front of a general camera may be applied.

[0044] FIG. 3 depicts an internal circuit configuration of the information processing apparatus 10a. The information processing apparatus 10a includes a CPU (Central Processing Unit) 23, a GPU (Graphics Processing Unit) 24, and a main memory 26. The components mentioned are connected with one another by a bus 30. To the bus 30, an input/output interface 28 is connected further. To the input/output interface 28, a communication section 32 including a peripheral equipment interface such as USB, IEEE1394, or the like or a network interface to a wired or wireless LAN, a storage section 34 such as a hard disk drive, a nonvolatile memory, and the like, an outputting section 36 outputting data to the display apparatus 16, an inputting section 38 receiving data from the imaging apparatus 12 or an inputting apparatus not depicted as an input thereto, and a recording medium driving section 40 driving a removable recording medium such as a magnetic disk, an optical disc, a semiconductor memory, or the like are connected.

[0045] The CPU 23 executes an operating system stored in the storage section 34 to control the entire information processing apparatus 10a. Further, the CPU 23 executes various programs read out from a removable recording medium and loaded into the main memory 26 or downloaded through the communication section 32. The GPU 24 has a function of a geometry engine and a function of a rendering processor, performs a drawing process in accordance with a drawing instruction from the CPU 23, and stores data of a display image into a frame buffer not depicted. Then, the GPU 24 converts the display image stored in the frame buffer into a video signal and outputs the video signal to the outputting section 36. The main memory 26 is configured from a RAM (Random Access Memory) and stores programs and data necessary for processing.

[0046] FIG. 4 depicts a configuration of functional blocks of the information processing apparatus 10a of the present embodiment. The functional blocks depicted in FIG. 4 and FIGS. 14 and 15 hereinafter described can be implemented in hardware from such components as the CPU, the GPU, various memories, a data bus, and the like depicted in FIG. 3 and is implemented in software by a program loaded from a recording medium or the like into a memory and demonstrating various functions such as a data inputting function, a data retention function, an arithmetic operation function, an image processing function, a communication function, and the like. Accordingly, it is recognized by those skilled in the art that the functional blocks can be implemented in various forms from hardware only, software only or a combination of them and are not limited to any one of them.

[0047] The information processing apparatus 10a includes a captured image acquisition section 50 that acquires data of a captured image from the imaging apparatus 12, an image data storage section 52 that stores the acquired data of the image, a target object recognition section 54 that specifies a state of a target object utilizing the captured image, and an output data generation section 56 that generates data to be output on the basis of the state of the target object. The captured image acquisition section 50 is implemented by the inputting section 38, the CPU 23, and the like of FIG. 3 and acquires data of a polarized image at a predetermined rate from the imaging apparatus 12.

[0048] Although the captured image acquisition section 50 acquires at least a polarized image in order to acquire a state of the target object as described above, the captured image acquisition section 50 may further acquire non-polarized (natural light) image data from the substance of information processing to be carried out by the information processing apparatus 10a or an image to be displayed. The image data storage section 52 is implemented by the main memory 26 and successively stores data of a captured image acquired by the captured image acquisition section 50. At this time, the captured image acquisition section 50 generates and stores image data necessary for processing at a succeeding stage, for example, to generate a luminance image from a color image as occasion demands.

[0049] The target object recognition section 54 is implemented by the CPU 23, the GPU 24, and the like and specifies a state of a target object using image data stored in the image data storage section 52. In the present embodiment, a predetermined target object is focused to perform analysis, thereby enhancing efficiency of processes and increasing reliability of information obtained from a polarized image. In particular, a three-dimensional model of a target object is registered in advance, and a state of the three-dimensional model is adjusted so as to correspond to a distribution of normal line vectors of the target object successively obtained from the polarized image.

[0050] In particular, the target object recognition section 54 includes a region extraction section 60, a normal line distribution acquisition section 62, a model data storage section 64, and a model adjustment section 66. The region extraction section 60 extracts a region in which a figure of a target object appearing in a captured image is included. In a case where a shape or a feature amount of the target object is known from the registered three-dimensional model, it is possible to extract the region by general template matching. In a case where a color image is acquired as a captured image, information of a color can be also utilized.

[0051] Alternatively, an article included in a predetermined range within a three-dimensional image captured object space may be specified from a depth image, and the region including a figure of the article may be extracted. The depth image is an image that represents a distance from an imaging plane of a subject as a pixel value in a captured image. The depth image can be obtained, for example, by providing a stereo camera for capturing images of an image captured space from the left and right points of view spaced by a known distance from each other on the imaging apparatus 12 and calculating the distance of the article represented by each figure on the basis of the principle of triangulation from the parallax between corresponding points in the captured stereo image.

[0052] As an alternative, a mechanism in which reference light such as infrared light is irradiated upon an image captured space and reflected light of the reference light is detected is mounted on the imaging apparatus 12 to find a distance by a TOF (Time of Flight) method. In any cases, a general technology can be applied to generation of a depth image, and the configuration of the imaging apparatus 12 may be determined suitably depending upon the technology. For example, in a case where a stereo camera is adopted, a general stereo camera capturing an image in natural light may be provided separately from a polarization camera having such an imaging device structure depicted in FIG. 2, or one or both of stereo cameras may be polarization cameras. In a case where a depth image is used, even if a target object has a shape that varies in various manners depending upon its orientation, a region of a figure of the target object can be extracted comparatively accurately.

[0053] Further, an extraction technique suitable for a target object, which has been placed in practical use, for specifying a region of a figure of the head of a person by a general face detection process may be adopted suitably. Furthermore, an original figure of a target object may be extracted from a positional relationship to a reference figure after the reference figure is extracted such as to extract a region of the hand from a relatively movable range to the head. In any cases, in the present embodiment, since more detailed information is obtained by a fitting process between a distribution of normal line vectors acquired using a polarized image and a three-dimensional model, the region extraction here may be performed with a comparatively rough resolution. Basic data of the target object necessary for region extraction is stored in the model data storage section 64.

[0054] The normal line distribution acquisition section 62 acquires a normal line vector distribution on the target object surface on the basis of polarization information of a region extracted by the region extraction section 60. A technique of utilizing such a characteristic that a behavior of the polarized light intensity with respect to the orientation depends upon the reflection angle of light from the subject to determine a normal line to the subject surface on the basis of a variation of the luminance of the polarized image with respect to an angle variation of the polarizer is generally known. However, a ratio between a specular reflection component and a diffuse reflection component included in reflected light differs depending upon a color or a material of the subject surface, and the relationship between the polarization characteristic and a normal line to the subject varies depending upon the ratio.

[0055] Therefore, although a technique of separating observed polarized light into a specular reflection component and a diffuse reflection component and evaluating them has been investigated variously, such techniques have many problems in terms of the accuracy and the load of processing. In the present embodiment, a target object is restricted to find a distribution of normal line vectors on the basis of the unique knowledge that, if a target object is restricted in color or material, there is no large variation in polarization characteristic of reflection light from the surface of the target object and a normal line can be obtained stably. For example, one of a specular reflection model and a diffuse reflection model is selectively used on the basis of a color or a material of the target object.

[0056] Further, it is made possible to accurately acquire a distribution of normal line vectors even if reflection components are not accurately separated from each other by utilizing region information extracted by the region extraction section 60 or recursively utilizing a result of state adjustment of a three-dimensional model by the model adjustment section 66. Basic data of a color, a material, and the like of the target object are stored in advance in the model data storage section 64.

[0057] The model adjustment section 66 disposes a three-dimensional model representative of a target object in a virtual three-dimensional space in which a camera coordinate system corresponding to an image plane is set and determines the state of the three-dimensional model so as to coincide with a captured image. In particular, the model adjustment section 66 adjusts the position or the posture of the three-dimensional model by moving or rotating the three-dimensional model such that it conforms to a normal line vector distribution of the target object acquired by the normal line distribution acquisition section 62. If the target object is an article that can be deformed due to application of a force by the user, for example, the model adjustment section 66 adjusts the shape of the three-dimensional model as well.

[0058] Since a distribution of normal line vectors is obtained in regard to a surface of the target object formed in a captured image, by performing the adjustment taking not only individual normal line vectors but also a shape of a region represented by a set of the normal line vectors into consideration, a relationship between the imaging apparatus and the target object can be reproduced more accurately in the virtual space. Data of the three-dimensional model of the target object are stored into the model data storage section 64. For geometrical calculation relating to the object in the virtual space and the camera, a general technology of computer graphics can be applied.

[0059] The output data generation section 56 is implemented by the CPU 23, the GPU 24, the outputting section 36, and the like and carries out predetermined information processing on the basis of a state of the target object specified by the target object recognition section 54 to generate data to be output such as a display image, sound, and the like. As described hereinabove, the substance of information processing to be carried out here is not specifically restricted. For example, in a case where a virtual object is to be drawn such that it contacts with a target object on a captured image, the output data generation section 56 reads out data of the captured image from the image data storage section 52 and draws the object such that it matches with the state of the target object specified by the target object recognition section 54. The output data generation section 56 transmits output data of the display image or the like generated in this manner to the display apparatus 16.

发表评论

电子邮件地址不会被公开。 必填项已用*标注