Amazon Patent | Realistic Rendering For Virtual Reality Applications
Publication Number: 10217286
Publication Date: 20190226
Applicants: Amazon Technologies, Inc.
Motion sickness resulting from use of a virtual reality headset can be mitigated by displaying a virtual nose in the field of view of the user. Nose data can be obtained by a user selecting various options, or determined dynamically using various image analysis algorithms. Image data captured of the user’s face can be data analyzed to determine aspects such as the size, shape, color, texture, and reflectivity of the user’s nose. A three-dimensional nose model is generated, which is treated as an object in the virtual world and can have lighting, shadows, and textures applied accordingly. The pupillary distance can be determined from the image data and used to determine the point of view from which to render each nose portion. Changes in lighting or expression can cause the appearance of the nose to change, as well as the level of detail of the rendering.
Virtual reality devices, such as headsets or goggles, are rapidly developing to the point where these devices should soon be widely available for various consumer applications. For example, virtual reality headsets that display images of a virtual world have been demonstrated at various events and application developers are preparing for their upcoming release. One issue that persists, however, is the problem of motion sickness. The human brain processes information in a certain way that, when the perceived reality is distorted or presented in an unexpected way, can lead to issues with motion sickness, headaches, and other such problems.
BRIEF DESCRIPTION OF THE DRAWINGS
Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which:
FIGS. 1A and 1B illustrate example left and right images that can be presented to a user wearing a virtual reality device in accordance with various embodiments.
FIG. 2 illustrates an example virtual reality device that can be utilized in accordance with various embodiments.
FIGS. 3A, 3B, and 3C illustrate example nose shapes, sizes, and colors that can be determined and/or rendered in accordance with various embodiments.
FIGS. 4A and 4B illustrate example facial feature points that can be detected and used for nose rendering in accordance with various embodiments.
FIG. 5 illustrates an example process for determining nose model information for a user that can be utilized in accordance with various embodiments.
FIG. 6 illustrates an example process fir using nose model information for a user to render a portion of a nose in a view of content for a virtual reality device that can be utilized in accordance with various embodiments.
FIGS. 7A and 7B illustrate an example analysis of facial features of a user in accordance with various embodiments.
FIGS. 8A, 8B, and 8C illustrate an example of capturing eye movement of a user as input in accordance with various embodiments.
FIGS. 9A, 9B, and 9C illustrate an approach to determining retina location from a pair of images that can be used in accordance with various embodiments.
FIG. 10 illustrates components of an example computing device that can be utilized to implement aspects of the various embodiments.
FIG. 11 illustrates an example environment in which aspects of the various embodiments can be performed.
Systems and methods in accordance with various embodiments of the present disclosure overcome one or more of the above-referenced and other deficiencies in conventional approaches to rendering virtual reality content in an electronic environment. In particular, various embodiments provide for the generation of a nose model for a user that can be used to render an appropriate nose portion in content displayed by a virtual reality device, such as a virtual reality headset. In some embodiments, a user can have the ability to “design” an appropriate nose for the user by selecting various options, such as an appropriate shape and color. In other embodiments the appropriate nose data can be determined dynamically by analyzing image data, including a representation of the face of the user, to determine the relative locations of various facial features or feature points. From these feature points the size, shape, and location of the user’s nose can be determined. The size and shape data can be used to generate a virtual model (i.e., a wire frame model or mesh) of the user’s nose. The location of the nose in the image data enables appearance data to be determined for the nose, where the appearance data can include data for aspects such as the base color, variations in color, texture, and reflectivity of the nose, which can be used when applying texture, lighting, and/or shadowing to the nose model.
When a virtual reality (VR) device is to display VR content to the user, the content to be rendered can be obtained, as well as the relevant nose data including the mesh and texture data. Since views are rendered for each eye, a point of view can be determined for each eye display and the appropriate portion of the nose rendered from that point of view. The nose can be treated as an object in the virtual environment and can have lighting, shading, and other effects applied just like any other object in the virtual world. As the user moves his or her head, or changes gaze direction, the changes in view can be rendered accordingly. The level of detail (e.g., resolution and texture) applied to the nose can depend at least in part upon factors such as lighting and gaze direction. If the user changes expression, the nose can be re-rendered to have a slightly different shape that represents the current user expression. The presence of a substantially accurate nose portion visible in the field(s) of view of the virtual reality device can help to mitigate motion sickness resulting from using the device. Using the image data analyzed for nose shape and size, for example, the pupillary distance (i.e., physical separation between points such as the centers of the user’s pupils) of the user can be determined. In some embodiments the pupil positions can be determined as two of the feature points generated from the feature detection process. The pupillary distance for a user enables virtual content to be rendered from the correct points of view (i.e., with the correct amount of disparity), which can further help to mitigate motion sickness in at least some embodiments.
Various other functions and advantages are described and suggested below as may be provided in accordance with the various embodiments.
FIG. 1A illustrates an example pair of images 102, 104 that can be displayed to a user wearing a virtual reality headset, or other such device, in accordance with various embodiments. As known for such devices, a different view is typically rendered and/or displayed for each of the user’s eyes, with each view being slightly different due to the difference in locations of the user’s eyes. The pupillary distance (or other measure of physical separation or relative locations of the user’s eyes) between the user’s eyes causes objects to be viewed from slightly different angles, with the difference in angle depending at least in part upon the distance from the object to the user’s eyes. This varying angle with distance, which results in a different apparent lateral separation between representations of objects in the left and right images, is typically referred to as disparity. Objects at “infinity” will have representations appear at approximately the same pixel location in each image because the difference in angle will be approximately zero degrees. As objects get closer, the angular difference between the locations of the object with respect to each eye will increase, such that the difference in pixel location(s) of the object representation will increase between the two images. For example, an object close to the user’s eyes will appear to be significantly further to the left (e.g., 100 pixels further to the left of center) in the right eye image than the representation of that object in the left eye image. Thus, in FIG. 1A the portion of the road closer to the user has a greater difference in pixel location than the portions of the road that are further away, based on the difference in disparity with distance.
When rendering a virtual reality scene, this difference in angle with disparity can be used to render the same scene from two different points of view. In the driving example, there will be a virtual model (i.e., a wire frame or mesh model) of the road and surrounding environment, and the relative “position” of each eye in the scene can be used as a point of reference from which to render the scene for each of the left eye image and the right eye image, which will result in the appropriate amount of disparity between views. For each image, the view of the virtual three-dimensional (3D) model can be determined, then the appropriate textures and lighting applied (as known for such purposes) to render the scene from the two different viewpoints. The left and right eye images can be displayed concurrently, using two separate display screens or portions of a single display screen, or in alternating sequence, with a virtual shutter or other such element causing the left and right eyes to alternatively be able to view the content when the corresponding image is displayed. Such a process provides the 3D appearance of the environment. A virtual reality device typically also includes some type of motion and/or orientation detection sensor, such as an accelerometer, gyroscope, electronic compass, inertial sensor, magnetometer, and the like, which can provide data as to movement of the virtual reality device resulting from movement of the user’s head, in general. The point of view and direction can thus be updated with movement of the device such that the views presented to the user correspond to the expected view based on the user movement. The views can be re-rendered and displayed at an appropriate rate, such as thirty or sixty frames per second, such that the user can feel as if the user is actually in the virtual environment based on the combination of the appropriate rendering with changes in user head orientation, etc.
As mentioned, however, slight variations or deviations from expected reality can present problems such as motion sickness. These deviations can include, for example, dropped frames, delays in point of view updates, improper lighting or shading, and other such aspects. One approach that has been demonstrated to potentially mitigate motion sickness is the introduction of a virtual nose displayed in the virtual reality views. The human brain is used to detecting the user’s nose in information captured by each eye, then filtering out that portion of the information such that the human typically will not notice the nose portion in the field of view. The lack of such a nose can signal to the brain that there is something wrong with the reality being presented, which can lead to motion sickness. The discussion of virtual noses included in virtual reality to this point has not presented sufficient information as to how to determine the correct appearance of the nose in the virtual reality environment. Further, the discussion has not touched on how the virtual environment interacts with virtual noses, or how the nose appearance should change under various environmental conditions.
Approaches in accordance with various embodiments attempt to address these and other deficiencies with existing virtual reality devices and applications by determining an appropriate nose appearance for a user, and rendering that nose in an expected way for the user. Further, the rendering can be updated to appear to correctly interact with the environment, including not only changes in brightness, color, and appearance, but also changes in shape with expression, changes in focus or resolution, and the like. Approaches can also use a feedback control loop such that if a user is detected to be looking at the rendering of the nose in the virtual reality device the rendering of the nose can change since the current rendering is not optimal or expected for the current conditions.
For example in the example left and right images 102, 104 rendered in FIG. 1B there is a left rendering 126 and a right rendering 128 of portions of the user’s nose. As would be expected, the left image 122 includes a rendering 126 showing the left side of the user’s nose in the lower right hand corner of the image, illuminated due to the location of the virtual sun being to the left of the nose in the images. Correspondingly, the right image 124 includes a rendering 128 of the right side of the user’s nose that is somewhat shadowed, due to the location of the virtual sun. These renderings can be determined to be appropriate for the user based on information known about the user’s nose, the user’s current expression, and/or other such information. As the virtual vehicle goes along the virtual road, and as the user looks around in the virtual world, the relative location of the sun with respect to the nose will change, such that the lighting and/or shading should change appropriately. Further, aspects such as the resolution and/or focus of the renderings may change based upon aspects such as the lighting, viewing direction of the user, etc. Various embodiments attempt to determine some or all of these variations and update the rendering of the nose in a way that is expected, or at least relatively normally perceived, by the user.
FIG. 2 illustrates an example virtual reality device 200 that can be utilized in accordance with various embodiments. Various other types of devices, such as smart glasses, goggles, and other virtual reality displays and devices can be used as well within the scope of the various embodiments. In this example, the device includes a housing 202 made of a material such as plastic with a polymer lip 214 or other such portion intended to contact the user’s face in order to provide for comfort of the user as well as providing a relatively light-tight seal to prevent extraneous light from passing to the user’s eyes while wearing the device. The example device also includes a strap 216 or other such mechanism for securing the device to the user’s head, particularly while the user’s head 204 is in motion. The example device includes a left eye display screen 208 and a right eye display screen, although as mentioned in some embodiments these can be portions of a single display screen or arrays of multiple screens, or holographic displays, among other such options. In some embodiments a single display element will be used with respective convex lenses for each eye and one or more separation elements that limit the field of view of each eye to a designated portion of the display. The device will typically include display circuitry 218, as may include memory, one or more processors and/or graphics processors, display drivers, and other such components known or used for generating a display of content. There can be a single set of circuitry for both displays 206, 208 or at least some of the components can be duplicated for each display such that those components only provide for display of content on one screen or the other. The display screens can be any appropriate type of display, such as an AMOLED or LED display with sufficient refresh rate for virtual reality applications. The device includes one or more motion and/or orientation sensors 210, as may include at least one accelerometer, magnetometer, gyroscope, electronic compass, inertial sensor, and/or other such sensor for providing data about rotation, translation, and/or other movement of the device. The motion and/or orientation data can be used to determine the appropriate point of view (POV) from which to render the current scene of content. The example device also includes at least one communication component 212, such as a wired or wireless component for transmitting data over a protocol such as Bluetooth, Wi-Fi, 4G, and the like. The communication component can enable the device 200 to communicate with a computing device for purposes such as obtaining content for rendering, obtaining additional input, and the like. The example device can include other components as well, such as battery or power components, speakers or headsets, microphones, etc.
The example device 200 can also include one or more cameras 220, 222 or other image capture devices for capturing image data, including data for light reflected in the ambient or infrared spectrums, for example. One or more cameras can be included on an exterior of the device to help with motion tracking and determining environmental conditions. For example, locations of light sources, intensity of surrounding ambient light, objects or persons nearby, or any of various other objects or conditions can be determined that can be incorporated into the virtual reality scene, such as to make the lighting environmentally appropriate or to include things located around the user, among other such options. As mentioned, tracking the motion of objects represented in the captured image data can help with motion tracking as well, as rotation and translation data of surrounding objects can give an indication of the movement of the device itself.
Further, the inclusion of one or more cameras 220, 222 on the inside of the device can help to determine information such as the expression or gaze direction of the user. In this example, the device can include at least one IR emitter 224, such as an IR LED, that is capable of emitting IR radiation inside the device that can be reflected by the user. IR can be selected because it is not visible to the user, and thus will not be a distraction, and also does not pose any health risks to the user. The IR emitter 224 can emit radiation that can be reflected by the user’s face and detected by one or more IR detectors or other image capture elements 220, 222. In some embodiments the captured image data can be analyzed to determine the expression of the user, as may be determinable by variations in the relative locations of facial features of the user represented in the captured image data. In some embodiments, the location of the user’s pupils can be determined (as discussed elsewhere herein), which can enable a determination of the gaze direction of the user. The gaze direction of the user can, in some embodiments, affect how objects near to, or away from, the center of the user’s field of view are rendered.
As mentioned, the nose to be rendered in the virtual reality environment can be selected, generated, or otherwise determined so as to be appropriate for the particular user. FIGS. 3A, 3B, and 3C illustrate example nose renderings 300 that can be utilized in accordance with various embodiments. As illustrated, the noses appropriate for different users can have many different characteristics, including different overall sizes and shapes, as well as different sizes and shapes of specific portions such as the bridge, tip, and nostrils. The size, shape, and location of the nostril bump can be important as to how the nose is perceived in various circumstances. Other factors can vary by user as well, such as reflectivity (based on oil levels of the skin), texture, etc. In some embodiments a user can be presented with various options and tasked with selecting a virtual nose that most closely matches the user’s nose. This can include, for example, being presented with a set of nose shapes such as illustrated in FIGS. 3A-3C, whereby the user can select one of the noses as being most appropriate. In some embodiments the user may have the ability to modify the shape of the selected nose, such as to adjust a size or shape of a portion of the nose, which can affect the underlying nose model used for rendering. The user can also have the option of selecting a color for the nose, such as by selecting one of a set of colors or using a slider bar to select an appropriate color from a palette, among other such options. The nose shape and size data can be stored as nose model data and the color stored as texture data (or other such data) for the user for use in rendering virtual content with a device such as that described with respect to FIG. 2.
In other embodiments, the nose data for the user can be determined using image data captured for the user. For example, a user can capture an image of the user’s face, often referred to as a “selfie,” that includes a view of at least a significant portion of the user’s face. A video stream, burst of images, or set of video frames can also be captured in accordance with various embodiments. Stereoscopic image data or image data captured from different viewpoints can give more accurate data as to shape, or at least data that is accurate in three dimensions. A facial recognition algorithm or process can be used to analyze the image (or image data) to determine the relative locations of various facial features of the user in the image. An example of one such set of feature points is displayed in the situation 400 of FIG. 4A. In this example, specific facial feature points 402 are determined. The relative spacing of these points 402 helps to determine information such as facial structure, expressions, and the like. As illustrated, a set of these feature points will also correspond to the user’s nose. Using these points, the general size (at least with respect to the user’s head or face) and shape of the user’s nose can be determined. These points and/or the determined size and shape can be used to generate the nose model (i.e., a 3D wire model for object rendering) for the user. In some embodiments the symmetry of the nose can be taken into account so as to only have to store data for one side of the user’s nose, as may include data for the uttermost right, left, bottom, and top of one side of the nose, along with a center point or other such data. In other embodiments data for a full model can be stored to account for any asymmetries in the user’s nose. Since the points identify the location of the nose in the image, the portion of the image corresponding to the nose can also be analyzed to determine the appearance of the nose, as may include factors such as color, skin texture, and reflectivity. This data can be stored as texture data that can be mapped onto the nose model during the rendering process. In this way, the size, shape, and appearance of the user’s nose can be determined with minimal effort on the part of the user. The image can be captured using the VR device, a separate computing device, or a digital camera, or an existing image can be uploaded or otherwise provided for analysis.
As mentioned, the shape of the user’s nose can vary with expression. For example, the situation 420 of FIG. 4B illustrates the relative location of feature points 422 determined for a user from a second image (or video frame) where the user is smiling or laughing. It can be seen that, with respect to FIG. 4A, the relative positions of many of the facial feature points have changed. It can also be seen that the relative positions of the points corresponding to the user’s nose have changed as well. Thus, it can be desirable in at least some embodiments to capture multiple images, video frames, or a stream of video data wherein the user exhibits multiple expressions. The user can be directed to make certain expressions at certain times, which can then be captured and analyzed with the relative positions associated with the specified expression. In other embodiments the user can be instructed to make various expressions, and the facial analysis algorithm can determine the appropriate expression based upon trained classifiers or other such approaches. A camera on the inside of the virtual reality device (or other such location) can then monitor the relative positions of visible facial features to attempt to determine the user’s current expression, which can then cause the nose rendered for the VR environment to use a nose model relevant for that expression. In some embodiments a library of nose models can be maintained for a user, while in other embodiments a single model can be stored that includes information for various facial expressions, among other such options. For example, in some embodiments a basic nose model can be determined and then the model can be modified by expression using a set of standard expression modifiers or animations applicable to most users. In many instances the changes in nose shape may be subtle such that a general animation or modeling approach may be sufficient for at least some implementations.
FIG. 5 illustrates an overview of an example process 500 for determining nose data for a user that can be used in accordance with various embodiments. It should be understood that there can be fewer, additional, or alternative steps performed in similar or alternative orders, or in parallel, within the scope of the various embodiments unless otherwise stated. In this example, image data including a representation of a user’s face is obtained 502. As mentioned, this can include a user providing an image showing the user’