Varjo Foveated Display Part 2 – Region Sizes

Introduction

As discussed in Part 1, the basic concept of foveated display in theory should work to provide high angular resolution with a wide FOV. There is no single display technology today for near-to-eye displays. Microdisplays (LCOS, DLP, and OLED) support high angular resolution but not wide FOV and larger flat panel displays (OLED and LCD) support wide FOV but with low angular resolution.

The image above left includes crops from the picture on Varjo’s web site call “VR Scene Detail” (toward the end of this article is the whole annotated image).  Varjo included both the foveated and un-foveated image from the center of the display. The top rectangle in red it taken from the top edge of the picture where we can just see the transition starting from the foveated image to what Varjo calls the “context” or lower resolution image. Blending is used to avoid an abrupt transition that the eye might notice.

The topic foveated gathered addition interest with Apple’s acquisition of the eye tracking technology company SMI which provided the eye tracking technology for Nvidia’s foveated rendering HMD study (see below). It is not clear at this time why Apple bought SMI, it could be for foveated rendering (f-rendering) and/or foveated display (f-display).

Static Visual Acuity

The common human visual acuity charts (right) give some feel for the why foveation (f-rendering and/or f-display) works. But these graphs are for static images of high contrast black and white line pairs. While we commonly talk about a person normally seeing down to 1 arcminute per pixel (300 dpi at about 10 inches) being good, but people can detect down to about 1/2 arcminute and if you have a long single high contrast line down to about 1/4th of an arcminute. The point here is to understand that these graphs are a one-dimensional slice of a multi-dimensional issue.

For reference, Varjo’s high resolution display has slightly less than 1-arminute/pixel and their context display in their prototype has about 4.7-arcminutes/pixel. More importantly, their high resolution display covers about 20 degrees horizontally and 15 degrees vertically and this is within the range where people could see errors if they are high in contrast based on the visual acuity graphs.

Varjo will be blending to reduce the contrast difference and thus make the transition less noticeable. But on the negative side, with any movement of the eyes, the image on the foveated display will change and the visual system tends to amplify any movement/change.

Foveated Rendering Studies

Frendering, varies the detail/resolution/quality/processing based on where the eyes are looking. This is seen as key in not only reducing the computing requirement but also saving power consumption. F-rendering has been proven to work with many human studies including those done as part of Microsoft’s 2012  and Nvidia’s 2016 papers. F-rendering becomes ever more important as resolution increases.

F-rendering uses a single high resolution display and change the level of rendering detail. It then uses blending between various detail levels to avoid abrupt changes that the eye detect. As the Microsoft and Nvida papers point out, the eye is particularly sensitive to changes/movement.

In the case of the often cited Microsoft 2012, they used 3 levels of detail with two “blend masks” between them as illustrated in their paper (see right). This gave them a very gradual and wide transition, but 3 resolution levels with wide bands of transition are “luxuries” that Varjo can’t have. Varjo only has two possible levels of detail, and as will be shown, they can only afford a narrow transition/bends region. Microsoft 2012 study used only 1920×1080 monitor with a lower resolution central region than Varjo (about half the resolution) and then 3 blending regions that are so broad that that they would be totally impractical for f-display.

Nvidia’s 2016 study (which cites Microsoft 2012) simplified to two levels of detail, fovea and periphery, with a sampling factor of 1 and 4 with a simpler linear blending between the two detail levels. Unfortunately, most of Nvidia’s study was done with a very low angular resolution Oculus headset display with about a 4.7 arcminutes/pixel with a little over 1,000 by 1,000 pixels per eye, the same display as Varjo uses for their low resolution part of the image. Most of the graphs and discussion in the paper was with respect to this low angular resolution headset.

Nvidia 2016 also did some study of a 27″ (diagonal) 2560×1440 monitor with the user 81cm way resulting in an angular resolution of about 1-arcminute and horizontal FOV of 40 degrees which would be more applicable to Varjo’s case. Unfortunately, As the paper states on their user study, “We only evaluate the HMD setup, since the primary goal of our desktop study in Section 3.2 was to confirm our hypothesis for a higher density display.” They only clue they give for the higher resolution system is that, “We set the central foveal radius for this setup to 7.5°.” There was no discussion I could find for how they set the size of the blend region; so it is only a data point.

Comment/Request: I looked around for a study that would be more applicable to Varjo’s case. I was expecting to find a foveated rendering study using say a 4K (3840×2160) television which would support 1 arcminute for 64 by 36 degrees but I did not find it. If you know of such a study let me know.

Foveated Rending is Much Easier Than Foveated Display

Even if we had a f-rendering study of an ~1-arcminute peak resolution system, it would still only give us some insight into the f-display issues. F-rendering, while conceptually similar and likely to to be required to support a f-display (f-display), is significantly simpler.

With f-rendering, everything is mathematical beyond the detection of the eye movement. The size of the high resolution and lower resolution(s) and the blend region(s) can be of arbitrary size to reduce detection and even be dynamic based on contend. The alignment between resolutions is perfectly registered. The color and contrast between resolutions is identical. The resolution of rendering of the high resolution area does not have to scaled/re-sampled to match the background.

Things are much tougher for f-display as there are two physically different displays and the high resolution display has to be optically aligned/moved based on the movement of the eye. The alignment of the display resolution(s) limited by the optics ability to move the apparent location of the high resolution part of the image. There is likely to be some vibration/movement even when aligned. The potential size of the high resolution display as well as the size of the transition region is limited by the size/cost of the microdisplay used. There can be only a single transition. The brightness, color, and contrast will be different between the two physically different displays (even if both are say OLED, the brightness and colors will not be exactly the same). Additionally, the high resolution display’s image will have to be remapped after any optical distortion to match the context/peripheral image; this will both reduce the effective resolution and will introduce movement into the highest resolvable (by the eye) part of the FOV as the foveated display tracks the eye on what otherwise should be say a stationary image.

When asked, Varjo has said that they more capable systems in the lab than the fixed f-display prototype they are showing. But they stopped short of saying whether they have a full up running system and have provide no results of any human studies.

The bottom line here, is that there are many more potential issues with f-display that could prove to be very hard if not practically impossible to solve. A major problem being getting the high res. image to optically move and stop without the eye noticing it. It is impossible to fully understand how will it will work without a full-blown working system and a study with humans and a wide variety of content and user conditions including the user moving their head and reaction of the display and optics.

Varjo’s Current Demo

Varjo is currently demoing a proof of concept system with the foveated/high-resolution image fix and not tracking the center of vision. The diagram below shows the 100 by 100 degree FOV of the current Varjo demonstration system. For the moment at least, let’s assume their next step will be to have a version of this where the center/foveated image moves.

Shown in the figure above is roughly the size of the foveated display region (green rectangle) which covers about 27.4 by 15.4 degrees. The dashed red rectangle show the area covered by the pictures provided by Varjo which does not even fully cover the foveated area (in the pictures they just show the start of the  transition/blending from high to low resolution).

Also shown is a dashed blue circle with the  7.5 degree “central fovial radius” (15 degree diameter) circle of the Nvidia 2016 high angular resolution system. It is interesting that it is pretty close to angle covered vertically by the Varjo display.

Will It Be Better Than A Non-Foveated Display (Assuming Very Good Eye Tracking)?

Varjo’s Foveated display should appear to the human eye as having much higher resolution than an non-foveated display of with the same resolution as Varjo’s context/periphery display. It is certainly going to work well when totally stationary (such as Varjo’s demo system).

My major concern comes (and something that can’t be tested without a full blown system) when everything moves. The evidence above suggests that there may be visible moving noise at the boundaries of the foveated and context image.

Some of the factors that could affect the results:

  1. Size of the foveated/central image. Making this bigger would move the transition further out. This could be done optically or with a bigger device. Doing it optically could be expensive/difficult and using a larger device could be very expensive.
  2. The size of the transition/blur between the high and low resolution regions. It might be worth losing some of the higher resolution to cause a smoother transition. From what I can tell, Varjo a small transition/blend region compared to the f-rendering systems.
  3. The accuracy of the tracking and placement of the foveated image. In particular how accurately they can optically move the image. I wonder how well this will work in practice and will it have problems with head movement causing vibration.
  4. How fast they can move the foveated image and have it be totally still while displaying.
A Few Comments About Re-sampling of the Foveated Image

One should also note that the moving foveated image will by necessity have to be mapped onto the stationary low resolution image. Assuming the rendering pipeline first generates a rectangular coordinated image and then re-samples it to adjust for the placement and optical distortion of the foveated image, the net effective resolution will be about half that of the “native” display due to the re-sampling.

In theory, this re-sampling loss could be avoided/reduce by computing the high resolution image with the foveated image already remapped, but with “conventional” pipelines this would add a lot of complexity. But this type of display would likely in the long run be used in combination with foveated rendering where this may not be adding too much more to the pipeline (just something to deal with the distortion).

Annotated Varjo Image

First, I  want to complement Varjo for putting actual through the optics high resoluion images on their website (note, click on their “Full size JPG version“). By Varjo’s own admission, these pictures were taken crudely with a consumer camera so the image quality is worse than you would see looking into the optics directly. In particular there are chroma aberrations that are clearly visible in the full size image that are likely caused by the camera and how it was use and not necessarily a problem with Varjo’s optics. If you click on the image below, it will bring up the full size image (over 4,000 by 4,000 pixels and about 4.5 megabytes) in a new tab.

If you look at the green rectangle, it corresponds to size of the foveated image in the green rectangle the prior diagram showing the whole 100 by 100 degree FOV.

You should be able to clearly see the transition/blending starting at the top and bottom of the foveated image (see also right). The end of the blending is cutoff in the picture.

The angles give in the figure were calculated based on the known pixel size of the Oculus CV1 display (their pixels are clearly visible in the non-foveated picture). For the “foveated display” (green rectangle) I used Varjo’s statement that it was at least 70 pixels/degree (but I suspect not much more than that either).

Next Time On Foveated Displays (Part 3)

Next time on this topic, I plan on discussion how f-displays may or may not compete in the future with higher resolution single displays.

6 comments

  1. Mitchell Charity says:

    I’d buy the headset now, as-is. Just the HMD, with three HDMI cables, and no code.

    Doing software development in VR requires the angular resolution to read text. Artifacts which don’t compromise that, are strictly secondary.

    I already run the Vive on a custom stack. With no lens correction, so no sampling (modulo the Vive’s PenTile pixels). Using mixed resolution, so I can run it on integrated graphics. With cheap lenses, text needs to be read in the center, so the surround hardly matters. And the high-res region isn’t just clearly visible, with zero blending, but given vergence, doesn’t even fully overlap between eyes. And for the purpose, that’s all fine.

    So I’d get the HMD. Attach a Vive tracking puck. And cameras (wide and narrow fov) for monocular pass-through AR. Put a full screen three.js window on each of the three displays. And I’d be live in few days. And a show-stopping bottleneck would be unstuck.

    If the high-res region was 30 degrees high as well, I’d be ecstatic. Even with burns from the demo’s heat sink. But eye tracking will likely be Windows only, and I use linux, so there’d be little point in waiting for it.

    As you observe in 31 Jan “CES 2017 AR, What Problem Are They Trying To Solve?”, different markets have very different needs. Varjo has already achieved minimum viable product for mine. If I had a choice, given stationary, I’d not wait for motion. Given motion, I’d not wait for seamless motion.

    • KarlG says:

      Interesting comments. On the one hand without the eye tracking and movement of the foviated display, it becomes pretty much a very expensive 1080p HMD with pixels that are perhaps a bit too small for computer work with a nice optical surround effect (a bit like the Philips’ Ambilight on steroids – https://en.wikipedia.org/wiki/Ambilight).

      But then again, there is a lot to be said for the general concept of a stationary high resolution center image on a low resolution background. In normal use, people don’t move their eyes that much to see detail, but rather turn their head in which case head tracking could keep what you are trying to look at in high resolution.

      If say they could get the center pixels to more like 1.5 arcminutes per pixel then they would cover about 50 degrees of the center vision which is about as far as the central vision “roams” when a person is focused on an activity. 1.5 arcminutes/pixel is probably good enough for computer work (and about where a lot of people will adjust to in front of a a typical computer monitor). With the current display sizes with their pixel pitches, it would likely would require some optical/location changes because the “natural”/easy-optical place for the OLED microdisplay would interfere with the view of the larger OLED unless the larger OLED is moved further away making the headset a bit bulkier (or some more complex optics in front of the OLED microdiplay). A bigger microdisplay would get very expensive quickly so it would be better to avoid this but it might be a longer term option.

      I’m much more concerned about the accuracy and secondary effects of trying to track the eye and move the high resolution part of the image. You are moving the part of the image where the eye has the best resolution. Right off the top you loose resolution having to re-sample the foveated area to match the peripheral/stationary image. It could very well be that moving the central image adds a lot of complication/cost for very little gain. I would think that it might be better in the long run with the display trends to put the money and effort into the size/resolution of a stationary central display — certainly I would experiment with this before moving the display.

  2. Sebastian says:

    Wouldn’t the overlapping screen image greatly increase the brightness in the foveated region? I would think this would make the foveated region very obvious and distracting. Or do they decrease the brightness in that zone on the original screen where the overlap is occurring?

    • KarlG says:

      They do have to try and balance the brightness and the larger display will be dark in the center while the foveated display will be dark on the outside. They will have a transition/blend region where the dim one display while they bring up the other. By gradually bending between the two displays, you hope to hide any slight differences in brightness, color, and alignment between the two display devices.

  3. John S says:

    Hey Karl, Lenovo is up to much more than a Meta cell phone knock off.

    Lenovo daystAR

    https://www.engadget.com/2017/07/20/lenovo-ar-headset-ai-smart-speaker-concepts/#/

    • KarlG says:

      Thanks “frankenberry” also alerted me to that design. I am going to take a look at it and see what I can say. Below is a copy of my earlier response to frankenberry:

      “This looks like another “birdbath” design, but unlike the Disney-Lenovo one that uses at Phone, this one appears to be using microdisplays and has “only” a 40 degree FOV. It is currently axiomatic that if the design has a wide (>60 degree) FOV they are using a larger flat panel technology and if it has less than 60 degree FOV it is using a microdisplay. At 40 degrees, I’m guessing it is using around a 720p device per eye, most likely LCOS if they want to keep it a “consumer cost” device.”

      This looks in many ways like a lower cost version of the ODG R8/R9.

Leave a Reply

Your email address will not be published. Required fields are marked *