Microvision (MVIS) Replaces CEO – A Soothsayer’s Retrospective

Ding-Dong

Microvision’s board finally got around to replacing their CEO today (officially, he is going to be “spending more time with his family“). Mind you, they appeared to have been plenty happy with Alexander Tokman after a decade of routinely losing  over $12 million/year (some years over $40M) for over the last decade. Microvision’s board was still giving him $792,892/year in compensation in 2016, which was a step down for him after receiving $950,561 in 2015 with a total of $3,754,437 in the 5 years from 2012 to 2016, (and then there will be 2017 and whatever golden parachute they give him). That is pretty good pay for someone that drove the stock from about $48/share when he took over July 7th 2005 to about $1.50 today. What’s more, executive compensation roughly doubled from 2012 to 2016 in spite of continuing losses.

As I have written before, Microvision for over 24 years appears to have been a company in the business of selling stock rather than product. I guess it does take a certain kind of talent to keep selling stock while the company continually loses money. Laser Beam Scanning has been the ultimate con-technology for the semi-technical literate. On the surface it may look like a good idea until you really understand it. Do the words attributed to P.T. Barnum come to mind?

Soothsayer

This blog, primarily discussing display technology, started talking about Microvision back in 2011 and Microvision quickly responded with a an SEC 8-K filing by calling me a “False Soothsayer.”   This led to my writing my 7 part Soothsayer Series about Microvision. Microvision painted a very misleading (to be generous) picture of the state of the green laser market. I called them out on it and they had the audacity to call me a “False Soothsayer.” It was then proven I was telling the truth.

The difference in this blog and tech sites that just repeat company marketing spiels, is that I try and analyze technology as an engineer, and were possible, measure things objectively. In the case of Microvision, the more I measured and understood the technology, the worse it looked. Microvision “fibbed” (to put it mildly) about power, resolution, cost, size, eye-safety, and just about everything that could be measured.

I  have explained on this blog how fundamentally flawed laser beam scanning is as a display technology. You can search this bog to find out the details (or hire me to help explain it). I tried to point out that even when the green laser cost came down, Laser Beam Scanning (LBS) was still fundamentally flawed in how it works and it will NEVER be a good display technology for a large market (there may be a few very small niche uses).

Though the years, I tested and publish images and data proving that Microvision was making false claims about resolution and power consumption. But no matter, there is no “marketing police” and Microvision was able to keep selling stock, to people that wanted to believe.

Pivoting More That A Ballerina

In addition to misleading people about the (false) virtues of Laser Beam Scanning, they kept “pivoting” both in terms of market and business model. When Mr. Tokman took over, he pivoted Microvision from Head Mounted Displays (HMDs) to Pico Projectors.

Every time Microvision failed with a product concept, business model, or market, they would announce a new “pivot.” Thus keeping Microvision a 24-year-old “start-up” with a “new” future.

When any rational business person could figure out that building a laser scanning pico projector would lose money, so Microvision funded development and paid companies to make lasers and engines for them. When still nobody would build a projector with a Microvision subsidized engine, Microvision built and sold the final product, the ShowWX and ShowWX+. This resulted in Microvision losing over $45M in 2011 and $27M in 2012.  It was a colossally bad business move, but making money was apparently never the point, Microvision was able to sell more stock based on making a product, and when the losses were found out, the stockholders got an 8 to 1 reverse split.

Microvision pivoted from making ShowWx projectors and selling the engines at a loss to then saying they would be just an I.P. company with Sony making engines. But when the Sony deal was not working out they got back in the engine making business. All the while through all these different “business models” they steadily kept losing about $1M/month and sometimes more. But most importantly with each pivot in business model and market thrust they could sell more stock.

Microvision continues to pivot in in the area of markets. First (pre-Tokman) they were focused on head mounted displays, then pico projectors, then when Google Glass was announced, the were back pushgin head mounted displays. They claimed to be good for gesture recognition when Microsoft Kinects was a hot product. More recently LIDAR for self driving cars (funny, there are a lot of LIDAR companies already around that didn’t need Microvision).  All the while, they keep the pie plates spinning in pico projectors, HUDs, and HMDs. They have a 24 year record of failing in one market and business strategy after another.

So What Is Microvision Up To Now?

If things were going as well as Microvision wanted you to believe, they wouldn’t allow Mr. Tokman to be “spending more time with his family.” The new CEO, Perry Mulligan has a background as a VP of  Operations for telecom companies and no background in displays other than sitting on Microvision’s Board for 10 years.

My best guess is that they are trying to pretty up the company for some type of acquisition or perhaps a new “pivot” with a big money raise. Most likely they will be pushing more into LIDAR as it is newer, less well understood, and a hot topic today.

I could also see them splitting off and selling their patent portfolio to a Non-Practicing Entity (NPE, or more commonly known as a “Patent Troll”).

Crass Commercial Message

Among other things I do these days is perform Technical Due Diligence in evaluating companies. Before your company spends $10M, $50M, $100, or (in the case of Magic Leap)  $500M you might you might want to have my experienced eye evaluate the technology.

I also help companies working on new display technologies. I have a very broad perspective, particularly in the areas of microdisplays,  HMDs, automotive HUD, and novel/new displays technologies.

You can connect with me on LinkedIn.

 

Speaking at The Display Summit Oct 4-5th

I just wanted to let my readers know I am going to be speaking at The Projection Summit on Oct 5th in Sterling, Virginia (near Washington Dulles Airport) with a 20 minute presentation titled, “AR and VR Display Technologies for Wide FOV and High Angular Resolution.” Later on Oct. 5th I will be participating in a panel discussion on AR/VR.

Info on the overall conference is available at here

The agenda is given here.

I hope to see a some of you there,

Karl

Collimation, Etendue, Nits (Background for Understanding Brightness)

Introduction

I’m getting ready to write a much requested set of articles on the pro’s and con’s of various types of microdisplays (LCOS, DLP, and OLED in particular with some discussion of other display types). I feel as a prerequisites, I should give some key information on the character of light as it pertains to what people generally refer too as “brightness”.  For some of my readers, this discussion will be very elementary/crude/imprecise, but it is important to have a least a rudimentary understanding of nits, collimation, and etendue to understand some of the key characteristics of the various types of displays.

Light Measures – Lumens versus Nits

The figure on the left from an Autodesk Workshop Page illustrates some key light measurements. Lumens are a measure of the total light emitted. Candelas (Cd) are a measurement of the light emitted over a solid angle. Lux measures the light per square meter that hits a surface. Nits (Cd/m2) measure light at a solid angle. Key for a near eye display, we only care about the light in the direction that makes it to the eye’s pupil.

We could get more nits by cranking up the light source’s brightness but that would mean wasting a lot of light. More efficiently, we could use optics to try and somehow steer a higher percentage of total light (lumens) to the eye. In this example, we could put lenses and reflectors to aim the light to the surface and we could make the surface more reflective and more directional (known as the “gain” of a screen). Very simply put, lumens is a measure of the total light output from a light source, nits is a measure of light in a specific direction.

Etendue

The casual observer might think, just put a lens in front of or a mirror behind and around the light source (like a car’s headlight) and concentrate the light. And yes this will help but only within limits. The absolute limit is set down by a physics law that can’t be violated known as “etendue.”

There are more detailed definitions, but one of the simplest (and for our purpose practical) principles is given in a presentation by Gaggione on collimating LED light stating that “the beam diameter multiplied by the beam angle is a constant value” [for an ideal element]. In simpler terms, if we put an optical element that concentrates/focuses the light, the angles of the light will increase. This has profound implications in terms of collimating light. Another good presentation, but a bit more technical, on etendue and collimation is given by LPI.

Another law of physics is that etendue can only be increased. This means that the light once generated, the light rays can only becomes more random. Every optical element will hurt/increase etendue. Etendue is analogous to the second law of thermodynamics which states that entropy can only increase.

Lambertian Emitters (Typical of LEDs/OLEDs)

LEDs and OLEDs used in displays tend to be “Lambertian Emitters” where the nits are proportional to the cosine of the angle. The figure on the right shows this for a single emitting point on the surface. A real LED/OLED will will not be a single point, but an area so one can imagine a large set of these emitting points spread two dimensionally.

Square Law and Concentrating Light

It is very important to note that the diagram above shows only a side view. The light rays are spreading as sphere and nits are a measure of light per unit area on the surface a sphere. If the linear spread by is reduced by X, the nits will then increase by X-squared.

Since for a near eye display, the only light that “counts” is that which makes it into a person’s eye, there is a big potential gain in brightness that comes not from making the light source brighter but by reducing the angles of the light rays in the form of collimation.

Collimating Light

Collimation is the process of getting light rays to be a parallel to each other as possible (within the laws of etendue). Collimation of light is required for projecting light (as with projector), making for very high luminance (nits) near eye displays, and for getting light work properly with a waveguide (waveguides require highly collimated light to work at all)

Show below is the classic issue with collimating light. A light source with the center point “2” and the two extreme points point at the left “1” and right “3” edge of a Lambertian emitter are shown. There is a lens (in blue) trying to collimate the light that is located at a distances equal to the focal length of the lens. There is also shown a reflector in dashed blue that is often used to capture and redirect the outermost rays that would bypass the lens.

The “B” figure shows happens when 3 light rays (1a, 2a, and 3a) from the 3 points enter the lens at roughly the same place (indicated by the green circle). The lens can only perfectly collimate the center 2a ray to become 2a’ (dashed line) which exits along with all other rays from the point 2 perfectly parallel/collimated. While rays 1a and 3a have their angle reduced (consistent with the laws of etendue, the output area is larger than the source light area) to 1a’ and 3a’ but are not perfectly parallel to ray 2a’ or each other.

If the size of the light source were larger such that 1 and 3 are farther apart, the angles of rays 3a’ and 1a’ would be more severe and less collimated. Or if the light source were smaller, then the light would be more highly collimated. This illustrates how the emitting area can be traded for angular diversity by the laws of etendue.

Illuminating a Microdisplay (DLP or LCOS) Versus Self Emitting Display (OLED)

Very simply put, what we get conceptually by collimating a small light source (such as set of small RGB LEDs) is a bundle of individual highly collimated light sources to illuminate each pixel of a reflective microdisplay like DLP or LCOS. The DLP or LCOS device pixel mirrors then simply reflect light with the same characteristics with some losses and scattering due to imperfections in the mirror.

The big advantage in terms of intensity/nits for reflective mirodisplays is that they separate the illumination process from the light modulation. They can take a very bright and small LEDs and then highly collimate the light to further increase the nits. It is possible to get many tens of thousands of nits illuminating a reflective microdisplay.

An OLED microdisplay is self emitting and the light is Lambertian which as show above is somewhat-diffuse. Typically OLED microdisplay can emit only about 200 to at most 400 nits for long periods of time (some lab prototypes have claimed up to 5,000 nits, but this is unlikely for long periods of time). Going brighter for long periods of time will cause the OLED materials to degenerate/burn-up.

With the OLED you are somewhat stuck with the type of light, Lambertian, as well as the amount of light. The optics have to preserve the image quality of the individual pixels. If you want to say collimate the Lambertian light, it would have to be done on the individual pixels with miniature optics directly on top of the pixel  (say a microlens like array) to have a small spot size (pixel) to collimate. I have heard several people theorize this might be possible but I have not seen it done.

Next Time Optical Flow

Next time I plan to build on these concepts to lay out the “optical flow” for a see-through (AR) microdisplay headset. I will also discuss some of the issues/requirements.

 

Mira Prism and Dreamworld AR – (What Disney Should Have Done?)

That Was Fast – Two “Bug-Eye” Headsets

A few days ago  I published a story on the Disney Lenovo Optics and wondered why they didn’t use a much simpler “bug-eye” combiner optics similar to the Meta-2 (below right) which currently sells in a development kit version for $949. It turns out the very same day Mira announced their Prism Headset which is a totally passive headset with a mount for a phone and bug-eye combiners with a “presale price” of $99 (proposed retail $150). Furthermore in looking into what Mira was doing, I discovered that back on May 9th, 2017, DreamWorld announced their “DreamGlass” headset using bug-eye combiners that also includes tracking electronics which is supposed to cost “under $350” (see the Appendix for a note on a lawsuit between DreamWorld and Meta)

The way both of these work (Mira’s is shown on the left) is that the cell phone produces two small images, one for each eye, that reflects off the two curved semi-mirror combiners that are joined together. The combiners reflect part of the phone’s the light and move the focus of the image out in space (because otherwise human could not focus so close).

Real or Not?: Yes Mira, Not Yet Dreamworld

Mira has definitely built production quality headsets as there are multiple reports of people trying them on and independent pictures of the headset which looks to be near to if not a finished product.

DreamWorld has not demonstrated, at least as of their May 9th announcement, have a fully functional prototype per Upload’s article. What may appear to be “pictures” of the headset are 3-D renderings. Quoting Upload:

“Dreamworld’s inaugural AR headset is being called the Dreamworld Glass. UploadVR recently had the chance to try it out at the company’s offices but we were not allowed to take photos, nor did representatives provide us with photographs of the unit for this story.

The Glass we demoed came in two form factors. The first was a smaller, lighter model that was used primarily to show off the headset’s large field of view and basic head tracking. The second was significantly larger and was outfitted with “over the counter” depth sensors and cameras to achieve basic positional tracking. “

The bottom line here is that Mira’s appear near ready to ship whereas DreamWorld still has a lot of work left to do and at this point is more of a concept than a product.

DreamWorlds “Shot Directly From DreamWorld’s AR Glass” videos were shot through a combiner, but it may or may not be through their production combiner configured with the phone in the same place as the production design.

I believe views shown in the Mira videos are real, but they are, of course, shooting separately the people in the videos wearing the heaset and what the image look’s like through the headset. I will get into one significant problem I found with Mira’s videos/design later (see “Mira Prism’s Mechanical Interference” section below).

DreamWorld Versus Mira Optical Comparison

While both DreamWorld and Mira have similar optical designs, on closer inspection it is clear that there is a very different angle between the cell phone display and the combiners (see left). DreamWorld has the combiner nearly perpendicular to the combiner whereas Mira has the cell phone display nearly parallel. This difference in angle means that there will be more inherent optical distortion in the DreamWorld design whereas the Mira design has the phone more in the way of the person’s vision, particularly if they wear glasses (once again, see “Mira Prism’s Mechanical Interference” section below).

See-Through Trade-offs of AR

Almost all see-though designs waste most light of the display in combining the image with the real world light.  Most designs lose 80% to 95% (sometimes more) of the display’s light. This in turn means you want to start with a display 20 to as much as 100 times (for outdoor use) the brightness of a cell phone. So even an “efficient” optical design has serious brightness problems starting with a cell phone display (sorry this is just a fact). There are some tricks to avoid these losses but not if you are starting with the light from a cell phone’s display (broad spectrum and very diffuse).

One thing I was very critical of last time of the Disney-Lenova headset was that it appeared to be blocking about 75 to 80% of the ambient/real-world light which is equivalent to dark sunglasses. I don’t think any reasonable person would find blocking this much light to be acceptable for something claiming to be “see-through” display.

From several pictures I have of Mira’s prototype, I very roughly calculated that they are about 70% transparent (light to medium dark sunglasses) which means they in turn are throwing away 70+% of the cell phone’s light. On of the images from from Mira’s videos is shown below. I have outlined with a dashed line the approximate active FOV (the picture cuts it off on the bottom) which Mira claims to cover about 60 degees and you can see the edge of the combiner lens (indicated by the arrows).

What is important to notice is that the images are somewhat faded and don’t not “dominate”/block-out the real world. This appears true of all the through optics images in Mira’s videos. The room while not dark is also not overly brightly lit. This is going to be a problem for any AR device using a cell phone as its display. With AR optics you are both going to throw away a lot of the displays light to support seeing through to the real world and you have to compete with the light that is in the real world. You could turn the room lights out and/or look at black walls and tables, but then what is the point of being “see through.”

I also captured a through the optics image from DreamWorld’s DreamGlass video (below). The first thing that jumps out at me is how dark the room looks and that they have a very dark table. So while the images may look more “solid” than in the Mira video, most of this is due to the lighting of the room

Because the DreamWorld background is darker, we can also see some of the optical issues with the design. In particular you should notice the “glow” around the various large objects (indicated by red arrows). There is also a bit of a double image of the word “home” (indicated by the green arrow). I don’t have an equivalent dark scene from Mira so I can’t tell if they have similar issues.

Mira Prism’s Resolution

Mira (only) supports the iPhone 6/6s/7 size display and not the larger “Plus” iPhones which won’t fit. This gives them 1334 by 750 pixels to start with. The horizontal resolution first has to be split in half and then about 20% of the center is used to separate the two images and center the left and right views with respect to the person’s eye (this roughly 20% gap can be seen in Mira’s Video). This nets about (1334/2) X 80% = ~534 pixels horizontally. Vertically they may have slightly higher resolution of about 600 pixels.

Mira claims a FOV of “60 Degrees” and generally when a company does not specify the whether it is horizontal, vertical, or diagonal, they mean diagonal because it is the bigger number. This would suggest that the horizontal FOV is about 40 and the vertical is about 45 degrees. This nets to a rather chunky 4.5 arcminutes/pixel (about the same as Oculus Rift CV1 but with a narrower FOV). The “screen door effect” of seeing the boundaries between pixels is evident in Mira’s videos and should be noticeable when wearing.

I’m not sure that supporting a bigger iPhone, as in the Plus size models would help. This design requires that the left and right images be centered over the which limits where the pixels in the display can be located. Additionally, a larger phone would cause more mechanical interference issues (such as with glasses covered in the next section).

Mira Prism’s Mechanical Interference

A big problem with a simple bug-eye combiner design is the location of the display device. For the best image quality you want the phone right in front of the eye and as parallel as possible to the combiners. You can’t see through the phone so they have to move it above the eye and tilt it from parallel. The more they move the phone up and tilt it, the more it will distort the image.

If you look at upper right (“A”) still frame form Mira’s video below  you will see that the phone his just slightly above the eyes. The bottom of the phone holder is touching the top of the person’s glasses (large arrow in frame A). The video suggest (see frames “B” and “C”) that the person is looking down at something in their hand. But as indicated by the red sight line I have drawn in frames A and B the person would have to be looking largely below the combiner and thus the image would at best be cut-off (and not look like the image in frame C).

In fact, for the person with glasses in the video to see the whole image they would have to be looking up as indicated by the blue sight lines in frames A and B above. The still frame “D” shows how a person would look through the headset when not wearing glasses.

I can’t say whether this would be a problem for all types of glasses and head-shapes, but it is certainly a problem that is demonstrated in the Mira’s own video.

Mira’s design maybe a bit too simple. I don’t see any adjustments other than the head band size. I don’t see any way work around say running into a person’s glasses as happens above.

Cost To Build Mira’s Prism

Mira’s design is very simple. The combiner technology is well known and can be sourced readily. Theoretically, Mira’s Prism should cost about the same to make as a number of so called “HUD” displays that use a cell phone as the display device and a (single) curved combiner that sell for between $20 and $50 (example on right). BTW, these “HUD” are useless in the daylight as a cell phone is just not bright enough. Mira needs to have a bit more complex combiner and hopefully of better quality than some of the so-called “HUDs” so $99 is not totally out of line, but they should be able to make them at a profit for $99.

Conclusions On Simple Bug-Eye Combiner Optics With A Phone

First let me say I have discussed Mira’s Prism more than DreamWord’s DreamGlass above because there is frankly more solid information on the Prism. DreamGlass seems to be more of a concept without tangible information.

The Mira headset is about as simple and inexpensive as one could make an AR see-through headset assuming you can use a person’s smartphone. It does the minimum enabling a person to focus on a phone that is so close and combining with the real world. Compared to say Disney-Lenovo birdbath, it is going to make both the display and real world both more than 2X brighter. As Mira’s videos demonstrate, the images are still going to be ghostly and not very solid unless the room and/or background is pretty dark.

Simplicity has its downsides. The resolution is  low, image is going to be a bit distorted (which can be corrected somewhat by software at the expense of some resolution). The current design appears to mechanical interference problems with wearing glasses. Its not clear if the design can be adapted to accommodate glasses as it would seem to move the whole optical design around and might necessitate a bigger headset and combiners.  Fundamentally a phone is not bright enough to support a good see-through display in even moderately lit environments.

I don’t mean to be overly critical of Mira’s Prism as I think it is an interesting low cost entry product, sort of the “Google Cardboard” of AR (It certainly makes more sense than the Disney_Lenovo headset that was just announced). I would think a lot of people would want to play around with the Mira Prism and find uses for it at the $99 price point. I would expect to see others copying its basic design. Still, the Mira Prism demonstrates many of the issues with making a low cost see-though design.

DreamWorld’s DreamGlass on the surface makes much less sense to me. It should have all the optical limitations of the much less expensive Mira Prism. It it adding at lot of cost on top of a very limited display foundation using a smartphones display.

Appendix

Some History of Bug-Eye Optics

It should be noted that what I refer to as bug-eye combiners optics is an old concept. Per the picture on the left taken from a 2005 Links/L3 paper, the concept goes back to at least 1988 using two CRTs as the displays. This paper includes a very interesting chart plotting the history of Link/L3 headsets (see below). Links legacy goes all the way back to airplane training simulators (famously used in World War II).

A major point of L3/Link’s later designs,  is that they used corrective optics between the display and the combiner to correct for the distortion cause by the off-axis relationship between the display and the combiner.

Meta and DreamWorld Lawsuit

The basic concept of dual large combiners in a headset obviously and old idea (see above), but apparently Meta thinks that DreamWorld may have borrowed without asking a bit too much from the Meta-2. As reported in TechCrunch, “The lawsuit alleges that Zhong [Meta’s former Senior Optical Engineer] “shamelessly leveraged” his time at the company to “misappropriate confidential and trade secret information relating to Meta’s technologies”.

Addendum

Holokit AR

Aryzon AR

There are at least two other contenders for the title of “Google Cardboard of AR.” Namely the Aryzon and Holokit which both separate the job of the combiner from the focusing. Both put a Fresnel lens in between the phone and a flat semitransparent combiner. These designs are one step simpler/cheaper (and use cardboard for the structure) than Mira’s design, but are more bulky with the phone hanging out. An advantage of these designs is that everything is “on-axis” which means lower distortion, but they have chromatic aberrations (color separation) issues with the inexpensive Fresnel lenses that the Mira’s mirror design won’t have. There also be some Fresnel lens artifact issues with these designs.

Disney-Lenovo AR Headset – (Part 1 Optics)

Disney Announced Joint AR Development At D23

Disney at their D23 Fan Convention in Anaheim on July 15th, 2017 announced an Augmented Reality (AR) Headset jointly developed with Lenovo. Below is a crop and brightness enhanced still frame capture from Disney’s “Teaser” video.

Disney/Lenovo also released a video from a interview a the D23 convention which gave further details. As the interview showed (see right), the device is based on using a person’s cell phone as the display (similar to Google cardboard and Samsung’s Gear for VR).

Birdbath Optics

Based on analyzing the two videos plus some knowledge of optical systems, it is possible to figure out what they are doing in terms of the optical system. Below is a diagram of what I see them as doing  in terms of optics (you may want to open this in a separate widow to view this figure in the discussion below).

All the visual evidence indicates that Disney/Lenovo  using a classical “birdbath” optical design (discussed in an article on March 03, 2017). The name “birdbath” comes from the used of a spherical semi-mirror with a beam splitter directing light into the mirror. Birdbath optics are used because they are relatively inexpensive, lightweight, support a wide field of view (FOV), and are “on axis” for minimal distortion and focusing issues.

The key element of the birdbath is the curve mirror which is (usually) the only “power” (focus changing) element. The beauty of mirror optics is that they have essentially zero chromatic aberrations whereas is is difficult/expensive to reduce chromatic aberrations with lens optics.

The big drawbacks of birdbath optics include that they block a lot of light both from the display device and the real world and double images from unwanted reflections of “waste” light. Both these negative effects can be seen in the videos.

There would be no practical way (that I know of) to support a see-though display with a cell phone sized display using refractory (lens) optics such as used with Google Cardboard or the Oculus Rift. The only practical ways I know for supporting AR/see-through display using a cell phone size display all use curved combiner/mirrors..

Major Components

Beam Splitter – The design uses a roughly 50/50 semi-mirror beam splitter which has a coating (typically aluminum alloy although it is often called “silver”) that lets about 50 percent of the light through while acting like a mirror for 50% of the light. Polarizing beam splitters would be problematic with using most phones and are much more expensive. You should note that the beam splitter is arranged to kick the image from the phone toward the curved combiner and away from the person’s eyes; thus light from the display is reflected and then has a transmissive pass.

Combiner – The combiner, a spherical semi-mirror is the key to the optics and multiple things. The combiner appears to also be about 50-50 transmissive-mirror. The curved mirror’s first job is to all the user for focus on the phones display which otherwise would be too close to a person’s eyes to support comfortable focusing. The other job of the combiner is to combine the light/image from the “real world” with the display light; it does this with the semi-mirror allowing light from the image to reflect and light from the real world be be directed toward the eye. The curve mirror only has a signification optical power (focus) effect on the reflected display light and very little distortion of the real world.

Clear Protective Shield

As best I can tell from the two videos, the shield is pretty much clear and serves no function other than to protect the rest of the optics.

Light Baffles Between Display Images

One thing seen in the picture at top are some back stepped light baffles to keep light cross-talk down between the eye.

Light Loss (Follow the Red Path)

A huge downside of the birdbath design is the light loss as illustrated in the diagram by the red arrow path where the thickness of the arrows are roughly to scale with the relative amount of light. To keep things simple, I have assumed no other losses (there are typically 2% to 4% per surface).

Starting with 100% of the light leaving the phone display, about 50% of goes through the beam splitter and is lost while the other 50% is reflected to the combiner. The combiner is also about 50% mirrored (a rough assumption), and thus 25% (0.5 X 0.5) of the display’s light has its focus changed and reflected back toward the beam splitter. About 25% of the light also goes through the combiner and causes the image you can see in the picture on the left. The beam splitter in turn allows 50% of the 25% or only about 12.5% of the light to pass toward the eye. Allowing for some practical losses, less than 10% of the light from the phone makes it to the eye.

Double Images and Contrast Loss (Follow the Green Dash Path)

Another major problem with the birdbath optics is that the lost light will bounce around and cause double images and losses in contrast. If you follow the green path, like the red path about 50% of the light will be reflected and 50% will pass through the beamsplitter (not shown on the green path). Unfortunately, a small percentage of the light that is supposed to pass through will be reflected by the glass/plastic to air interface as it tries to exit the beamsplitter as indicated by the green and red dashed lines (part of the red dashed line is obscured). This dashed path will end up causing a faint/ghost image that is offset by thickness of the beamsplitter tilted at 45 degrees. Depending on coatings, this ghost image could be from 1% to 5% of the brightness of the original image.

The image on the left is a crop from a still frame from the video Disney showed at the D23 conference with red arrows I added pointing to double/ghost images (click here for the uncropped image). The demo Disney gave was on a light background and these double images would be even more noticeable on a dark background. These same type of vertically offset double image could be seen in the Osterhaut Design Group (ODG) R8 and R9 headsets that also use a birdbath optical path (see figure on the right).

A general problem with the birdbath design is that there is so much light that is “rattling around” in an optical wedge formed by the display surface (in this case the phone), beamsplitter, and combiner mirror. Noted in the diagram that about 12.5% of the light returning from the combiner mirror reflected off the beam splitter is heading back toward the phone. This light is eventually going to hit the front glass of the phone and while much of it will be absorbed by the phone, some of it is going to reflect back, hit the beam splitter and eventually make it to the eye.

About 80% of the Real World Light Is Blocked

In several frames in the D23 interview video it was possible to see through the optics and make measurements as to the relative brightness looking through and around the optics. This measurement is only rough and and it helped to take it in several different images. The result was that about a 4.5 to 5X difference in brightness looking through the optics.

Looking back at the blue/center line in the optical diagram, about 50% of the light is blocked by the partial mirror combiner and then 50% of that light is block by the beam splitter for a net of 25%. With other practical losses including the shield, this comes close to the roughly 80% (4/5ths) of the light being block.

Is A Cell Phone Bright Enough?

For Movies in a dark room ANSI/SMPTE 196M spec for movies recommends about about 55 nits in a dark room. A cell phone typically has from 500 to 800 peak nits (see Displaymate’s Shootouts for objective measurements), but after about a 90% optical loss the image  would be down to between about 50 and 80 nits, which is possible just enough if the background/room is dark. could be acceptably bright in a moderately dark room.  But if the room light are on, this will be at best marginal even after allowing for the headset blocking about 75 to 80% of the room light between the combiner and the beam splitter.

With AR you are not just looking at a blank wall. To make something look “solid” non/transparent the display image needs to “dominate” by being at least 2X brighter than anything behind it. It becomes even more questionable that there is enough brightness unless there is not a lot of ambient light (or everything in the background is dark colored or the room lights are very dim).

Note, an LCOS or DLP based see-through AR systems can start with about 10 to 30 times or more the brightness (nits) of a cell phone. They do this so they can work in a variety of light conditions after all the other light losses in a system.

Alternative Optical Solution – Meta-2 “Type”

Using a large display like a cell phone rather than microdisplay severely limits the optical choices with a see-through display. Refractive (lens) optics, for example, would be huge and expensive or Fresnel optics with their optical issues.

Meta-2 “Bug-Eye” Combiners

The most obvious alternative to the birdbad would be to go with dual large combiners such as the Meta-2 approach (see left). When I first saw the Disney-Lenovo design, I even thought it might be using the Meta-2 approach (disproven on closer inspection). With Meta-2, the beam splitter is eliminated and two much larger semi-circular combiners (givening a “bug-eye” look) have a direct path to the display.  Still the bug-eyed combiner is not that much larger than the shield on the Disney-Lenovo system. Immediately, you should notice how the user’s eyes are visible which shows how much more light is getting through..

Because there is no beamsplitter, the Meta-2 design is much more optically efficient. Rough measurements from pictures suggest the Meta-2’s combiners pass 60% and thus reflects about 40%. This means with the same display, it would make the display appear 3 to 4 times brighter while allowing about 2.5X of the real world light through as that of the Disney-Lenovo birdbath design.

I have not tested a Meta-2 nor have read any serious technical evaluation (just the usual “ooh-wow” articles), and I have some concerns with the Meta design. The Meta-2 is “off-axis” in that the display is not perfectly perpendicular to the the combiner. One of the virtues of the birdbath is that is it results in a straightforward on-axis design. With the off-axis design, I wonder how well the focus distance is controlled across the FOV.

Also, the Meta-2 combiners are so far from the eye, that a persons two eyes would have optical cross-talk (there is nothing to keep the one eye from seeing what the other eye is seeing such as the baffels in the Disney-Lenovo design). I don’t know how this would affect things in stereo use, but I would be concerned.

In terms of simple image quality, I would think it would favor the single bug-eye style combiner. There are are no secondary reflections caused by the beamsplitter and both the display and the real world would be significantly brighter. In terms of cost, I see pro’s and con’s relative to each design and overall not a huge difference assuming both designs started with a cell phone displays. In terms of weight, I don’t see much of a difference either.

Conclusions

To begin with, I would not expect even good image quality out of a phone-as-a-display AR headset. Even totally purpose built AR display have their problems. Making a device “see-through” generally makes everything more difficult/expensive.

The optical design has to be compromised right from the start to support both LCD and OLED phones that could have different sizes. Making matters worse is the birdbath design with its huge light losses. Add to this the inherent reflections in the birdbath design and I don’t have high hopes for the image quality.

It seems to me a very heavy “lift” even for the Disney and Star Wars brands. We don’t have any details as to the image tracking and room tracking but I would expect like the optics, it will be done on the cheap. I have no inside knowledge, but it almost looks to me that the solution was designed around supporting the Jedi Light Saber shown in the teaser video (right). They need the see-through aspect so the user can see the light saber. But making the headset see-through is a long way to go to support the saber.

BTW, I’m a big Disney fan from way back (have been to the Disney parks around the world multiple times, attended D23 conventions, eaten at Club 33, was a member of the “Advisory Council” in 1999-2000, own over 100 books on Disney, and the one of the largest 1960’s era Disneyland Schuco monorail collections in the world ). I have an understanding and appreciation of Disney fandom, so this is not a knock on Disney in general.

Varjo Foveated Display Part 2 – Region Sizes

Introduction

As discussed in Part 1, the basic concept of foveated display in theory should work to provide high angular resolution with a wide FOV. There is no single display technology today for near-to-eye displays. Microdisplays (LCOS, DLP, and OLED) support high angular resolution but not wide FOV and larger flat panel displays (OLED and LCD) support wide FOV but with low angular resolution.

The image above left includes crops from the picture on Varjo’s web site call “VR Scene Detail” (toward the end of this article is the whole annotated image).  Varjo included both the foveated and un-foveated image from the center of the display. The top rectangle in red it taken from the top edge of the picture where we can just see the transition starting from the foveated image to what Varjo calls the “context” or lower resolution image. Blending is used to avoid an abrupt transition that the eye might notice.

The topic foveated gathered addition interest with Apple’s acquisition of the eye tracking technology company SMI which provided the eye tracking technology for Nvidia’s foveated rendering HMD study (see below). It is not clear at this time why Apple bought SMI, it could be for foveated rendering (f-rendering) and/or foveated display (f-display).

Static Visual Acuity

The common human visual acuity charts (right) give some feel for the why foveation (f-rendering and/or f-display) works. But these graphs are for static images of high contrast black and white line pairs. While we commonly talk about a person normally seeing down to 1 arcminute per pixel (300 dpi at about 10 inches) being good, but people can detect down to about 1/2 arcminute and if you have a long single high contrast line down to about 1/4th of an arcminute. The point here is to understand that these graphs are a one-dimensional slice of a multi-dimensional issue.

For reference, Varjo’s high resolution display has slightly less than 1-arminute/pixel and their context display in their prototype has about 4.7-arcminutes/pixel. More importantly, their high resolution display covers about 20 degrees horizontally and 15 degrees vertically and this is within the range where people could see errors if they are high in contrast based on the visual acuity graphs.

Varjo will be blending to reduce the contrast difference and thus make the transition less noticeable. But on the negative side, with any movement of the eyes, the image on the foveated display will change and the visual system tends to amplify any movement/change.

Foveated Rendering Studies

Frendering, varies the detail/resolution/quality/processing based on where the eyes are looking. This is seen as key in not only reducing the computing requirement but also saving power consumption. F-rendering has been proven to work with many human studies including those done as part of Microsoft’s 2012  and Nvidia’s 2016 papers. F-rendering becomes ever more important as resolution increases.

F-rendering uses a single high resolution display and change the level of rendering detail. It then uses blending between various detail levels to avoid abrupt changes that the eye detect. As the Microsoft and Nvida papers point out, the eye is particularly sensitive to changes/movement.

In the case of the often cited Microsoft 2012, they used 3 levels of detail with two “blend masks” between them as illustrated in their paper (see right). This gave them a very gradual and wide transition, but 3 resolution levels with wide bands of transition are “luxuries” that Varjo can’t have. Varjo only has two possible levels of detail, and as will be shown, they can only afford a narrow transition/bends region. Microsoft 2012 study used only 1920×1080 monitor with a lower resolution central region than Varjo (about half the resolution) and then 3 blending regions that are so broad that that they would be totally impractical for f-display.

Nvidia’s 2016 study (which cites Microsoft 2012) simplified to two levels of detail, fovea and periphery, with a sampling factor of 1 and 4 with a simpler linear blending between the two detail levels. Unfortunately, most of Nvidia’s study was done with a very low angular resolution Oculus headset display with about a 4.7 arcminutes/pixel with a little over 1,000 by 1,000 pixels per eye, the same display as Varjo uses for their low resolution part of the image. Most of the graphs and discussion in the paper was with respect to this low angular resolution headset.

Nvidia 2016 also did some study of a 27″ (diagonal) 2560×1440 monitor with the user 81cm way resulting in an angular resolution of about 1-arcminute and horizontal FOV of 40 degrees which would be more applicable to Varjo’s case. Unfortunately, As the paper states on their user study, “We only evaluate the HMD setup, since the primary goal of our desktop study in Section 3.2 was to confirm our hypothesis for a higher density display.” They only clue they give for the higher resolution system is that, “We set the central foveal radius for this setup to 7.5°.” There was no discussion I could find for how they set the size of the blend region; so it is only a data point.

Comment/Request: I looked around for a study that would be more applicable to Varjo’s case. I was expecting to find a foveated rendering study using say a 4K (3840×2160) television which would support 1 arcminute for 64 by 36 degrees but I did not find it. If you know of such a study let me know.

Foveated Rending is Much Easier Than Foveated Display

Even if we had a f-rendering study of an ~1-arcminute peak resolution system, it would still only give us some insight into the f-display issues. F-rendering, while conceptually similar and likely to to be required to support a f-display (f-display), is significantly simpler.

With f-rendering, everything is mathematical beyond the detection of the eye movement. The size of the high resolution and lower resolution(s) and the blend region(s) can be of arbitrary size to reduce detection and even be dynamic based on contend. The alignment between resolutions is perfectly registered. The color and contrast between resolutions is identical. The resolution of rendering of the high resolution area does not have to scaled/re-sampled to match the background.

Things are much tougher for f-display as there are two physically different displays and the high resolution display has to be optically aligned/moved based on the movement of the eye. The alignment of the display resolution(s) limited by the optics ability to move the apparent location of the high resolution part of the image. There is likely to be some vibration/movement even when aligned. The potential size of the high resolution display as well as the size of the transition region is limited by the size/cost of the microdisplay used. There can be only a single transition. The brightness, color, and contrast will be different between the two physically different displays (even if both are say OLED, the brightness and colors will not be exactly the same). Additionally, the high resolution display’s image will have to be remapped after any optical distortion to match the context/peripheral image; this will both reduce the effective resolution and will introduce movement into the highest resolvable (by the eye) part of the FOV as the foveated display tracks the eye on what otherwise should be say a stationary image.

When asked, Varjo has said that they more capable systems in the lab than the fixed f-display prototype they are showing. But they stopped short of saying whether they have a full up running system and have provide no results of any human studies.

The bottom line here, is that there are many more potential issues with f-display that could prove to be very hard if not practically impossible to solve. A major problem being getting the high res. image to optically move and stop without the eye noticing it. It is impossible to fully understand how will it will work without a full-blown working system and a study with humans and a wide variety of content and user conditions including the user moving their head and reaction of the display and optics.

Varjo’s Current Demo

Varjo is currently demoing a proof of concept system with the foveated/high-resolution image fix and not tracking the center of vision. The diagram below shows the 100 by 100 degree FOV of the current Varjo demonstration system. For the moment at least, let’s assume their next step will be to have a version of this where the center/foveated image moves.

Shown in the figure above is roughly the size of the foveated display region (green rectangle) which covers about 27.4 by 15.4 degrees. The dashed red rectangle show the area covered by the pictures provided by Varjo which does not even fully cover the foveated area (in the pictures they just show the start of the  transition/blending from high to low resolution).

Also shown is a dashed blue circle with the  7.5 degree “central fovial radius” (15 degree diameter) circle of the Nvidia 2016 high angular resolution system. It is interesting that it is pretty close to angle covered vertically by the Varjo display.

Will It Be Better Than A Non-Foveated Display (Assuming Very Good Eye Tracking)?

Varjo’s Foveated display should appear to the human eye as having much higher resolution than an non-foveated display of with the same resolution as Varjo’s context/periphery display. It is certainly going to work well when totally stationary (such as Varjo’s demo system).

My major concern comes (and something that can’t be tested without a full blown system) when everything moves. The evidence above suggests that there may be visible moving noise at the boundaries of the foveated and context image.

Some of the factors that could affect the results:

  1. Size of the foveated/central image. Making this bigger would move the transition further out. This could be done optically or with a bigger device. Doing it optically could be expensive/difficult and using a larger device could be very expensive.
  2. The size of the transition/blur between the high and low resolution regions. It might be worth losing some of the higher resolution to cause a smoother transition. From what I can tell, Varjo a small transition/blend region compared to the f-rendering systems.
  3. The accuracy of the tracking and placement of the foveated image. In particular how accurately they can optically move the image. I wonder how well this will work in practice and will it have problems with head movement causing vibration.
  4. How fast they can move the foveated image and have it be totally still while displaying.
A Few Comments About Re-sampling of the Foveated Image

One should also note that the moving foveated image will by necessity have to be mapped onto the stationary low resolution image. Assuming the rendering pipeline first generates a rectangular coordinated image and then re-samples it to adjust for the placement and optical distortion of the foveated image, the net effective resolution will be about half that of the “native” display due to the re-sampling.

In theory, this re-sampling loss could be avoided/reduce by computing the high resolution image with the foveated image already remapped, but with “conventional” pipelines this would add a lot of complexity. But this type of display would likely in the long run be used in combination with foveated rendering where this may not be adding too much more to the pipeline (just something to deal with the distortion).

Annotated Varjo Image

First, I  want to complement Varjo for putting actual through the optics high resoluion images on their website (note, click on their “Full size JPG version“). By Varjo’s own admission, these pictures were taken crudely with a consumer camera so the image quality is worse than you would see looking into the optics directly. In particular there are chroma aberrations that are clearly visible in the full size image that are likely caused by the camera and how it was use and not necessarily a problem with Varjo’s optics. If you click on the image below, it will bring up the full size image (over 4,000 by 4,000 pixels and about 4.5 megabytes) in a new tab.

If you look at the green rectangle, it corresponds to size of the foveated image in the green rectangle the prior diagram showing the whole 100 by 100 degree FOV.

You should be able to clearly see the transition/blending starting at the top and bottom of the foveated image (see also right). The end of the blending is cutoff in the picture.

The angles give in the figure were calculated based on the known pixel size of the Oculus CV1 display (their pixels are clearly visible in the non-foveated picture). For the “foveated display” (green rectangle) I used Varjo’s statement that it was at least 70 pixels/degree (but I suspect not much more than that either).

Next Time On Foveated Displays (Part 3)

Next time on this topic, I plan on discussion how f-displays may or may not compete in the future with higher resolution single displays.

Texas Instruments 99/4A and TMS9918 History

A little break from displays today to go back into my deep dark history. For my first 20 years in the industry, I was an I.C. designer and led led the architecture a number of CPUs and graphics devices.

I got a “shout out” of sorts in an IEEE article on the 99/4 computer by Wally Rhines, CEO of Mentor, about my work on the TMS9918 graphics unit which was my first design (started in 1977). Contrary to what the article states, I was NOT the only designer, back then it took 7 “whole engineers” (quite a few less than today) to design a graphics chip and I was the youngest person on the program. I think the 9918 took less than 1 year from raw concept to chip. Wally gave things from his perspective as a high level manager and he may be off in some details.

The 9918 coined the word “Sprites” and was used in the TI 99/4A, Colecovision, and the MSX computer in Japan. It was the first consumer chip to directly interface to DRAMs (I came up with the drive scheme). Pete Macourek and I figured out how to make the make the sprites work and then I did all the Sprite logic and control design.

A “Z80-like” register file compatible superset clone of the 9918 was used in both the Nintendo (Nintendo was a software developer for Coleco) and Sega Game systems among others.

After working on the TMS9918, I led the architecture and early logic design of the TMS9995 (which resulted in my spending 6 months in Bedford England) which is also mentioned in Wally’s article. If the TI Home Computer was not cancelled, I would have had a major part in the design of both the CPU and the Graphics chip on the 99/8 and 99/2.

Back in 1992 in the I was interviewed about the home computer in the days of BBS Bulletin Boards. This was only about 10 years after the events so they were more fresh in my mind. At the time of the 1992 interview, I was working on the first fully programmable media processor (and alluded to it in the interview) that integrated 4 DSP CPUs and a RISC processor on a single device (call the TMS320C80 or MVP). Another “little thing” that came out of that program was the Synchronous DRAM. You see I had designed the DRAM interface on the 9918 and the TMS340 graphics processor family and had worked on the Video DRAM (predecessor of today’s Graphics DRAMs) and was tired of screwing with the analog interface of DRAMs; so in a nutshell, I worked with TI’s memory group to define the first SDRAM (one of the patents can be found here). The 320C80 was the first processor to directly interface with SDRAM because it was co-designed with them.

For anyone interest, I wrote some more about my TI Home Computer and 9918 history on this blog back in the early days of this blog in 2011.

Varjo Foveated Display (Part 1)

Introduction

The startup Varjo recently announced and did a large number of interviews with the technical press about their Foveated Display (FD) Technology. I’m going to break this article into multiple parts, as currently planned, the first part will discuss the concept and the need for and part 2 will discuss how well I think it will work.

How It Is Suppose to Work

Varjo’s basic concept is relatively simple (see figure at left – click on it to pop it out). Varjo optically combines a OLED microdisplay with small pixels to give high angular resolution over a small area (what they call the “foveated display“), with a larger OLED display to give low angular resolution over a large area (what they call the “context display“). By eye tracking (not done in the current prototype), the foveated display is optically moved to be in the center of the person’s vision by tilting the beam splitter. Varjo says they have thought of and are patenting other ways of optically combining and moving the foveated image other than a beam splitter.

The beam splitter is likely just a partially silvered mirror. It could be 50/50 or some other ratio to match the brightness of the large and microdisplay OLED. This type of combining is very old and well understood. They likely will blend/fade-in the image in the rectangular boarder where the two display images meet.

The figure above is based on a sketch by Urho Konttori, CEO of Varjo in a video interview with Robert Scoble combined with pictures of the prototype in Ubergismo (see below), plus answers to some questions I posed to Varjo. It is roughly drawn to scale based on the available information. The only thing I am not sure about is the “microdisplay lens” which was shown but not described in the Scoble interview. This lens(es) may or may not be necessary based on the distance of the microdisplay from the beam combiner and could be used to help make the microdisplay pixels appear smaller or larger. If the optical path though the beam combiner to large OLED (in the prototype from an Oculus headset) would equal the path from to the microdisplay via reflecting off the combiner, then the microdisplay lens would not be necessary. Based on my scale drawing and looking at the prototype photographs it would be close to not needing the lens.

Varjo is likely using either an eMagin OLED microdisplay with a 9.3 micron pixel pitch or a Sony OLED microdisplay with a 8.7 micron pixel pitch. The Oculus headset OLED has ~55.7 micron pixel pitch. It does not look from the configuration like the microdisplay image will be magnified or shrunk significantly relative to the larger OLED. Making this assumption, the microdisplay image is about 55.7/9 = ~6.2 time smaller linearly or effectively ~38 times the pixels per unit area. This ~38 times the area means effectively 38 times the pixels over the large OLED alone.

The good thing about this configuration is that it is very simple and straightforward and is a classically simple way to combine two image, at least that is the way it looks. But the devil is often in the details, particularly in what the prototype is not doing.

Current Varjo Prototype Does Not Track the Eye

The Varjo “prototype” (picture at left from is from Ubergismo) is more of a concept demonstrator in that it does not demonstrate moving the high resolution image with eye tracking. The current unit is based on a modified Oculus headset (obvious from the picture, see red oval I added to the picture). They are using the two Oculus larger OLED displays the context (wide FOV) image and have added an OLED microdisplay per eye for the foveated display. In this prototype, they have a static beam splitter to combine the two images. In the prototype, the location of the high resolution part of the image is fixed/static and requires that the user look straight ahead to get the foveated effect. While eye tracking is well understood, it is not clear how successfully they can make the high resolution inset image track the eye and whether the a human will notice the boundary (I will save the rest of this discussion for part 2).

Foveated Displays Raison D’être

Near eye display resolution is improving at a very slow rate and is unlikely to dramatically improve. People quoting “Moore’s Law” applying to display devices are simply either dishonest or don’t understand the problems. Microdisplays (on I.C.s) are already being limited by the physics of diffraction as their pixels (or color sub-pixels) get withing 5 times the wavelengths of visible light. The cost of making microdisplays bigger to support more pixels drives the cost up dramatically and this not rapidly improving; thus high resolution microdisplays are still and will remain very expensive.

Direct view display technologies while they have become very good at making large high resolution display, they can’t be make small enough for lightweight head-mounted displays with high angular resolution. As I discussed the Gap in Pixel Sizes (and for reference, I have included the chart from that article) which I published before I heard of Varjo, microdisplays enable high angular resolution but small FOV while adapted direct view display support low angular resolution with a wide FOV. I was already planning on explaining why Foveated Displays are the only way in the foreseeable future to support high angular resolution with a wide FOV: So from my perspective, Varjo’s announcement was timely.

Foveated Displays In Theory Should Work

It is well known that the human eye’s resolution falls off considerably from the high resolution fovea/center vision to the peripheral vision (see the typical graph at right). I should caution, that this is for a still image and that the human visual system is not this simple; in particular it has sensitivity to motion that this graph can’t capture.

It has been well proven by many research groups that if you can track the eye and provide variable resolution the eye cannot tell the difference from a high resolution display (a search for “Foveated” will turn up many references and videos). The primary use today is with Foveated Rendering to greatly reduce the computational requirements of VR environment.

Varjo is trying to exploit the same foveated effect to gives effectively very high resolution from two (per eye) much lower resolution displays. In theory, it could work but will in in practice?  In fact, the idea of a “Foveated Display” is not new. Magic Leap discussed it in their patents with a fiber scanning display. Personally, the idea seems to come up a lot in “casual discussions” on the limits of display resolution. The key question becomes: Is Varjo’s approach going to be practical and will it work well?

Obvious Issues With Varjo’s Foveated Display

The main lens (nearest the eye) is designed to bring the large OLED in focus like most of today’s VR headsets. And the first obvious issues is that the lens in a typical VR headset is designed resolve pixels that are more than 6 times smaller. Typical VR headsets lenses are, well . . ., cheap crap with horrible image quality. To some degree, they are deliberately blurring/bad to try and hide the screen door effect of the highly magnified large display. But the Varjo headset would need vastly better, and much more expensive, and likely larger and heavier optics for the foveated display; for example instead of using a simple cheap plastic lens, they may need a multiple element (multiple lenses) and perhaps made of glass.

The next issue is that of the tilting combiner and the way it moves the image. For simple up down movement of the foveated display’s image will follow a simple path up/down path, but if the 45 degree angle mirror tilts side to side the center of the image will follow an elliptical path and rotate making it more difficult to align with the context image.

I would also be very concerned about the focus of the image as the mirror tilts through of the range as the path lengths from the microdisplay to the main optics changes both to the center (which might be fixable by complex movement of the beam splitter) and the corners (which may be much more difficult to solve).

Then there is the general issue of will the user be able to detect the blend point between the foveated and context displays. They have to map the rotated foveated image match the context display which will loose (per Nyquist re-sampling) about 1/2 the resolution of the foveated image. While they will likely try cross-fade between the foveated and context display, I am concerned (to be addressed in more detail in part 2) that the visible/human detectable particularly when things move (the eye is very sensitive to movement).

What About Vergence/Accommodation (VAC)?

The optical configuration of Varjo’s Foveated Display is somewhat similar to that of Oculus’s VAC display. Both leverage a beam splitter, but then how would you do VAC with a Foveated Display?

In my opinion, solving the resolution with wide field of view is a more important/fundamentally necessary problem to solve that VAC at the moment. It is not that VAC is not a real issue, but if you don’t have resolution with wide FOV, then VAC is not really necessary?

At the same time, this points out how far away headsets that “solve all the world’s problems” are from production. If you believe that high resolution with a wide field of view that also address VAC, you may be in for a many decades wait.

Does Varjo Have a Practical Foveated Display Solution?

So the problem with display resolution/FOV growth is real and in theory a foveated display could address this issue. But has Varjo solved it? At this point, I am not convinced, and I will try and work though some numbers and more detail reasoning in part 2.

VAC By Oculus and Microsoft . . . Everywhere and Nowhere

Technically Interesting New Papers At Siggraph 2017

Both Oculus (Facebook) and Microsoft’s are presenting interesting technical research  papers at Siggraph 2017 (July 30th to August 3rd) that deal with Vergence/Accommodation (VAC).  Both have web pages (Oculus link and Microsoft link) with links to relatively easy to follow videos and the papers. But readers should take to heed the words on the Microsoft Page (which I think is applicable to both): “Note that this Microsoft Research publication is not necessarily indicative of any Microsoft product roadmap, but relates to basic research around holographic displays.” I can’t hope to try and get into all the technical details here, but they both have a lot well explained information with figures and for those that are interested, you can still learn a lot from them even if you have to skip over some of the heavy duty math. One other interesting thing is that both Oculus and Microsoft used phase controlled LCOS microdisplays at the heart of their technologies.

Briefly, VAC is the problem with stereoscopic 3-D where the apparent focus of objects does not agree with were they seem to appear with binocular vision. This problem can cause visual discomfort and headaches. This year I have been talking a lot about VAC thanks first to Magic Leap (ML article) and more recently Avegant (Avegant VAC article ) making big deals about and both raising a lot of money (Magic Leap over $1B) as a result. But least you think Magic Leap and Avegant are the only ones, there are dozens of research groups over the last decade working on VAC. Included in that number is Nvidia with a light field approach that they presented a paper in 2013 also at Siggraph (The 2013 Nvidia Paper with links embedded at the bottom of the Abstract to more information and a video)

The Oculus paper has a wealth of background/education information about VAC and figures that help explain the concepts. In many ways it is a great tutorial. They also have a very lengthy set of references that among other things confirm how many different groups have worked on VAC and this is only a partial list. I also recommend papers and videos on VAC by Gordon Wetzstein of Stanford. There is so much activity that I put “Everywhere” in the title.

I particularly liked Oculus’s Fig. 2 which is copied at the top of this article (they have several other very good figures as well as their video). They show the major classes of VAC, from a) do nothing, b) change focus (perhaps based on eye tracking), to c) Multifocal which is what I think Magic Leap and Avegant are doing, to d)&e) Oculus’s “focal surfaces(s), to f) light fields (ex. Nvidia’s 2013 paper). But light fields are in a way a short cut compared to real/true holograms which is what Microsoft’s 2017 paper is addressing (not shown in the table above but discussed in Oculus’s paper and video).

I put the “real” in front of the work “hologram” because confusingly Microsoft, for what appears to be marketing purposes, has chosen to call stereoscopic merged reality objects “holograms” which scientifically they are not. Thanks to Microsoft’s marketing clout and others choosing “if you can’t beat them joint them” in using the term, we now have the problem of what to call real/true holograms as discussed in Microsoft’s 2017 Siggraph paper.

High Level Conceptually:
  • Light Fields are a way to realize many of the effects of holograms such such as VAC and being able to see around objects. But light fields have piece-wise discontinuities. They can only reduce the discontinuities by massively trading off resolution; thus they need massive amounts of processing and native display resolution for a given visual resolution. Most of the processing and display resolution never makes it do the eye as based on where the eye is looking and focused, all but a small part of the generated image information is never seen. The redundancy with light fields tends to grow with a square law (X and Y).
  • Focus planes in effect try and cut down the Light Field square law redundancy problem by having the image redundancy grow linearly. They need multiple planes and then rely on your eye to do the blending between planes. Still the individual planes on “flat” and with a large continuous surface there would be discontinuities at the point where it would have to change planes (imagine a road going off in the distance).
  • Oculus Surfaces are in essence and improvement on focus planes where the surfaces try to conform more to the depth in the image and reduce the discontinuities. One could then argue whether it would be better to have more simple focus planes or fewer Focus Surfaces.
  • Holograms have at least an “n-cube” problem as they conceptually capture/display the image in X, Y, and Z. As the resolution increases the complexity grows extremely fast. Light fields have sometimes been described as “Quantized Holograms” at they put a finite limit on the computational and image content growth.
Oculus’s Focus Surface Approach

In a nutshell, Oculus is using an eMagin OLED to generate the image and a Jasper Display Phase Shift LCOS device to generate a “focus surface”. The focus changes focus continuously-gradually, and not on a per-pixel basis, which is why they call is a “surface”.  The figure on the right (taken from their video) shows the basic concept of a “focus surface” and how the surface roughly tracks the image depth. The paper (and video) go on to  discuss how having more than one surface and how the distance approximation “error” would compare with multi-focus planes (such as Magic Leap and Avegant).

While the hardware diagram above would suggest something that would fit in a headset, it is still at the optical breadboard stage. Even using microdisplays, it is a lot to put on a person’s head. Not to mention the cost of having in effect two displays (the LCOS one controlling the focus surface) plus all the additional optics. Below is a picture of the optical breadboard.

Microsoft (True This Time) Holograms

While Oculus’s hardware looks like something that could fit in a headset someday, Microsoft is much more of a research concept, although they did show a compact AR Prototype “glasses” (shown at right) that had a small subset of the capability of the larger optical breadboard.

Microsoft’s optical breadboard setup could support either Wide FOV or Multi-Focal (VAC) but not both at the same time (see picture below). Like other real time hologram approaches (and used by Oculus in their focal surface approach), Microsoft uses a Phase LCOS device.The Microsoft paper goes into some of the interesting things that can be done with holograms including correcting for aberrations in the optics and/or a person’s vision.

In many ways Holograms ultimate end game in display technology where comparatively everything else in with VAC is a hack/shortcut/simplification to avoid the massive computations and hardware complexities/difficulties of implementing real time holograms.

Resolution/Image Quality – Not So Much

The image quality in the Oculus Surface paper is by their admission very low both in terms of resolution and contrast. As they freely admit, it is a research prototype and not meant to be a product.

Some of these limitations are the nature of making a one-off experiment as the article points out but some of the issues may be more fundamental physics. One thing that concerns me (and pointed out in the article) in the Oculus design is that they have to pass all three colors through the same LC material and the LC’s behavior varies with wavelength. These problems would become more significant as resolution increases. I will give the Oculus paper props for both for is level of information and candor about many of the issues; it really is a very well done paper if you are interested in this subject.

It is harder to get at the resolution and image quality aspects of the the Microsoft Hologram paper as they show little images from different configurations. They can sort of move the problems around with Holograms; they can tune them and even the physical configuration for image quality, pupil size, or depth accommodation, but not all at the same time. Digital/real-time holograms can do some rather amazing things as as the Microsoft paper demonstrates but but they are still inordinately expensive both to compute and display and the image quality is inferior to more conventional methods. Solving for image quality (resolution/contrast), pupil/eyebox size, and VAC/image depth simultaneously makes the problems/cost tend to take off exponentially.

Don’t Expect to See These In Stores for Decades, If Ever

One has to realize that these are research projects going for some kind of bragging rights in showing the technical prowess, which both Oculus and Microsoft do impressively in their own ways. Note the Nvidia Light Field paper was presented at Siggraph 2013 years ago and supporting decent resolution with Light Fields is still a very far off dream. If their companies thought these concepts were even remotely practical and only a few years away, the companies would have kept them deep dark secrets. These are likely seen by their companies as so out in the future that there is no threat to letting their competition see what they are doing.

The Oculus Surface approach is conceptually better on a “per plane” than the “focus planes” VAC approaches, but then you have to ask are more simple planes better overall and/or less expensive? At a practical level I think the Oculus Surface would be more expensive and I would expect the image quality to be considerably worse. At best, the Oculus Surface would be a stop-gap improvement.

Real time high resolution holograms that will compete on image quality would seem to be even further out in time. This is why there are so many companies/researchers looking at short cuts to VAC with things like focus planes.

VAC in Context – Nowhere

VAC has been a known issue for a long time with companies and researchers working in head mounted displays. Magic Leap’s $1+B funding and their talk about VAC made it a cause célèbre in AR/VR and appears to have caused a number of projects to come out from behind closed doors (for V.C. funding or just bragging rights).

Yes, VAC is a real issue/problem particularly/only when 3-D stereoscopic objects appear to be closer than about 2 meters (6 feet) away. It causes not only perceptual problems, but can cause headaches and make people sick. Thus you have companies and researchers looking for solutions.

The problem IMO is that VAC is would be say about 20th (to pick a number) on my list of serious problems facing AR/VR. Much higher on the list are based image quality, ergonomic (weight distribution), power, and computing problems. Every VAC solution comes at some expense in terms of image quality (resolution/contrast/chromatic-abberations/etc).

Fundamentally, if you eye can pick what it focuses on, then there has to be a lot of redundant information presented to the eye that it will discard (not notice) as it focuses on what it does see. This translates into image information that must be displayed (but not seen), processing computations that are thrown away, and electrical power being consumed for image content that is not used.

I’m Conflicted

So I am conflicted. As a technologist, I find the work in VAC and beyond (Holograms address much more than VAC) fascinating. Both the Oculus and Microsoft articles are interesting and can be largely understood by someone without a PhD in the subject.

But in the end I am much more interested in technology that can reach a sizable market and on that score I don’t understand all the fuss about VAC.  I guess we will have to wait and see if Magic Leap changes the world or is another Segway or worse Theranos; you might be able to tell which way I am leaning based on what I understand.

Today, the image quality of headsets is pretty poor when compared to say direct view TVs and Monitors, the angular resolution (particularly of VR) is poor, the ergonomics are for the most part abysmal, and if you are going to wireless, the batteries are both too heavy and have too short a life. Anything that is done to address VAC makes these more basic problems not just a little worse, but much worse.

Near Eye Displays (NEDs): Gaps In Pixel Sizes

I get a lot of questions to the effect of “what is the best technology for a near eye display (NED).” There really is no “best” as every technology has its strengths and weaknesses. I plan to right a few articles on this subject as it is way too big for a single article.

Update 2017-06-09I added the Sony Z5 Premium 4K Cell Phone size LCD to the table. Their “pixel” is about 71% the linear dimension of the Samsung S8 or about half the area but still much larger than any of the microdisplay pixels. But one thing I should add is that most cell phone makers are “cheating” on what they call a pixel. The Sony Z5 Premium’s “pixel” really only has 2/3rds of an R, G, and B per pixel it counts. It also has them in a strange 4 pixel zigzag that causes beat frequency artifacts when displaying full resolution 4K content (GSMARENA’s Close Up Pixtures show of the Z5 Premium fails the show the full resolution in both directions). Note similarly Samsung goes with RGBG type patterns that only have 2/3rd the full pixels in the way they count resolution as well. These “tricks in counting are OK when viewed with the naked eye at beyond 300 “pixels” per inch, but become more problematical/dubious when used with optics to support VR. 

Today I want to start with the issue of pixel size as shown in the table at the top (you may want to pop the table out into a separate window as you follow this article). To give some context, I have also included a few major direct view categories of displays as well. I have grouped the technologies into the colored bands in the table. I have given the pixel pitch (distance between pixel centers) as well as the pixel area (the square of the pixel pitch assuming square pixels. Then to give some context for comparison I have compared the pitch and area relative to a 4.27-micron (µm) pixel pitch which is about the smallest being made in large volume. Finally there are columns showing how big the pixel would be in arcminutes when view from 25cm (250mm =~9.84inches) which is the commonly accepted near focus point. Finally there is a column showing how much the pixel would have to be magnified to equal 1-arcminute at 25cm which gives some idea about the optics required.

In the table, I tried to use smallest available pixel in a given technology that was being produced with the exception of “micro-iLED” for which I could not get solid information (thus the “?”). In the case of LCOS, the smallest field sequential color (FSC) pixel I know of is the 4.27µm one by my old company Syndiant used in their new 1080p device. For the OLED, I used the eMagin 9.3 pixel and for the DLP, their 5.4 micron pico pixel. I used the LCOS/smallest pixel as the baseline to give some relative comparisons.

One thing that jumps out in the table are the fairly large gaps in pixel sizes between the microdisplays versus the other technologies. For example you can fit over 100 4.27µm LCOS pixels in the area of a single Samsung S8 OLED pixel or 170 LCOS pixels in the area of a the pixel used in the Oculus CV1. Or to be more extreme you can fit over 5,500 LCOS pixels in one pixel of a 55-inch TV pixel.

Big Gap In Near Eye Displays (NEDs)

The main point of comparison for today are the microdisplay pixels which range from about 4.27µm to about 9.6µm in pitch to the direct view OLED and LCD displays in 40µm to 60µm that have been adapted with optics to be used in VR headsets (NEDs). Roughly we are looking at one order of magnitude in pixel pitch and two orders of magnitude in area. Perhaps the most direct comparison is the microdisplay OLED pixel at 9.3 microns versus the Samsung S8 at 4.8X linear and a 23x area difference.

So why is there this huge gap? It comes down to making the active matrix array circuitry to drive the technology. Microdisplays are made on semiconductor integrated circuits while direct view displays are made on glass and plastic substrates using comparatively huge and not very good transistor. The table below based on one in an article from 2006 by Mingxia Gu while at Kent State University (it is a little out of date, but gives lists the various transistors used in display devices).

The difference in transistors largely explains the gap. With the microdisplays using transistors made in I.C. fabs whereas direct view displays fabricate their larger and less conductive transistors on top of glass or plastic substrates at much lower temperatures.

Microdisplays

Within the world of I.C.’s, microdisplays used very old/large transistors often using nearly obsolete semiconductor processes. This is both an effort to keep the cost down and the fact that most display technologies need higher voltages than would be supported by smaller transistor sizes.

There are both display physics and optical diffraction reasons which limit making microdisplay pixels much smaller than 4µm. Additionally, as the pixel size gets below about 6 microns, the optical cost of enlarging the pixel to be seen by the human start to escalate so headset optics makers want 6+ micron pixels which are much more expensive to make. To a first order, microdisplay costs in volume are a function of area of the display so smaller pixels means less expensive devices for the same resolution.

The problem for microdisplays is even using old I.C. fabs, the cost per square millimeter is extremely high compared to TFT on glass/plastic, and yields drop as the size of the device grows so doubling the pixel pitch could result in an 8X or more increase in cost. While is sounds good to be using old/depreciated I.C. fabs, it may also mean they may not have the best/newest/highest yielding equipment or worse yet, they close down the facilities as being obsolete.

The net result is that microdisplays are no where near cost competitive with “re-purposed” cell phone technology for VR if you don’t care about size and weight. They are the only way to do a small lightweight headsets and really the only way to do AR/see through displays (save the huge Meta 2 bug-eye bubble).

I hope to pick up this subject more in some future articles (as each display type could be a long article in and of itself. But for now, I want to get onto the VR systems with larger flat panels.

Direct View Displays Adapted for VR

Direct View VR (ex. Oculus, HTC Vive, and Google Cardboard) have leveraged direct view display technologies developed for cell phones. They then put simple optics in front of the display so that people can focus the image when the display is put so near the eye.

The accepted standard for human “near vision” is 25cm/250mm/9.84-inches. This is about as close as a person can focus and is used for comparing effective magnification. With simple (single/few lens) optics you are not so much making the image bigger per say, but rather moving the display closer to the eye and then using the optics to enable the eye to focus. A typical headset uses a roughly 40mm focal length lens and then put the display at the focal lens or less (e.g. 40mm or less) from the lens.  Putting the display at the focal length of the lens makes the image focus at infinity/far away.

Without getting into all the math (which can be found on the web) the result is that with a 40mm focal length nets an angular magnification (relative to viewing at 25cm) of about 6X. So for example looking back at the table at the top, the Oculus pixel (similar in size to the HTC Vive) which would be about 0.77 arcminutes at 25cm end up appearing to cover about 4.7 arcminutes (which are VERY large/chunky pixels) and about a 95 degree FOV (depends on how close the eye gets to the lens — for a great explanation of this subject and other optical issues with the Oculus CV1 and HTC Vive see this Doc-Ok.org article).

Improving VR Resolution  – Series of Roadblocks

For reference, 1 arcminute per pixel is consider near the limit of human vision and most “good resolution” devices try to be under 2 arcminutes per pixel and preferably under 1.5. So let’s say we want to keep the ~95 FOV but improve the angular resolution by 3x linearly to about 1.5 arcminutes, we have several (bad) options:

  1. Get someone to make a pixel that is 3X smaller linearly or 9X smaller in area. But nobody makes a pixel this size that can support about 3,000 pixels on a side. A microdisplay (I.C. based) will cost a fortune (like over $10,000/eye if it could be made at all) and nobody makes transistors that a cheap and compatible with displays that are small enough. But let’s for a second assume someone figures out a cost effective display, then you have the problem that you need optics that can support this resolution and not the cheap low resolution optics with terrible chroma aberrations, god rays, and astigmatism that you can get away with 4.7 arcminute pixels
  2. Use say the Samsung S8 pixel size (a little smaller) and make two 3K by 3K displays (one for each eye). Each display will be about 134mm or about 5.26 inches on a side and the width of the two displays plus the gap between them will end up at about 12 inches wide. So thing in terms of strapping an large iPad Pro in front of your face only, it now has to be about 100mm (~4 inches) in front of the optics (or about 2.5X as far away at on the current headsets). Hopefully you are starting to get the picture, this thing is going to huge and unwieldy and you will probably need shoulder bracing in addition to head straps. Not to mention that the displays will cost a small fortune along with the optics to go with them.
  3. Some combination of 1 and 2 above.
The Future Does Not Follow a Straight Path

I’m trying to outline above the top level issue (there are many more). Even if/when you solve the display cost/resolution problem, lurking behind that is a massive optical problem to sustain that resolution. These are the problems “straight line futurists” just don’t get; they assume everything will just keep improving at the same rate it has in the past not realizing they are starting to bump up against some very non-linear problems.

When I hear about “Moore’s Law” being applied to displays I just roll my eyes and say that they obviously don’t understand Moore’s Laws and the issued behind it (and why it kept slowing down over time). Back in November 2016 Oculus Chief Scientist Michael Abrash made some “bold predictions” that by 2021 we would have 4K (by 4K) per eye and 140 degree FOV with 2 arcminutes per pixel. He upped my example above by 1.33x more pixels and upped the FOV by almost 1.5X which introduces some serious optical challenges.

At times like this I like to point out the Super Sonic Transport or SST of the 1960’s. The SST seemed inevitable for passenger trave, after all in less than 50 years passenger aircraft when from nothing to the jet age; yet today, over 50 years later, passenger aircraft still fly at about the same speed. Oh by the way, in the 1960’s they were predicting that we would be vacationing on the moon by now and having regular fights to Mars (heck, we made it to the moon in less than 10 years). We certainly could have 4K by 4K displays per eye and 140 degree FOV by 2021 in a head mounted display (it could be done today if you don’t care how big it is), but expect it to be more like the cost of flying supersonic and not a consumer product.

It is easy to play arm chair futurist and assume “things will just happened because I want them to happen. The vastly harder part is to figure out how it can happen. I lived through I.C. development in the late 1970’s through the mid 1990’s so I “get” learning curves and rates of progress.

One More Thing – Micro-iLED

I included in the table at the top Micro Inorganic LEDs, also known as just Micro-LEDs (I’m using iLED to make it clear these are not OLEDs). They are getting a lot of attention lately, particularly after Apple bought LuxVue and Oculus bought InfiniLED. These essentially use very small “normal/conventional” LEDs that are mounted (essentially printed) on a substrate. The fundamental issue is that red requires a very different crystal from blue and green (and even they have different levels of impurities). So they have to make individual LEDs and then combine them (or maybe someday grow the dissimilar crystals on the common substrate).

The allure is that iLEDs have some optics properties that are superior to OLEDs. They have tighter color spectrum, more power efficient, can be driven much brighter, less issues with burn in, and in some cases have less diffuse (better collimated) light.

These Micro-iLEDs are being used in two ways, one to make very large displays by companies such as Sony, Samsung, and NanoLumens or supposedly very small displays (LuxVue and InfiniLED). I understand how the big display approach works, there is lots of room for the LED and these displays are very expensive per pixel.

With the small display approach, they seem to have to double issue of being able to cut very small LEDs and effectively “print” the LEDs on a TFT substrate similar to say OLEDs. What I don’t understand is how these are supposed to be smaller than say OLEDs which would seem to be at least as easy to make on similar TFT or similar transistor substrates. They don’t seem to “fit” in near eye, but maybe there is something I am missing at this point in time.