Introduction
Recently, I have been writing a series of articles about accessibility and Apple describing the cognitive dissonance I feel when I’m in a position in which I must praise the Cupertino technology giant. I wrote the first article, “Apple and the Accessible Internet” before I realized this would become a series so it reads like a stand-alone piece. Then, after the release of iOS/8 and the Yosemite version of Macintosh OS X and a bit of encouragement from some readers, I launched a series investigating broader issues regarding Apple and accessibility. You can read the first article, “My Long History Fighting Apple,” about my activism on intellectual and information freedoms and the second item, “Where’s The Competition?,” in which I revisit a common theme for this blog, namely the current and historic lack of competition in accessibility and how this phenomena hurts blind users. These are not great examples of my writing skills, I stand by the opinions presented but please forgive me for the mediocre writing and repetitiveness of the material, I’ve been highly distracted while working with my new guide dog.
I also work on another blog called Skeptability, a pan-disability site that discusses the intersection of disability with feminism, social justice, skepticism, humanism, atheism and related subjects and disability. My general rule separates my articles between the two sites by publishing those that are more technical, more laden with jargon and require more historical knowledge about the access technology field here but, when I write for Skeptability, I write about things of interest to a broader audience. My Skeptability articles tend to be less dark than is this blog and, if you’re interested, you can read an article about my experience at guide dog school called, “My Time At Guide Dog School” there if you’re so inclined. . ,
I have been a blind user of Macintosh for a pretty long time now. I first wrote about this experience on my old BlindConfidential blog in an article called, “Eating An Elephant, Part 2: Apple Rising,” where I prefaced the piece with a discussion of Apple’s deplorable history regarding intellectual property law but continued to talk about how good Macintosh accessibility had become at that point. Back then, I did an experiment in which I didn’t reboot or restart my Macintosh with VoiceOver running until I absolutely had to. My record for testing the reliability of a Macintosh back then was more than 40 days without needing to restart the laptop or the screen reader. Today, a bunch of years later, I rarely go a single day without rebooting my Macbook Air or restarting VoiceOver. Plain and simply, I cannot be as productive with my Macintosh as I once was and I will soon be returning to Windows as my full time system, using Macintosh only for my audio work.
This article explores the very accessibility sloppy Yosemite operating system release as well as discusses problems with [OS X] accessibility that have been with us for years. As far as I can tell, Apple has been made aware of all of these issues and reports of these issues have been received by Apple repeatedly for a lot of years but have been ignored by the Apple engineers. In fact, Apple seems to treat Macintosh accessibility as an orphan stepchild of the much more comprehensive iOS versions of the same.
I’d like to thank my friend and fellow accessibility expert Bryan Smart for the conversations we’ve had in preparation for this piece. Readers should visit his blog where they can listen to his work investigating some of the issues described herein. Bryan is a really smart and very insightful individual on issues regarding accessibility and you, my loyal readers, should check out his stuff too.
The Sloppy Yosemite release
As I mentioned in the second article in this series, it appears as if Apple had hired an accessibility quality assurance specialist out of the notoriously sloppy Google testing department. Yosemite also contains some accessibility improvements, most notably in the browser, iWork and by adding support for MathML in a number of apps. These are all very solid steps forward but, very sadly, they are overshadowed by the newly introduced accessibility problems along with long standing issues that have yet to be remedied. I didn’t do a lot of testing to prepare for this piece and will be writing from personal experience rather than reporting results found from a formal testing procedure. The guys on AppleVis wrote a terrific and much more detailed article called, “Features and Bugs in OS X 10.10 Yosemite,” which you should read if you’re looking for a more detailed report.
AppleMail
I tend to keep my email app running at all times on all of the different OS I use. Email is, for me, an essential tool for business, recreation, personal and professional correspondence and nearly every other activity in which I participate. Years ago, when I wrote “Apple Rising,” AppleMail was both entirely compliant with the Apple accessibility API as well as being very usable for a VoiceOver user.
Over the years, AppleMail has seen its accessibility deteriorate. In the Yosemite version, using “Classic Mode” for the display, when a user opens an email that is part of a thread, they will hear “Embedded
Finder
When Apple released the Mavericks version of OS X in 2013, they introduced some nasty accessibility bugs in Finder, one of the most essential bits of software to all Macintosh users. Specifically, when one tried to navigate through the sidebar to move to a certain folder, focus was lost and instead of reading the items in the part of the interface a VoiceOver thinks he’s interacting with, it read file names and, indeed, moved focus from the sidebar to the table of files. For a VoiceOver user, this is a usability nightmare and, while I think Apple had fixed this in a later version of Mavericks, it appears to have been broken again in Yosemite.
This problem leads me to question Apple’s quality assurance and software engineering methods. If a bug existed and was fixed in an earlier version of the operating system, the fix should have long ago been integrated into the main trunk of the source tree but, apparently, Apple has chosen to ignore accessibility fixes present in Mavericks in the Yosemite release. This also speaks to what must be a fact that Apple either does not test its accessibility features and VoiceOver or chooses to ignore bugs reported either by their internal testing teams or by the army of blind people out there willing to spend their personal time reporting problems to Apple regarding accessibility. I know which bugs I had personally reported during the Yosemite beta cycle and, much to my chagrin, I also saw very few of the many I had reported fixed in the final release.
Other Problems
While my notions about AppleMail and Finder are accurate and things you can test for yourself, they do not even approach a complete look at Yosemite accessibility. As I suggest above, please do read the AppleVis article to get far more details. I’ll suffice it to say that OS X has had problems for a number of releases and, with each new version of OS X, the accessibility deteriorates further.
Yosemite And The Internet
After publishing “Apple And The Accessible Internet,” I received an email from the people who work at the email address, accessibility@apple.com. The author of the email asked me to install the Yosemite beta, to test the improved Internet support and report my findings to them. I typically politely refuse to run pre-release software without being compensated for my time but, in this case, I made an exception and elected to work as a volunteer testing this OS release.
I was pleased when I went to my first web site using Safari, VoiceOver and Yosemite. The first thing I did, with QuickNav turned off, was to start navigating around using cursor keys in a manner similar to how I interact with FireFox using NVDA or Internet Explorer with JAWS. I also enjoyed the relatively new feature that allows a VO user to navigate a web site with single key commands similar to QuickKeys in JAWS and similar features in all Windows screen readers.
When, however, I tried to actually use the new Yosemite version of VoiceOver in Safari, I found a number of problems.
An Interface Out Of Sync With Itself
If you are running OS X Yosemite (10.10),you can try this on this very page. First, make sure QuickNav is turned on, then hit “h” a few times to get to a heading somewhere on the site, it doesn’t matter where. Next, turn QuickNav off (left arrow plus right arrow toggles it) and start navigating with the cursor keys in the new simulated virtual cursor mode. You will discover that the two navigation modes are out of sync with each other. A user would expect that hitting a down arrow after navigating by heading would read the first line after the heading text, in Yosemite, you will find that the cursor navigation, assuming you hadn’t used it earlier, starts from the top of the page no matter where QuickNav had left you. This turns the new cursor navigation feature into a demo of things to come in the future as it is not actually usable in its current state. A lot of VoiceOver for OS X has seemed more like a demo than production code for a long time.
Split Lines
Due to Apple’s philosophical obsession with ensuring that VoiceOver only represents information that appears on the screen (more on this later in this article), when using cursor navigation on an Internet site, it reads the information exactly as it appears visually in Safari. This means that when using cursor navigation or having cursor navigation turned on during a “read all” the user will hear words hyphenated by Safari read with the hyphens included. If the user has sounds turned on for misspelled words, the hyphenation will create misspelled words by its nature and the user will experience the latency problem caused by having sounds inserted sequentially into the audio stream. NVDA does not exhibit this problem and, if I remember correctly, neither does JAWS.
Faithful representation of on screen information is very nice in some cases but, in this one and a number of others, it inserts a layer of inefficiency into the user experience.
Copy And Paste
I spend a lot of my time writing and, like most authors these days, I use the Internet as source material for my work. It is therefore essential that I be able to copy information from web sites and paste it into my text editor for integration either into one of these blog articles or, far more important, into the documents I prepare for my clients. With VoiceOver and Safari, copy and paste is a never ending adventure.
On this site, one can select using the cursor key navigation along with the SHIFT key as one would expect but, on many sites I tried, the same selection, copy and paste do not work at all. VoiceOver does provide a keystroke for selecting text on web pages but it also works very inconsistently. When I’ve reported problems with selecting text on web sites to Apple, they responded with ambiguous answers that tended to say something unspecific like, “something about that site prevents us from selecting text.” I’d accept this as an answer based in web accessibility standards and guidelines if the people at Apple would tell me which piece of WCAG 2.0 or standard HTML was violated but they never include that piece of information in their responses to me. Meanwhile, NVDA handles the same pages perfectly in FireFox and, in my opinion, if one screen reader can do something properly, they all can.
In general, the Yosemite version of VoiceOver and Safari provide a nicer experience on the web than did Mavericks but, as it also contains a whole lot of the problems that were reported by users of earlier versions of OS X, it remains far behind JAWS and NVDA in its actual usability.
Latency and Sounds in VoiceOver
A really long time ago, TV Raman (now at Google accessibility), added the notion of an “earcon” to his emacspeak software. More than ten years ago, JAWS became the first screen reader to include this idea with the advent of its Speech and Sounds Manager. An earcon is a sound a user hears in lieu of speech to augment the audio stream in order to spend less time listening to speech and more time actually getting their work done. Going back to the early versions of VoiceOver on OS X, Apple included the concept of an earcon to deliver information but implemented it in the worst way possible.
While I worked at HJ, Ted henter personally taught me to count syllables in any text that JAWS would speak to its users. Ted demonstrated that every syllable or pause spoken to a user takes up a single unit of said user’s time. We invented the speech and sounds manager in order to help users reduce the number of syllables they need to hear in order to enjoy the same amount of semantic information in less time. As a quick example, one can set JAWS to play a tone instead of saying “link” when it finds one. The important feature of the JAWS implementation, however, is that the sound plays simultaneously with the text being spoken.
As you can hear if you listen to Bryan Smart’s recordings on this matter, the VoiceOver developers made a rather bizarre decision when they implemented the sound feature on OS X. Specifically, instead of playing the sound simultaneously with the spoken text, VoiceOver adds its sounds sequentially to the audio stream. Thus, instead of saving time, each sound played by VO adds more time to that which the user needs to spend hearing the same amount of information. According to Bryan’s work, this delay is never less than 200 milliseconds and can go as long as a half second. One fifth of a second doesn’t sound like much but such interruptions cause a cognitive hiccup that could easily be avoided by playing the sounds at the same time as the text is spoken. Apple’s sound system adds time, thus reducing efficiency while also breaking up the text in a manner that disrupts one’s attention.
This problem and its related efficiency issues have been reported to Apple many times over the years, people have discussed it in blog articles and podcasts but over the years, Apple continues to refuse to remedy this major problem with their interface.
The latency issues aren’t always associated with the sounds being played. If one uses any text augmentations, including having VO change the pitch for links and misspelled words, they are accompanied by a delay of no less than 100 milliseconds, making these features interesting but not entirely usable.
Complex Apps And Efficiency
Apple must be commended for the excellent work it has done regarding accessibility in software like Xcode and Garageband. As far as I can tell, a VoiceOver user now has access to all of the features in both of these very complex user interfaces. For me, an occasional podcaster, having Garageband for recording and mixing available to me has been a lot of fun. I also enjoy using Garageband to create “virtual bands” to jam along with using loops and related features. At the beginning, the VoiceOver interface in Garageband worked well for me but I was a novice then. and, as I grew more proficient with the program, I found many tasks were tremendously cumbersome.
As I’m only passingly familiar with Xcode (I don’t write software for Apple devices), the examples I’ll use in this section will come from Garageband but apply to almost every Apple branded Macintosh application of any complexity, including iWork apps like Pages and Numbers.
Faithful Representation Of On Screen Information
When I worked on JAWS, a frequent complaint we would receive at FS from the field often came from sighted people or from actual JAWS users who needed to work closely with sighted colleagues. The problem came as a result of JAWS speaking information in a manner differently from how it appears on the screen. Sighted trainers became frustrated when the speech didn’t match the visual display and I can remember trying to ask my sighted wife for help at times and both of us getting frustrated by the difference between speech and screen information. The people who designed VoiceOver chose instead to take a radically different approach and ensure that on screen information is accurately represented in what the user hears.
The JAWS philosophy comes from Ted henter’s insistence on not only providing an accessible solution but also making sure that the solution is as efficient to use as possible. I’m sad to say that, as far as I can tell, no screen readers other than JAWS and NVDA even attempt to maximize efficiency anymore. The problem with the JAWS approach, however, is that it comes with a steep learning curve, users to use complex applications efficiently with JAWS must spend a fair amount of time learning different keystrokes specific to the application they need to use and will need to live with aspects of the application remaining inaccessible in most cases. The Apple approach solves the discoverability problem, a novice can poke around the Garageband interface and find everything in a fairly intuitive manner; the Apple approach, at the same time, provides little in terms of efficiency for intermediate to advanced users.
Using Garageband, I often find myself spending more time navigating from control to control than I do actually working on my recordings.
A Lack Of Native Keystrokes
In general, Windows programs tend to have more accelerator keys to handle interacting with features than do those on Macintosh. It would be useful for Macintosh apps to have the same. While I can perform every task and use every feature in Garageband, many require me to issue a pile of keystrokes to both navigate from place to place but also to use a on screen simulator. Indeed, my experience is nearly identical to what a sighted user enjoys but without the efficiencies provided by having vision. Where a sighted user can move quickly with a mouse or trackpad, a blind user needs to step through every item in between and often perform actions with a keyboard that could be made profoundly more easy if a single keystroke was available.
The Interaction Model
In an attempt to make navigation more efficient, the VoiceOver developers invented a user interface system that grouped interface items together in order that the user could either jump past its contents or, if they so choose, to interact with the group and access the information therein. Unfortunately, the grouping seems to be done algorithmically and that this facility doesn’t work terribly well.
Using the Macintosh version of iTunes as an example, a user can observe some areas made more efficient by the interaction model while also finding areas where they need to step through a bunch of controls that are not grouped together in a useful manner. This is true of many other applications as well, the interaction model demoes well but is implemented in such a random manner throughout the Apple branded apps on OS X so as to be of marginal use at best.
The interaction model also inserts a hierarchy on the interface. In a complex app like Garageband or Xcode, a VoiceOver user needs to climb up and down a tree of embedded groups with which they must interact separately. Moving from a place in the interface buried deeply in one set of nested groups to another place buried in a different group requires a ton of keystrokes just to do the navigation which could be obviated with either native accelerator keystrokes or keystrokes added specifically for VoiceOver users.
It appears as if these groups and the interaction model had been presented as an idea, included in VoiceOver and then ignored as the software matured. I do not believe that this interface model is mutually incompatible with efficiency, I just think that it has only been partially implemented and that it needs much more work moving forward.
A Lack Of A Real Scripting Language
AppleScript is available but has so many restrictions that it is nearly useless as a scripting system for VoiceOver. First and fore mostly, it is very difficult to share AppleScript with other users as such requires copying the files individually and adding keystrokes separately on each system. It is also impossible to assign a non-global keystroke to an AppleScript so application specific ones are impossible as well. AppleScript cannot fire on UI events so, continuing with the Garageband examples, one cannot have a sound play only when an on screen audio meter hits a certain level or some other interesting UI event had happened. After many years of criticizing JAWS for having a scripting language but falling further and further behind in the functionality wars, GW Micro finally added a real scripting facility to Window-Eyes, it’s now time that Apple do the same for VoiceOver.
Bryan Smart works for DancingDots, a company that makes Caketalking an impressive set of JAWS scripts that, among other things, provide access to the popular Sonar audio editing software on Windows. Why would people pay a lot of money to get JAWS, a lot of money for the DancingDots scripts and a lot of money for Sonar when they can get Garageband, VoiceOver and a laptop all for the price of a Macintosh? Because they need to use Sonar efficiently and Garageband, while being an excellent choice for a novice, cannot be used efficiently by a VoiceOver user. Complex applications seem to need a scripting language to accommodate users as they grow increasingly proficient with an application.
Syllables, Syllables, Syllables
As I wrote above, Ted Henter taught JAWS developers to count syllables whenever we added text to be spoken by JAWS. After running Yosemite for a few days, I changed my verbosity setting from “High” (the default) to “Medium” but still find that VoiceOver takes too much time to express some very simple ideas.
In AppleMail, for instance, VoiceOver reads “reply was sent” instead of simply “replied” which could save two syllables and the time spent on the whitespace to separate words. When I used CMD+TAB to leave my text editor to use another app and then again to return, VoiceOver says, “space with applications TextEdit, Mail, Safari…” and lists all of the apps I have running, even if I had hit CONTROL to tell VoiceOver to stop speaking. In TextEdit, where I’m writing this piece, if I type a quotation mark, instead of saying “quote” or some other single syllable term, VoiceOver “left double quotation mark” enough syllables to fill a mouthful or more.
I could go on. It seems that VoiceOver speech is overly verbose in far too many places to list. Whether a key label or a text augmentation, it is essential that the user hear as few syllables as possible in order to maximize efficiency.
Forcing A Keyboard Into A Mouse’s Job
Most blind people access general purpose computers using a keyboard and this is how I use my Macintosh, only rarely using the TrackPad. As I mention above, the VoiceOver UI is designed to mimic as closely as possible the on screen information. Quite sadly, a keyboard is not an efficient mouse or trackpad replacement.
The notion of drag and drop makes sense visually, the user “grabs” an object with the mouse or trackpad, drags it across the screen to its destination and then drops it by releasing the button on the pointing device. Using a keyboard to navigate by object until one finds themselves at their target destination is a hunt and peck process at best and far too cumbersome to use at worst. But there are Apple branded apps, including Garageband, that allow the user to interact with some features only with drag and drop, inserting a profound level of inefficiency into a VoiceOver user’s experience. Why not also allow for cut, copy and paste as a alternative to drag and drop? Doing such would provide a UI metaphor that makes sense to a person driving a Macintosh with only a keyboard.
In Garageband, there are custom controls for moving the play head, selecting blocks of audio information, inserting and deleting blocks and so on. For a VoiceOver user to do these things they must jump through weird UI hoops to force a keyboard to act like a mouse. Plain and simply, this can be corrected by either adding native keystrokes to Garageband or by allowing VoiceOver to be customized as extensively as one can do with JAWS or NVDA. In its current state, a blind person can use these features (along with similar ones in other Apple apps) but only with a great deal of superfluous keyboarding involved.
In short, though, using a keyboard to faithfully mimic what sighted users would do with the mouse is a poor idea in practice.
Conclusions
- Apple, largely due to its iOS offerings, remains the leader in out-of-the-box accessibility. It is also true that the accessibility on OS X has both deteriorated from release to release and has had major problems delivering information and permitting interaction in an efficient manner.
- Both iOS/8 and Yosemite contain a lot of “stupid” bugs, defects that should have been discovered by automated testing and remedied with about a minute of effort typing some text into a dialogue box.
- Making VoiceOver on Macintosh into an efficient system will require changing some of its deeply held philosophical positions and I doubt this will ever actually happen.
[14] http://en.wikipedia.org/wiki/Xcode
mehgcap says
I mostly agree with your article, but I did want to point out a few things, not necessarily in any order.
1. Navigation of complex UIs is indeed frustrating. However, using a trackpad to explore and locate items, similar to what one might do on an iOS device, can make this faster. VoiceOver’s hotspot feature may also be useful – you can mark an item you want to return to and quickly move to it. Full disclosure: I don’t currently use this feature, mostly because I’ve gotten so used to the more cumbersome way of moving around that I never thought much about it. I was recently reminded that hotspots exist and can make my life easier, and I plan to start using them more.
2. The speaking of “space with applications Finder, Safari…” is a consequence of using a full-screen application, then switching to or away from it. As far as I can tell, the Mac puts any full-screen app in its own space, then puts all non-full-screen apps into a single space. I know this doesn’t explain why control fails to stop speech, or why switching seems to take so much longer for full-screen apps, but I wanted to explain that this extra verbosity doesn’t happen all the time, only with full-screen apps, and is VoiceOver describing the space you are moving to or from.
3. The sidebar in Finder works as expected if you vo-up or vo-down in it. You can expand or collapse items, and find folders or tags, just as expected. It seems that up and down arrows are captured by the folder/file view, so you can up and down arrow through your files even as you change locations with vo-up or vo-down. While this approach prevents easily collapsing a group (such as Favorites or Tags), it does seem to offer a different type of efficiency, letting you browse the location you select, or select a different location, without having to jump from sidebar to list and back again. Personally, I see this as a feature.
4. I don’t use GarageBand, but have you tried vo-comma and vo-period for drag and drop, respectively? These often fail in Xcode, so they may not work in GarageBand either, but I thought I’d mention it. Copy and paste are, at least in Xcode, sometimes a replacement for drag/drop, as you suggested. It might be worth trying these in GarageBand as well, just to see if they do anything.
5. As you move around a webpage with navigation commands (not arrows), if you then want to use the arrow keys, move VoiceOver left and then right, or right and then left, just to get the focus back in sync. Obviously, this step shouldn’t be necessary at all, but it is possible to sync things up, and the problem only seems to happen with headings. For instance, move by link, and the arrows track you properly. Again, I’m not excusing Apple here.
As I said, I agree with most of what you said in your post. Apple does need to make a lot of changes, and something needs to happen internally to get better testing in place. VoiceOver’s core feature set is complete enough that Apple can stop adding things to it for the next OS X update. Instead, concentrate on the bugs and inefficiencies; get those ironed out, and you’ll have a far better screen reader than would result from adding a couple new features.