Making Reading More Immersive

He ain't got no distractions, can't hear those buzzers and bells.

Prepared by Alison Chaiken and offered under

SuperHappyDevHouse lightning talk slides
Motivation
Inspiration: Audio Files for Language Study
Why Not Orca?
An Alternative Approach Based on totem, pdftotext and text2wave
Installation of sayPDF totem-based solution
Running sayPDF totem-based solution
Notes
Features to be Added
An Alternative Approach from Kyle Rankin
Resources
Disclaimer

Photo credit: Marco Grob for the New York Times

Video of a presentation by mind-gamer Joshua Foer: "Step Outside Your Comfort Zone and Study Yourself Failing. H/T Jay Liew for the link.

Motivation

If you're like me, you read a lot of documentation and other detailed technical material. Sometimes the documentation is poorly written or confusing and I have trouble concentrating. The hardest material to concentrate on is patent applications. I can barely bring myself to read more than a half page of claims at a time even when I have a deadline.

As a side note, I've been thinking a lot about assistive technologies due to my professional work on "in-vehicle infotainment" systems for drivers. A moment's consideration shows that a driver is essentially a blind computer user, so a lot of assistive technologies may be adaptable for in-car use.

Inspiration: Audio Files for Language Study

Recently I've been studying Finnish at night with the assistance of audio and video files whose intention is to improve pronunciation. The videos did not include written text to compare with the audio, and I found them hard to understand and useless. The audio files featured native speakers reading sentences that corresponded to print in a textbook. Reading the book while listening to the audio and repeating after the speakers not only improved my pronunciation, it also immensely helped my retention of vocabulary. I could remember words that I'd heard on the recording much better than those I had not.

It occurred to me that listening to audio derived from documentation while reading the documentation might help me focus and retain material better and lately I've been trying it out. Somehow the brain hack of listening to a stream that I must intentionally pause immensely reduces my tendency for distraction. Put on headphones, open your PDF file, crank up the volume and give it a try!

Why Not Orca?

Orca logo

Gnome comes with the Orca assistive technology package that will read just about any window out loud. Orca does work with PDFs, but really poorly in my experience. For every slide, the audio includes the metadata, and I had to keep moving the mouse around to get the page content to play. There may well be a way to configure Orca to speak PDFs better, but I haven't found one. Please email me if you know.

Orca also tends to cause the Gnome desktop to hang on my fully patched Fedora 14 system as of today (March 3, 2011). My solution, while clunky and minimally featureful, may SEGV on your system for all I know, but it will not hang your desktop and necessitate a reboot! If you use Orca, you may want to study up on the Magic Sysrq keys. ALT-SYSRQ-R-E-I-S-U-B is your friend. In Fedora, set "kernel.sysrq = 1" in /etc/sysctl.conf.

An Alternative Approach Based on totem, pdftotext and text2wave

SCALE 9x logo

Inspired by a talk at SCALE 9x by Dallas Legan and by a subsequent conversation with my buddy Sarah Newman, I've hacked up a solution that works for me. The basic idea is to use the very basic totem media player that comes with Gnome to play the audio stream so that it can be backed up and scrolled ahead. As far as I can make out, orca doesn't allow this kind of fine-grained control.

The gist of the implementation is to pipe output from pdftotext to text2wave to totem. While the gstreamer player gst-launch that underlies totem will play from stdin, I've had trouble getting totem to do so reproducibly. For shorter PDF files, totem streams from stdin, but for longer files it complains that it can't determine the type.

Works:

cat /tmp/dbus.wav | gst-launch-0.10 playbin uri=fd://0

Not so much:

[alison@bonnet UPnP]$ cat /tmp/dbus.wav | totem file://fd://0
** Message: Error: Could not open resource for reading.
gstgiosrc.c(324): gst_gio_src_get_stream (): /GstPlayBin2:play/GstURIDecodeBin:uridecodebin0/GstGioSrc:source:
Could not open location file://fd://0 for reading: Operation not supported
[alison@bonnet UPnP]$ cat /tmp/dbus.wav | totem uri:fd://0
** Message: Error: No URI handler implemented for "uri".
[alison@bonnet UPnP]$ cat /tmp/dbus.wav | totem uri=fd://0
gstfilesrc.c(1034): gst_file_src_start (): /GstPlayBin2:play/GstURIDecodeBin:uridecodebin0/GstFileSrc:source:
No such file "/home/alison/UPnP/uri=fd:/0"

Installation of sayPDF totem-based solution

Get the files.
tar xjf sayPDF.tar.bz2
If you have already installed festival and its voices, you're all set. Otherwise on Fedora 14 or similar, try "./CommandLineAudioInstallFedora" to get the required packages. Note that Fedora by default installs the voices in the wrong directory.
Move the festivalrc file to .festivalrc in your home directory. You can change the playback speed or try different voices by editing ~/.festivalrc.
If you like, move the bash and perl executables to a bin directory on your $PATH.

Running sayPDF totem-based solution

Play file dbus.pdf from pages 2 through 5 using totem:

sayPDF_totem -f 2 -l 5 -i /tmp/dbus.pdf

Play the same file using gstreamer directly:

sayPDF_gst-launch -f 2 -l 5 -i /tmp/dbus.pdf

Note that text2wave is a CPU-intensive program. If you want to listen to a PDF on (for example) a netbook, you may want to ssh into a beefier machine and process the file there. Running the program with the -v flag will tell you what the Wav filename is so that you can cp it out of /tmp.

Miscellaneous Notes

espeak, a component of speech-dispatcher, offers the ability to change the reading speed at invocation-time in a script, but its voices are inferior to festival. espeak has superior support for foreign languages and a cool phoneme-output mode, but is not particularly easy to use or stable.
Consider setting an audio play-pause keyboard shortcut so that you don't have to change back and forth to a totem window from your PDF reader if you want to linger over a particular page. In Gnome 2.x, go to System . . . Preferences . . . Keyboard Shortcuts and set "Play (or play/pause)" to a convenient binding. I chose WindowsKey-space.
A fascinating aspect of using the program is that I can see that I need to listen to documentation of different complexity at different speeds. I've been using 60% of the default speed, while most Festival users apparently speed it up. Perhaps long-time listeners are simply more skilled and I will get faster over time. I'm not sure how my listening speed will compare to my normal reading speed, but I don't really care given that the goal is retention, not rapid completion of a meaningless exercise.

Features to be Added

A highlighting cursor moving through the document view as the audio progresses. Maybe could be a xournal hack? (H/T andreas_ on Freenode)
Changing speed of audio stream on the fly (Gnome already offers many methods of changing volume)
Changing voices on a per-document basis
Hyperlink-following method for HTML documents (not likely to be coded by me!)

An Alternative Approach from Kyle Rankin

Cover of Kyle Rankin's book

The ever-inspiring Kyle Rankin offers an alternative approach in his book Linux Multimedia Hacks. Kyle generously allowed me to include his perl script speak.pl in the tarball above. For reasons I don't understand, on my Fedora system I must invoke it as "perl speak.pl <filename>" rather than having an executable called "speak" that I invoke as "speak <filename>." Kyle's solution only works with text files, which is displays as it speaks them. His script also adds end-of-sentence and end-of-paragraph pauses better than default festival.

Resources

Major sources for the implementation: Dallas Legan's encyclopedic notes and Randy Solomonson's helpful blog.
Gstreamer playing from stdin
Another useful aggregation of assistive technology info, suggested by Niels Mayer.
I've had several useful discussions on the general topic of assistive technologies with Jonathan Nadeau and Spencer Hunley.

Disclaimer

Why would you take the advice of some random person you don't know?

alchaiken@gmail.com (Alison Chaiken)