Richard Sproat

Computational Models of Orthography and Phonology

The Alexa, Cortana, Google Assistant or Siri voice on your phone needs to know how to pronounce words. How does it do this? In this presentation I will give a brief overview of the grapheme-to-phoneme problem as it has been approached in speech technology, in particular text-to-speech conversion. Approaches have ranged from traditional rule-based methods (still perfectly reasonable for some writing systems), to various types of machine learning including, most recently, neural approaches. While modern systems are still probably not as good as human performance, particularly for morphologically complex words, there are some areas, such as proper name pronunciation, where automated systems can actually do better than humans. If nothing else, automated systems may have a better model of world knowledge, which can affect word pronunciation, so that an Assistant-like application will probably know that if you are in New York, Houston (Street) should be pronounced /ˈhaʊstən/ (not /ˈhjuːstən/), and thus not make the same error that many tourists do.