Time we talked to our machines

We talk to our machines more and more often: the smartphones, computers and digital assistants on our desks. Are keyboards, and the manual operation of electronic devices, becoming obsolete?


facebook twitter linkedin email
Time we talked to our machines Norbert Biedrzycki blog

Over the last few years, artificial intelligence (AI) has advanced rapidly, with developers regularly reporting new breakthroughs. AI algorithms work ever faster. Until recently, skeptics argued it would take ages for robots to come anywhere close to moving like humans. It was easier to program a computer to defeat humans in the Chinese game of GO than to construct a machine that could move like us. But the skeptics have been proven wrong. Today, we can see creatures made by Boston Dynamics jump and run and perform acrobatics. Robots have become as agile as we are. In fact, AI has been acting ever more human-like, and not only in ​​robotics. Time to talk to our machines.

The challenge of understanding

For years, developers have been honing computers’ human-speech-processing capabilities. A great deal of thought and effort has gone into devising ways to decode natural language and support man-machine interactions. Intensive research into speech recognition began in the 1980s. The IBM computer used in early experiments could recognize thousands of words but managed to understand only a handful of complete sentences. It was not until 1997 that a breakthrough was made, when the Dragon NaturallySpeaking software surprised everyone with its ability to recognize continuous speech at the rate of a hundred words per minute. The biggest challenge faced by experts seeking to achieve a breakthrough was (and, to a certain extent, still is) the fact that human speech relies not only on inner logic but also on references to external situational contexts and/or emotions.

Today, it is easy for a computer to understand and answer the question “What is today’s weather?”. It is far harder to wrap their processors around the meaning of, “So, I suppose I’m going to need an umbrella again next time I go out? Yes?”. The challenge lies in that question’s irony, allusiveness and the reference to the past. Such rhetorical forms, common in human communications, continue to pose the biggest challenge for smart machines. Yet, the progress being made in the field is absolutely dramatic.

We don’t only talk about the weather

Today’s computers can process voice messages with excellent accuracy (an error rate of merely 5 percent). Their growing capacity to comprehend complex contexts represents a major advance in the development of algorithm-based voice-recognition technology. The huge effort put into training bots by feeding them samples of human speech has made communication with electronic devices considerably more natural. We can now ask a table-top speaker about the weather or command it to adjust room temperature or make a purchase in an online store. Meanwhile, voice-enabled bots are speaking in perfectly structured sentences. It is hard to deny they are graceful and skillful in dealing with complex communication problems. To learn more, check out this video from Google. 

Time we talked to our machines Norbert Biedrzycki blog 1

The 2017 Black Friday miracle

One of the key milestones in speech technology has been the development of the Siri smart application from Apple. Soon after Siri demonstrated its capabilities to the general public, it was followed by the launches of Microsoft’s Cortana and Amazon’s Alexa. More recently, Google Assistant has been taking the market by storm. Voice-operated interfaces have been establishing themselves in banking and commerce. Other industries are showing growing interest in jumping on the bandwagon. 

Encouraged by this favorable market response, Microsoft, Amazon, Apple, Google, and Facebook have engaged in a race to launch new applications. Google has joined forces with Starbucks to develop an assistant to place orders on behalf of regular customers. Drivers will be able to use a voice assistant to communicate with Google Maps. Amazon is working to develop a system and machines that will enable users to sell and/or buy products by simply talking to their computer. A year ago, Amazon’s sales people realized that the new technology has the potential to astound individual users. 

Yet, in 2017, even the biggest voice recognition optimists did not anticipate what would happen on Black Friday. Mainly the day after Thanksgiving, when Americans are traditionally offered huge discounts. On that day, interest in Alexa speakers exceeded all expectations. Consumers ended up buying millions of Alexa and Echo devices. This, admittedly, was partly driven by a large-scale promotional campaign and deep discounts. Nevertheless, the numbers seem to indicate an interest that surpasses the urge to take advantage of a deal.

The 2018 Voice Labs Report estimated that by end of 2017 there were 33 million “voice-first” devices in circulation. According to the investment fund RBC Capital, nearly 130 million devices networked directly to Alexa will operate around the world by 2020. Over the next two years, Alexa sales will generate $10 billion in revenues for Amazon. Google claims that 20 percent of its users rely on voice for searching the internet on mobile devices. Over the next two years, this number is expected to increase by another 10 percent. According to the Mintel Digital Trends report, 62 percent of UK would like to use voice to control devices, and 16 percent have done so already. These numbers reveal a great deal about the underlying trend. 

However, AI voice technology is not always smooth sailing

Caveat speaker

Only two years ago, corporate failures to develop new technologies received more media coverage than successes.  In 2016, Microsoft jettisoned its Tay chatbot project after it found the chatbot “fed” on profanities from web users, which it then spread itself. At the time, the media made fun of bots. The web was awash with reports from users complaining about Siri or Echo activating themselves unexpectedly. Some critics point to the danger of smart speakers leaking recorded user conversations online. Such records can be deleted as long as one knows and remembers to do so. This leads us to the issue of personal data protection and the safe use of cameras and speakers. 

Other doubts have arisen over the reliability of voice assistants. Could the answers from Alexa, Cortana, or Google Assistant to some of the more complex customer queries be manipulated for marketing purposes? And, speaking of marketing, think about voice-controlled searching. Will those searches and machines be steered to sell products? And what about search engine optimization (SEO) in a voice-controlled environment? Websites that rely on visual/textual and all-textual advertising may lose significant value.

Time we talked to our machines Norbert Biedrzycki blog 2

The future is hands-free, with machines

I began this article wondering whether a major change, including a departure from manually-operated controls, was imminent. Considering the technology’s track record over the last few years, that seems likely. 

One of the key drivers behind this trend is the increasingly popular idea of “the ​​smart home,” enabled by the Internet of Things. Apple, Google and Amazon – the heavyweights – are all on board, believing the use of voice to operate devices aligns  perfectly with the preferences of today’s consumers. What we want from shopping in terms of information access and interaction is convenience, pleasure and quick results. Voice control seems positioned to satisfy all those needs. A model relying on short, quick statements and commands from shoppers and fast-responding applications and assistants is undoubtedly viable.

Given the pace of technology advancement, I don’t see why the next few years could not bring a change as radical as the transformative impact of smart phones. We’ll be able to give our eyes and hands a rest as we increasingly talk (and listen) to our electronic friends.

.    .   .

Works cited:

Brooking, Jenny Perlman Robinson Molly Curtiss, MILLIONS LEARNING REAL-TIME SCALING LABS – Designing an adaptive learning process to support large-scale change in education, Link, 2018. 

RBC, Amy Cairncross, SVP, Communications; Sanam Heidary, Managing Director, Communications, RBC announces retirement of RBC Capital Markets and RBC Investor & Treasury Services Group Head, Doug McGregor, link, 2018. 

Mintel, Matt King, minority report reluctance to use voice controlled tech: 62% of brits would be happy to use voice commands to control devices, Link, 2018. 

The NetFlix Tech Blog/ Medium, Chaitanya Ekanadham, Using Machine Learning to Improve Streaming Quality at Netflix, link, 2018. 

McKinsey Global Institute, Michael ChuiJames Manyika, Mehdi Miremadi, Nicolaus Henke, Rita Chung, Pieter Nel, and Sankalp Malhotra,Notes from the AI frontier: Applications and value of deep learning, link, 2018. 

.    .   .

Related articles:

– Artificial intelligence is a new electricity

– Machine, when you will become closer to me?

– Will a basic income guarantee be necessary when machines take our jobs? 

– Can machines tell right from wrong?

– Medicine of the future – computerized health enhancement

– Machine Learning. Computers coming of age

– The brain – the device that becomes obsolete

Leave a Reply


  1. TommyG

    First off I would like to say fantastic blog!

    I had a quick question that I’d like to ask if you do not mind.

    I was curious to find out how you center yourself and clear your mind before
    writing. I have had a hard time clearing my thoughts in getting my thoughts out
    there. I truly do enjoy writing however it just seems like
    the first 10 to 15 minutes are generally wasted simply just
    trying to figure out how to begin. Any recommendations or hints?


  2. Tesla29

    Realistically speaking “our robot friends” will definitely be efficient enough to replace all of our human friends. That flesh and blood thing may just become a thing of the past.
    No fighting will be needed like in a Terminator scenario. AI systems are patient. Just waiting for 15-20 or 30 generations for humans to unlearn everything including communicating, writing and reading, growing own food, etc. – letting the people become fully dependent and then pulling the plug on this life support.

    Looks logical to me. Why would intelligent, independent systems need humans?

  3. Jack23

    Late to the party, but check out this slide deck of a Perry Cook presentation on the history of speech synthesis: https://www.cs.princeton.edu/~prc/CookDAFX09Keynote.pdf

    Particularly the section on early speaking machines.
    I heard him give this (or a similar) talk once. There was a particularly titillating bit about someone who pumped air through cadaver heads and manipulated the corpse’s vocal cords to synthesize phonemes, but I can’t find any info about that right now 🙁

  4. Oscar2

    Nope. There would have be massive advances in A.I. for that happen. More important than the vocal cavity in determining what a singer sounds like is the brain. Take Frank Sinatra, for example. Imagine directly replacing a Katie Perry vocal with the tone of Frank Sinatra. It would sound exactly like Katie Perry with a deeper voice because it would still have all of Katie’s mannerisms. It’s the brain that decides to hold the letter ‘n’ for a bit longer, or chooses to say “Aaa” instead of “I”, or does a little yodel at various points.

  5. And99rew

    Computers are very good at making artificial voices. I don’t know why you’d think otherwise unless you think that unless they are 100% perfect then they “suck”.

    Mechanical voice simulation is pretty clumsy. The original one is the Voder from 1939. An impressive feat, but little better than early digital speech synthesis.

    I think that you massively underestimate the vast muscle array and speed of motion required for the human voice.

  6. AndrewJo


    It’s no surprise that your upper airway affects your voice. Have a cold? –> You sound more nasally.
    An academic has used machine learning to generate/predict faces of people given their voice (audio clips). The algorithm was able to predict ethnicity and age well but surprisingly, NOSE SHAPE. There are of course other variables at play that affect our voice, but this mainly focused on generating frontal images (thus nose shape was what they picked up on).

    Perhaps we should be asking how has your voice changed after mewing?