by Kyle Wiggers
Adam Cheyer, a Brandeis University and UCLA alum with degrees in computer science and AI, knows a thing or two about digital assistants. He previously led the Cognitive Assistant that Learns and Organizes (CALO) project at SRI International’s Artificial Intelligence Center, which sought to integrate cutting-edge machine learning techniques into a platform-agnostic cognitive assistant. Cheyer was on the founding team of Siri, the startup behind the eponymous AI assistant technology that Apple acquired in 2010 for $200 million, and he cofounded Viv Labs, which emerged from stealth in 2016 after spending four years developing an assistant platform designed to handle complex queries.
Samsung acquired Viv in October 2016 for roughly $215 million, and soon after tasked Cheyer and colleagues to build their startup’s technology into the company’s Bixby assistant, which rolled out in March 2017 alongside the Samsung Galaxy S8 and S8+. The fruit of their labor — Bixby 2.0 — made its debut in October 2017 at Samsung’s Bixby Developer Conference, and it formally launched on the Galaxy Note9 in August 2018.
Register Today See Adam Cheyer and New Bixby at VOICE Summit 2019
Today, Bixby is available in over 200 countries and on over 500 million devices, including Samsung’s Family Hub 2.0 refrigerators, its latest-gen Smart TV lineup, and smartphone and tablet series including the Galaxy S, Galaxy Note, and mid-range Galaxy C, J, and A. (Sometime this year, the first-ever smart speaker with Bixby built in — the Galaxy Home — will join the club.) On the features front, Bixby has learned to recognize thousands of commands and speak German, French, Italian, U.K. English, and Spanish. And thanks to a newly released developer toolkit — Bixby Developer Studio — it supports more third-party apps and services than ever before.
But the Bixby team faces formidable challenges, perhaps chief among them boosting adoption. Market researcher Ovum estimates that 6% of Americans used Bixby as of November 2018, compared with 24% and 20% who used Alexa and Google Assistant, respectively. For insight into Bixby’s development and a glimpse at what the future might hold, VentureBeat spoke with Cheyer ahead of a Bixby Developer Session in Brooklyn today.
Here’s a lightly edited transcript of our discussion.
VentureBeat: I’d love to learn more about Bixby Marketplace, Bixby’s upcoming app store. What can customers expect? Will they have to search for Bixby apps and add them manually, or will they be able to launch apps with trigger words and phrases?
Adam Cheyer: Bixby Marketplace will eventually be available as part of the Galaxy Store, the app store [on Galaxy phones, Samsung Gear wearables, and feature phones]. Samsung is committed to having a single place where you can buy apps, watch faces, and other items. You’ll be able to find Capsules there in addition, and Capsules contributed by developers in Samsung’s Premier Development Program will have key placement. But I think the coolest way to interact with the Marketplace will be through Bixby itself.
There’s a number of approaches you can take already. One is Bixby Home, [the left-most dashboard] on Galaxy phones’ home screens. On [smartphones], you just tap on the Bixby button and swipe to see featured Capsules and other Capsules in all sorts of categories.
You can also discover Capsules automatically through natural language. For instance, if you say something like “Get me a ride to San Francisco,” Bixby will respond ”Well, you don’t have any rideshare providers enabled right now, but here are several providers in the Marketplace.” You’ll then be prompted to try [the different options] and decide whether you like one brand, another brand, or both. If you enable more than one, Bixby will ask which you’d like to use by default.
Also, as you suggested, you can invoke Capsules with a name or phrase. For instance, you can say “Uber, get me a ride to San Francisco.”
VentureBeat: Right. So eventually, will developers be able to charge for voice experiences — either for Capsules themselves or Capsule functionality? I’m envisioning something akin to Amazon’s In-Skill Purchases, which supports one-time purchases and subscriptions.
Cheyer: Absolutely. The first version of the Bixby Marketplace will not feature what I call “premium Capsules,” which means paid apps or subscription apps. But we’re working hard on that, and we’ll have some announcements around that soon. We know that the content providers of the world need to make a living, and we will absolutely support that.
Transactional Capsules can charge money — we have providers like Ticketmaster and 1-800-Flowers who are accepting purchases today, and we’ve worked really hard to lower purchase friction for our commerce partners. If you’ve saved your card on file anywhere within the Samsung ecosystem, Bixby will know about it — you just say “Send some flowers to my mom,” and the 1-800-Flowers Capsule will say “Great — do you want to pay with your usual card?”
Additionally, we support OAuth for partners like Uber, which have cards on file within user accounts. You’re able to attach Bixby and give it account access privileges so that you can make purchases in these partners’ payment flows.
VentureBeat: You added new languages to Bixby recently — they joined English, Korean, and Mandarin Chinese. What are a few of the localization barriers the team’s facing as they bring Bixby to new territories?
Cheyer: We’re working hard to launch at least five new languages a year, and we may up that in the future.
We believe that offering the right tools and building an ecosystem that scales will enable the world’s developers to create fantastic content for end-users. This is especially important when it comes to globalization because it means that we don’t have to localize every single service. Instead, we provide a platform that has same capabilities in each language.
VentureBeat: So on the subject of developer tools, has the Bixby team investigated neural voices like those adopted by Amazon and Google? I’m referring to voices generated by deep neural networks that sound much more human-like than the previous generation of synthetic voices.
Cheyer: I’m not going to announce anything that’s not yet in production, but I will say that Samsung has significant capabilities not only on the text-to-speech side of things but on the speech recognition side, as well. There’s significant advances being made in AI, and neural network voices is certainly one of them. There’s also a lot of work ongoing in automatic speech recognition (ASR) — we’re transitioning from hidden Markov model approaches to pure end-to-end neural networks — and we’re seeing ASR models move from the cloud to edge devices like phones.
We’re definitely aware of all of this, and you can rest assured that we’re working hard on these areas.
VentureBeat: You briefly mentioned privacy. As you’re probably aware, there’s some concern about how recorded commands from voice assistants are being stored and used. Bixby already offers a way to delete recordings, but would the team consider introducing new commands or in-app settings that’d make it even easier to delete this data?
Cheyer: Sure — we’re open to all of those things. Privacy is an important and multifaceted issue. For me personally, it’s not just the fact that my voice was used to tune a particular speech recognition model somewhere. I’m much more concerned about what an assistant’s doing on a semantic level — what it knows about me and why it’s showing me certain information.
But different users are going to worry about different things. You have to offer a variety of ways to let users control the data that companies have, and how they use that data.
One thing that’s important to note is that we’ve made control over what Bixby learns a fundamental platform capability. I’ll give you an example: With Bixby, developers can opt to use machine learning to process requests from users. If I ask Bixby about the weather in Boston, it might not be obvious, but which “Boston” I’m referring to is actually a preference. Most people are going to choose Boston, Massachusetts, but people who live in Texas might choose Boston, Texas. It’s kind of annoying to have to repeatedly specify which Boston you want, which is why Bixby is built to learn preferences about things like restaurants, products, and ridesharing globally and locally.
We surface these learnings to users in the Understandings page. They’ll see that Bixby guessed that they meant Boston, Massachusetts last time they asked about the weather. If they didn’t, they’re able to update it or make it clear that they don’t want Bixby to know this information about them. They always have total visibility of what is known about them and how it’s being used at a very granular level.