While watching television last week, I did the impossible: I controlled my Apple TV with an Amazon Echo.
Normally, Apple’s streaming box and Amazon’s smart speaker aren’t supposed to talk to each other. But with a new device called Caavo as the middleman, I was able to launch videos from Netflix and control video playback using only my voice. Caavo also routed all my other living room boxes—from TiVo to Roku to Xbox One—into one input, and provided a universal remote and interface to control them all.
As a consumer product, Caavo is flawed in several ways. It doesn’t integrate with every streaming service and pay TV provider, it doesn’t support HDR video, and it introduces unbearable input lag to game consoles. At $400, it’s also an awfully pricey way to solve TV’s current problems.
It does, however, underscore just how vexing those problems can be as tech giants like Apple, Amazon, and Google invade our living rooms. These companies are unlikely to work together on integrating their respective products, which means more broken experiences for consumers, and more opportunities for neutral parties like Caavo to pick up the pieces.
“We realize that none of the big players have been solving the problem, because they have a horse in the race,” says Ashish Aggarwal, Caavo’s cofounder and CTO. “They are trying to ride that horse, and it’s making consumers’ lives difficult.”
Other companies have gone down the path of TV unification before with little success. Google TV tried to combine cable and streaming into one interface in 2010, but the product was flawed and the concept didn’t get much buy-in from cable providers or streaming services. Google eventually killed the platform in favor of Android TV and Chromecast. Microsoft had a similar idea with its Xbox One console, which has an HDMI pass-through for cable boxes, but the all-in-one entertainment angle never resonated with consumers, and Microsoft has since focused on building out the Xbox’s core gaming features instead.
Caavo works a bit differently than those efforts. Instead of a single HDMI input, it has eight of them. And instead of offering its own platform for streaming apps, Caavo depends entirely on what those inputs provide. When you select something to watch through Caavo’s watchlist or search menu, it switches to your device of choice, then navigates through that device to launch the appropriate video. This happens using a mix of software hooks and computer-vision trickery; in some cases you can actually see Caavo scrolling through menus and selecting things on your behalf after you’ve chosen a video.
Because Caavo can “see” what’s happening on each input, it feels more comprehensive than past attempts at interface unification. It can surface DVR queues from cable boxes, and it can search across streaming services even when they’re on different devices. (This is what allows Alexa to manipulate an Apple TV or Roku.) In a future update, the headphone jack on Caavo’s remote control will allow private listening through any input.
Routing all your HDMI inputs through Caavo has some inherent downsides. It introduces about 60 milliseconds of input lag, which is fine for TV controls but ruinous for gaming, and the system can take a while to track down and launch your selected video. If you’d rather use the built-in search functions on a device like Roku or Apple TV, which are faster and more capable, you’ll still need to reach for that device’s remote control instead of the one Caavo provides—which defeats the whole purpose of unification.
But Caavo’s biggest shortcomings are the connections it hasn’t made yet. Currently, the device can only search through a handful of streaming services, including Netflix, Amazon, iTunes, HBO, and Hulu. There are many more it doesn’t work with, including PlayStation Vue, MLB TV, Crackle, and the vast majority of “TV everywhere” apps, from Fox Now to HGTV. And right now, only Comcast Xfinity, DirecTV, and Dish subscribers can surface their DVR watchlists through Caavo. You can still use Caavo’s remote to launch other streaming apps and control other cable boxes, but at that point it’s not much different than using a universal remote such as the Logitech Harmony.
Uncutting The Cord
Even if Caavo adds support for more streaming services, it’s solving some problems that won’t always exist.
The notion of merging cable and streaming boxes, for instance, will become quaint as more people give up cable TV and use a single streaming box for all their video needs. And for those who do stick with cable, modern cable boxes like Comcast’s Xfinity X1 are starting to embrace streaming video anyway, with support for apps like Netflix. All of this means that an increasing number of people will converge devices even without a $400 HDMI pass-through box.
The bigger challenge to tackle is what happens when users do settle on a single streaming device. At that point, they’ll be at the whim of one company—probably a large one like Amazon or Apple—whose integrations with other products and services are dictated as much by business goals as the needs of consumers.
Voice search is a great example. Today, you can use Amazon Echo voice commands to control Amazon’s own Fire TV, or Google Home voice commands to control Google’s Chromecast. Apple’s HomePod will probably act as a voice remote for Apple TV devices in the future. But what happens if you prefer Chromecast for streaming and HomePod as a smart speaker, or Apple TV for streaming and Amazon Echo for voice control? None of these companies have an incentive to make their devices work together. They’d rather lock you into their own device ecosystems.
Even Roku, long seen as a neutral party in the streaming TV wars, is now building its own voice control ecosystem. Last month, the company announced a platform for smart speakers and soundbars that can control Roku TVs and streaming boxes. Perhaps that explains why Roku hasn’t allowed its devices to be controlled via Alexa, even though the necessary developer tools already exist.
Meanwhile, streaming video will continue to be fragmented, in some cases device by device. Apple is reportedly investing $1 billion in original video programming, which means you might someday need an Apple TV just to watch certain shows. And until Google and Amazon figure out how to settle their differences, you won’t be able to watch Amazon Prime Video on Chromecast, or YouTube TV on Fire TV devices.
Caavo acknowledges that its current hardware is overkill for these kinds of problems. Most people don’t need a $400 piece of hardware with gorgeous wood paneling and eight HDMI inputs just to make sure Amazon Echo and Apple TV can talk to each other. But would you buy a device with two HDMI inputs if it meant not having to think about what’s on each streaming box? Would you consider a single HDMI pass-through if it made your voice control headaches go away? Could this kind of solution just be built directly into a smart TV?
Caavo doesn’t have the answers to these questions. For now, the company is planning to test the market with its current product–it only intends to sell about 5,000 of them–then evolve based on where it believes the industry is headed. With $32.5 million in venture capital and a concept device that is intriguing even if not altogether satisfying, Caavo is betting that it has plenty of time to figure out an intervention plan.
“I think the problem goes away only when there is a single large company providing all the content to you,” Aggarwal says. “Which I don’t think happens anytime soon.”