Microsoft’s Marc Whitten on ESPN Coming to Xbox, and Using Kinect for Interactive TV

With Microsoft’s new Kinect camera, new kinds of interactivity will be coming to living rooms. Marc Whitten, the General Manager of Xbox Live, tells about speech recognition, about natural control, and how Kinect will change the living room.

Marc Whitten

Kevin Ohannessian: What is your favorite thing at E3 this year?


Marc Whitten: I like our stuff pretty well, and that’s not just a PR answer. I have been working with Xbox for 10 years, and the stuff we’re doing around ESPN and Kinect, and finally getting to a place we can bring out more of the interactive experiences for everyone. Interface and controller has been such a big problem in this space, it’s pretty cool. There are a lot of great games out there as well. I’m pretty excited about what’s happening in the living room–all of the screens being connected, really showing the power of something like Xbox Live and Kinect.

What hurdles did you have to get over for the ESPN deal?

They are great partners, and from meeting number one the excitement was through the roof on both sides. We got our teams together, and we started saying, “We can do this, and we can do this. This what we look like. Here are the types of things ESPN has been wanting to do for a while, but it’s a little tough in the living room without the interactivity. Here’s what we’ve been thinking, here’s what we know about what happens in games and how that would translate into non-game stuff.” That part was never a hurdle. I think getting through the logistics of when and how, and how we can get our schedules to align and those parts, that’s always work.


What about the more interactive elements of ESPN, the polling and the following of teams?

It’s a really great mix of what the Xbox is great at and what ESPN is great at. ESPN is a data machine–they have all of the sports stuff going on, on top of that they are collecting the stats, meta-data so you know what is going on, and they use that in a variety of ways. On the Xbox, we have the power to do interactive stuff, the community that connects people. One of the things we saw last year when we launched our Sky service in the U.K., that party mode inside of sports session was really huge. And we knew that if we’re able to marry all of that stats and data and stuff that’s going on in other games, that we’d come out with something pretty special. A lot of it goes back to how do you use what’s socially happening on live to make better decisions about what you might want to do to be entertained. With ESPN on Xbox Live, it’s easier to find the right games, because you can see where the community’s interest is in, what’s the hot game happening right now.

On, we have been covering the evolution of natural interfaces, and there is always a comparison to Minority Report. How much did science fiction influence Kinect?


Hey, we’re Microsoft. We like science fiction. I remember when were first getting voice recognition up and running, there’s a scene from Star Trek IV, when they go back in time and Scotty picks up the mouse, “Computer, computer.” I made them use that as the clip for the first time the command “Xbox pause” worked. When you look at where the imagination goes when the technology gets out of the way, that’s where the magic happens. There’s that great quote, “Any sufficiently advanced technology is indistinguishable from magic.” I use this electricity analogy. If I told you how electricity got to your house, it’s really complicated. There’s a nuclear reactor, there’s thousands of miles of cables, there’s government regulations, and whole industries–you don’t know about any of that stuff. You just flip the switch and it works. That huge technology investment just makes it easier for people to interact.

At Sony’s event, they announced PlayStation Plus, a paid subscription.

I am not surprised other people in the world do similar things. I didn’t have a very deep reaction, frankly. We’ll see what they do with it. I’m excited about what we’re doing. It’s about how do we deliver this amazing value inside of Xbox Live, that’s about all entertainment becoming social and interactive, and you being able to find content you want, with people you care about, wherever you are. That’s my constant push with the team. I think we are going to focus on a lot of different business models, subscriptions and transactions, both of those are really great for us.


I am surprised you guys didn’t touch on gaming with 3-D televisions.

There are a lot of great games with 3-D support. I think 3-D TVs are going to get into households and it will be something we support. I am more focused on how to make the world social in the living room, how do you make it easy to engage. That’s what our news here is about. While Party mode has been really great for us in things like Sky and Netflix, it’s interesting for us because putting a headset on removes you from the people sitting next to you in the living room, removes you from this most social place in the house. A lot of what I think is special about what we are talking about and what we’re trying to do is make the technology invisible and pull people together. I am excited to see people experimenting with 3-D games, I think there’ll be good, I think people will get the TVs, and it is something we will support. But our focus is on how we make it natural for people to interact.

Xbox Kinect

Can you tell me more about Kinect’s gestural interface and voice interface?


You’ll see something similar to what we have today, where we have a games area, and the different things you can do inside of Kinect. What’s important to us is making sure that it is really simple for you to know all the different experiences. The part that’s really cool about voice, is that its one of those only-at-Microsoft things. We’ve been working on speech recognition for 10 years, for boring stuff like transcribing voicemails. What really has to happen to make that natural, first we have to teach it how a human throat makes noises, then we have to teach it what grammar looks like in a region or a dialect, and finally you have to teach it words you are interested in–it has to pick all of that up and remove the sound around you. In Zune, if you watching a movie in 5.1 surround, Kinect will remove all of the noise from the video. You don’t have to shout over it. It will take all of it out, and will know that there are sounds coming from you. It will allow you say in a very conversation voice, “Xbox pause.”

You start by recreating the simplest of controls, making them even simpler, but when you really think about what has happened in the living room, all of these things become more and more interactive, that is when you are going to need a more natural user interface and getting people to experience them, or people are going to get stuck. If you look in your living room, you probably have 5 remotes, 250 buttons, just to change the channel.

Think about every television screen being connected, the Internet coming to all of those things, all of the experiences connecting you to friends and to entertainment. The difference between a game and a non-game is disappearing. American Idol is a game–it has a controller, it has an experience. It’s somewhat constrained by the current technology, but once you can get that, and know how your friends are voting, not just the world, once you can interact with it in a deeper way, you will start seeing the world really change. The language is already there for video games. When people describe Halo, they will always to describe it as the community of Halo players, but when they describe Lost, they will always refer to the audience of Lost. It’s that one thing–control has been a barrier to allowing people connect together and have these really interesting experiences.


So you will be interacting with the TV more?

The magic of Kinect is we allow people to get what they want, connect with the friends that they want, and if they choose to, interact and have a natural experience. Interactive TV is something that people have been talking about forever. It’s always, “Here’s this red button on the remote, you can go through these menus, go through this stuff.” All that has to go away. If you really are going to nail this next level, it’s going to be something like Kinect, something like Xbox Live, that allows you to stitch together that experience and make it natural.

Imagine the feedback loop for Oprah. She’s sitting there having a conversation. When she says something good, the audience sitting there laughs. She knows it and feels it. She knows that was a good moment. But you laugh in your living room too. And suddenly being able to use that as input to make the experience better is interesting. The sports experience goes the same way. If you were to watch rabid sports fans they yell at the TV a lot, they yell, “Bad call!” No one wants to pick up a remote and go through, and go find a button, “I wish to now say that’s a bad call.” But you can start to imagine it as a more natural experience from the ground up. I think you can really get people to expect more out of the entertainment that they have.


For more with Marc, read the E3 posts on Art in Gaming and Financial Threats to Games.

Stay tuned for more interviews from E3.

About the author

His work has also been published by Kill Screen, Tom's Guide, Tech Times, MTV Geek, GameSpot, Gamasutra, Laptop Mag, Co.Create, and Co.Labs. Focusing on the creativity and business of gaming, he is always up for a good interview or an intriguing feature.