It’s easiest to get genuine UX feedback when the user is in the heat of the moment, but actually finding the right user at the right moment–and interrupting them in a way that won’t freak them out–requires considerable know-how.
Facebook is constantly rolling out new features and interfaces, and has been using remote time-aware testing for a little less than a year. Bolt, who is Facebook’s Design Research Manager, told us how Facebook conducts and uses their remote testing research, and how to isolate factors like language and culture which can screw up results.
What is remote user testing in a nutshell?
In a traditional lab environment you pull people up out of the physical context of their usage of a product. That’s just kind of a well-known, accepted practice. Like everybody knows that that’s how things are done and have been for a long time. But the timing around when somebody wants to use Facebook or when somebody wants to use any product is a rich area. It’s best for us to observe interaction in the moment when people care about what they’re doing. Getting them at the time when stuff is happening is the way to do that.
What do you think then that time-aware testing accomplishes that traditional UX testing can’t?
It’s like a whole new kind of window into the criteria by which people make clicks or taps or gestures. Because if you have a meeting at 2:00 where you need to contact somebody or you need to create something or you need to get something done, the pressure that you have and the things that are on your mind are totally much more real than if you’re sitting in a lab. Even if you’re doing a remote interview or ethnography, there’s nothing about your day or your life at that moment that’s impacting how you’re using a tool. It’s just a little more make-believe. We don’t like make-believe. Make-believe doesn’t get us a great product.
Like where this story is headed? Check out How Facebook Did UX Testing For Facebook Home (With Fewer Than 60 People).
How has remote UX research changed in the past few years?
Okay, here’s the thing. In three years, everything has veered mobile. So a few years ago we were still pretty focused on desktop behavior. And that stuff is just totally played out now. Obviously we still care about our desktop products deeply, but u and everybody else in the world is much more focused on mobile. So we have to adapt the research techniques to the mobile environment.
So how is Facebook handling that remote user testing for mobile devices?
One thing we do is what’s called “reverse laptop hugging.” Basically we find people with a webcam in their laptop. We ask them to turn the laptop around and they hug it. And then they can use their mobile phone and we can observe their mobile phone usage through their webcam on their laptop. Otherwise, it’s very hard to observe international remote mobile usage because mobile phones don’t screen share. We do fancier stuff too. But as far as something that gets us in touch with a huge percentage of the world that’s connected that’s online, we do a fair amount of that. We also have these three very official labs with this thing called WolfVision cameras that are much more detailed and are designed to get us more intricate interactions with mobile devices. So that’s one of the ways we do [lab] research on mobile. Laptop-hugging is a more like quick and dirty sort of approach. We also use an OSX app called Reflector that uses AirPlay to mirror iOS streams. And then simply talk to people. I mean, that still counts — just having conversations either in person or over the phone, where we’re not gonna give you fancy screen sharing.
How do you translate the results into actionable information for designers?
Outside of qualitative design research, we have a huge design team and quantitative mobile analysis team that’s doing sort of behavioral trends and analysis on mobile data, which is probably, in terms of numbers, a larger effort than my team. But my side of things is more focused on the human element, the sort of behavior effect. And it’s incredibly blended. I mean, it’s common for studies to have three or four redundant data gathering methods. Some of those data gathering methods will be qualitative and some will be quantitative. We have a pretty badass analytics team that’s gonna give us trends and insights and show us things happening on the mobile builds that we wouldn’t get out of qualitative testing. And then a lot of times, we’ll go investigate with the qualitative stuff. My favorite part is that we don’t present reports. We try to never deliver any reports ever, if possible. Reports can’t attend meetings and they can’t argue in favor of their findings. They die in the wastebasket immediately. So we’ll bring up some data in a session, we brainstorm on a whiteboard, absorb some of the human patterns of the people that are using this stuff, and then incorporate that in our next build. That’s the goal.
How do you collate anecdotal feedback which can be all over the place in qualitative research into one clear set of recommendations?
Yeah that’s a key challenge. There’s two ways. The first way is we try to ignore opinions that focus on behavior, even in the qualitative sessions. Because opinions are essentially worthless. They don’t repeat. Opinions require a huge sample size to find trends. But cognitive behavior repeats over a very small sample size. So we can reliably extract behavioral trends from a fairly small qualitative sample. The chances are you navigate a new system as a certain type of user in one way. A lot of other people will navigate in that same way. You might not like it or you might not like the way it looks, but your functional interactions with it will be fairly dependable. It’s a fascinating distinction between what people say they like and their ability to actually perform tasks. It’s one of the things that’s most interesting about what we do. It’s sort of a creative endeavor to separate out the signal from the noise. Because what people say they want or say they wanna do with technology is notoriously garbage. 1973, AT&T did a market research study and found that there was not a significant market nor would there ever be for mobile phones in the United States.
But can you really get data that precise from just a few time-aware tests?
First of all, the real key is mixing methods. We almost always try and mirror behavioral tasks and questions across big data, qualitative observation, experimental data, and surveys. Secondly, we’re not talking about science, we’re talking about interface and product design, so the behavioral trends from a sample as small as 10 people might just be used to inspire our teams, rather than be an absolute scientific truth.
How could someone interested in doing time-aware testing implement it themselves?
Is this really the kind of thing that developers can do on their own?
Absolutely. And I love it when anybody outside of sort of design research conduct their own tests. I think it’s fantastic. It’s just talking to people and observing their behavior–anybody can do it. Now, in practice, what I find is a lot of people don’t necessarily want to do that. I get that it doesn’t sound appealing. This is probably the reason why usertesting.com is so popular. I know a lot of marketers, designers, engineers, and PMs out in the world use them because they cut out the need for you to speak directly with random people.
We’ve talked a lot about remote user testing, but is there still a place for traditional lab-based UX testing?
Absolutely. We blend it all here at Facebook. And I did that when we worked with other large companies like Sony or Wikipedia. In general, on the research side of things, to be rigorous, mixing methods is usually a good thing. So mixing traditional, old school stuff with a little bit of live intercept or a little bit of remote research is awesome. It usually gives you some nice comparative data. The basic gist of it is for lab research, there’s something magical about having people in the room, either next to or close to your stakeholders. Seeing real human beings face-to-face is kind of nice, you know? You get a lot of cues and a lot of visual input and a lot of empathy even if you’re on the other side of a one-way mirror. And then the advantages on the remote side are more about the accuracy of the interactions. When people are more comfortable, they’re more likely to talk trash about remotely. When you bring them into your lab, walking past your company logo and everything, they tend to want to be nice. Most people don’t want to come in and be like, “Actually, I hate everything about your company.” Most people don’t do that. But remotely they do.
Where do you see the future of remote and time-aware testing?
Mobile screen and camera-sharing are in their absolute infancy right now. Things like WebRTC GetUserMedia will progress to make it easier for people to quickly share their mobile screen and camera from anywhere in the world but there are massive security and privacy concerns at the mobile OS level that will need to be figured out.
[Image by jonathanpeterss on Flickr]