Circa’s Object-Oriented Approach To Building The News

News startup Circa is taking a programmatic approach to mobile news. FastCo.Labs talks to founding editor David Cohn about abandoning the article format and organizing stories into atomic units like events, statistics, quotes, and images which can be resurfaced, reused, and refactored.

Circa’s Object-Oriented Approach To Building The News

Since newspapers first started to print, the basic unit of news has been the article, an unstructured blob of data. A news article contains standard elements like a headline, quotes, facts and photos. It has a clear beginning and an end. To report a new fact, a journalist writes an entirely new article. Online news is not subject to the constraints of the paper page and the daily print run, but it still follows those basic tenets. Mobile news startup Circa has taken the radical step of ditching the linear article altogether. Circa’s founding editor David Cohn littered his presentation to journalists at the recent GEN News Summit with software development terms like forking and refactoring. It was hard to know whether he was describing a software or a news startup. Cohn says that’s exactly the point. Circa is making object-oriented news.


You said in your presentation that “All code is political.” How are Circa’s values reflected in the product you have created?


Circa is half a technology and half a media company. I head up the media half of it but the two are intimate with each other. At the very heart of it Matt, Ben and Arsenio, who are the three co-founders, pictured a GitHub for news. The idea of something evolving over time was incredibly important. Our CMS is organized around breaking news. We are not doing narratives. That’s not to say that there is no value in the narratives. It’s just not where we are putting our value.

The way that technology organizes information is not benign. The way the technology is organized changes the nature of the news itself. We are not going to organize the news in articles. In most news organizations that’s the unit and at best you can organize by topic. So we said ‘What happens if you make the unit, this atomic unit of news, the fact – a stat, a quote, an image?’ A quote can exist in multiple stories. This is what Jeff Jarvis once referred to as object-oriented journalism. This fact is an object and it can exist in several stories.

We want to respect users who are voracious as well as readers who are brand new, by rearranging things for them. The first feature to do this is the follow. We keep track of the story the last time you went. We will push a new fact on this story to your phone.


The news business creates a pile of new content every day and then, effectively, throws it away and starts again. How does structuring news help you to you reuse content?


I call it news amnesia. If you do articles, you have to do this since you need something new today. I think it’s as frustrating for journalists as it is for the readers. What we do is say “Here’s the latest fact and the story it belongs in”. Maybe it belongs in two stories. We will then point these stories to each other. Nelson Mandela has been in hospital maybe five times in the last year. We have a story tracking this and people are following it and eventually when he dies we will update it. People do that already with pre-written obits but that’s the exception. For a lot of other stories like legislation – say gay marriage in France – we were tracking it until it finally went through and then all we had to do was say “passed”. All the background, we already have that.

We value organizing content for the purpose of resurfacing it. This is a hat tip to Adrian Holovaty. He is one of the creators of Django. He worked for a Kansas paper when he did that and then he created Chicago Crime Maps. Information in an article is a blob and you can’t use it later. He said “All this crime data. You are just putting it in an article, and then what?” Anybody who does data journalism owes him a lot.

It’s easier to do data journalism around hard numbers. It’s harder around social issues which are much more fungible but all stories still have facts, quotes, etc. It’s just not organized. When journalists write an article, tags are an afterthought. In our CMS you are tagging it as you go. It’s part of the process. The other thing which is interesting there is constraints. An article is a big box. There are no constraints. On Circa you have a choice. You can add a fact, stat, quote, event or image. There is no opinion entry. There is no analysis entry. Those constraints actually help our editorial process. They guide us. We might be able to go further later on and create entities.


How would you like to refactor that structured news?


One I always geek out about is this quotes thing. It would be great if you could tie entities, like experts, to all their quotes to be able to see “This is the same person they always quote.” which either means A) They are lazy or b) He’s really good. Then you would be able to see across the breadth of stories that he is quoted in and also see what sources we have used. In Circa every fact, quote and event has a citation. Now we could say “This guy has been quoted in seven publications or he has only been quoted in one publication seven times.” Being quoted in seven different publications is more valuable than being quoted seven times in one publication.

Another thing which would be cool to do down the line is corrections. At best you either write a new article or you go to your old article and write “correction” somewhere. Because we can keep track of exactly what you have read, what fact or what quote, when we get a fact wrong we can let everyone who read that story know.


Were there other “news atoms” that Circa considered?

Other ones we will probably bring in are video. When I say video I don’t necessarily mean a talking head analyzing news for you but video of what just happened, a primary document, video that shows the fact. In fact, one of the things I have always said is that all of the atomic units are facts. There’s a statistic, a numerical fact, an event which is a fact in time and place, an image and a quote, the fact that someone said something. We don’t quote talking heads. We quote someone who is in a position of power. Quoting Barack Obama’s State of the Union goals is relevant because he is the president and it’s a fact that he said it. We don’t quote the guy on Fox News. We might think about bringing in social media, which is like a quote. One thing we haven’t done yet, but there is a lot of scope for, is different forms of visual presentation of statistics. Right now it’s just text with a number in it.


You said that a Circa story is like a Facebook newsfeed. You can get to know someone better in little frequent updates, than in a single face-to-face meeting every few months.


There is an element of truth in that. It is important for someone to gorge on 4000 words, and there is value in it, but someone who follows Syria on Circa touches it pretty much every day. Someone following that story, I would argue, has a greater awareness of what is happening in Syria than someone who reads 4000 words and then never touches it again. There’s an easy counter-argument, but I don’t think I am insane for proposing that.

What have been the biggest stories for Circa so far?


The biggest moments for us have been Boston, before Boston the comet in Russia, the Newton shooting. Hurricane Sandy was big. The Boston one is a great example. It started off like a regular Monday. We put up one point (Circa’s term for a news atom). A good proportion of people who read the story followed it. Every time we did a push we would see an instant spike. The session time decreased with every update. That makes sense since everything else they had already read. In aggregate they were spending more time in the app but each session was shorter and shorter.


Why We’re Tracking This Story

Chicken Little says journalism is dying. Well, there are certainly many struggling journals, especially among those that came of age before the Internet. Ask a publisher where the big bucks went, and they’re likely to mumble something about Craigslist, the loss of print advertising dollars, and an inability to sustain a newsroom with digital advertising.

On the flip side, Matthew Yglesias writing for Slate Magazine says ignore the doomsayers: The news-reading public has never had more and better information at their fingertips. Thanks to social networks and ubiquitous mobile Internet, anyone can report from anywhere in the world at any time. As a result, more news is available on more subjects than at any previous time in history.

If you ask me, this is the real cause of publishers’ struggles. Many of the functions they used to perform are simply no longer valuable in a world where everybody, and increasingly nobody in the form of automated sensor networks, can report basic information.


Worse still, news organizations by and large missed the boat on Internet technology and are only now starting to catch up, just as an increasing number of new technology companies set their sights on replacing even more functions that journalists used to perform exclusively.

So what value can news organizations provide in order to survive? Fortunately, there’s no shortage of opinion from academics, technologists, and journalists themselves. This is my attempt to track those ongoing conversations and add my thoughts as both a technologist and a journalist.

Previous Updates

June 26, 2013

While drones have been in the news lately, they probably won’t be making news itself anytime soon, according to the discussions at the GEN News Drone Journalism Boot Camp in Paris this month.

The idea feels promising at first. A drone flying over Taksim Square recently recorded astonishing video of the police clearing protesters, and was shot down by a police bullet for its trouble. Drones have been pitched as a way to tell stories which would otherwise be inaccessible, unsafe, or too costly to cover. Drones can do things which traditional aircraft cannot. They are smaller, lighter, and cheaper. Five hundred thousand toy Parrot drones, which cost a couple hundred dollars, are already in the skies worldwide.

However, the obstacles to Drone Journalism are myriad. Drones, or UAVs (Unmanned Aerial Vehicles) as the BBC prefers to call them, are currently classified as aircraft in many countries, especially if they are above a certain weight or used commercially. They are mostly illegal in the U.S. until new FAA regulations come into force in 2015. Piloting takes hours of practice. In most jurisdictions the pilot is obliged to keep the drone in his line of sight at all times limiting the use cases. It’s illegal in countries like the U.K. to fly at less than a certain height over crowds. The battery life is short. Drone journalism is also more labor-intensive than it seems. You often need a pilot, cameraman, and journalist to get a good quality story. Raphael Labbe, Innovation Director at the French newspaper L’Express, described reporting with drones as “10 percent journalism, 30 percent video and 60 percent piloting.”

“A lot of people literally think that you can sit at your desk and fly one of these things,” says Scott Pham, who runs the Missouri Drone Journalism Program. “Not only is that illegal but it’s a very bad idea since you want to be out there and be aware of what you are doing. They think you can put it into the sky and wait for something to happen and that’s not going to happen because of the short battery life.”

The BBC experimented with drones as part of its London Olympics coverage and even collaborated with the University of Southampton on a 3-D-printed drone. But while the Corporation captured breathtaking video of its own Broadcasting House using a drone, Guy Pelham, a Live editor at BBC News and Nick Pinks from BBC R&D contended that a helicopter is often still the best option for broadcasters who need aerial images. Although the BBC’s helicopter contract costs several hundred thousand dollars a year, the Corporation can get a helicopter to 80% of the U.K. within three hours and send back live, broadcast-quality video with an audio link, something which is currently not possible even with larger and more expensive drones.

What about newsrooms which don’t have the budget of the BBC? Scott Pham has been flying low-cost drones constructed from kits as part of the Missouri Drone Journalism program. “I’m really looking forward to seeing what kinds of small operations can deal with this. The kind of rigs you will see the BBC flying are the kinds of things I never would. Because their standards are so high in national media, this may become the kind of thing which is primarily done by local media.” Pham sees the biggest potential in what he calls “big, flat stories,” which involve a lot of land. “Those are stories that are difficult to photograph, difficult to visualize. It’s also a lot safer to fly over than something populated.” In fact, small news operations and citizen journalism may be where drones can make the biggest contribution. The Taksim Square video was shot by a hobbyist.

Data protection was a big issue at the boot camp. Marc Corcoran, a foreign correspondent with the Australian Broadcasting Corporation, recalled how he had his own eureka moment on drones while embedded with Hezbollah in South Beirut in 2006. Israeli drones constantly flew overhead taking surveillance footage and calling in air strikes. He thought “Why don’t we use drones in journalism?” He pointed to the Black Hornet Nano drone which was developed for the British army and is currently being used in Afghanistan. This kind of nano-drone, once it becomes affordable, could be used in investigative journalism, but could also invade people’s privacy in the worst possible way. Someday we may need “drone control” as well as gun control.

June 24, 2013

The first news of the death of Soprano’s actor James Gandolfini may have reached you via Twitter, but you probably went to the New York Times or, dare we say, Fast Company to read about it. The Reuters Institute for Journalism’s latest report on Digital news, just announced at the Gen News Summit, reveals that while most people now discover news via social media, they don’t trust social media itself as a news source and want their news verified by traditional news brands like broadcasters and newspapers.

Many of the results from the report–the doubling of the consumption of news on tablets in the last 10 months or the fact that 33% of readers track news on two devices–seem rather obvious. A more surprising finding was that the death of the traditional news brand has been greatly exaggerated. Reuters surveyed 11,000 people of all ages in 8 countries, including over 2,000 in the U.S.

In all countries we asked if people agreed that they preferred to get news from sites they know and trust. The figures were universally high, with 90% supporting the proposition in Brazil, 82% in the US, and 77% in the UK. Broadcaster websites in the UK were trusted by 79%, with newspapers showing over 60%. In comparison, Facebook (8%) and Twitter (9%) were widely mistrusted – although heavy social media users were significantly more likely to trust what they found.

To verify news originating on a source you don’t trust, you can use the Co.Labs guide to doing your own fact checking.

This picture does vary across countries and age groups. In the U.K. and Denmark, traditional news brands continue to attract 80% or more of the online audience, while in Japan and the U.S. pure online players like the Huffington Post and aggregators like Yahoo have attracted a much bigger market share (56% used an aggregator in the last week as opposed to 32% in the U.K.) than in other countries. 25–34-year-olds in the U.K. show greater trust in news from social media than other groups.

Heavier consumers of news, who are more likely to use multiple devices, are even more likely to trust a traditional news brand, meaning that smartphone and tablet news users show more trust in traditional brands than other users and more often discover news directly on a brand’s site.

We may want our news verified by the stamp of a traditional brand, but we are finding it via social networks.

Social networks (predominantly Facebook) represent the most widely used way of finding news online for urban Brazilian internet users, and a widely used gateway in the US (30%), Italy (38%), and Spain (45%). For the under 35s – looking at the aggregated data across all of our countries – this is now the most important way in which people find news stories.

Americans are more likely than many countries to discover news via social media, especially if they consume news via tablets and smartphones. 47% of smartphone and tablet users in the U.S. got their news via social media via 27% in the U.K and they are also much more enthusiastic users of social newsreaders (22%) and news alerts (44%) than denizens of other countries.

How Editors’ Lab Amsterdam Sees The Future Of News

Editors’ Lab Amsterdam looks like any other hack day, with attendees skewing towards the young, male, and sartorially challenged, but the majority of participants are not actually developers but journalists or designers. “Traditional hackathons are designed for coders,” says organizer Antoine Laurent. “We want to involve journalists and designers since the main objective is to introduce them to a more collaborative way of working. So we can’t do only coding since then the journalists will just sit there being bored.”

Editors’ Lab, which is organized by the non-governmental organization the Global Editors Network, kicked off in Argentina in 2012 and has since visited newsrooms all over the world from India Today to the New York Times. On this occasion it’s hosted by the Dutch public broadcaster NOS. Teams must consist of at least three members: a journalist, designer, and developer. Some smaller newsrooms have to rent the latter for the event. At the Dutch Editors’ Lab, there were teams from the major Dutch newspapers, broadcasters, and small online-only newsrooms.

“The real difference we see between the teams in is product management,” explains Laurent. “Everywhere we have been, you have good coders but are they used to developing and managing news applications. Journalists also have to learn how to write down the requirements and specifications and how to talk to developers. In the U.S (The New York Times hosted the last Editors’ Lab), the room was full of teams who are already working in this setup in their newsrooms.”

Each lab has a theme, which in Amsterdam was “new journalistic tools for reaching young readers.” The team’s concepts ranged from the winner from national newspaper De Volkskrant, a site to help readers 7-12 years old to discover news, to a clever tool to replace the audio in a video with something funnier in order to entice young readers to watch videos on politics. “We focus on tools for journalists which a newsroom can implement to regularly and easily produce innovative content,” says Laurent. He highlights the winning project from Editors’ Lab Paris, a CMS plug-in using facial recognition to allow journalists to upload a picture, crawl the database of the newspaper’s pictures for related content, and tag people in the photo.

One project which caught my eye was from Internet-only news room Follow the Money, which specializes in complex financial investigations. “Last year before the presidential election in the U.S. Follow the Money did an article on Mitt Romney’s tax affairs that’s still the most read article on (national newspaper) De Volkskrant,” says Follow the Money collaborator Richard Jong.

Follow the Money wanted to address shorter attention spans and varying levels of knowledge of readers without dumbing down the content. The solution was Story Browser, a presentation layer on top of the original text. “Everything we see on the Internet is a pretty direct translation of straight, old-style newspaper articles,” explains Jong. “We decided to cut it into pieces and let clever algorithms decide what the order of the article should be. It’s a cloud of rich media chunks where the items which are hopefully most important for you are displayed proportionally larger. The complete article isn’t mapped into the diagram but only the most important fragments. You can read it in a chronological order, read in the order of what your friends think is most important, or what the editors say is most important. If you are logged into the website and we can see that you have never read an article on Bitcoin, we can display the explanation of Bitcoin larger.”

Jong built the new layout by scraping existing content and extracting titles, images, and the most important comments. “Often the comments on financial investigations are way more important than the article itself since they are from financial experts.” This was converted into a JSON file, imported into JavaScript infographics library D3 and used with a force-directed graph layout.

Many of the more ambitious features are not yet implemented, but Story Browser could be a very promising approach to giving new readers an overview of a complex story or set of related stories. “It’s a very ambitious concept and this is one way of doing that. It’s not the perfect solution,” says Jong.

What Journalism Can Learn From Open Companies (And Vice Versa)

Gittip founder and open company devotee Chad Whitacre did the unthinkable for a startup: He turned down an interview with TechCrunch. Here’s why he did it, and what that means for journalists and open companies everywhere.

Update: After reading the article below, Gittip founder Chad Whitacre invited me to have an open conversation with him about openness, software and journalism. The result was a 45-minute talk that I’m posting here without comment. For context, scroll down. Otherwise, here’s the video:

Yesterday, Gittip founder Chad Whitacre declined to be interviewed by TechCrunch unless he could record and publish the full, unedited conversation online. His reasoning? Gittip is an open company (in fact, it inspired payments startup Balanced, which we recently profiled), and he tries to do as many things out in the open as possible. TechCrunch declined, and Whitacre wrote about the experience on Medium. His main point is that he thinks making all interviews open provides more value for everyone than keeping them closed just so one publication can claim a scoop:

To me, that looks like it exposes journalism as a zero-sum game, and I don’t play zero-sum games, if I can help it. In my worldview, having multiple journalists conducting interviews and having multiple journalists writing stories based on those interviews is an overall win for readers and for humanity.

He’s right. As I’ve written before, in a world where anyone can break news and spread it around the world in seconds simply by tweeting, being first to a story isn’t very valuable. Publications like TechCrunch try to artificially create value by demanding startups give them exclusive stories, but this tactic will become increasingly less effective as companies and individuals find ways to reach large audiences without these publications.

Companies have good reason to go around places like TechCrunch. After posting his thoughts on Medium, two journalists took him up on his offer for an open interview: Brian Jackson of, and Mathew Ingram of PaidContent. In the interview with Ingram, which I live-tweeted for reasons I’ll discuss in a moment, Whitacre explained why he was comfortable offering TechCrunch something they would likely turn down:

“TechCrunch is a machine. How many stories of new startups are they stamping out every day? What value is it to me, building my company, to be just one of another stream of stories that floats by on TechCrunch?”

In this way, an open interview functions as protection from gatekeepers like TechCrunch that provide artificial value. This view is understandable given TechCrunch’s recent reputation for being unstable and full of conflicts of interest. But by trying to protect himself from places like TechCrunch, Whitacre also limits the coverage he’ll see from journalists trying to provide their audiences with more value than just a scoop.

In the interview with Ingram, Whitacre equates journalists synthesizing raw information into understandable stories with engineers creating easier-to-use abstractions from more complicated systems. This is a great metaphor, except that interview material isn’t analogous to a preexisting underlying system. Professional journalists use interviews to extract value that wasn’t there before. Otherwise, why not just interview yourself and post for all to see? (Ed. note: that’s called a press release.)

That’s why journalists rightfully feel a sense of ownership over their interviews. It’s one thing if all you’re doing is grabbing a few extra quotes to footnote a press release. It’s another if you’re trying to tell a complex, in-depth story like the piece Co.Labs editor Chris Dannen wrote about iPad app Paper.

Chris provides value in researching and writing a long piece like the Paper story by tying together many disparate threads of thought into one coherent narrative. He doesn’t just do this when he writes his article, he does it by asking the right questions in interviews. If we had live-broadcast these interviews as they happened, publications could have collected and posted all of the best bits with their own framing.

It would be analogous to releasing only the back-end for a web app and having other developers gain traction with their own front-ends before you have a chance to launch yours. You might have planned to release a really elegant and simple version, but now you’re starting in a hole, competing with everyone else’s features. That’s not a recipe for startup or journalism success.

It may not sound like a bad thing for multiple publications to put their own spin on an article, or companies to fork software for their own use. In the interview with Jackson, Whitacre calls this “open-source journalism,” and cites it as one of the reasons he prefers open interviews. In fact, this happens all the time in the form of re-blogging and aggregating. The difference is that re-blogging the interview before Chris had a chance to write his article wouldn’t be aggregation at all. Instead, it would force Chris to compete with himself to add value to the story.

In fact, this very problem came up during Whitacre’s interview with Jackson:

Jackson: “Actually I was just thinking that other writers could watch this and write the story for me.”
Whitacre: “So did you even see that somebody already tweeted that?”
Jackson: “Yeah I did, I was talking to that guy. I’m kidding. I will write this.”
Whitacre: “It’s up to you, right?”
Jackson: “It would be funny, though, if I was too lazy to write it myself I just do the interview and then I’m like ‘Oh, well, this guy wrote it up for me.’”
Whitacre: “It’s open-source journalism, it’s fascinating.”
Jackson: “Why not? Saves me the time. I’ll go write a different story, right?”

Jackson did write his own story, but it begs the question: why? Here I am, writing a piece of it for him. Did he blog his own version just so I would have somewhere to link to?

If you believe that professional journalists create value by interviewing, there’s one other problem with open interviews. If you watch a few of the open calls on Whitacre’s YouTube channel, you’ll notice that they’re fairly performative. Whitacre in particular seems to have mastered an informal but polished style that suits him well.

I mentioned above that I live-tweeted the interview as it happened. One of the reasons I did this was to see how both parties behaved, knowing for certain that they were being watched (by a journalist, no less). During the interview, Ingram admitted that he was “conscious of my questions because I don’t want to look like an idiot when someone watches our interview,” to which Whitacre responded that he would get used to it.

There are a lot of tech personalities who are good at manipulating the press via performance (a certain former Apple CEO comes to mind, for instance). One of the nice things about a private interview is that people often stop performing when they know that they can take statements off the record if they need to. Similarly, as a journalist you can try ways of getting answers that don’t necessarily come off as smart or polite without worrying about what people will think of you.

If you don’t think that’s a problem, scroll up and re-read the exchange between Whitacre and Jackson I posted. I wonder if Jackson would like to have that back? Maybe he’ll tell me in an open interview of his own.

Publishing first means breaking news–and maybe battering your own reputation. Because of the chaos of events like the Boston bombing, media outlets like the New York Post and the crowdsourced “FindBostonBombers” campaign on Reddit routinely identify the wrong suspects–and push reports to the public.

During emergencies like Hurricane Sandy or the 2011 London riots, false rumors and fake photos abounded on social networks like Twitter. When even professional journalists get it wrong, how can you tell fact from fiction and ensure that you are not sharing false information yourself? Do your own digital fact checking.

Claire Wardle is the Director of News Services at Storyful, a startup founded by veteran Irish journalist Mark Little, which verifies social media content like YouTube videos for news organizations. “Lots of people were sharing stuff around Hurricane Sandy which was fake. It would have taken them two seconds to do a reverse image search,” she says. ”After the recent helicopter crash in London, the Guardian had an image up for 2 hours purporting to be of the crash which wasn’t verified.” Storyful’s verification process combines tech tools and old-fashioned journalistic skills as described in a recent blog post on how the company verified a video taken during the Boston marathon bombings.

Reverse image search tools like Google Search by Image and TinEye return where an image appears online and therefore help you to track its original source and history. For example, during the London riots in 2011, there was a rumor circulating on Twitter that tigers had been released from London Zoo, including a photo of a big cat on a city street. The photo turned out to be of a big cat which escaped from a circus in Italy in 2008.

Also during the London riots Twitter users circulated a photoshopped image of the London Eye on fire. FourMatch is a paid image forensics tool ($20 buys you a demo key to analyze up to 30 images within 30 days) which can detect whether or not an image has been tampered with. TinEye also sorts results by how much an image has been modified.

When looking at video, Storyful tries to match the location where the video was purportedly filmed with the real location using tools like Google maps and Streetview. Wikimapia, Wikipedia for maps, is useful for identifying districts or buildings which appear on commercial map services like Google Maps. Maplandia has some similar features. Geofedia’s location-based social media search can help to determine if an image or video was actually sent from a given location.

Finally, Storyful checks the social media profiles from which images and video originated and links them to people. Use DomainTools to check domain ownership and Spokeo People Search or Whitepages to find information about people in the U.S. in particular.

What if you want to make it easier for images or video you capture yourself to be verified? Wardle’s first piece of advice is to geotag it. Only a tiny proportion of social media content is currently geotagged. ”Witness (a non-profit which highlights human rights abuses) has two pieces of technology: Informacam and Obscuracam. Obscuracam hides all metadata like location, is specifically for activists who want to give verification information.” The metadata includes information like the user’s current GPS coordinates, altitude, compass bearing, light meter readings, the signatures of neighboring devices, cell towers, and Wi-Fi networks.

“The behavior of the people on the ground is changing,” adds Storyful founder Mark Little. “The activist will point the camera at a minaret before going back to the focus of their story. The general public will start to realize that they should geotag their tweet if they want it to be seen. The motivated crowd will become much more literate in helping us help them.”

If you realize that you have distributed false information on Twitter, one enterprising developer has created a tool called Retwact, which lets you issue a correction or apology. Followers can view a side-by-side comparison showing both the older, incorrect tweet as well as the newer, corrected one.

This update contributed by Ciara Byrne

Today Reuters launched a preview of its next-generation web platform. Called Reuter’s Next, it’s a massive improvement over its legacy site for many reasons. It uses cutting edge technology, which Reuters Tech Editor Paul Smalera took to Twitter to brag about at length two days ago.

The site also adopts a design and presentation concept that’s becoming popular online called “river of news.” The idea is that because users often come into a single article on your website rather than your homepage, every article page should be filled with content, like a homepage. The Atlantic’s business-focused sister site, Quartz, is a pioneer of this concept.

Reuters is taking the concept a step further by treating visits to an article as a signal that a reader is interested in a particular topic. When you click on an article about Facebook, the site will search its database using a new deep tagging system and surround the story with links to other articles about Facebook. In an interview with Nieman Labs’ Justin Ellis, Reuters Head of Product Alex Leo said their goal is to help readers fully understand the subject of the article:

“We wanted to create an experience for users that would give them the right amount of breadth and knowledge that they need from Reuters.”

This sentiment hits right at home for us, because one of the goals of our Co.Labs Tracking stories (you’re reading one right now) is to provide you with a choice: If you’re already familiar with the story, you can stop reading after this update. If you’re new to it and want more information, all you have to do is scroll down. We don’t assume either.

Reuters’ new site is a great first step, but there’s one problem: Even as the website tries to be helpful by giving you context if you want it, every article is still a traditional 800-word news article with a structure that assumes you’re new to the issue. If you want to get all of the new information out of it, you have to read through the entire article, background paragraphs and all.

At the end of the day, content is still the main reason people visit a news website. Adding context as an additional service Reuters provides to the reader is great, but the 100-year-old article format makes it far less valuable than it could be. We’d like Reuters continue to experiment–not just with its website, but with its content.

Publishers will be frightened to hear that Facebook is the new home of grammar–and any Facebook developer will tell you so. Spend some time building a Facebook app (or perhaps now, a Parse app) and you’re presented with the Facebook App Dashboard, which is essentially a dynamic mad-lib generator with a few different inputs.

But yesterday Facebook announced that its semantic efforts are much more ambitious than perhaps anyone previously thought, forcing us to consider the idea of machine-written and machine-interpreted language as a part of our normal, everyday conversations, many of which happen (these days) on Facebook.

If you’re not familiar with the kind of “language” Facebook speaks today, you can see it has a kindergartner’s command of English from the Open Graph overview:

The actor: This is the person who published the story; The app: This is the app that publishes the story on the actor’s behalf. Every story is generated by an app and every story includes the app used to create it; The action: This is the activity the actor performs, in our case ‘finished reading’; The object: This is the object the actor interacts with, ‘The Name of the Wind’, a book.

Developers can create custom subjects, objects and verbs, too:

Objects are publicly accessible web pages hosted on the Internet, and almost any web page can be an object. Objects are public information. If there is no common action available that meets your needs, you can create your own custom action type. For example, if you’re building an app to track rock climbing you may want to make an action ‘climb’ where the object is a mountain.

If Facebook learned to be a little more linguistically flexible, there’s no reason it couldn’t write status updates for you, publish blog posts about trips you’ve taken, or compile other summary/analysis writing based on passive feedback from other apps that are connected to Facebook. This is Facebook becoming your personal narrator and scribe.

It’s all explained in a 2,700-word technical post Facebook published yesterday about the lexical analysis built into its new Graph Search engine. It’s no surprise that, as Facebook teaches itself to speak using intelligible statements, it might also learn how to read what people are writing. But this sort of lexical analysis is much more powerful than Facebook is letting on, because it allows Facebook to associate nodes on their network with a potentially unlimited number of sobriquets–making the computer better at understanding what you’re talking about with little or no context.

Semantic understanding without context is most vital in search, and especially on mobile search, so this area of Facebook research should be no surprise given that Facebook Home is probably a nascent attempt at grabbing smartphone OS marketshare. But semantic technology doesn’t have to be restricted to tasks like search; in fact, this sort of lexical analysis makes it easier for the page to “listen in” on the topic being discussed on the page, serve targeted ads, or spawn calls to action (the way Facebook reminds you to hit up old friends). From Facebook’s post:

Our team realized very early on that to be useful, the grammar should also allow users to express their intent in many various ways. For example, a user can search for photos of his friends by typing:
“photos of my friends”
“friend photos”
“photos with my friends”
“pictures of my friends”
“photos of facebook friends”

Facebook has also taught the system to compensate for subject-verb agreement errors and other common grammatical mistakes. But where things get even more portentous for publishers is when it comes to Graph Search’s focus on synonyms, dialects, and slang:

The challenge for the team was to make sure that any reasonable user input produces plausible suggestions using Graph Search… The team gathered long lists of synonyms that we felt could be used interchangeably. Using synonyms, one can search for “besties from my hood” and get the same results as if he had searched for “my friends from my hometown.”

If you think that the way people talk on the web will be impossible for machines to ever comprehend, think again. Facebook says it is trying to make Graph Search useful even for incredibly vague, poorly worded queries using some incredibly clever parsing:

Our grammar only covers a small subspace of what a user can potentially search for. There are queries that cannot be precisely answered by our system at this time but can be approximated by certain forms generated from the grammar. For example, “all my friends photos” -> My friends’ photos… In order for our grammar to focus on the most important parts of what a user types, the team built list of words that can be optionalized in certain context: “all” can be ignored when it appears before ahead noun as in “all photos”, but shouldn’t be ignored in other context such as “friends of all” (which could be auto completed to “friends of Allen” and thus shouldn’t be optionalized).

Facebook is building on some pre-existing technologies here, to help handle some of these unpredictable queries using abstract associations:

Our team used WordNet to extract related word forms to let users search for people with similar interests in very simple queries “surfers in los angeles” or “quilters nearby” … In Graph Search, we use a variation of the N-shortest path algorithm, an extension of Dijkstra’s algorithm, to solve the problem of finding the top K best parse trees. Our biggest challenge was to find several heuristics that allow us to speed up the computation of the top K grammar suggestions, thereby providing a real-time xperience to our users.

In fact, Facebook has even taught its system to understand and ignore oxymorons:

A naïve, context-free grammar would allow the production of a wide range of sentences, some of which can be syntactically correct but not semantically meaningful. For example, Non-friends who are my friends; Females who live in San Francisco and are males; Both sentences would return empty sets of results because they each carry contradictory semantics. It is therefore critical for our parser to be able to understand semantics with opposite meanings in order to return plausible suggestions to users.

Today, anyone can report new information and watch it spread through their social network, sometimes reaching millions, without the intervention of a professional journalist. In the days before Twitter, disseminating breaking news took time. To reach a meaningful audience, information had to pass through a series of reporters and editors en route to a television screen or page in a newspaper.

This example of technology replacing a classic journalistic function is a direct threat to the livelihood of many news organizations who once made their names by being the first to every story. To adjust, outlets like CNN and the Associated Press have joined the Twitter rat race and adjusted editorial processes on other mediums to move quicker. But in a world where everyone knows about breaking news almost instantaneously, the rush to be first can cause more problems for news organizations than it solves.

Take, for example, the Boston Marathon bombings. After the explosions, information spread like wildfire. Some of it was false, as is always the case in a breaking news situation, but instead of acting with restraint, many news organizations rushed to report falsehoods and speculations as fact. For example, CNN reporting that suspects had been arrested on Wednesday night, or the New York Post clinging to a rumor that 12 people died in the explosions, or everybody and their mother reporting that the government had cut off cell phone service in Boston to prevent further remote detonations.

Ironically, we’re talking about this issue because despite the democratization of news, journalists still hold real power. Twitter accounts like @AP and @CNNBRK have millions of followers who trust them to report accurately. If you need evidence of that, look no further than when a hacker broke into the AP’s Twitter account and sent a fake tweet that caused automated stock traders to momentarily send the Dow Jones Industrial Average tumbling 143 points.

Circa editor-in-chief Dave Cohn called it an inevitable product of newsrooms trying to be first:

We have the unstoppable force of news organizations that want to be first and want as much attention as possible, especially in times of breaking news. On the other hand we also have the immovable object of technology platforms like Twitter that will inevitably be where news breaks and where people flock to get information, especially when there is breaking news.

I like this description because it correctly summarizes the conflict between two fields converging, but I wonder if news organizations need to be unstoppable. In the wake of both events, several future of news pundits called for journalists to stop trying to be first and instead find other ways to compete with social networks.

Here’s CUNY Professor Jeff Jarvis on an alternative model:

The key skill of journalism today is saying what we *don’t* know, issuing caveats and also inviting the public to tell us what they know

If I ran a news organization, I would start a regular feature called, Here’s what you should know about what you’re hearing elsewhere.

Similarly, Mike Annany wrote about silence and timing for the Nieman Lab:

The Internet makes it possible for people other than traditional journalists to express themselves, quickly, to potentially large audiences. But the ideal press should be about more than this. It should be about demonstrating robust answers to two inseparable questions: Why do you need to know something now? And why do you need to say something now?

Both of these authors make the same basic argument: There are ways to add value to news consumers without sacrificing accuracy by always trying to be first. This idea is compelling, but for news organizations worrying about disappearing bottom lines, Dan Gillmor’s argument in the Guardian ought to resonate more:

Information providers forfeit some trust every time they make mistakes. That eventually, one hopes, affects the bottom line, or in a social context, the confidence of one’s friends and peers.

In an environment where trust is just about all they have left, perhaps publishers would be wise to rethink whether speed or accuracy is more valuable to the consumer.

The decline in value isn’t limited to consumers. Newspaper and magazine publishers used to serve a vital function for advertisers as well. Before the Internet, print publications were one of the few places advertisers, especially local ones, could reach large, wealthy audiences. This gave publishers total control over pricing, making print advertising an extremely profitable business.

Today, no single website has a monopoly on any audience. If advertisers want to reach New York Times readers, they can do so on a variety of similar websites just as easily as they can on the Grey Lady’s site. Not only that, but advertisers now know more about these audiences than ever before. They can tell exactly how many users see ads, whether they click or engage on them, even target individual users based on past interactions. As a result, control over pricing has been flipped on its head. If your prices are too expensive, an advertiser can go elsewhere and reach the same exact people.

How are publishing companies coping? They’re rethinking the value they offer to advertisers.

One approach is to sell so-called “native advertising,” which publications like The Atlantic and Buzzfeed are pursuing with a decent amount of success. The thinking goes something like this:

“The real value we offer isn’t the audience on its own, it’s our ability to produce content that our audience wants to read. We can sell that expertise to advertisers.”

When this tactic works, the result is content that consumers want to read, share, and interact with at much higher rates than traditional banner ads. This means organizations can charge more and, because the content is tailored specifically to the site it runs on, publishers regain some of the audience exclusivity they could claim in print.

When it backfires, however, the result can be disastrous for the organization’s credibility. Such was the case when The Atlantic published an advertisement celebrating the Church of Scientology that felt completely at odds with the publication’s core value of intellectual integrity.

Thanks to the Internet, most of the wealth of human knowledge is instantly and freely available. That’s tough luck for businesses that used to make money by charging for access to information, but it’s not stopping some publishers from taking a gamble that their stuff is still worth paying for.

High-profile publishers like the New York Times have put up paywalls that require readers to pay for unlimited access to their websites, with some success. The Times announced that while ad revenue decreased by 11.2 percent in the first quarter of 2013, subscription revenues increased 6.5 percent.

The tactic isn’t a panacea, however. USA Today publisher Larry Kramer infamously told the world his newspaper’s content “wasn’t unique enough” to charge customers for its content. He’s probably right. It shouldn’t come as a surprise, but research by University of Missouri Journalism Dean Esther Thorson has found a direct link between newsroom investment and paywall success:

Input into the newsroom in dollars had far and away the greatest impact on all sources of revenues — both advertising and circulation.

This leads me to a most obvious takeaway: If you want readers to value your content enough to pay for it, you should probably value it at least that much, too.

Stay Tuned For More Updates

[Image: Flickr user Kaffeeringe]


About the author

Lapsed software developer, tech journalist, wannabe data scientist. Ciara has a B.Sc. in Computer Science and and M.Sc in Artificial Intelligence


#FCFestival returns to NYC this September! Get your tickets today!