Journalists. Listen up. The robots aren’t coming for your jobs. Not now anyway, and probably not in the future if you’re any good.
This is the message I’m taking to SXSW this year, where I’ll be on a panel with journalists from the Washington Post, the New York Times, and Norwegian news agency NTB. I’ll be there to hammer home the point that robot writers as less of a threat and more of a tool.
Natural Language Generation (NLG) is the term for creating human-sounding narrative out of data using algorithms and machine learning. It’s what we do here at Automated Insights and I’ve been doing it for over six years. In that time, Automated Insights has written billions of articles, ranging from sports recaps to quarterly earnings news reports to hyperlocal real estate market reports.
For the record, I can count the number of journalists we’ve put out of work on one hand. Actually, I’m kidding. I don’t need a hand. It’s zero.
In fact, NLG is already being used by those journalists who crunch data as part of their standard reporting method. As the science evolves, NLG and high-quality automated content are going to enhance journalism and media in three key areas: Reach, Depth, and Speed.
So let’s begin by defining what these writing robots actually do.
The video below is a representation of our automated content engine creating two articles. The first is a fantasy football matchup recap we produce for one of our customers. The second is an example of a Quarterly Earnings Report article we produce for the Associated Press.
In real life, we’re much faster (and much more complex) than this representation. We create articles using algorithms — word-by-word, sentence-by-sentence, and paragraph-by-paragraph — and we deliver them at a rate of up to 2,000 articles per second.
Automated Insights started automating content from raw data back in 2010, when we built the first commercially available NLG engine. We spent the next few months standing up over 800 websites, one for every college and pro basketball, football, and baseball team in the US, and published content to each of these sites automatically, up to five times a day.
The Tipping Point: Fantasy Content
With the success and popularity of those websites, we were able to sell fantasy-related content to the two largest providers in the space. Three times a week, we provide tens of millions of fantasy managers a human-sounding recap or preview, written just as it might appear in a traditional sports section.
The fantasy football articles did a few things for us. First, they spotlighted how our NLG engine could change purpose, language choice, and tone from article to article.
More crucial to our customers, the fantasy football providers, our plain-English articles demystified the number-crunching and data-nerdery of fantasy football for the more mainstream fantasy manager. This allowed for broader participation, and thus broader engagement, from a new, more casual fantasy football fan.
But most importantly, our fantasy content allowed our customers to create tens of millions of pieces of informative, valuable content where that content heretofore had been impossible to create.
Soon, Automated Insights expanded into dozens of verticals, including finance, marketing, healthcare, business intelligence — basically any industry that has a lot of data and the need to communicate that data efficiently and insightfully. It turns out that’s almost every industry. In fact, you’ve probably read quite a bit of our content without even realizing it.
How We Automate Content
This is the basic recipe for how automated content works, although to be totally transparent, we do indeed have patents and secrets around how we produce the highest-quality content at such great speed. It all starts with data.
In a sports context, this is the data produced from a game or match. For finance, it could be the price of a company stock.
Whenever new data hits the system, our NLG engine analyzes this new data and creates another narrative. To make these stories more robust, we almost always integrate historical performance data into the analysis. For additional context, we usually apply third-party data — this can be geolocal data, weather data, even social media data, as long as it relates to the original data in an informative manner.
We automate the data science for each set of data, and discover the most relevant and important facts related to the purpose of the article. That purpose, the lede or topic of the article, can be one of several dozen or even a hundred different scenarios, determined by what the data tells us. For instance, one game recap might be about a thrilling finish while another might be about an outstanding individual performance.
We then prioritize those insights and present the most important facts first, in the same way a human would write about them. We use a custom lexicon and tone for each type of article we create. We also edit as we write, which allows us to make changes based on what we’ve already written or to refer back to a fact we’ve already stated.
Finally, we deliver the article via web, mobile, email, any format the customer chooses, and we can use our own API or a custom-built API to automate the delivery to the customer’s specifications, even integrating with publishing software the customer already uses.
Why Automate Content?
Automation is good at some things, not so great at others. Automation has trouble with quote context, sentiment, investigation, reasoning, and so on. Machines can do that, but the application of such logic must be very targeted for it to have much value.
So that begs the question: What’s the purpose of automated content?
To reiterate, it’s not replacing journalists. Automated content provides the most benefit in situations where humans can’t write or won’t write. As data accumulation grows at an increasingly rapid pace, automation can handle the analysis of larger and larger sets of data much more efficiently than humans.
Turning back to sports, the traditional way of describing a game or match with data was the boxscore, which provides a few dozen data points for analysis. Over the last 10 years, play-by-play data has become standard, upping the data volume to a couple hundred data points per game. This isn’t overload for a human analyst or journalist, but it can be quite time-consuming to review.
Today, live camera and sensor data being captured at the professional level in almost all sports, it’s common to capture hundreds-of-thousands and even over a million data points per game. This is way more data than a team of analysts and journalists can digest, especially in near-time or real time.
Automated content can do this analysis relatively quickly, even in real time, producing insights and then presenting those insights to the journalist in a consumable and usable format.
The Future: Reach, Depth, and Speed
So when we talk about the future of journalism as it relates to automated content, there are three key areas where automation will allow journalism to expand.
Let’s first talk about reach. Automation allows journalists to cover more events, regardless of the size of the population that the event impacts or the location of the event itself. This is often referred to as the “long tail,” a tail which has been reduced over the last few decades as newsroom budgets shrink.
We recently partnered with the Associated Press to cover every Minor League Baseball game in the US, regardless of the location of the game or the size of the fan base. The AP can now cover hundreds of games every night without the expense of sending reporters to those games.
By the way, in every case, the automated coverage did not replace reporters that were already covering teams, but it did provide new coverage for the teams that weren’t being covered at all. This coverage used to be cost-prohibitive. It isn’t anymore.
It’s the same story for stocks and quarterly earnings reports. Before partnering with Automated Insights, the AP was able to cover about 400 companies every quarter. Today, they can cover over 5,000. In a recent study conducted by Stanford University and the University of Washington, it was concluded that this additional coverage has actually boosted trading volume and liquidity over time.
You may have heard this stat bandied about elsewhere: According to se Dragland at the Scandinavian independent research firm SINTEF, 90% of the world’s data has been created over just the last two years. Applying automation to this explosion of data allows for a new level of depth, including advanced analytics, deeper historical comparisons, and real-time analysis of smaller components of a story that might be missed by the human eye and brain.
The final facet of the future of automated journalism is speed. Our platform can churn out unique, human-sounding, personalized articles with visualizations and flexible formatting at the rate of over 2,000 per second. As we continue to hone this speed, we’re able to produce real time insights on larger and larger sets of data, opening up a new world for broadcast and second screen opportunities, such as what’s represented in the video below.
This kind of reporting goes beyond play-by-play, because we can put the ever-changing data into context along several axes at once. In sports, this could be lineup assessment, individual matchup comparisons, even player fatigue. We can compare a play to a similar play, a player to another player, or a player to the same player at a different time.
Personalization: The Final Frontier
As information gathering gets more robust and more complex, the ability to aggregate, curate, and deliver specific, actionable information to an individual in a timely manner is going to be critical. All three of the aforementioned components — reach, depth, and speed — can be complemented by personalization.
In other words, in the future, the news will be your news.
For example, instead of having to search to get updates on stocks in an individual’s portfolio, that portfolio content can be curated and delivered automatically. This would happen not just daily or weekly, but whenever a threshold is triggered — say a stock in the portfolio rises or falls to a certain level.
Think of this definition of news as super-hyper-local, with locality not only defined by physical proximity, but personal interests, the information that’s the most valuable and actionable to the individual. That means crime news at the neighborhood level, financial news that hits the wallet, and sports news for your local team that wasn’t available otherwise.
The ultimate in personalization, our direct-to-consumer NLG product, Wordsmith, is a software-as-a-service model that brings the power of automated content to everyone. It’s as simple as using a content management system or spreadsheet, but almost as powerful as the software we program to deliver thousands of earnings reports for the AP to its millions of customers.
The result is that not only can we cover any event with automation, but YOU can cover any event with automation, and deliver that coverage in any format.
The video below is one of my favorites to show. It features two of our employees playing ping-pong in our break room. The ball is being tracked with a sensor, the data is sent to Wordsmith, which then automates real-time commentary of the match. That commentary is then ported to Amazon’s Alexa, which vocalizes the commentary and color. The whole thing is streamed via live video, providing a professionally-announced live ping-pong match to the world, played by two of our employees in our break room.
They also get a professionally-written recap at the end.
In January, Automated Insights hosted a two-day Wordsmith and Alexa Hackathon, bringing in over a dozen companies from across the country, including Amazon (of course), AP, Tibco, STATS, and others, to build unique and exciting new ways to power automated content and news creation and delivery with the Wordsmith NLG platform and Alexa’s voice input and voice delivery.
The results were astonishing, and you’ll see these types of use-cases for automated content in production soon.
In fact, in the not-too-distant future, it’s our belief that every newsroom should be using Wordsmith, as the single most-important tool to leverage automation to increase their reach, depth, and speed. It’s my goal to make that happen — easily, painlessly, and with high return on the investment in time and resources.