How can artificial intelligence enhance the news experience? Can computers do parts of the editorial workflow better than humans? Is it possible to build automated systems that recommend the most relevant content to individual readers?
These are among the questions the machine learning team tries to answer – in close cooperation with other product & tech teams in Schibsted News Media.
“Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on models and inference instead.”
Data scientists and software engineers
7 people. 3 are software engineers and 4 data scientists. 5 are based in Stockholm and 2 in Oslo. This is the machine learning team in Schibsted News Media.
The team started out as the Natural Language Processing team. At the time, the focus of the team was to write models to “understand” natural language. Examples could be to decide whether two articles were about the same topic, whether a messaging conversation was about greeting or bargaining, etc.
When Schibsted split into the Marketplaces and Media divisions in 2017, the team changed name to the Machine Learning team.
– Machine learning and Artificial Intelligence is one of the megatrends in the technology industry toda, and requires a very specific skill sets. The field is developing fast, although it also has its limits. In Schibsted we need top talents and experts that both understand this new technology and can help us understand how to apply it to our business, says Sigurd Seteklev, product lead for Data & Insight.
Video: How we use machine learning to automate front pages
Curious about how the machine learning team works? In this video data scientist Fredrik Jørgensen explains how the team develops algorithms to help automate news front pages.
The machine learning team is one of many teams that contribute to Curate, Schibsted´s project to develop personal, algorithm-driven front pages to our readers.
Using mathematics to understand journalism
Several of the projects in the machine learning team have been to automate processes that previously have been done manually, for instance in the editorial area.
But for computers to understand words, they need to be translated to numbers. The data scientists typically apply advanced mathematical models to help the computer understand journalism.
One example is the video recommendation project: The team was asked to develop algorithms that could recommend users what to watch next when they have finished viewing a video on one of Schibsted´s news sites.
Traditionally journalists have picked the recommended videos manually. But this is both hard to do well as well as time-consuming.
In cooperation with the Stream team (responsible for Schibsted´s video platform) the machine learning team built algorithms to have the computer automatically suggest the potentially most relevant videos.
Solution: Similarity search
But how do you find the most relevant video among thousand of alternatives? What exactly makes one video similar to another one?
– The first step was to define similarity in some way. After that, we needed to implement a search for that similarity. And the last step was to make sure that the data are updated continuously, says software engineer Samuel Rivas in the machine learning team.
And then mathematics is put to work. Words are translated to vectors, which are essentially a long list of numbers showing a relationship. Once the computer has learned to map words to vectors, it can see how “similar” words are by looking at their distance to each other.
– To construct the model we have used all the articles published in Aftonbladet and VG, says Samuel. The model is then applied on the metadata following each video, such as the title, description and category.
Samuel points out that it is important to use relevant text when constructing the model. – By using articles from our own news media, the computer learns the relationship or meaning of words in this particular context. If we instead had used texts from ads on Finn or Blocket, for instance, the model would be worse at understanding news videos, but would have worked better in finding similar ads.
Measuring the results
During the last months, much of the team´s resources have been spent on developing algorithms for automated news front pages.
This work is part of Schibsted´s Curate project, with many different teams involved. Aftenposten recently launched it’s partially automated and personalized front page for all readers. The algorithms used to sort many of the news stories have been developed by the data scientists and software engineers in the machine learning team.
– Basically, we have been trying to develop algorithms that replicate the workflow of the front page editors. This frees up time for the editors for more creative tasks, says data scientist Fredrik Jørgensen.
The machine learning team has been involved with several other projects. They include:
- Mapping products to categories for the price comparison site Prisjakt
- Suggesting tags for editorial articles
- Predicting the intent in messaging conversations
- Mapping candidates to job ads
- Clustering articles by topics
– We are a very mathematics-heavy team, says software engineer Samuel Rivas.
He explains that they have developed a cross-disciplinary team. – We either look for software engineers with a strong interest in data science or for data scientists with knowledge of software engineering.
Product lead Sigurd Seteklev in the Data & Insight department is impressed with the results of the team so far:
– The team has already demonstrated that there are many opportunities to improve our products and process by using machine learning. We are extremely happy that we have been able to recruit the amazing talent we have in the team, and we really look forward to understanding even more how machine learning can change the media industry and how Schibsted can take a leading role in these changes, he says.