'We are just starting to see the full spectrum of data science applications in agriculture'

Machine Learning and AI in Life Science and Healthcare

22 Sept

Written By

With climate change and a growing world population to contend with, the agriculture industry is under pressure to find greener solutions. We spoke with Robert Berendes, Executive Partner at Flagship Pioneering and chairman of Indigo Agriculture, to find out how data science and artificial intelligence could drive innovation along the entire agricultural value chain

idalab: Data science is often even described as a driver of innovation in agriculture. In your opinion, what are the big, innovative applications?

Robert Berendes: I think we are just starting to see the full spectrum of data science applications in agriculture. Intelligent data processing to generate new insights is still very much virgin territory in the agriculture sector. But whenever it is applied, it creates amazing new insights.

What specific areas are you thinking about?

Robert Berendes: ‘Data science could take agricultural research processes that need six to eight years and reduce them to months or even weeks’

If you want to innovate in the agricultural business, you have to run trials in the field or lab for years. Only by aggregating this data can you get real insights. Going beyond classical statistics it would, one day, be possible to simulate the entire physiology of plants based on their genome.

Some companies are very well positioned in terms of data to make accurate predictions about the behaviour of plants in the field, with significant impact on the performance of crops.

For instance, when I put a new drought-resistant plant on the market, there are some crucial questions that need to be asked. Under what moisture conditions does the plant grow well? How much drought can the plant withstand? What is the performance degradation when there is too much rain?

The new plant should ultimately be better than its predecessor under all conditions. Achieving this is very difficult and requires a huge number of trials in the field. If you have the corresponding data science capacities and can simulate in silico, it saves an incredible amount of time.

I think that the possibilities of data science in agriculture are far from exhausted and will, in the medium term, lead to a massive acceleration in the development of new and better plants. A process that otherwise takes maybe six to eight years could be reduced to months or even weeks.

You’ve mentioned the simulation of the entire physiology of a plant, based on the genome: a bottom-up approach, based on our current understanding of the physical-biological reality. The other paradigm, top-down, would start with the data to discover significant correlations. What is the dominant paradigm today and what is to come?

Both can currently be seen in action, with the top-down approach clearly dominating so far. The weaknesses of the top-down approach, however, lie in the various hurdles in the agricultural sector, which are not so easy to overcome.

Currently, a plant is primarily considered as a complete unit without substructure. The effects and reactions are observed, but not attributed to processes inside the plant.

Usually these reactions are then correlated with data from satellites on soil, precipitation, etc to make predictions. In my opinion, however, we only get a small gain in knowledge here.Predictions are much improved when the companies also have the exact data for the specific fields.

But farmers are unfortunately extremely leery of sharing their data. That puts companies like John Deere, which sell the machines that collect such data to farmers, in a crucial position.

A real breakthrough will also only come when we merge the top-down and bottom-up approaches. With the bottom-up approach, you simulate the plants and therefore you don't need such a high level of detailed data for the externalities. That's still enough to generate phenomenal insights. Here, we are just at the beginning of a steep development curve.

In a plant-based simulation like this, what are currently the hardest things to simulate and why?

This is, of course, a hot topic at the moment. Typically, in plants, we always have the genetic information and can map the complete physiological processes. However, there is still some confusion between genotype and phenotype; gene expression, for example, cannot be completely mapped.

Moving away from R&D, what is the status of data science along the entire value chain?

The value chain from input provider to farmer will change significantly in the coming years. Currently, there is a huge apparatus stretched between these two parties.

There are resellers who earn a lot of money by supplying the farmer with information, product selection and logistics. They take a large margin for this service – but, in my view, that distribution step doesn’t add enough in its current form.

By taking a data-science approach to rethink this value chain and make the data transparently available to the farmer, distribution could be reduced to the core logistics area and the margins could be shifted towards the input provider and farmer.

For farmers in particular, it really is an untenable state of affairs at the moment; they are being put through the mangle by the big boys up- and downstream and are the lowest paid link in the chain.

Another aspect I'd like to mention is on-farm tooling. Here, farmers have accumulated experience over generations regarding the handling of certain plant species. At the same time, there are a great many "agricultural consultants" who want a piece of the pie, but aren't that much cleverer than the farmer. If data science can be used to dramatically improve farmers' decision-making, this effect should not be underestimated. Farmers want to see hard facts, and data science can provide robust information.

The element of data availability is very interesting. How do the big players behave here? Are there efforts to establish systems that enable access to this data?

The big players are, of course, extremely interested in using data from the field, and it’s actually the machine manufacturers that have the best access to that data at present, because their systems can automatically record it.

At the same time, manufacturers are actually also selling sensor technology to collect data about the plants. There are efforts to systematise this data collection, but the data mostly remains with the machine manufacturers - we only see the occasional partnership. John Deere, for example, has just partnered with Corteva so that its data can be used in the seed business, and we can expect to see more from this type of partnership in future.

Machine manufacturers have the best access to data from the field, because their systems can automatically record it, but they have made few partnerships with seed manufacturers to share that data

What do seed producers expect from the data?

Ultimately, the logic is this: you need a range of technologies to be sustainably successful. Certain plants react differently to certain pesticides or fertilisers, and pests build up resistance over time. If you only work on fungicide innovation, for example, eventually you will no longer be able to control certain diseases. That means integrated technology has a real innovation advantage because it delays the development of resistance. If you extend the life of a blockbuster product in this way, you are talking about billions of dollars in additional value creation.

The pace of innovation in this area is also not exponential. With a single tool, you therefore reach the limits incredibly quickly. This is why data science is so relevant for core strategies in the modern agricultural world: There are highly complex applications and products that need to be correlated, which is why digital agriculture is increasingly being seen as a crucial tool to achieve success in this area.

When we talk about innovation in agriculture, it often seems to be very capital-intensive in an R&D-heavy environment. How do you see that? Can small companies hold their own?

There is currently an incredible amount of money – billions in venture capital – available in the field of agriculture. But to get your hands on it you need to have talented people with good ideas.

You definitely need less money than in pharma. Although the regulatory process is similarly lengthy and hostile to innovation, the process is much simpler and cheaper, particularly on the non-pesticide side.

If you are in the field of digital agriculture, you can already start operating with single-digit millions of dollars. A lot is possible without having extremely high investments.

The topic of data ownership seems to be highly relevant for future developments. You mentioned that machine manufacturers have a good position here. How will the data business develop in this area?

My hypothesis is that the data input space will commoditise quickly. The number of data sources is growing rapidly, even if we look only at new sensor technologies or the possibilities of data acquisition with drones.

However, this is not directly relevant to sales, as it is completely unclear how much data will be needed in the future. My guess is that the price of data will continue to fall. I see the biggest leverage in generating insight, in building the algorithms that can lead to real value creation.

Certainly, there's still the question of what's happening in the data delivery space. There is a lot of momentum here as well, but there are more people who can build apps than people who can build robust algorithms.

For you as a company founder and incubator, what are the criteria you use to invest? Is there anything agriculture-specific?

We want to know whether the newly created company has a bottom-up approach. So, has it decoded the interaction between the plant and the externalities in a special way?

The key question here is: does understanding or modelling at the molecular level provide insights that the farmer cannot map using their empirical experience?

Also, where does the data come from that is used to generate real insights? If you're relying on data that can be cut off by the provider at any time, you have an essential problem with your business model. And what model does the company use to ensure it gets its fair share of the profits from the insights?

In particular, the point about the business model is interesting. What frameworks are you considering?

If you have a classic software model, the farmer might pay a dollar an acre, but carries all the risk of whether the great insight that you sold them will materialise. That's why they're paying so little – and not the $30 per acre that would be appropriate, given that the farmer gains $100 per acre more in revenue because of you.

The dominant model in the agricultural industry has always been that the farmer pays at the beginning of the season, against a return-on-investment logic that is shown to them in advance (or, to put it another way: "With my product you make X more yield, I want X/Y of that").

So the supplier takes his money at the beginning of the season and the farmer is left out in the cold – quite often, in agriculture, literally.

The farmer also has to ensure that the additional yield that was promised is generated. However, this model has massive weaknesses for data science products. Imagine that you are no longer selling a physical product via this model, but a software product. The farmer will just end up scratching their head.

Apart from your portfolio, what are some other exciting companies we'll be hearing from in the coming year?

I'll name two companies that I think are exciting, also because they have an interesting approach in terms of the points I just mentioned.

1. Farmers Business Network

This company is set up like a cooperative, trying to create a community feeling by involving farmers in the growth. They may well have developed an interesting business model and could also make advances in the digital agriculture space because they can engage, motivate, and mobilise customers in a very different way.

2. AgBiome

AgBiome also has an interesting business model. They work for various partner companies (including Monsanto, Syngenta, etc), but outsource the projects to separate company units. So the different projects are separated and the company is able to use one technology platform to serve different partners, who are also strong competitors for each other. AgBiome is also very innovative with regard to biopesticides.

Robert, thank you for this fascinating conversation.

My pleasure.

Robert Berendes is an executive partner at Flagship Pioneering, focusing on innovations that address sustainability in the agriculture and nutrition sectors. He is chairman of Indigo Agriculture, CiBO Technologies and Invaio Sciences, and also serves on the board of Inari.