Defined deliverables: the key to successful AI proofs-of-concept

Vague setups condemn the majority of AI proofs-of-concept to being inconclusive. Does it actually work? Is it truly worth it? You’ll only find out if your PoC team has a list of clear deliverables 

Whatever your project – be it a kitchen redesign or a large IT system rollout – you wouldn’t launch without knowing exactly what you wanted to get out of it. Not just a beautiful kitchen, for example, but one that consumes less energy, or has enough storage and worktop space to cater for a family of five. It’s these specific deliverables that define the shape of the project, and against which its success will be measured.

The goal of an AI PoC seems deceptively clear: to find out whether an AI system is both technically feasible and justifies the effort. But against which criteria should this be judged? What does the PoC team need to bring back to management so that they can make that call? Are we talking about simply a bit of code, a short report or some data? 

This uncertainty is hardly surprising, given the nascent nature of AI as a whole. AI PoCs are a new type of project – a mix of engineering, business analysis, product design and algorithm development, and sometimes even fundamental research. Where your kitchen refurb or IT project rollout tends to be a matter of organising known components and steps into the correct order, an AI project almost always contains an element of fundamental uncertainty. 

But AI PoC teams tend to get too excited about the algorithm element, paying little attention to other critical testing areas.

The only effective strategy we’ve found to counter this is to specify deliverables precisely; your PoC team need to know exactly what they’re expected to come back with once they re-emerge from the rabbit hole. As Dr Mikio Braun, a pioneering AI architect, once said: “The key to running AI PoCs successfully is an almost paranoid focus on risk management.”

Talk of whether an AI project “works”, is “worth it” or will generate ROI is too ambiguous. And it all depends on where you’re standing. The distinct viewpoints that make up an AI PoC project – the business, engineering, design and algorithm elements – necessitate a more rigid framework. 

Below, we’ll examine the three main (and equally important) categories of deliverables for an AI PoC – system prototype, data and business analysis – and highlight a few best practices to follow when signing off an AI PoC.

System prototype

The prototype is an experimental attempt to implement the system and find out whether it is technically feasible and what it depends on to function.

This prototype can be further divided into four areas: data extraction and preparation; the calibration pipeline; the live component; and the user interface (whether that is a UI/UX frontend or a REST API).

The critical part of the system can be both anywhere and everywhere. Maybe data preparation is particularly challenging, because biological entities need to be extracted from scientific publications. Or it could be the user interface that presents significant hurdles if, for example, it needs to enable the user to specify prior beliefs or gut feelings in an intuitive way. Imagine, for instance, an application helping a medicinal chemist to judge pockets of a protein for druggability; given the complexity of the task, being able to incorporate expert feedback and experience is key.

Best practices

  1. Deliver real code, but don’t get hung up on production-grade engineering
    All too often, the result of a PoC implementation is a bunch of notebooks. While this is a great format for keeping a “lab journal”, the notebook mindset isn’t suited to prototyping a real pipeline that runs automatic end-to-end – and discovering the hidden challenges to that process. At the same time, don’t get hung up on beautiful engineering. Expect your code to be thrown away. Avoid big software engineering frameworks; they’re superb for projects that are built to last, but have no place in prototype testing.
  2. Test all interfaces
    Whether your system does what is required becomes apparent when talking to the customer, metaphorically speaking. If you’re dealing with an end-user-facing application, then prototyping a UI/UX is critically important; you can’t answer the key question, “Does it work?”, without it. If it’s a back-end component you’re working on, wrap it in an API and integrate it experimentally with the system that’s calling it.
  3. Capture your journey
    Unlike in a software implementation project, where feasibility is (almost) a given, the road to success in AI PoCs is paved with failures and dead-ends. Document them. Include anything you’ve learned about what could work if things were different. AI projects are often ahead of their time; the end up being resurrected years later – and flourishing – when things have changed. Take face recognition. It started out as a purely academic exercise, unfit for practical applications; now it’s embedded in the security systems of every iPhone. Your learning will not only give you a massive head start; it will also help you gauge when the time is ripe. For logging these methodological aspects, notebooks excel.


In AI systems, data isn’t merely food for the system – something that passes through (I/O) – but a building block. Whereas you could, in principle, develop an accounting software without any accounting data, AI systems cannot be developed this way. This is particularly true for prediction-type applications, which are, in a way, more data than software. But use cases that are built on search, optimisation or simulation also depend on data.

As such, any data that has been put together, cleaned and integrated during a PoC is valuable – and may even be seen as the main achievement.

Best practices

  1. Capture data, soft and hard
    We all know it: putting together the dataset you need for your use case can take months. Months spent writing emails, engaging with stakeholders, clearing legal hurdles etc. It might be soft data, but the experience of obtaining it is certainly hard – and this makes it valuable. Capture this information, document it, spare others the pain you went through. And when it comes to the dataset itself, make sure it gets mothballed in a way that presents minimal hurdles – whether for teams developing a similar project, or for revisiting the PoC later. Classic schema-rich formats work best: if possible, go for a text-based SQL dump (or “CREATE script”). This might feel like microfiche, but it’s robust.
  2. Remember that the data landscape is ever shifting
    AI systems tend to be built on data, and those foundations are constantly shifting. This works both ways. Ask: what if this data source suddenly becomes inaccessible? But also ask: what if we could get access to that dataset? How do we need to change our own processes to get at that data?
  3. Talk about data
    The feeling among algorithm-loving data scientists seems to be that data integration and transformation are too mundane to be troubling senior decision makers with. However, the opposite is often true. Presented in a concise and engaging way, the challenge of pulling different data sources together, and enriching them for meaning, can prove quite alluring. Unlike the intricacies of machine-learning algorithms – which might impress, at best, but almost never engage top brass – data integration has a lot of concrete strategic angles to it. The most interesting touchpoints of disparate datasets coincide with the organisational or functional borders that senior management are skilled at navigating.

Business analysis

The business analysis ties it all together to answer two key questions. First: does it work? And second: is it worth it? Without two resounding positives, no project can be given the green light. This makes business analysis possibly the most important element of the deliverables triumvirate – but, tragically, the one that is most often neglected. 

Why? Because it is common for PoC teams to consist predominantly – if not exclusively – of data scientists or other engineers, whose main focus is technical feasibility.

To correct this imbalance, you’ll want to be especially clear about the business analysis deliverables. Be literal and specify precisely what you need. This can be as easy as providing a spreadsheet template, with cells to be filled. You can’t expect people who haven’t done an MBA to know what it means to calculate an ROI or do a cashflow analysis any more than you would expect CFAs to drop the accounting for a moment to write you a REST API.

Best practices

  1. Banish statistical performance indicators, right from the start
    AUC, Precision/Recall, F1 scores, accuracy … these are your enemies – unnecessary communication hurdles that separate your thinking from the application. In a fishing scenario, for example, where it’s all about salmon vs. other fish, use “percentage of salmon in the catch” instead of precision. Whatever you report, do it using the terminology and context of the application.If this sounds like a simple translation exercise to you, you’re not giving it the attention it warrants – because the language you use matters. Statistical terminology abstracts away many of the crucial details that you best notice (and adapt your model to) during the PoC, rather than when you’re presenting to senior management. The world is orders of magnitude more complex than academic benchmark datasets suggest.
  2. Yes, you can quantify the impact
    There is no way to wriggle out of this. If you can’t quantify the problem, you haven’t truly understood it. Even if the impact of your system is so indirect that it’s hard to estimate, you can evaluate the magnitude of the issue; is your company losing $2m, $10m or $100m every year because of it?Next, let’s say your AI system only provides insights, that a human must then act on – and which may only produce the outcome you’re trying to achieve after a long sequence of events. To quantify impact in this setup you can start by choosing a proxy goal that is closer; then calculate how your algorithm moves the needles of both the long-term and proxy game, based on certain scenarios and assumptions (minimal, expected, best case etc).Don’t dwell on your assumptions. Instead, spend time ensuring you are clear and transparent about how you derive the estimated impact from those inputs. This may strike your PoC team as a ludicrously vague approach, especially if they’re used to software engineering projects, where you can’t really work with vague information. But strategic decision making is different. Uncertainty, vagueness and big bets is what it’s designed to deal with. Anything you can provide – an order of magnitude, or the key factors of influence – might be of immense value.
  3. Don’t forget operations
    “What do we need to run this thing?” That’s the killer question many PoC teams have blanked on. They’re too excited about the chance of building it to think about running it. But impact is only realised through operation – and the running costs need to be weighed against the potential benefits. Exact numbers for this are tough to pin down, so opt for a more general view. What skills and type of team are necessary to run the operations? How much is the infrastructure? What about external data licences? You’ll also need to factor in the cost of re-engineering or system maintenance at regular intervals, and predict how the costs will scale over key dimensions.

Take all these things into account when you spell out what you want from your PoC team, and none of their time will be wasted – even if that dream application you were all so keen to implement has to wait until the world is ready for it.


Paul von Bünau

Managing Director

+49 (0) 173 24 16 000


Potsdamer Straße 68
10785 Berlin