Interview
Building the evaluation framework for a patient-facing chatbot is surprisingly difficult on a conceptual level
At idalab 2026 · Studying Mathematics and CS @ Uni Hamburg
Could you introduce yourself and tell us a bit about your background?
My background is in mathematics and computer science. I’ve been interested in healthcare applications for a while — less the clinical side, more what happens when you try to actually deploy technology in those environments.
I first explored this in grade 12, when I did a two-week internship after meeting Julian (Associate Principal, idalab), when I attended his healthcare course at the Deutsche SchülerAkademie Grovesmühle 2022.
What motivated you to pursue an internship at idalab?
I was motivated by two things. First, I already knew from my previous internship that working at idalab is both interesting and fun. Second, the project itself was compelling, especially the chance to work directly with a client.
What kind of project did you work on?
My project was with an international medtech company, one of the largest in the world, which idalab has been supporting for a couple of years. As more of their patients manage their care at home, they developed a chatbot to support them and already had a prototype in place.
My part was to design and implement the evaluation framework, which is surprisingly difficult on a conceptual level for a patient-facing chatbot. When is an answer good, safe, and useful? When is it better not to answer at all? Are partially correct responses acceptable? And in a complex RAG pipeline, how do we reliably trace the root cause of an error? These were some of the questions we had to solve.
How was the onboarding?
Pretty smooth. There was a clear structure from the first day: meeting my mentor, understanding the project background, and getting up to speed on the work. And whenever something wasn’t clicking, I could just ask the team. When I ran into a couple of tricky Git errors, for example, a colleague sat down with me over lunch and helped me sort them out.
What did day-to-day look like?
It shifted quite a bit. Early on, it was mostly conceptual work: figuring out what we wanted to measure and how. Then we started building metrics and checks along the ingestion pipeline, creating a test set, and fixing things that kept breaking. For example, table structures were often incomplete, so at some point I built an automatic correction using an LLM to fix the structure when we detected missing columns or rows. Toward the end, I built a dashboard so the client could monitor chatbot performance and spot anomalies. And mixed into all of that: preparing for client calls, refining our approach, and discussing open questions with the client’s tech team.
How was it working directly with the client team?
That was one of the most exciting and challenging parts. We had weekly update calls with all stakeholders, which we prepared together and which I co-presented. This felt like a lot at first, but by the final one it felt normal. We also had more technical discussions with the engineering team of the chatbot, which I did on my own from week three on.
What tools and technologies did you use?
The main ones: Python, Git of course, and Streamlit for the dashboard. The pipeline work touched on some RAG components, like parsing documents, creating a vector database and retrieving the relevant information given a question. Nothing exotic, but the combination and the requirements of a healthcare application were new to me.
What skills turned out to matter most?
Structuring your thinking before you touch the code. It seems obvious, but even deciding which metrics were helpful was not straightforward. Let alone the task to build an “anomaly detector”. And then the communication side, which I underestimated. Knowing what you want to build is one thing; aligning with the client team is another.
What’s next for you?
I will continue my studies in mathematics and computer science, and then probably continue with a degree that integrates quantitative skills with medical applications.

