DATAWorks Speakers and Abstracts

Julie Obenauer-Motley

Sr AI Technical Advisor, Johns Hopkins University Applied Physics Laboratory
“AI Assurance for Operational Use of Generative AI”

Session Materials: Link

Session Recording: Link

Speaker Bio:

Dr. Julie Obenauer-Motley is a Senior AI Technical Advisor at the Johns Hopkins University Applied Physics Lab. Their current work focuses on advising senior U.S. and Department of Defense leaders on AI policy, governance, and implementation. They lead the AI and Autonomy for Strategic Advantage capability advancing AI strategy, research, and operational integration. Their work translates the complex realities of AI into actionable guidance, enabling institutions to harness AI while managing risk with precision and foresight. They have led strategic workshops and high-level working groups on generative AI, high-consequence AI applications, and human enhancement technologies. They have also authored and supported guidance, policy, and practice across the DoD, including within CDAO, T&E, and OUSD directorates and international fora.

Abstract:

Generative AI (GenAI) tools are rapidly moving from experimentation to everyday use across operational and enterprise environments. Yet adoption does not happen simply because a tool is powerful. Empowering people to use GenAI effectively requires justified confidence that the system will perform reliably under real-world conditions. This talk explores practical lessons learned from deploying GenAI across DoW applications, with a focus on building and communicating assurance for non-technical stakeholders.

At the heart of successful GenAI adoption is a clear understanding of operational realities. In real-world environments, users face time pressure, incomplete information, competing priorities, and evolving mission needs. AI systems be resilient to these realities and perform as intended. We begin by examining how we identify the conditions that are most likely to impact operational outcomes: What tasks are users trying to accomplish? What types of errors would have meaningful consequences? How does the GenAI empower and enable the user?

The second focus of the talk is translating operational realities into meaningful test and evaluation approaches. GenAI generates open-ended and varied outputs and may behave differently across contexts. We give examples of evaluations to test performance from real-world GenAI implementation. This included assessing where GenAI may provide benefit in a workflow, designing experiments and identify metrics to assess those benefits, and evaluating performance as a function of the mission needs. Importantly, we also discuss how to evaluate and structure assurance for user interactions with the GenAI workflows. Our experience highlights the importance of iterative testing, structured experimentation, and scenario-based evaluation that reflects operational pressures.

Finally, we address how to communicate assurance in ways that enable informed decision making. Technical reports and performance statistics alone rarely build user confidence. Instead, assurance evidence must be translated into clear, decision-relevant insights. We share approaches for presenting findings in accessible formats that help leaders and users understand where the system performs well, where caution is warranted, and how to apply appropriate safeguards.

This talk offers practical insights and examples for those seeking to leverage GenAI across their workflows. Attendees will leave with a clearer understanding of how to align AI assurance with real-world needs, design meaningful evaluation strategies, and communicate evidence in ways that empower decision makers.