DATAWorks Speakers and Abstracts

Dr. Jane Pinelis
Chief Scientist for Special Operations, Johns Hopkins University Applied Physics Laboratory
“AI Assurance for Operational Use of Generative AI”
Speaker Bio:
Dr. Jane Pinelis serves as Chief Scientist for Special Operations at the Johns Hopkins University Applied Physics Laboratory, where she leads the development, assurance, and operational integration of advanced AI capabilities for mission-critical defense applications. Previously, she was the inaugural Chief of AI Assurance at the Department of Defense’s Chief Digital and Artificial Intelligence Office and Joint Artificial Intelligence Center, where she directed test and evaluation and responsible AI efforts across the Department. Her career also includes leadership roles supporting Project Maven, the Office of the Director of Operational Test and Evaluation, and the Marine Corps Operational Test and Evaluation Activity. Dr. Pinelis has built her national security career at the intersection of AI assurance, operational testing, and defense innovation. Dr. Pinelis holds a PhD in Statistics from the University of Michigan, Ann Arbor.
Abstract:
Co-Presentation with Dr. Julie Obenauer-Motley.
Generative AI (GenAI) tools are rapidly moving from experimentation to everyday use across operational and enterprise environments. Yet adoption does not happen simply because a tool is powerful. Empowering people to use GenAI effectively requires justified confidence that the system will perform reliably under real-world conditions. This talk explores practical lessons learned from deploying GenAI across DoW applications, with a focus on building and communicating assurance for non-technical stakeholders.
At the heart of successful GenAI adoption is a clear understanding of operational realities. In real-world environments, users face time pressure, incomplete information, competing priorities, and evolving mission needs. AI systems be resilient to these realities and perform as intended. We begin by examining how we identify the conditions that are most likely to impact operational outcomes: What tasks are users trying to accomplish? What types of errors would have meaningful consequences? How does the GenAI empower and enable the user?
The second focus of the talk is translating operational realities into meaningful test and evaluation approaches. GenAI generates open-ended and varied outputs and may behave differently across contexts. We give examples of evaluations to test performance from real-world GenAI implementation. This included assessing where GenAI may provide benefit in a workflow, designing experiments and identify metrics to assess those benefits, and evaluating performance as a function of the mission needs. Importantly, we also discuss how to evaluate and structure assurance for user interactions with the GenAI workflows. Our experience highlights the importance of iterative testing, structured experimentation, and scenario-based evaluation that reflects operational pressures.
Finally, we address how to communicate assurance in ways that enable informed decision making. Technical reports and performance statistics alone rarely build user confidence. Instead, assurance evidence must be translated into clear, decision-relevant insights. We share approaches for presenting findings in accessible formats that help leaders and users understand where the system performs well, where caution is warranted, and how to apply appropriate safeguards.
This talk offers practical insights and examples for those seeking to leverage GenAI across their workflows. Attendees will leave with a clearer understanding of how to align AI assurance with real-world needs, design meaningful evaluation strategies, and communicate evidence in ways that empower decision makers.