DATAWorks Speakers and Abstracts


Adam Miller

Research Staff Member, IDA
“Evaluating Human-System Interaction Across the Acquisition Lifecycle”

Speaker Bio: 

 

Dr. Miller joined IDA in 2022 as a Research Staff Member in the Operational Evaluation Division. His work focuses on the test and evaluation of Army air defense and AI-enabled systems, and he serves as the human-systems integration (HSI) subject matter expert for Land Warfare programs. Dr. Miller holds a PhD in behavioral neuroscience from Cornell University and a BA in psychology from Providence College. Prior to joining IDA, he was a neuroscientist at the Hospital for Sick Children (SickKids), where his research focused on characterizing the neuronal code underlying memory.

Abstract: 

 

Continuous evaluation has been proposed as a way to improve acquisition outcomes by making better use of early test data. In this approach, a system is evaluated throughout development rather than only after large scheduled test events, with the goal of increasing visibility and speed. In practice, however, continuous evaluation can be difficult to implement because data collected on unfinished systems is often difficult to apply to an evaluation of the completed system. Human-System Interaction (HSI) provides a useful domain for exploring this approach because many HSI questions can be answered on incomplete systems as components mature. To realize this opportunity, we introduce a framework for tracking the piecemeal evidence generated as individual system components are developed. The framework organizes data into an Operator-Component-Task matrix and documents evidence maturity using Human Readiness Levels. By updating the matrix as development progresses, evaluators can more clearly communicate what is known and unknown about HSI performance and then focus future testing on remaining risks.


Alan Brown

Professor, Virginia Tech
“Integrating Mission Engineering with Systems Engineering to Enable Navy Ship Design”

Speaker Bio: 

 

Dr. Alan J. Brown, CAPT USN (ret), is NAVSEA Professor Emeritus of Ship Design in the Kevin T. Crofton Department of Aerospace and Ocean Engineering at Virginia Tech. He was NAVSEA Professor of Ship Design from 1998-2025.  From 2010-2015 he was Co-Director for Curriculum of the Naval Engineering Education Center for the US Navy. He created and coordinated the Naval Engineering Minor and Naval Engineering Graduate Certificate programs at Virginia Tech. His research is in naval ship design, marine systems, survivability, and underwater explosion effects. He has served as Northeast Regional Vice President of SNAME, as a member of the ASNE Council, as Chairman of the ASNE/SNAME Joint Education Committee, as chairman of the SNAME Ad Hoc Panel on Structural Design and Response in Collision and Grounding, and as Chairman of the New England Section of SNAME. He was the SNAME/ASNE Faculty Advisor at Virginia Tech for more than 25 years. He was Professor of Naval Architecture and directed the Naval Construction and Engineering Program at MIT from 1993 to 1997.  Dr. Brown was the 2025 recipient of the Academy of Aerospace and Ocean Engineering Excellence Distinguished Faculty Award, the 2021 recipient of the ASNE Harold E. Saunders Award (Lifetime Achievement), the 2015 recipient of the SNAME William H. Webb Medal for outstanding contributions to education in naval architecture, marine or ocean engineering, and the 2007 recipient of the ASNE Solberg Award (Research).  At Virginia Tech, he has earned multiple awards for teaching and service. As a US Navy Engineering Duty Officer from 1971 to 1998, he served in ships, fleet staffs, shipyards, NAVSEA, OPNAV and at MIT.  He received a PhD in Marine Engineering Systems in 1986, an MS in Ocean Engineering in 1973, an MS in Shipping and Shipbuilding Management in 1973 and a BS in Naval Architecture and Marine Engineering in 1971, all from MIT.  He was a professional mechanical engineer registered in the State of California. He is a member of ASNE and a fellow of SNAME.

Abstract: 

 

This presentation examines how the U.S. Navy Mission Engineering and Integration (MEI), integrated with model-based systems engineering (MBSE) and system-of-systems methods, can be applied to modern naval ship design to ensure alignment with operational objectives and evolving threats. Building on the Navy’s ME-to-SE strategy, it connects foundational MEI principles with contemporary ship design methodologies practiced within Virginia Tech’s Aerospace and Ocean Engineering Department, where multiple analytical tools are integrated to optimize performance, survivability, cost, and mission effectiveness. The presentation demonstrates that maturing MEI artifacts within a model-based digital environment strengthens traceability from mission need to architecture, requirements, and design decisions, extending a coherent digital thread within a Digital Engineering Ecosystem from mission analysis through design synthesis.


Alexandra Mangel

Member of Technical Staff, The Aerospace Corporation
“Combinatorial and Adaptive Stress Testing for Autonomous Spacecraft Control Systems”

Speaker Bio: 

 

Alexandra Mangel is an Artificial Intelligence/Machine Learning Engineer in the Vehicle Autonomy & System Trust department at the Aerospace Corporation. Her work pertains to the incorporation of autonomy for spacecraft control applications and developing verification & validation processes to enable trustworthiness in AI/ML algorithms. She has a B.S. in Aerospace Engineering from the Ohio State University and an M.S. in Aerospace Engineering from the University of Maryland.

Abstract: 

 

This paper addresses the growing challenge of evaluating increasingly complex autonomous control systems, particularly those developed using machine learning techniques like reinforcement learning, by introducing novel testing methods that overcome the limitations of traditional analytical and Monte Carlo simulation techniques. Specifically, we propose a dual approach that leverages combinatorial (t-way) testing to efficiently explore critical parameter interactions and Adaptive Stress Testing (AST) to dynamically identify edge-case failure modes, enhancing test efficiency and failure case identification in simulation environments. This paper applies this combination test case generation approach to a runtime assured autonomous control system for spacecraft attitude, where a neural network control system trained via reinforcement learning
is bounded by a control barrier function-based assurance module and compares the results of these testing methods against conventional Monte Carlo approaches. The contributions include development of a combined t-way testing and AST test case generation approach, application to a runtime assured neural network control system, and evaluation of results compared to a traditional Monte Carlo Simulation.


Allison Holston

Mathematical Statistician, ATEC
“Test Design for the Integrated Head Protection System”

Speaker Bio: 

 

Allison Holston is the Lead Statistician for the Army Evaluation Center at Aberdeen Proving Ground, Maryland. She has an M.S. in Statistics from the University of Georgia and a B.S. in Mathematics & Statistics from Virginia Tech.

Abstract: 

 

This presentation details a live fire test event for the Integrated Head Protection System, a key component of the Soldier Protection System. The Integrated Product Team (IPT) coordinated the test design to assess Full Up System Level performance using a Design of Experiments approach.

The evaluation focuses on maximum and median Abbreviated Injury Scale scores per helmet, with injury severity modeled using the Operations Requirement-based Casualty Assessment model on fragment impacts. Key factors examined include threat type, standoff distance, system generation, and helmet orientation.

Developed in JMP, the test design guided the IPT in determining the minimum sample sizes needed for robust testing. It carefully balanced statistical rigor, practical test setup constraints, and anticipated threat scenarios. This presentation will explore the critical questions addressed by the team, the test configurations, and valuable lessons learned throughout the process.


Amy Vennos

Graduate Student, Virginia Tech
“Multiclass Recalibration of Probability Predictions via the Linear Log Odds Function”

Speaker Bio: 

 

Amy Vennos is a Ph.D. student in Statistics at Virginia Tech, advised by Christopher T. Franck, and a Graduate Research Assistant funded by the Virginia Tech National Security Institute. She received an M.S. in Statistics and an M.S. in Mathematics from Virginia Tech. Her research focuses on assessing the quality of probability predictions generated from machine learning models. She has prior experience as a Lecturer at Salisbury University.

Abstract: 

 

Machine-generated probability predictions are essential in modern classification tasks such as image classification. A model is well calibrated when its predicted probabilities correspond to observed event frequencies. Despite the need for multicategory recalibration methods, existing methods are limited to (i) comparing calibration between two or more models rather than directly assessing the calibration of a single model, (ii) requiring under-the-hood model access, e.g., accessing logit-scale predictions within the layers of a neural network, and (iii) providing output which is difficult for human analysts to understand. To overcome (i)-(iii), we propose Multicategory Linear Log Odds (MCLLO) recalibration, which (i) includes a likelihood ratio hypothesis test to assess calibration, (ii) does not require under-the-hood access to models and is thus applicable on a wide range of classification problems, and (iii) can be easily interpreted. We demonstrate the effectiveness of the MCLLO method through simulations and three real-world case studies involving image classification via convolutional neural network, obesity analysis via random forest, and ecology via regression modeling. We compare MCLLO to four comparator recalibration techniques utilizing both our hypothesis test and the existing calibration metric Expected Calibration Error to show that our method works well alone and in concert with other methods.


The Honorable Amy E. Henninger , PhD.

Director, Director, Operational Test & Evaluation (DOT&E)
“”

Speaker Bio: 

 

Coming Soon


Andrea Kirkpatrick

RSM, IDA
“Full Spectrum Survivability and Lethality Framework”

Speaker Bio: 

 

Co-presenters Jordan Adams, Lindsey Butler, Andrea Kirkpatrick, and James Rhoads are all senior IDA Research Staff members that support the Office of the Secretary of War on test and evaluation issues. This team has spent more than a year working together to develop a methodology for assessing programs’ full spectrum survivability and lethality.

Abstract: 

 

In the FY2022 National Defense Authorization Act (NDAA), Congress requested that DOT&E expand survivability and lethality test and evaluation (T&E) to include non-kinetic threats and target environments, and that it continue this expanded evaluation throughout a program’s lifecycle. This new effort is called full spectrum survivability and lethality (FSSL). By more consistently including T&E of multiple simultaneous or near-simultaneous threat engagements of different threat classes, FSSL can reshape T&E of the survivability and lethality of U.S. platforms and weapons across the program life cycle. To accomplish this goal, IDA recommends that FSSL evaluations incorporate data from operational, developmental, live fire, and integrated tests, as well as large-force exercises and real-world events. IDA has drafted an evaluation framework that defines FSSL and its objectives, explains and presents examples of its methodology, and lays out a phased approach for implementation.


Andrew Cooper

Graduate Student, Virginia Tech
“Robust Wrapped Gaussian Process Inference for Noisy Angular Data”

Speaker Bio: 

 

Andrew Cooper is a Ph.D. candidate in Virginia Tech’s Department of Statistics. He received his bachelors and masters degrees in Statistical Science from Duke University. His research areas include computer experiments and surrogate modeling, as well as Bayesian methodology.

Abstract: 

 

Angular data are commonly encountered in settings with a directional or orientational component. Regressing an angular response on real-valued features requires either intrinsically capturing the circular manifold the data lie on, or using an appropriate extrinsic transformation. A popular example of the latter is the technique of distributional wrapping, in which functions are “wrapped” around the unit circle via a modulo-2 transformation. This approach enables flexible, non-linear models like Gaussian processes (GPs) to properly account for circular structure. While straightforward in theory, the need to infer the latent unwrapped distribution along with its wrapping behavior makes inference difficult in noisy response settings, as misspecification of one can severely hinder estimation of the other. We propose a novel Bayesian approach to wrapped GP (WGP) inference that more robustly estimates the latent unwrapped space in the presence of noise compared to existing implementations. Our work is motivated by the problem of localizing radio frequency identification (RFID) tags used for tracking nuclear materials. We showcase our model’s ability to capture the relationship between frequency and phase angle in order to accurately range assets in laboratory environments.


Andrew Newman

Group Supervisor and Principal Engineer, Johns Hopkins Applied Physics Laboratory
“Reconnaissance Blind Chess: An Experimentation Platform for Autonomous Decision Making”

Speaker Bio: 

 

Dr. Andrew J. Newman is a member of the Principal Professional Staff and Supervisor of the Intelligence, Surveillance, Reconnaissance, and Targeting Technology Group in the Force Projection Sector of the Johns Hopkins Applied Physics Laboratory. He has over 35 years of experience with industry, consulting, and academia in data fusion, automatic control, optimization, physical-mathematical modeling, systems analysis, and software development; including 25 years of experience with DoD advanced research and development programs in analysis, planning, control, and data fusion for C4ISR systems. Since joining APL in 2003, he has led or been a major contributor on a wide variety of projects in ISR systems, sensor and data fusion, target tracking, and dynamic ISR resource management applied to missions in the land, maritime, air, and space domains. Dr. Newman currently serves as Chair of the program committee for the Military Sensing Symposia (MSS), National Security Sensor and Data Fusion, and has served on that committee since 2009. Dr. Newman was recognized as a MSS Fellow in 2025 for technical and leadership contributions to the military sensing community and won the Joseph Mignogna Data Fusion Award in 2023 for leadership in the national security data fusion community. He won the JHU/APL Achievement Award in the Outstanding Mission Accomplishment for an Emerging Challenge category in 2016 for €œCLUTCHSHOT€ and the JHU/APL Hart Prizes in 2006 and 2012 for excellence in IRAD in the development category for, respectively, €œTactically Responsive ISR Management€ and €œOrganic Persistent ISR.€ In 2015, Dr. Newman invented Reconnaissance Blind Chess, a game to serve as a new challenge problem in autonomous decision making under uncertainty. Previously, he worked for ALPHATECH, Inc. of Arlington, VA, where he developed optimization and control algorithms for ISR resource management in support of several DARPA and AFRL research programs. Dr. Newman has published and presented at numerous technical conferences and co-authored journal articles in the IEEE Transactions on Aerospace and Electronic Systems and the Johns Hopkins APL Technical Digest. He received a Bachelor of Science degree in Systems Engineering from the University of Pennsylvania (1987), a Master of Science degree in Electrical Engineering from the University of Virginia (1992), and a Doctor of Philosophy degree in Electrical and Computer Engineering from the University of Maryland (1999).

Abstract: 

 

This presentation re-introduces the game of reconnaissance blind chess (RBC), invented in 2015 at the Johns Hopkins Applied Physics Laboratory (APL), as a paradigm and test bed for understanding and experimenting with autonomous decision making under uncertainty, fusing incomplete data, and managing sensors to maintain situational awareness informing tactical and strategic decision making. RBC is a variant of traditional chess where players cannot directly observe their opponent’s pieces or the entire board, but rather must use a reconnaissance sensor to momentarily and privately observe a chosen 3×3 region of the board once per move. In RBC, opponents share no explicit common knowledge and players must reason tactically and strategically over imperfect information (their own and accounting for their opponent’s). These characteristics appear in real-world scenarios and challenge state-of-the-art algorithms. APL developed and maintains a website and mobile app for researchers and other users to access RBC for research, education, and recreational purposes. The research community including APL has developed some machine intelligence approaches to playing the game, some of which have been used and demonstrated at the machine RBC tournaments at the Neural Information Processing Systems (NeurIPS) conferences in 2019, 2021, and 2022.


Anna Flowers

Ph.D. Candidate, Virginia Tech
“Gaussian Process Assisted Meta-learning for Image Classification and Object Detection”

Speaker Bio: 

 

Anna Flowers is a fifth-year Ph.D. candidate in Statistics at Virginia Tech. She received a B.S. in Mathematical Statistics in 2021 from Wake Forest University and an M.S. in Statistics in 2023 from Virginia Tech. She is jointly advised by Bobby Gramacy and Chris Franck and her research focuses on Gaussian process regression, multi-stage modeling, and active learning.

Abstract: 

 

Collecting operationally realistic data to inform machine learning models can be costly. Before collecting new data, it is helpful to understand where a model is deficient. For example, object detectors trained on images of rare objects may not be good at identification in poorly represented conditions. We offer a way of informing subsequent data acquisition to maximize model performance by leveraging the toolkit of computer experiments and metadata describing the circumstances under which the training data was collected (e.g., season, time of day, location). We do this by evaluating the learner as the training data is varied according to its metadata. A Gaussian process (GP) surrogate fit to that response surface can inform new data acquisitions. This meta-learning approach offers improvements to learner performance as compared to data with randomly selected metadata, which we illustrate on both classic learning examples, and on a motivating application involving the collection of aerial images in search of airplanes.


Anthony Garland

Applied Machine Intelligence Staff, Sandia National Labs
“ATLAS Chat: Leveraging Generative AI for NationalSecurity Missions”

Speaker Bio: 

 

Dr. Anthony Garland is an AI Research Leader at Sandia National Laboratories, where he develops agentic AI systems for complex engineering challenges in manufacturing and scientific discovery. With over six years of machine learning research expertise, he translates cutting-edge AI into production solutions. Dr. Garland has his PhD in Mechanical Engineering and a Master’s of Mechanical Engineering from Clemson University.

Abstract: 

 

Atlas UI 3 is an open-source, full-stack large language model (LLM) chat interface developed by Sandia National Laboratories to support U.S. Government customers and national security missions. The platform integrates the Model Context Protocol (MCP) for extensible tool execution, implements multi-level compliance controls for data segregation, and supports multiple LLM providers through a unified abstraction layer. This paper presents the architectural design, security features, and unique capabilities of Atlas UI 3, with particular emphasis on the considerations that motivated developing a purpose-built solution rather than adopting existing commercial or open-source alternatives. Key differentiating features include a fully pluggable architecture where organizations bring their own LLM providers, RAG systems, tools, and storage; group-based authorization controlling MCP tool discovery and execution; administrator-configurable tool approval requirements; a compliance-level enforcement system; interactive tool elicitation; and multiple agent reasoning strategies. The platform’s design prioritizes software supply chain security, licensing clarity, configurability for rapid deployment across diverse environments, and adaptability to high-performance computing (HPC) environments.


Anthony Sgambellone

STAT COE
“Missing data: Motivations, Methods and Examples”

Speaker Bio: 

 

Dr. Anthony Sgambellone is a Scientific Test and Analysis Techniques (STAT) Expert at the STAT Center of Excellence (COE). He has been part of the STAT COE since 2020 where he provides support and instruction in efficient and effective test design and analysis across the DOD. Before joining the STAT COE Dr. Sgambellone developed machine-learning models in the financial industry and on an Agile software development team in support of customers on Wright Patterson Air Force Base. Anthony holds a BS in Biology from Case Western Reserve University, a MS in Statistics from the University of Akron and a PhD in Statistics from the Ohio State University.

Abstract: 

 

Many datasets have datapoints that are not collected, also referred to as missing data. If the missing data are not assessed properly; they may result in biased conclusions being drawn. If the probability of the missing data does not depend on any predictor variable it is called missing completely at random (MCAR) and missing data can be ignored without biasing results. If the probability of being missing is dependent on some fully observed predictor variable, then the data are called missing at random (MAR), and it may be possible to develop a statistical model of the probability of the data being missing given the fully observed variables. If the probability of the missing data cannot be predicted using some fully observed variable it is called not missing at random (NMAR). NMAR data can lead to biased conclusions, but it is impossible to know what they may be. For example, in a satisfaction survey the results will be biased if either very satisfied, or very unsatisfied people are more likely to complete the survey, but one does not know who is not answering the survey. Examples will be shown based on publicly available border crossing data. If the data can be collected in groups based on fully observable data, then the data can be weighted to correct bias. If a Bayesian model can be fit to the data, then the value of missing data can be estimated for a given value of the fully observed variables. Examples of these methods will be shown.


April Nellis

Mathematiciann, Johns Hopkins Applied Physics Laboratory
“Gaussian Process Regression for Additive Manufacturing Material Exploration”

Speaker Bio: 

 

April Nellis is a mathematician at the Johns Hopkins Applied Physics Lab. Her areas of research interest include machine learning for scientific applications and probabilistic models and optimization under uncertainty. Before coming to APL, she earned her Ph.D. in applied mathematics at University of Michigan.

Abstract: 

 

Material exploration via multiple rounds of experiments is typically a time-consuming and expensive process. In addition, it can be difficult to methodically evaluate whether the full processing space for a material has been explored to the desired extent. We have accelerated material discovery by using experimental data to build and iteratively update a Gaussian Process (GP)-based map of the processing space. The benefit of GP regression, which provides both a mean and a variance estimate for the process map, is twofold. First, the variance of the GP provides a quantitative measure of the current overall uncertainty across the predictive map. Additionally, we are able to input both mean and variance of the model into a Bayesian optimization algorithm which efficiently selects new experiments by prioritizing both discovery of desired material properties and reduction of uncertainty across the space. This approach was implemented for Laser Powder Bed Fusion (L-PBF) additive manufacturing of Inconel alloys. Over six rounds of experimental builds, we developed a detailed map of material porosity across the selected material processing space. The resulting map has both low uncertainty and high granularity, and is particularly detailed in areas which produce dense material.


Braxton VanGundy

Software Engineer, NASA Langley Research Center
“A Sensitivity Analysis Pipeline for Neural Network-Based Raman Spectra Mineral Classifiers”

Speaker Bio: 

 

Braxton VanGundy is a Software Engineer for the Flight Software Systems branch at NASA Langley Research center. He holds a bachelor’s in Computer Engineering and a minor in Computer Science from Old Dominion University (ODU) in addition to a master’s in Engineering Management from ODU. He has been at NASA Langley since 2017 working on various projects related to data science, machine learning, and flight software.

Abstract: 

 

Scientists at NASA Langley Research Center are developing an automated detection algorithm for the identification of minerals and other substances in lunar regolith using a machine learning approach [1]. This work is part of a project to mature the Standoff Ultra-Compact Raman (SUCR) sensor for capturing Raman spectra of lunar regolith while deployed to the lunar South Pole region, which is funded through the NASA Development and Advancement of Lunar Instrumentation (DALI) program. Previously, a human user had to manually view each spectrum from the SUCR sensor and identify features that indicate the presence of a specific mineral or substance, such as those that are valuable for in-situ resource utilization (ISRU). Minerals and substances of value include, but are not limited to, ilmenite for oxygen production [2], plagioclase for construction materials [3], and water typically found in the form of ice. While it may take a human a significant amount of time to identify the constituents of a sample based on its Raman spectra, a convolutional neural network (CNN) can accomplish the same identification task in a few seconds and is not subject to mistakes caused by human fatigue after classifying many samples in succession. Such an advancement would greatly increase the likelihood of discovering rich ISRU material deposits on the lunar surface.
In addition to traditional cross validation techniques, such as the 80/20 train/test split, the team needed a way to gauge the model’s performance across all supported mineral classes and assess when specific variations or artifacts in the data may potentially lead to the incorrect classification of lunar regolith. To address this need, a sensitivity analysis pipeline was developed to quickly analyze how different alterations to the input data affect model performance across all supported mineral classes. The pipeline simulates three different factors which are bounded to user-specified ranges: integration time, spectral shift, and a random sinusoidal time-varying scale factor. Integration time represents the exposure time for the sensor and is specified as an integer in units of seconds, shorter integration times result in a noisier spectrum. Spectral shift, also represented as an integer value and typically caused by temperature, pressure, and calibration variations, is where the entire spectrum is shifted to the left or right on the wavenumber axis. A random sinusoidal time-varying scale factor, denoted as a value between 0.0 and 1.0, shifts the relative intensities of the spectral peaks throughout the signal following a sine wave pattern. This relative intensity variation can result from factors such as differences in the sample crystal orientation.
With this sensitivity analysis pipeline, it is possible to test thousands of combinations of these factors across a wide range of user-specified values for every mineral class. The pipeline generates a point cloud visualizing the results of the sensitivity analysis allowing the user to quickly see model performance across each of the classes and observe what factors most significantly impact classification accuracy. This poster will present this sensitivity analysis pipeline which our team has used to quickly validate the performance of different models and quantify each model’s operational bounds where misclassifications begin to occur.
References
[1] Sanders, Bryson R., et al. “A Machine Learning Algorithm for Identifying Lunar Minerals in Mixed Samples With Raman Spectroscopy” Abstract submitted to the 57th Lunar and Planetary Science Conference (2026)
[2] Taylor, Lawrence A. “Resources for a lunar base: Rocks, minerals, and soil of the Moon.” NASA. Johnson Space Center, The Second Conference on Lunar Bases and Space Activities of the 21st Century, Volume 2. 1992.
[3] Azami, Mohammad, et al. “A comprehensive review of lunar-based manufacturing and construction.” Progress in Aerospace Sciences 150 (2024): 101045.


Caleb King

Research Statistician Developer, JMP Statistical Discovery LLC
“Change Often Takes Time: Determining a “Soft Changepoint” in Lifetime Mixture Distribution”

Speaker Bio: 

 

Caleb King is a Senior Research Statistician Developer in the DOE & Reliability group at JMP. He received is PhD in Statistics from Virginia Tech in 2015, after which he spent three years as a statistician at Sandia National Laboratories before transitioning to JMP. His areas of expertise include design of experiments, accelerated testing, and reliability analysis.

Abstract: 

 

It is quite common for data to come from populations that actually consist of two or more subpopulations. This is especially true for reliability, where the lifetime distribution of a system or component may actually consist of multiple distributions depending on specific failure modes. If the failure modes are known, then we can use competing risks methodology to estimate the overall reliability. However, if the modes are unknown, then a typical approach might be to use a mixture distribution, where the likelihood of each observation is a weighted combination of several distribution models. These weights may be constant or a function of other covariates. Another approach is to consider a changepoint where the distribution model makes a sudden change from one model to another. In this talk, we propose an approach that falls between these two extremes. Instead of a hard changepoint, we instead propose using a probit or logistic model that allows the mixture proportion to vary over the range of the variable, with the point at which the mixture is evenly split serving as a “soft changepoint”. We will show methods for estimating the parameters of this soft changepoint model and investigate its performance using simulated data. We will also illustrate this new approach using data from an industrial application.


Chad Brisbois

Research Staff Member, IDA
“Scoping OT&E from a Spectrum Warfare Perspective”

Speaker Bio: 

 

Dr. Brisbois has provided support to DOT&E for the test and evaluation of spectrum warfare systems since joining IDA in 2021. He also supports simulation efforts related to naval T&E in a high performance computing environment. Prior to his time at IDA, Dr. Brisbois defended his PhD in physics from Michigan Technological University, where his dissertation focused on understanding very high-energy gamma-ray emissions from fast-spinning neutron star environments. As a postdoc at the University of Maryland College Park, Dr. Brisbois specialized in cosmic-ray simulation as well as gamma-ray astrophysics.

Abstract: 

 

Spectrum warfare, driven by electromagnetic spectrum operations (EMSO), is a decisive factor in modern combat. Yet, conclusions drawn from test data can be misleading if the electromagnetic operating environment (EMOE) is inadequately represented or EMSO‑dependent missions are not fully evaluated. This talk presents a practical framework for scoping Operational Test & Evaluation (OT&E) from an EMSO perspective. The framework rests on four pillars: EMOE definition, mission mapping, data‑relevance review, and integrated test design. It combines live, lab, and modeling‑and‑simulation efforts from early development through operational test while respecting resource constraints. The result is a refined T&E strategy that delivers more realistic tests, higher‑quality data, and a more efficient use of resources.


Chinomso Oji

AI Evaluator, ATEC/AEC/OEC
“TIRANICS- FDSC Scoring”

Speaker Bio: 

 

Chinomso Oji is an AI and data science professional with over five years of experience working at the intersection of machine learning, cybersecurity, analytics, and system evaluation with the Army Test and Evaluation Command (ATEC). He specializes in building reliable, scalable AI solutions and developing frameworks that ensure complex models are accurate, explainable, and aligned with real-world needs. His work spans applied artificial intelligence, evaluation of AI enabled system, predictive analytics, and responsible AI practices, with a strong emphasis on turning advanced technology into practical, high-impact outcomes. Chinomso holds a master’s degree in Artificial Intelligence from Johns Hopkins University and is passionate about the thoughtful application of AI in critical domains.

Abstract: 

 

TIRANICS application addresses a persistent reliability-engineering bottleneck: the time-consuming, inconsistent, and difficult-to-audit process of interpreting Test Incident Reports (TIRs) against a program’s Failure Definition and Scoring Criteria (FDSC). Manual scoring is labor-intensive, highly dependent on individual evaluator judgment, and often produces outputs that are difficult to trace back to the authoritative source text; creating downstream risk for metrics integrity and readiness decisions.

The TIRANICS platform was developed to solve this. The first iteration used traditional Machine Learning (ML) and Natural Language Processing (NLP) to achieve two primary features. First, it enabled an evaluator to search for historically similar TIRs, using the results to guide and standardize scoring. Second, it could automatically classify a new TIR by referencing the classifications of these historically similar incidents, providing a consistent baseline for evaluation.

Building on this foundation, the current web application operationalizes a more advanced and auditable Retrieval-Augmented Generation (RAG) workflow. This system enhances scoring accuracy by combining hybrid search (keyword and vector) with structured, rubric-driven reasoning. It provides two primary interfaces: (1) an interactive chat for targeted FDSC inquiry with citation-linked evidence, and (2) a batch TIR processing workspace that iteratively classifies incident narratives according to the FDSC rubric. This workspace presents results in editable markdown and preserves intermediate outputs in a thinkingview for quality assurance. By grounding all classifications in retrieved FDSC passages, the application reduces scoring variability, improves throughput, and strengthens defensibility through page-level traceability.


Christian Smart

Cost Analyst, KBR
“Wake Up and Smell the Coffee: Developing Realistic Risk Estimates”

Speaker Bio: 

 

Christian B. Smart, Ph.D., CCEA, is an internationally recognized expert in cost estimating, project risk analysis, and the use of advanced statistical and machine learning techniques for complex aerospace and defense programs. With more than 25 years of experience supporting NASA, the Department of Defense, DARPA, and commercial space organizations, he has built a career at the intersection of mathematics, engineering, and strategic decision making.
He is currently an Expert Cost Analyst with KBR, supporting national security missions. Previously, he developed independent cost estimates for NASA’s Jet Propulsion Laboratory and led development of NASA’s Analogy Software Cost Tool (ASCoT). Dr. Smart also served as Chief Scientist for Galorath Federal, where he applied advanced cost and risk methods for NASA, DARPA, the Space Development Agency, Virgin Galactic, and multiple Army programs, and led development of the SEER for Space Systems cost model.
Earlier, he directed the 125 person cost workforce at the Missile Defense Agency and oversaw all agency cost estimates. He has also held senior analytical roles at NASA centers and on the Review of U.S. Human Spaceflight Plans Committee.
Dr. Smart is the author of Solving for Project Risk Management (McGraw Hill, 2020), has published numerous award winning papers, and has served in leadership roles within ICEAA. His contributions to federal cost methodology include work on the GAO Cost Estimating and Assessment Guide and ICEAA’s Software Cost Estimating Body of Knowledge. His achievements have earned top honors including the ICEAA Frank Freiman Lifetime Achievement Award and NASA’s Exceptional Public Service Medal.
Dr. Smart earned a Ph.D. in Applied Mathematics from the University of Alabama in Huntsville and completed graduate coursework in machine learning at Georgia Tech. He also holds a B.S. in Economics and Mathematics from Jacksonville State University.

Abstract: 

 

Despite decades of use, risk analyses of both cost and schedule exhibit an overwhelming tendency to significantly underestimate the true extent of uncertainty. For example, cost and schedule risk analyses for multi-billion dollar projects that take years to develop will have cost risk ranges that only admit for 10% variation for cost. For schedule, it is often even less. As an example, for a launch vehicle development project for NASA, the difference between the 95th percentile and the 5th percentile for schedule risk was only a two-week window, despite the fact that the overall schedule was more than five years in total.
In this presentation, we discuss the problem and provide three solutions. One is a top-level calibration, one-size-fits-all method. Another method uses WBS-level historical cost and schedule growth to adjust input ranges in order to calculate credible risk ranges. A third method, currently at the conceptual stage, uses parametric calibration that incorporates specific project characteristics, such as new design and technology readiness levels.


Christian Beels

Cadet, United States Military Academy
“Speech Recognition as a Threat Vector in Real-Time Doxxing”

Speaker Bio: 

 

Christian Dane Beels is a second-year cadet at the United States Military Academy at West Point pursuing a B.S. in Mathematical Sciences. His research focuses on the intersection of applied mathematics, data science, and cybersecurity. His current research project applies topological data analysis to network simulations to detect anomalous and malicious behavior. Outside of academics, he competes on the West Point Cyber Team and has placed in the top 1% individually in the National Cyber League.

Abstract: 

 

This research evaluates the feasibility of speech recognition as a potential threat vector in real-time doxxing operations, using consumer wearable technology devices such as the Ray-Ban Meta Glasses. The study was broken up into two phases: Phase 1, which validated a SpeechBrain-based Python model using audio data from the VoxCeleb2 database, and Phase 2, which used live audio capture from Meta Glasses and compared with a locally-created dataset to examine confidence in identifying a particular individual through speech. Phase 1 produced a 93% hit rate and 100% correct-rejection rate at a cosine similarity threshold of 0.4, with a median processing latency of 0.67 seconds. In Phase 2, the system averaged a 96% correct selection rate between targets in the dataset, however, overall confidence in correct identification remained low (47% and 30%, respectively). These results indicate that speech-based real-time doxxing may be a feasible threat in certain scenarios, particularly with minimal background noise and close proximity between threat actor and target.


Christine Knott

Research Mathematician, AFRL
“Probability of Detection: Evaluating the Reliability of Nondestructive Inspection Systems”

Speaker Bio: 

 

Christine E. Knott is a Research Mathematician in the Material State Awareness branch of the U.S. Air Force Research Laboratory’s Materials and Manufacturing Directorate (AFRL/RX). She uses statistical models to validate nondestructive inspection systems which are used to find defects in aircraft structural and engine components, and her research focusses on Modern Methods for Probability of Detection. She earned her PhD. and M.S. in Applied Mathematics at the Air Force Institute of Technology. Dr. Knott started her federal government career in 2010 at the FAA performing test and evaluation for the ADS-B system, followed by six years as a data analyst at NASIC, and then joining AFRL in 2016. She is a member of the American Society for Nondestructive Testing (ASNT), the American Society for Quality (ASQ), and the American Statistical Association (ASA).

Abstract: 

 

The reliability of nondestructive inspection (NDI) systems is estimated using statistical methods, the most thorough of which is the Probability of Detection (POD) methodology. The Department of the Air Force uses periodic nondestructive re-inspection of critical structural components to maintain aircraft safety, and POD helps establish the length of inspection intervals. The established methods for POD will be provided, followed by a discussion of contemporary statistical research which extends and improves upon these methods.


David Han

Romo Endowed Professor, UT San Antonio
“Bayesian Generalized Linear Model for MTBF of Subsystems in Military Aircraft”

Speaker Bio: 

 

David Han M.S., Ph.D. is a Romo Endowed Professor at the University of Texas at San Antonio. He teaches statistics and data science. His research interests include statistical modeling and inference, machine learning, and artificial intelligence applied to lifetime analysis and reliability engineering.

Abstract: 

 

Ensuring a high level of reliability is essential for mission-focused operations, as it directly impacts success. Poor reliability can lead to significant consequences, including high repair and replacement costs, missed opportunities, service disruptions, and, in extreme cases, safety violations and loss of life. This research focuses on modeling the reliability of repairable subsystems within the framework of competing and complementary risks. We then extend this analysis to determine the overall reliability of a repairable system using group-censored maintenance data from aircraft systems. Assuming subsystem lifetimes follow non-identical exponential distributions, we justify modeling system reliability through Poisson processes. Given the complexity of parameter estimation, we adopt a Bayesian generalized linear model (GLM), incorporating manufacturer-supplied estimates for defining prior distributions. This methodology mitigates model uncertainty and the limitations of frequentist methods and allows for updates as new data become available.


David Steinberg

Professor, Tel Aviv University
“Entropy-Driven Design of Sensitivity Experiments”

Speaker Bio: 

 

David M. Steinberg is Professor Emeritus of Statistics at Tel Aviv University. He has published more than 160 papers in refereed journals along with several book chapters and conference proceedings. His field of research specialization is the statistical design of experiments, including factorial experiments, Latin hypercubes, computer experiments, robust parameter design experiments and seismic networks. He has worked on numerous applications in a variety of fields.

In 2013 he received the George Box Medal from the European Network for Business and Industrial Statistics (ENBIS). In 2017 he was awarded the Walter Shewhart Medal from the American Society for Quality. Both awards recognize lifetime contributions to research in quality and in industrial statistics.

From 2008-2010, he served as Editor of the leading journal Technometrics (and as Editor-Elect during 2007).  He is currently on the editorial boards of the Technometrics, Journal of Quality Technology and the International Statistical Review.

Abstract: 

 

Sensitivity experiments assess the relationship of a stress variable to a binary response, for example how drop height affects the probability of product damage. Sequential designs are used, with the data obtained thus far dictating the next stress level. For example, the popular Bruceton (up/down) experiment increases the stimulus if no response is observed and decreases it following a positive response. In this work we present a new entropy-driven design algorithm that chooses stress levels to maximize the mutual information between the next observation and the goals of the inference. The approach is parametric, so the goals could be the parameters themselves or functions of them, such as a quantile or a set of quantiles. We use a Bayesian analysis that quantifies the entropy of the goals from the current posterior distribution. The method can also be used when there are two or more stress variables (e.g. drop height and the hardness of the landing surface). We illustrate the use of the algorithm by simulation and on real experimental data. I will also present our interactive, shiny web app, which implements the algorithm and makes it a practical tool for planning, monitoring and analyzing sensitivity tests.

This work is joint with Rotem Rozenblum and Amit Teller.


Delante Moore

Research Staff Member, IDA (OED)
“Linear Mixed Effects Models in Continuous Operational Evaluation”

Speaker Bio: 

 

Dr. Moore joined OED in January 2025 and works on the Fire Support and Space/Strategic Systems Tasks, splitting time between the Land and Space Portfolios. His projects include
GMLRs and the Proliferated Warfighter Space Architecture. Prior to working at IDA, Dr. Moore served 23 years in the Army. He was also an Assistant Professor in both the Systems Engineering
and Math Departments at West Point. In addition to teaching, his career has included operations research analysis and planning at the operational level and in the intelligence community.

Abstract: 

 

This talk confronts a core challenge facing defense acquisition: how to deliver operationally effective, suitable and survivable capabilities at the pace of rapidly evolving threats and emerging technologies. The traditional serial sequence of contractor tests, developmental tests and a single operational test produces sparse data and often uncovers issues late in the program. Drawing on the recently released Warfighter Acquisition Strategy, this talk proposes a longitudinal approach to operational evaluation that continuously collects and analyses data across contractor, developmental and operational phases.

We demonstrate how linear mixedeffects models evaluate the longitudinal nature of testing by treating the test increment as a random effect, alongside random intercepts for platforms, crews and missions, while modelling design factors such as threat posture as fixed effects. This specification borrows strength across test phases, allowing the model to quantify variability among increments without assuming identical time effects and to estimate overall performance trajectories. Partial pooling accommodates sparse or missing observations, making it possible to detect degradation or improvement over time. A radardetection example illustrates how this approach supports datadriven decisions across CT, DT and OT and shifts operational data collection from a single endoftest event to a continuous practice integrated throughout the entire development process.


Mr. DJ Akers

Research Engineer, GTRI
“Panel: Digitally Transforming the Test and Evaluation Landscape”

Speaker Bio: 

 

DJ is a passionate Research Engineer I in the Digital Engineering Threads and Environments Branch at GTRI (Georgia Tech Research Institute). He earned his Bachelor of Science in Aerospace Engineering from Georgia Tech in December 2021 and joined GTRI as a full-time Model-Based Systems Engineer in January 2022. Recently, he has also earned a Master’s degree in Systems Engineering from Georgia Tech, further broadening his expertise in the field. Currently, DJ is a Growing Contributor supporting a diverse range of projects, where he specializes in applying the Systems Modeling Language (SysML) using the Cameo Systems Modeler (CSM) authoring tool. Additionally, he develops innovative tools and integrations leveraging the OpenAPI specification, pushing the boundaries of digital engineering capabilities. DJ’s primary research interests include harnessing digital engineering threads for advanced decision analysis, applying digital engineering practices to modernize legacy programs, and exploring the integration of human-centered design methodologies with formal systems engineering practices. His work reflects a strong focus on bridging cutting-edge technical tools with impactful solutions for complex engineering challenges. In his role, DJ remains committed to driving innovation and collaboration in the digital engineering space to support the next generation of systems engineering methodologies.

Abstract: 

 

Since the publication of the United States Department of Defense Digital Engineering (DE) Strategy in 2018 the Department, the services, and the supporting industrial base have been working to integrate digital engineering tooling that capture the designs of the complex weapon systems that the uniformed and civilian workforce depend upon to serve the nation. Innovation of digital engineering methods have advanced rapidly for system design, hardware and software design, modeling and simulation, and ultimately test and evaluation to ensure that systems are delivered rapidly with the most advanced capability. To no great surprise parallel development has led to a diverse set of products, data structures, and methods that don’t necessarily integrate. In an effort to maximize efficiency a team form Developmental Test, Evaluation, and Accreditation, Director, Operational Test and Evaluation, and university researchers from the Acquisition Innovation Research Center joined forces to mature tools and methods in support of advancing Test and Evaluation which will work seamlessly together to help ensure the realization of the DE Strategy. A panel of experts/practitioners from these organizations will host a panel discussion describing their work, the results, the benefits, and some of the struggles that they have faced in pursuit of the Department’s objectives.


Douglas Schmidt

Dean of the School of Computing, Data Sciences & Physics, William & Mary
“Navigating Our AI-Enabled Future in High-Stakes Domains”

Speaker Bio: 

 

Dr. Douglas C. Schmidt is the inaugural Dean of William & Mary’s School of Computing, Data Sciences & Physics. He previously served as the President-appointed, Senate-confirmed Director of Operational Test & Evaluation, advising the Secretary of Defense on testing Department of Defense systems. Earlier roles include Cornelius Vanderbilt Professor of Engineering, Associate Provost for Research, and Co-Director of the Data Science Institute at Vanderbilt University. He has also held research positions at Carnegie Mellon University’s Software Engineering Institute and served with DARPA and the Air Force Scientific Advisory Board. His research spans software patterns, optimization, middleware for cyber-physical systems, and prompt engineering for generative AI. He earned sociology degrees from William & Mary and computer science degrees from UC Irvine.

Abstract: 

 

In this talk I explore the rapid rise of generative AI and its transformative impact on software engineering, national security, healthcare, and other critical sectors. Drawing from my experience as the Director of Operational Test & Evaluation, I emphasize that AI should be viewed “as augmented intelligence,” enhancing human capability—not replacing it. I introduce a taxonomy mapping AI’s role in both software development and system operations and highlighted the importance of prompt engineering—both in the small (individual tasks) and in the large (entire lifecycle processes). I also urge a shift-left mindset in testing and a focus on AI tools for analyzing non-code artifacts, such as requirements and regulatory documents. I warn against overhyping AI’s capabilities or underestimating its risks, especially in high-stakes safety- and mission-critical domains. Ultimately, I advocate for embracing AI literacy as a professional imperative, likening today’s shift to the biggest technological transformation in 2,400 years—and calling on engineers to be co-architects of a trustworthy, AI-augmented future.


Elizabeth Gregory

Research Engineer, NASA Langley
“Data Curation for AI Training, Testing, and Validation”

Speaker Bio: 

 

Dr. Elizabeth Gregory began her career at ATK Aerospace Structures (now Northrop Grumman Utah) as a design and analysis engineer working on the Ares I composite structures design. She went on focus on NDE while earning her PhD in Aerospace Engineering at Iowa State University which is home to the Center for Nondestructive Evaluation.  During her nine years at NASA Langley Research Center she has developed an automated detection framework, parallelized simulation tool to utilize high-performance computing resources, and model assisted probability of detection tools.

Abstract: 

 

AI and machine learning tools are quickly becoming common place in almost all areas of science and engineering. The performance of AI/ML tools is often advertised as amazing but as many are learning the failures of such tools are often significant and surprising. It is possible to build a tool that performs better than humans, but when it does fail that failure can be catastrophic.
This presentation will focus on best practices for curating training data to ensure that AI performance is consistent across a feature space. It will discuss the best uses and limitations of augmented, simulated, and AI-generated data in the event that observed data is sparse. This talk will also present a case for clear disclosure of a validated performance feature space as opposed to open ended performance claims.
It is still in the early days of AI/ML adoption, and regulation is still to be written. With many AI tools making their way into fields where the real-world consequences of failure are dire now is the time to begin developing guidelines and best practices. This presentation is meant to contribute to that discussion.


Emma Meno

Research Associate, Virginia Tech
“Emerging Best Practices for Testing & Evaluation of Reinforcement Learning”

Speaker Bio: 

 

Emma Meno is a Research Associate with the Intelligent Systems Division at Virginia Tech National Security Institute (VTNSI). Emma’s technical expertise centers at the intersection of artificial intelligence/machine learning (AI/ML), cybersecurity and test & evaluation (T&E). At VTNSI, she serves as a primary technical contributor (including as PI and co-PI) for various projects. She also holds supervisory and leadership roles for pre-college and undergraduate experiential learning and outreach activities with cybersecurity applications. Her graduate thesis work focused on neural cryptanalysis, utilizing neural networks to predict encrypted bits as a measure of cipher strength. Emma holds her M.S. in Computer Science & Applications and B.S. in Computer Science from Virginia Tech.

Abstract: 

 

Reinforcement learning (RL) is an artificial intelligence (AI) technique based on “learning by interaction,” often presented as a candidate method for automated sequential decision making in dynamic and evolving environments. RL introduces unique challenges and constructs for test, including a state-action space, optimal policy search, underlying stochasticity, and continuous learning. While research and guidance are more thoroughly established for the testing & evaluation (T&E) of supervised learning methods (i.e. deep learning and computer vision), the T&E of RL remains more novel and less defined. This talk aligns RL verification, validation, and uncertainty quantification (VVUQ) with RL T&E to present emerging best practices. This talk will also present a survey of existing work in T&E of RL, much of which focuses on safety, robustness, and generalizability. Lastly, the talk will present considerations for RL test by walking through a network security use case. Overall, this presentation serves as a foundation step towards rigorously framing necessary methods and metrics for RL T&E.


Erin Lanus

Research Associate Professor, Virginia Tech National Security Institute
“Testing LLM Prompt Robustness with Sequence Covering Arrays”

Speaker Bio: 

 

Erin Lanus is a Research Associate Professor at the National Security Institute and Affiliate Faculty Computer Science at Virginia Tech. Her research interests include the adaptation of combinatorial testing to the input space of artificial intelligence and machine learning (AI/ML), metrics and algorithms for designing test sets with coverage-related properties, and data security concerns in AI/ML systems. Dr. Lanus received a Ph.D. in computer science and a B.A. in psychology, both from Arizona State University.

Abstract: 

 

Generative AI Large Language Models (LLMs) have evolved to demonstrate remarkable capability in generating human-like text. General, all-purpose models are increasingly being deployed in workflows consisting of a variety of tasks and across problem domains. Given their versatile capabilities, test and evaluation (T&E) of LLMs assesses a model across a variety of tasks, such as sentiment analysis, text completion, and question and answers. Multiple choice questions (MCQ) are a widely used evaluation task that can be performed across a variety of subject areas to assess an LLM’s understanding and reasoning capabilities within and across problem domains. Studies have demonstrated that LLM correctness on MCQ tasks varies not only on the problem domain but often varies depending on the ordering of provided options. This variation suggests a lack of prompt robustness, suggesting that prompts should be tested that include multiple prompt orderings. Given the need for rapid T&E and limited test budgets, there is a tradeoff between assessing performance across a variety of problem domains and estimating variation due to a lack of prompt robustness within a domain. While exhaustive testing of all prompt orderings is infeasible, random selection may not provide sufficient coverage. In this talk, we describe an approach based on combinatorial testing that leverages sequence covering arrays to design the test across the prompt orderings. We present results from experimental evaluation on the Measuring Massive Multitask Language Understanding (MMLU) dataset, a widely used MCQ dataset and evaluated the robustness of GPT 3.5 Turbo, a pre-trained LLM. The results demonstrate that this approach closely approximates the prompt robustness of the model measured by exhaustive testing yet using only a quarter of the number of tests.


Francesca McFadden

Graduate Student, University of Maryland Baltimore County
“Competence Estimation Impact in Multi Agent Systems”

Speaker Bio: 

 

Francesca R. McFadden – Francesca has worked on modeling, simulation, and analysis to evaluate system architectures at the Johns Hopkins University – Applied Physics Laboratory since 2010. She earned a bachelors degree in computational mathematics with an additional major in statistics from Carnegie Mellon University. Francesca earned her Masters degree in Applied Mathematics from North Carolina State University. She is a  doctoral candidate at the University of Maryland Baltimore County studying under the advisement of Dr. Matthias K. Gobbert.

Abstract: 

 

Ensemble learning employs voting schemes to combine, fuse, or select among the predictions of multiple base models. The base models may exhibit diverse assumptions depending on the application, e.g., atmospheric or kinematic conditions. Competence scores, and similarly motivated trust scores, aim to estimate base model suitability for prediction for a given input. An approach for enhancing voting schemes by considering model competence in prediction was described at presented at DATAWorks 2025 and published in the ITEA Journal September 2025 issue. The approach applied to an ensemble of classification models demonstrated that integrating competence score estimation into ensemble learning leads to better performance compared to the highest confidence selection strategy. The strategy has been extended to base to ensemble regression with a demonstration of similar impact, for both model deployment and model test and evaluation phases. The impact of the two competence enhanced ensemble learning approaches, on classification and regression, is explored in multi agent systems. The integration of the concept and a demonstration of its impact on a system will be shown.


Frank Liu

Research Staff Member, IDA
“”

Speaker Bio: 

 

Frank Liu is a research staff member at the Institute for Defense Analyses (IDA). His work sits at the intersection of artificial intelligence, education, and national security. Frank received his PhD. In Computer Engineering from Arizona State University in 2024 and was the Spring 2024 valedictorian. Frank received his Bachelor of Science in Electrical Engineering from the University of Washington. Frank combines his technical expertise in artificial intelligence, signal processing, human computer interaction, and education to teach about frontier artificial intelligence capabilities and how to rigorously evaluate the frontier artificial intelligence systems.

Abstract: 

 


‘);”>

Gillen Brown

Research Staff Member, IDA
“Using Agile Software Demonstrations for Iterative OT&E”

Speaker Bio: 

 

Gillen Brown is a Research Staff Member at the Institute for Defense Analyses. His work at IDA has focused on the operational test and evaluation of enterprise-scale software and satellite communications systems. Gillen earned a B.S. in Physics from the University of Missouri–Kansas City and a Ph.D. in Astronomy and Astrophysics from the University of Michigan.

Abstract: 

 

Modern software development practices allow programs to rapidly deliver software. Operational test and evaluation (OT&E) practices need to adapt to these new approaches. In this presentation, I will describe how operational testers can evaluate software systems using data from a combination of iterative software demonstrations and operational test events. Collecting test data during demonstrations allows operational testers to deliver early evaluations and shrink the scope of later test events.


Grayson Peyovich

United States Military Academy
“Optimizing Army Parts Spend for Operational Readiness: A Vantage Decision Dashboard”

Speaker Bio: 

 

Born and raised in Bellevue, WA, Grayson Peyovich is an Applied Statistics & Data Science Major at West Point. Grayson is passionate about advancing cybersecurity in both the military and civilian sectors by leveraging data analytics and machine learning to develop innovative solutions for enhanced security and protection. He developed these interests through applied research and active membership in the Cadet Competitive Cyber Club (C3T). Beyond the Academy, he has interned in the Army Research Facilitation Laboratory and at Anduril Industries. In addition, he has completed the U.S. Army’s Air Assault School. He was awarded the Superintendent’s Award for Achievement and has made the Dean’s List every semester. Grayson currently serves as a Squad Leader in Second Regiment, India Company. Grayson hopes to continue his research in graduate school following his time at West Point, and branch Cyber.

Abstract: 

 

The U.S. Army routinely makes thousands of parts-purchasing decisions under tight budget constraints, but the readiness consequences of those purchases are often unclear at the point of decision – particularly when lead times, demand signals, and cross-equipment tradeoffs are not visible in a single workflow. This project develops and evaluates a data-driven recommendation method, implemented in Army Vantage (Palantir Foundry), that translates supply and maintenance data into actionable, explainable spending priorities delivered through an interactive decision-support dashboard. The approach integrates equipment status and mission capability, unit ownership, material demand/priority indicators, historical order-to-delivery lead times, and cost information to recommend purchases under user-defined constraints (unit scope, budget range, and maximum lead time), with optional prioritization of pacing items and customizable equipment weights to reflect commander intent. The dashboard provides two complementary outputs: an individualized view that ranks candidate purchases by expected readiness impact and a portfolio view that assembles feasible bundles of purchases to maximize readiness within the same constraints. Each recommendation includes a brief “why” explanation – linking the material to affected equipment and summarizing impact, cost, and timing tradeoffs – to support defensible decisions and rapid scenario testing. The intended contribution is a live, operationally relevant tool that improves the transparency and consistency of readiness-driven spending decisions while establishing a foundation for iterative user feedback and future performance assessment.


Gregory Rubinstein

Research Staff Member, IDA
“Network Sensor Evaluation in a Cyber Range Environment”

Speaker Bio: 

 

Dr. Gregory Rubinstein is a Research Staff Member at IDA, where he primarily provides analytical support to the DOT&E Cyber Assessment Program. As part of this support to DOT&E, Dr. Rubinstein provides technical expertise to cyber assessments spanning from large-scale training exercises to tool testing in cyber range environments. His research areas include the analysis of cyber defenders and network sensors. Prior to coming to IDA, Dr. Rubinstein received his Ph.D. in Chemical Engineering from Princeton University and worked at an investment firm, where he modeled financial markets.

Abstract: 

 

This talk details the methodology employed to evaluate the performance of network sensors in a cyber range environment, and provides potential methodological improvements to facilitate more comprehensive data collection at future sensor assessments. Network sensors, which are a combination of hardware devices and software components, are operated by trained cyber analysts to detect, characterize, and respond to malicious cyber activity and network misconfigurations across a variety of networks and systems. Understanding the strengths and gaps of network sensors helps inform stakeholders and network owners on where to deploy these sensors, which can form a critical layer of the overall cybersecurity strategy. IDA supported a DOT&E-led sensor evaluation in October 2025, which assessed the performance of various sensors in detecting cyber adversarial behavior. These network-based sensors ingested packet capture (PCAP) data from network taps distributed throughout a cyber range environment, while a cyber Red Team aggressed the network. For this sensor evaluation, the employed methodology focused on ensuring that each sensor had access to the same PCAP data from the network taps. This tapping methodology provided each sensor with the PCAP data in a compatible format that allowed the sensor to utilize all of its capabilities. The results of this evaluation provide a framework for future studies of network sensors and their operators.


Hans Miller

Mitre
“Then What? The Need for Iterative Assessments to Achieve Successful Operational Capabiliti”

Speaker Bio: 

 

Hans Miller, Col USAF (ret), is a Chief Engineer for Research and Advanced Capabilities Department and a Senior Principal T&E SME at the MITRE Corporation. He has over 28 years of experience in combat operations, experimental flight test, international partnering, command and control, and transition strategies of defense weapon systems. Prior to his position at MITRE, Mr. Miller was the Division Chief of Policy, Programs and Resources at the USAF Headquarters for Test and Evaluation. Mr. Miller was the Commander of the 96th (now 406th) Test Group at Holloman AFB and the Commander of the Global Power Bomber Combined Test Force at Edwards AFB supporting B-1, B-2 and B-52 testing. Mr. Miller has experience working with international partners through a NATO assignment and as the program manager of the DoD Foreign Comparative Test Program. He has served as an operational and experimental flight test pilot in the B-1B and as an F-16 chase pilot. He flew combat missions in the B-1B in Operation Allied Force and Operation Enduring Freedom. Mr. Miller graduated from the United States Air Force Academy with a B.S. in Aeronautical Engineering and a Masters of Aeronautics and Astronautics from Stanford University. He is a graduate of the USAF Air War College, USAF Test Pilot School and USAF Weapons School.

Abstract: 

 

Hans Miller, Col USAF (ret), is a Chief Engineer for Research and Advanced Capabilities Department and a Senior Principal T&E SME at the MITRE Corporation. He has over 28 years of experience in combat operations, experimental flight test, international partnering, command and control, and transition strategies of defense weapon systems. Prior to his position at MITRE, Mr. Miller was the Division Chief of Policy, Programs and Resources at the USAF Headquarters for Test and Evaluation. Mr. Miller was the Commander of the 96th (now 406th) Test Group at Holloman AFB and the Commander of the Global Power Bomber Combined Test Force at Edwards AFB supporting B-1, B-2 and B-52 testing. Mr. Miller has experience working with international partners through a NATO assignment and as the program manager of the DoD Foreign Comparative Test Program. He has served as an operational and experimental flight test pilot in the B-1B and as an F-16 chase pilot. He flew combat missions in the B-1B in Operation Allied Force and Operation Enduring Freedom. Mr. Miller graduated from the United States Air Force Academy with a B.S. in Aeronautical Engineering and a Masters of Aeronautics and Astronautics from Stanford University. He is a graduate of the USAF Air War College, USAF Test Pilot School and USAF Weapons School.


Dr. Harrison Schramm

Chief Technologist and Co-Founder, Attritable Machines, LLC
“”

Speaker Bio: 

 

Harrison Schramm is the Chief Technologist and co-founder of Attritable Machines, LLC. Harrison’s career has spanned military, industry and academia. His current research interests are at the intersection of small scale computing, AI and public policy. In addition, Harrison teaches courses in contested logistics and applied analysis at the Naval Postgraduate School. Harrison is published in every major sub-discipline of Operations Research. He is a past president of the Analytics Society of INFORMS and a past Vice-President of the Military Operations Research Society. Prior to his academic and industry roles, Harrison was a helicopter pilot in the US Navy, retiring at the rank of Commander.

Harrison’s awards include: The Clayton Thomas Prize, the Richard H. Barchi Prize, Naval Helicopter Association’s Aircrew of the Year (Deployed), Air Medal (single action, US Navy). He was elected by his peers to be a Fellow of MORS in December 2025.


Ilean Keltz

Technical Advisor, CDAO
“Demystifying DoDD 3000.09: A Practical Introduction for Acquisition Professionals”

Speaker Bio: 

 

Dr. Ilean Keltz is a systems engineer and retired Army Colonel specializing in the application of data analytics and artificial intelligence to solve large-scale national security challenges. With a career spanning senior roles on the Joint and Army Staffs, she has a proven track record of leading complex, data-driven initiatives from concept to enterprise-level implementation.
Currently, through an Intergovernmental Personnel Act (IPA) assignment, Dr. Keltz leads a high-priority tiger team for the Chief Digital and Artificial Intelligence Office (CDAO), where she is focused on scaling autonomous programs and navigating the DoD’s rigorous AI approval processes.

Key Data & Analytics Experience:

  • Joint Chiefs of Staff, J8: As Deputy Division Chief, Dr. Keltz managed the Secretary of Defense’s $180M Global Force Management Data Initiative (GFM DI). She led the effort to automate situational awareness and deliver critical operational data for senior leader decision-making, overseeing a multi-million dollar budget and multiple technical support contracts.
  • Program Analysis & Evaluation, Army G8: Served as a Program Analyst responsible for programming over $93B in funding. Her role was pivotal in using data analysis to inform the Army’s five-year fiscal guidance and resource allocation.
  • Institute for Defense Analyses (Future Role): Will lead operational test and evaluation for major Army and Marine aviation programs, a role centered on data-driven analysis of platform performance and effectiveness.

Dr. Keltz holds a Ph.D. in Systems Engineering from George Mason University, where she now serves as an Adjunct Professor teaching Human-Computer Interaction, and a Master of Science in Operations Research from Old Dominion University. She is a graduate of the United States Military Academy at West Point.

Abstract: 

 

This presentation provides a clear and practical introduction to Department of Defense Directive (DoDD) 3000.09, “Autonomy in Weapon Systems.” As the DoD increasingly turns to autonomous and AI-enabled capabilities, understanding this foundational policy is critical for all acquisition, engineering, and program management professionals. The core intent of DoDD 3000.09 is to ensure that commanders and operators maintain appropriate levels of human judgment over the use of force, thereby minimizing the likelihood and consequences of unintended engagements.

This session will demystify the directive’s requirements, focusing on the mandatory two-stage senior-level review process required for all autonomous and semi-autonomous weapon systems prior to formal development and fielding. We will walk through the key documentation tabs—from Technical Data and CONEMP to System Safety, T&E, AI Ethics, and Legal—that form the body of evidence for these reviews.

A key takeaway for attendees will be the understanding that compliance with DoDD 3000.09 is not about creating a new and duplicative paperwork drill. This briefing will demonstrate how programs can effectively leverage existing artifacts from standard acquisition processes, such as Systems Engineering Technical Reviews (SETRs), Program Protection Plans (PPPs), and safety reviews, to satisfy the directive’s requirements efficiently. Attendees will leave with a clear roadmap for navigating the review process and an understanding of the resources available to support them.


Irene Ji

Research Statistician Developer, JMP Statistical Discovery LLC
“MaLT: Machine-Learning-Guided Test Design & Fault Localization of Complex Software Systems”

Speaker Bio: 

 

Irene Ji obtained her Ph.D. degree in statistics from Duke University in May 2024. She is now working as a Research Statistician Developer at JMP. Her research interests include design of experiments, uncertainty quantification, and computer experiments.

Abstract: 

 

Software testing is essential for the reliable and robust development of complex software systems. Consider a critical flight software system such as the Traffic Alert and Collision Avoidance System (TCAS), which must operate correctly across a wide range of scenarios to ensure aircraft safety. TCAS software depends on many interacting input parameters, leading to a combinatorial explosion of possible scenarios. Given the high cost of test execution and fault diagnosis, exhaustive testing is infeasible and so a combinatorial testing and machine-learning-based testing approach is desirable. We outline in this talk a holistic machine-learning-guided test case design and fault localization (MaLT) framework, which combines efficient combinatorial testing techniques with probabilistic machine learning methods to accelerate the testing and fault diagnosis of complex software systems. MaLT consists of three steps: (i) the construction of a suite of test cases using a covering array for initial testing, (ii) when test outcomes are available, the investigation of posterior root cause probabilities via a Bayesian fault localization procedure, then (iii) the use of such Bayesian analysis to guide selection of subsequent test cases via active learning. The proposed MaLT framework can thus facilitate efficient identification and subsequent diagnosis of software faults with limited test runs.


Jake Rizzo

Cadet, United States Military Academy at West Point
“A Satellite Image Segmentation Pipeline to Inform Autonomous Aerial Search Strategies”

Speaker Bio: 

 

I am a Second-Class Cadet at the United States Military Academy at West Point, where I am pursuing a degree in Operations Research within the Department of Mathematical Sciences. Beyond the classroom, I have academic experience working with special operations components to develop and integrate autonomous systems solutions with military applications. 

Abstract: 

 

Unmanned aerial systems (UAS) have rapidly evolved into critical assets for modern military and reconnaissance operations, offering the ability to operate at long ranges with reduced risk to personnel. However, the efficacy of autonomous search missions is often limited by a reliance on predefined, geometric flight paths such as spirals or lawnmower patterns that fail to account for the complexities of the underlying terrain. This paper presents the development of a novel pipeline that generates autonomous, probabilistic terrain maps to seed target belief management algorithms, used for target acquisition and search persistence in non-permissive
environments. The primary success of this research lies in the creation of a hybrid segmentation model that outperforms traditional deep learning architectures in terms of deployability and integration. While Convolutional Neural Networks (CNNs) and Transformers like SegFormer often require massive, labeled datasets and significant computational overhead, our approach integrates OpenStreetMap (OSM) data with the lightweight, probabilistic vegetation estimates of the Detectree algorithm. This combination allows for the generation of pixel-wise probability matrices that characterize terrain without the need for extensive retraining or specialized hardware. By avoiding rigid classification masks, the pipeline assigns probability vectors to every pixel, enabling the system to distinguish between high-belief areas relative to the expected behavior of a specific target class. This probabilistic output was successfully integrated into a target belief management algorithm, providing a nuanced input for belief initialization. A key feature of this work is the ability to shape initial belief based on mission-specific
parameters, weighting the search distribution according to the probability of different target types such as vehicles, hikers, or boats occupying specific terrain features. This ensures that the initial particle distribution reflects both environmental feasibility and tactical reality, providing broad flexibility to a range of different search missions. The evaluation of this pipeline demonstrates a superior balance between segmentation quality and computational feasibility. Ultimately, this work delivers a functional, rapidly deployable system that transforms raw satellite imagery into actionable, probabilistic data, providing a foundational capability for the next generation of context-aware autonomous search strategies.


Jake Curran

Cadet, West Point
“Transforming UAV Imagery into Queryable 3D Scene Representations”

Speaker Bio: 

 

Jake Curran is a fourth-year cadet at the United States Military Academy at West Point. His research applies artificial intelligence to convert UAV imagery into structured, queryable scene representations. His work focuses on reducing cognitive burden and enabling faster, mission-relevant insight from aerial intelligence to support military decision-making in time-constrained environments.

Abstract: 

 

The rapid introduction of unmanned aerial vehicles (UAVs) has transformed how militaries collect intelligence, offering persistent coverage on the modern battlefield. However, this advantage has created a new problem: commanders and analysts find themselves increasingly overwhelmed by vast volumes of imagery that are difficult to manually interpret and act upon in time-sensitive environments. This senior thesis addresses that gap by reframing UAV data not as imagery to be viewed, but as structured knowledge to be queried.

Rather than relying on manual exploitation of numerous video feeds, this work proposes an AI-enabled approach for converting UAV imagery into three-dimensional (3D) scene representations that capture what is present in an environment and how elements relate spatially. By organizing observations into coherent scene graphs, the system enables higher-level reasoning about terrain, objects, and constraints that matter operationally.

Equally important is how this information is accessed. A natural language interface allows users to ask mission-driven questions directly of the scene representation, reducing cognitive burden and shrinking the gap between raw sensor data and insight. The goal is decision advantage: faster comprehension, situational awareness, and informed choices under uncertainty for military decision-makers.

This research identifies one potential approach for efficiently querying video information in a way that supplements rather than replaces human decision-making.


James Theimer

STAT COE/Huntington Ingles Industries
“Missing data: Motivations, Methods and Examples”

Speaker Bio: 

 

Dr. James Theimer is a Scientific Test and Analysis Techniques Expert employed by Huntington Ingles Industries Technical Solutions and working to support the Homeland Security Center of Best Practices.
Dr. Theimer worked for Air Force Research Laboratory and predecessor organizations for more than 35 years. He carried out research on the simulations of sensors and devices as well as the analysis of data.

Abstract: 

 

Many datasets have datapoints that are not collected, also referred to as missing data. If the missing data are not assessed properly; they may result in biased conclusions being drawn. If the probability of the missing data does not depend on any predictor variable it is called missing completely at random (MCAR) and missing data can be ignored without biasing results. If the probability of being missing is dependent on some fully observed predictor variable, then the data are called missing at random (MAR), and it may be possible to develop a statistical model of the probability of the data being missing given the fully observed variables. If the probability of the missing data cannot be predicted using some fully observed variable it is called not missing at random (NMAR). NMAR data can lead to biased conclusions, but it is impossible to know what they may be. For example, in a satisfaction survey the results will be biased if either very satisfied, or very unsatisfied people are more likely to complete the survey, but one does not know who is not answering the survey. Examples will be shown based on publicly available border crossing data. If the data can be collected in groups based on fully observable data, then the data can be weighted to correct bias. If a Bayesian model can be fit to the data, then the value of missing data can be estimated for a given value of the fully observed variables. Examples of these methods will be shown.


James Moreland, Jr.

Principal Engineer, MEI Innovative Solutions Inc
“Integrating Mission Engineering with Systems Engineering to Enable Navy Ship Design”

Speaker Bio: 

 

Dr. James Moreland, Jr. retired from the Senior Executive Service at the U.S. Department of Defense in 2020 and subsequently served as Vice President of Strategy and as Vice President of Mission Engineering and Integration at Raytheon Technologies, concluding five years of service with the company in April 2025. Prior to this industry experience, he served as the Executive Director Mission Engineering and Integration, Deputy Assistant Secretary of Defense for Tactical Warfare Systems, and Executive Director Naval Warfare in the Office of the Secretary of Defense. Prior to these Senior Executive positions, Dr. Moreland served as the Chief Engineer of the Naval Surface Warfare Center Dahlgren Division. He has served as a senior executive advisor to the National Science Board and White House Office of Science and Technology and will be serving on the Air Force Scientific Advisory Board starting in 2026. Dr. Moreland’s many honors include the OSD Exceptional Civilian Service Award, Navy Distinguished Civilian Service Award, Navy Superior Civilian Service Award, and multiple Joint and Navy unit commendations, among others. He is an internationally published expert, recipient of best technical paper awards, and a frequent speaker on mission engineering, complex systems integration, and defense strategy

Abstract: 

 

This presentation examines how the U.S. Navy Mission Engineering and Integration (MEI), integrated with model-based systems engineering (MBSE) and system-of-systems methods, can be applied to modern naval ship design to ensure alignment with operational objectives and evolving threats. Building on the Navyâ’s ME-to-SE strategy, it connects foundational MEI principles with contemporary ship design methodologies practiced within Virginia Tech’s Aerospace and Ocean Engineering Department, where multiple analytical tools are integrated to optimize performance, survivability, cost, and mission effectiveness. The presentation demonstrates that maturing MEI artifacts within a model-based digital environment strengthens traceability from mission need to architecture, requirements, and design decisions, extending a coherent digital thread within a Digital Engineering Ecosystem from mission analysis through design synthesis.


Jamie Thorpe

Sandia National Laboratories
“Pursuing Digital Assurance for High Consequence Systems through Semantic Reasoning”

Speaker Bio: 

 

Jamie Thorpe is a cybersecurity researcher at Sandia National Laboratories in Albuquerque, New Mexico, where she models and analyzes critical infrastructure systems. Her research interests include cyber resilience, system model development, and rigorous cyber experimentation. Jamie earned her BS in Mathematics and Computer Science from Millersville University in 2017 and her MS in Information Security from Carnegie Mellon University in 2019

Abstract: 

 

Digital technologies permeate most of the systems we use day to day. Not only are these technologies foundational to everyday life, but they increasingly make up key components of some of our most critical systems. High consequence systems are systems which fulfill a very specific mission, and which could result in grave consequences should they fail. Such systems include hypersonics and space systems. A growing concern is the digital assurance (or cyber assurance) of these systems. Digital assurance refers to the ability to ensure traditional requirements of a digital system are met while also protecting against compromise or subversion. Cybersecurity practices, supply chain evaluation, and system risk assessment could all support the digital assurance of a system. Today, the issue of rapidly and efficiently evaluating digital assurance of high consequence systems with rigor and confidence poses a significant challenge.

In the long-term, rigorous approaches to holistic, system-level risk assessment would significantly improve our ability to rapidly and efficiently assure high consequence systems. However, there exist a variety of obstacles to achieving this vision. Several such obstacles are not necessarily technical but are largely embedded in the culture and the language we use to assess individual digital technologies today. Given that holistic system assessments require engagement and collaboration from multidisciplinary teams, language differences can be a significant hinderance to success.

  •  Conflicting terminology causes language barriers between relevant communities, researchers, and system experts
  •  Valuable digital assurance techniques produce specialized outputs, making it difficult for results to compose and build on each other
  •  The community at large lacks methods to assess aggregate implications of a series of digital assurance assessments against the holistic system

To start to address some of these challenges, a team of researchers proposed an effort entitled Jabberwocky: Translating Method Outputs to a Common Reasoning Language for High Consequence Systems. Jabberwocky aims to provide common language and structure to support easier planning, execution, reporting, and overall effective communication of system-level risk assessment across a multidisciplinary assessment team. Over the past two years, Jabberwocky has used expert elicitation and concept mapping to explore this research space and look for gaps which could be filled by established semantic reasoning-based solutions.

This talk will focus on the approach taken by the Jabberwocky team to tackle the early stages of a long-term problem. We will discuss both our successes and failures from this work, and why we believe that concept mapping, ontology development, and semantic reasoning will be critical to finding long-term solutions in this space. This talk will also frame the work of Jabberwocky in the context of the broad problem of digital assurance for high consequence systems, and what Sandia National Laboratories is doing to help tackle the problem.

This talk comments on evaluating cybersecurity and digital assurance in high consequence systems (e.g., space systems, hypersonics), and is relevant to the workshop topic of Advancing Test & Evaluation of Emerging and Prevalent Technologies.


Dr. Jane Pinelis

Chief Scientist for Special Operations, Johns Hopkins University Applied Physics Laboratory
“AI Assurance for Operational Use of Generative AI”

Speaker Bio: 

 

Dr. Jane Pinelis serves as Chief Scientist for Special Operations at the Johns Hopkins University Applied Physics Laboratory, where she leads the development, assurance, and operational integration of advanced AI capabilities for mission-critical defense applications. Previously, she was the inaugural Chief of AI Assurance at the Department of Defense’s Chief Digital and Artificial Intelligence Office and Joint Artificial Intelligence Center, where she directed test and evaluation and responsible AI efforts across the Department. Her career also includes leadership roles supporting Project Maven, the Office of the Director of Operational Test and Evaluation, and the Marine Corps Operational Test and Evaluation Activity. Dr. Pinelis has built her national security career at the intersection of AI assurance, operational testing, and defense innovation. Dr. Pinelis holds a PhD in Statistics from the University of Michigan, Ann Arbor.

Abstract: 

 

Co-Presentation with Dr. Julie Obenauer-Motley.

Generative AI (GenAI) tools are rapidly moving from experimentation to everyday use across operational and enterprise environments. Yet adoption does not happen simply because a tool is powerful. Empowering people to use GenAI effectively requires justified confidence that the system will perform reliably under real-world conditions. This talk explores practical lessons learned from deploying GenAI across DoW applications, with a focus on building and communicating assurance for non-technical stakeholders.

At the heart of successful GenAI adoption is a clear understanding of operational realities. In real-world environments, users face time pressure, incomplete information, competing priorities, and evolving mission needs. AI systems be resilient to these realities and perform as intended. We begin by examining how we identify the conditions that are most likely to impact operational outcomes: What tasks are users trying to accomplish? What types of errors would have meaningful consequences? How does the GenAI empower and enable the user?

The second focus of the talk is translating operational realities into meaningful test and evaluation approaches. GenAI generates open-ended and varied outputs and may behave differently across contexts. We give examples of evaluations to test performance from real-world GenAI implementation. This included assessing where GenAI may provide benefit in a workflow, designing experiments and identify metrics to assess those benefits, and evaluating performance as a function of the mission needs. Importantly, we also discuss how to evaluate and structure assurance for user interactions with the GenAI workflows. Our experience highlights the importance of iterative testing, structured experimentation, and scenario-based evaluation that reflects operational pressures.

Finally, we address how to communicate assurance in ways that enable informed decision making. Technical reports and performance statistics alone rarely build user confidence. Instead, assurance evidence must be translated into clear, decision-relevant insights. We share approaches for presenting findings in accessible formats that help leaders and users understand where the system performs well, where caution is warranted, and how to apply appropriate safeguards.

This talk offers practical insights and examples for those seeking to leverage GenAI across their workflows. Attendees will leave with a clearer understanding of how to align AI assurance with real-world needs, design meaningful evaluation strategies, and communicate evidence in ways that empower decision makers.


Jitesh Panchal

Professor and Associate Head of Undergraduate Programs, Purdue University
“Developing Multi-Fidelity Test Plans for Evolving and Heterogeneous AI-Enabled Systems”

Speaker Bio: 

 

Dr. Jitesh Panchal is a Professor and Associate Head of Undergraduate Programs in the School of Mechanical Engineering at Purdue University. He received his BTech (2000) from the Indian Institute of Technology (IIT) Guwahati and MS (2003) and PhD (2005) in Mechanical Engineering from Georgia Tech. His research interests are in (1) design at the interface of social and physical phenomena, (2) computational methods and tools for digital engineering, and (3) secure design and manufacturing. He is a recipient of the NSF CAREER award and the Distinguished Alumni Award from IIT Guwahati. He received the Young Engineer Award, Guest Associate Editor Award, and three best paper awards from the ASME CIE division. He was recognized by the B.F.S. Schaefer Outstanding Young Faculty Scholar Award, the Ruth and Joel Spira Award, and as one of the Most Impactful Faculty Inventors at Purdue University. Dr. Panchal has co-authored two books and co-edited one book on engineering systems design. He served as an NSF Expert for the Engineering Design and Systems Engineering (EDSE) program and as an Associate Editor for the ASME Journal of Mechanical Design (JMD) and the ASME Journal of Computing and Information Science in Engineering (JCISE).

Abstract: 

 

The test and evaluation (T&E) of AI-enabled and autonomous systems faces growing challenges due to high costs, long timelines, and the increasing use of heterogeneous AI components that evolve over time. These challenges are particularly acute in Department of Defense acquisition contexts, where AI/ML components are often developed by third-party vendors, limiting access to training data, developmental test results, and assumptions about operational environments. As a result, existing AI T&E approaches do not adequately support operational decision-making under realistic constraints.

This talk presents a risk-informed, multi-fidelity approach to developing efficient test plans for AI-enabled systems that leverages information generated throughout the systems engineering lifecycle. By strategically combining results from lower-fidelity and developmental test environments with targeted high-fidelity operational testing, the proposed approach enables decision-makers to reduce uncertainty about requirement satisfaction while controlling cost and schedule. Sequential test planning methods are used to prioritize tests based on their expected value in reducing risk relative to mission and performance requirements. The approach is demonstrated using an autonomous vehicle (AV) perception system example, showing how informed combinations of tests across multiple fidelity levels can achieve confidence in system performance at substantially lower cost than relying solely on high-fidelity testing. The results illustrate how multi-fidelity, information-driven T&E can support integrated testing, continuous evaluation, and more timely acquisition decisions for AI-enabled autonomous systems.


Joe Gregory

Assistant Research Professor, University of Arizona
“”

Speaker Bio: 

 

Dr. Joe Gregory is an assistant research professor in the Department of Systems and Industrial Engineering at the University of Arizona. His research interests include digital engineering, model-based systems engineering, and semantic technologies. His current research focuses on the development of the Digital Engineering Factory, and the development of ontologies to support digital systems engineering. He is the co-chair of the Digital Engineering Information Exchange (DEIX) Ontology Working Group.

Abstract: 

 


John Introne

Undergraduate Student, College of William and Mary
“Fault Detection and Accommodation of Pressure Measurements in Hypersonic Vehicles”

Speaker Bio: 

 

John Introne is a junior at William & Mary studying mathematics and computer science. He has conducted research on hypersonic vehicle problems since his freshman year under the mentorship of Dr. Greg Hunt, an associate professor in the William & Mary math department. He also works on mechanistic interpretability problems in the SEMERU software engineering lab at William & Mary under the direction of Dr. Denys Poshyvanyk. He hopes to use machine learning to solve the most demanding challenges in engineering applications and beyond.

Abstract: 

 

The development of air-breathing hypersonic vehicles has significant implications in military defense and reconnaissance, allowing for more maneuverable and quicker-striking vehicles. However, these vehicles pose significant challenges for test and evaluation due to the range of extreme operating conditions and sensitive fluid dynamics. When these vehicles reach supersonic speeds, a series of shockwaves form inside the engines, causing drastic increases in pressure and temperature and decreases in air speed, which aids combustion. The shock train leading edge (STLE), the position of the first shockwave and point where pressure increases, is a critical metric for vehicle performance and safety. Ideally, the STLE resides close to the opening of the engine to maximize the pressure rise, but it moves around the engine rapidly and is inherently unsteady, and the ejection of the STLE through the inlet leads to engine unstart. Thus, accurately tracking the STLE is essential to controlling it and maintaining safe and efficient flight. Systems using pressure measurements from inside the engine have shown success in tracking the STLE, but the hostile environment often means that sensors will fail over the course of sustained flight, severely limiting the utility and reliability of such tracking systems due to distorted sensor information and complicating performance evaluation of the engine under realistic operating conditions. This work proposes and evaluates machine learning-based approaches, especially neural network autoencoder models, to identify and correct faulty pressure measurements that result from a variety of failure modes in real time. Using wind-tunnel and simulated datasets containing pressure and STLE measurements and simulating fault modes, this work assesses the models€™ abilities to identify anomalous pressures at a high rate, accurately reconstruct degraded signals, and preserve STLE prediction. Results demonstrate that these methods consistently detect problematic faults and drastically improve STLE prediction and uncertainty relative to baselines, allowing for robust and reliable STLE tracking in real-world flight conditions. These findings illustrate the utility of data-driven machine learning techniques as part of the test and evaluation framework for hypersonic systems in extreme yet realistic environments.


John Dennis

Research Staff Member, Institute for Defense Analyses
“AI Assurance”

Speaker Bio: 

 

John W. Dennis is a Research Staff Member at the Institute for Defense Analyses, where he is a member of the AI Assurance, Test Science, and Talent Management, Readiness, and Resilience groups.  Jay specializes in Econometrics, Statistics, and Data Science and focuses his work in the Department’s AI Assurance, AI T&E, and Human Capital Management communities. His work involves developing and evaluating statistical and econometric methodology for use in Human Capital and Test & Evaluation communities; applications of forecasting, causal inference, and prospective policy analysis for personnel management; development of guidance for development, evaluation, and use of artificial intelligence and machine learning enabled capabilities; and evaluation and assurance of artificial intelligence within the Department. He joined IDA after receiving his PhD in Economics from the University of North Carolina at Chapel Hill, where he specialized in Econometric Theory and Financial Econometrics.

Abstract: 

 

CDAO defines AI assurance as the grounds for sufficient confidence that an AI system, while operating within its defined scope, will achieve its intended outcomes without introducing unacceptable risks, throughout its lifecycle. We’ve been discussing AI Assurance for years now, yet it still feels nascent. We discuss existing concepts for producing and sustaining AI assurance, offer some practical guidance on managing the assurance of an AI-enabled capability through the lens of a recent case study using a ML-enabled personnel system, and consider paths forward for the AI assurance community and AI assurance in DoW.


John Haman

Research Staff Member, IDA
“Statistics in support of policy in the updated DOT&E Hard Armor Test Protocol”

Speaker Bio: 

 

Dr. Haman leads the Test Science Team, a resource on statistical and human factors rigor in T&E. In this role, he ensures the statistical quality of OED’s contributions to the DOT&E mission by working with his team and across the portfolios. He also contributes individually to various projects at IDA, most recently as an analyst on performance-based logistics and a coauthor on DOT&E’s latest procedure for hard armor live fire testing. His overall research interest is identifying effective and pragmatic statistical methods that align with DoD’s assumptions and analytic goals.

Abstract: 

 

In 2025, The Department of War updated its guidance on the design and analysis of tests of hard body armor. It was the first update in 15 years. The update was necessary to recommend test procedures compatible with new sizes and types and armor, and to clarify test procedures. Throughout the protocol, statisticians supported policy and subject-matter experts by quantifying the risks and trade-offs of different potential test procedures.

I will provide two examples of how statistics support the policy, and discuss some lessons that I learned about how statistics can better support government test policy.


Jose Alvarado

Technical Advisor, AFOTEC Det 5
“Resource Implications and Benefits of Model-Based Acquisition Planning”

Speaker Bio: 

 

JOSE ALVARADO, Ph.D., is a senior test engineer and technical advisor for AFOTEC Detachment 5 at Edwards AFB, California, with over 34 years of developmental and operational test and evaluation experience. His research focuses on improving flight test engineering by applying MBSE (Model-Based Systems Engineering) concepts and implementing Model-Based Test and Evaluation (MBTE) to refine test processes. Dr. Alvarado holds a BS in Electrical Engineering from California State University, Fresno (1991), an MS in Electrical Engineering from California State University, Northridge (2002), and a PhD in Systems Engineering from Colorado State University (2024). He serves as an adjunct faculty member for the engineering and technical education departments at Antelope Valley College. He is a member of the ITEA, Antelope Valley Chapter and the INCOSE Colorado State University Chapter.

Abstract: 

 

The Department of Defense (DoD) Test and Evaluation (T&E) community is advancing digital engineering by adopting and developing model-based testing methodologies within the flight test community. This article expands upon the existing grey box model-driven test design (MDTD) approaches and incorporates flight test information into Systems Modeling Language (SysML) models. These models leverage model-based systems engineering (MBSE) artifacts to generate flight test plans. Additionally, this article introduces a methodology developed by the Air Force Operational Test and Evaluation Center (AFOTEC) to guide the operational test and evaluation (OT&E) test planning process. This methodology uses MBSE to create, develop, and maintain the information required for test plans within a digital model construct. These case studies demonstrate the methodology using SysML implemented through an MDTD process and showcase the methodology’s applicability to different systems under test (SUT). The results include a set of System Usability Survey (SUS) metrics that measure the method’s ability to utilize SysML elements within the model and assess its usability and effectiveness in generating flight test plans. Furthermore, the paper also discusses the method’s applicability to various scenarios, the benefits of model-based testing, and its relevance in the context of operational flight testing.


Julie Obenauer-Motley

Sr AI Technical Advisor, Johns Hopkins University Applied Physics Laboratory
“AI Assurance for Operational Use of Generative AI”

Speaker Bio: 

 

Dr. Julie Obenauer-Motley is a Senior AI Technical Advisor at the Johns Hopkins University Applied Physics Lab. Their current work focuses on advising senior U.S. and Department of Defense leaders on AI policy, governance, and implementation. They lead the AI and Autonomy for Strategic Advantage capability advancing AI strategy, research, and operational integration. Their work translates the complex realities of AI into actionable guidance, enabling institutions to harness AI while managing risk with precision and foresight. They have led strategic workshops and high-level working groups on generative AI, high-consequence AI applications, and human enhancement technologies. They have also authored and supported guidance, policy, and practice across the DoD, including within CDAO, T&E, and OUSD directorates and international fora.

Abstract: 

 

Generative AI (GenAI) tools are rapidly moving from experimentation to everyday use across operational and enterprise environments. Yet adoption does not happen simply because a tool is powerful. Empowering people to use GenAI effectively requires justified confidence that the system will perform reliably under real-world conditions. This talk explores practical lessons learned from deploying GenAI across DoW applications, with a focus on building and communicating assurance for non-technical stakeholders.

At the heart of successful GenAI adoption is a clear understanding of operational realities. In real-world environments, users face time pressure, incomplete information, competing priorities, and evolving mission needs. AI systems be resilient to these realities and perform as intended. We begin by examining how we identify the conditions that are most likely to impact operational outcomes: What tasks are users trying to accomplish? What types of errors would have meaningful consequences? How does the GenAI empower and enable the user?

The second focus of the talk is translating operational realities into meaningful test and evaluation approaches. GenAI generates open-ended and varied outputs and may behave differently across contexts. We give examples of evaluations to test performance from real-world GenAI implementation. This included assessing where GenAI may provide benefit in a workflow, designing experiments and identify metrics to assess those benefits, and evaluating performance as a function of the mission needs. Importantly, we also discuss how to evaluate and structure assurance for user interactions with the GenAI workflows. Our experience highlights the importance of iterative testing, structured experimentation, and scenario-based evaluation that reflects operational pressures.

Finally, we address how to communicate assurance in ways that enable informed decision making. Technical reports and performance statistics alone rarely build user confidence. Instead, assurance evidence must be translated into clear, decision-relevant insights. We share approaches for presenting findings in accessible formats that help leaders and users understand where the system performs well, where caution is warranted, and how to apply appropriate safeguards.

This talk offers practical insights and examples for those seeking to leverage GenAI across their workflows. Attendees will leave with a clearer understanding of how to align AI assurance with real-world needs, design meaningful evaluation strategies, and communicate evidence in ways that empower decision makers.


Justin Krometis

Research Associate Professor, Virginia Tech National Security Institute
“Recent Methodological Advances for Integrated T&E”

Speaker Bio: 

 

Justin Krometis is a Research Associate Professor with the Virginia Tech National Security Institute and holds an affiliate position in the Math Department at Virginia Tech. He has led the Integrated Testing & Continuous Evaluation line of effort for the Acquisition Innovation Research Consortium’s support of DOT&E for the last three years. His research is largely in the development of theoretical and computational frameworks for Bayesian data analysis. These include approaches to incorporating and balancing data and expert opinion into decision-making, estimating model parameters, including high- or even infinite-dimensional quantities, from noisy data, and designing experiments to maximize the information gained. His research interests include: Parameter Estimation, Uncertainty Quantification, Experimental Design, High-Performance Computing, Artificial Intelligence/Machine Learning (AI/ML), and Reinforcement Learning.

Abstract: 

 

This session will describe recent work on integrating disparate data sources, such as from developmental and operational testing, to build common understanding of system behavior and to inform subsequent design of tests. The tutorial will compare frequentist and Bayesian approaches for conducting this data synthesis and describe associated pitfalls and best practices. The tutorial will walk through examples on example test data describing reliability, accuracy, and hit/miss or pass/fail. Finally, it will describe efforts to build these methods into hands-on tools to facilitate incorporation into defense and other programs. We will then conclude with a short opportunity for discussion about shortfalls that might be preventing adoption and where research needs to go to translate these methods and tools into practice.


Kathleen Bostick

Researcher, ARLIS RISC
“Convergent Threats through the Intersection of Biotechnology and AI/ML”

Speaker Bio: 

 

Kathleen Bostick graduated magna cum laude from Spelman College with a Bachelor of Science degree. Her research experience began in plant biology, where she co-authored two publications identifying the effects of antibiotic exposure on soil and plant root-associated microbiota. She later conducted research in cognitive neuroscience at Emory University’s National Primate Center. The culmination of her work resulted in two more publications examining metacognition and compositional processing in rhesus macaques. In 2025, she joined the University of Maryland as a RISC intern and currently serves as a full-time researcher identifying emerging biotechnology. She co-authors this project with Trevor Casey.

Trevor Casey is currently pursuing a degree in Data Science with a focus in Biotechnology at The George Washington University. Over the past four years, he has developed a multidisciplinary, data-driven approach to research through various roles as a research assistant, contributing to projects ranging from color pattern evo-devo to global food system reform. In 2025, Trevor joined the summer RISC program as an intern, where he continues to support innovative research in parallel to his full-time academic studies.

Abstract: 

 

Artificial intelligence and machine learning (AI/ML) are advancing biotechnology at an unprecedented rate. The convergence of these technological domains is playing a pivotal role in protecting national security and influencing asymmetric advantage in the scientific field. It is vital to strengthen the Nation’s understanding of how AI/ML is influencing biotechnology. The convergence of genetic engineering (a domain within biotechnology) with AI/ML presents a unique opportunity to assess its [genetic engineering’s] innovations, articulate the forces that influence those innovations, and identify key stakeholders.

Specifically, this research project is motivated to address how AI/ML are being applied to advance techniques and approaches in precision medicine (PM), and what factors most accelerate or inhibit biotechnological innovation. The project’s intended outputs include a mapping of the current bio-AI/ML convergent efforts, the identification of factors that accelerate/decelerate biotechnological research, and metrics that can be used to assess accelerators and decelerators. The project is employing a systematic approach that includes mapping where AI/ML is successfully being integrated into biotechnology Research & Development (R&D) pipelines. Major findings from this mapping will illuminate why biotechnology tools, such as breakthrough gene editing platforms, are strategic assets to the Nation’s domestic innovation landscape. It will also provide the foundational groundwork necessary to begin assessing the bioconvergence ecosystem between the U.S. and adversarial countries.

A central dimension of this work is the identification and development of quantifiable signals that enable continuous, lifecycle-based evaluation of PM innovation drivers. As such, a key aspect of this project approach is the use of structured metric design and open data assessment to track how factors, such as government R&D investment, cross-sector collaboration, and system interoperability, accelerate or decelerate innovation over time. Future work will include the characterization of factors that influence the rate of innovation, continued efforts in the open data assessment of metrics being used to assess accelerators and decelerators, the identification and assessment of policies that influence the rate of innovation, and a deeper step-wise data evaluation of the bio-AI/ML capabilities shaping bioconvergence in the U.S. and adversarial countries. Lastly, future research aims to identify validation approaches to assess the reliability and accuracy of the project findings.


Kathryn Toppin

Research Staff Member, Institute for Defense Analyses
“How to Run Effective Focus Groups”

Speaker Bio: 

 

Dr. Kathryn Toppin serves as a Research Staff Member at the Institute for Defense Analyses, currently supporting the DOT&E Cyber Assessment Program. Her work focuses on the vital intersection of cyber operations—both offensive and defensive—and Operational Test and Evaluation (OT&E). Kathryn’s technical background is uniquely informed by nearly a decade of experience in the tech sector as a user researcher, where she honed her expertise in qualitative methods to build high-impact, user-centered experiences for cross-functional teams.

Abstract: 

 

This 90-minute mini-tutorial introduces the fundamentals of designing and conducting effective focus groups. Whether you are new to formal research or an experienced researcher looking to expand your methodology, this session covers the essentials: defining a clear purpose, crafting engaging questions, and applying a structured framework for data analysis. You will also learn how to set up your workflow, learn practical facilitation techniques to manage group dynamics, avoid common pitfalls, and using AI tools to help with planning and data analysis. Attendees will leave with a repeatable framework and actionable tips to conduct their focus groups with confidence.


Kelli Esser

Chief Strategy Officer, Virginia Tech National Security Institute (VTNSI)
“Panel: Digitally Transforming the Test and Evaluation Landscape”

Speaker Bio: 

 

Dr. Kelli Esser is Chief Strategy Officer at the Virginia Tech National Security Institute, where she leads initiatives that connect cutting-edge research with real-world mission needs across artificial intelligence, strategic planning, organizational transformation, and workforce development. With nearly 20 years of experience spanning government, academia, and industry, she has advised senior leaders on research investment, capability development, and enterprise modernization. Previously, she served as Strategic Advisor for R&D Coordination at the Department of Homeland Security. Dr. Esser holds a Ph.D. in Biophysics and M.A. Applied Economics from Johns Hopkins University, and a B.S. in Biochemistry from Duquesne University.

Abstract: 

 

Since the publication of the United States Department of Defense Digital Engineering (DE) Strategy in 2018 the Department, the services, and the supporting industrial base have been working to integrate digital engineering tooling that capture the designs of the complex weapon systems that the uniformed and civilian workforce depend upon to serve the nation. Innovation of digital engineering methods have advanced rapidly for system design, hardware and software design, modeling and simulation, and ultimately test and evaluation to ensure that systems are delivered rapidly with the most advanced capability. To no great surprise parallel development has led to a diverse set of products, data structures, and methods that don’t necessarily integrate. In an effort to maximize efficiency a team form Developmental Test, Evaluation, and Accreditation, Director, Operational Test and Evaluation, and university researchers from the Acquisition Innovation Research Center joined forces to mature tools and methods in support of advancing Test and Evaluation which will work seamlessly together to help ensure the realization of the DE Strategy. A panel of experts/practitioners from these organizations will host a panel discussion describing their work, the results, the benefits, and some of the struggles that they have faced in pursuit of the Department’s objectives.


‘);”>

Kelly Tran

Researcher, IDA
“Network Sensor Evaluation in a Cyber Range Environment”

Speaker Bio: 

 

Dr. Kelly Tran is a researcher at the Institute for Defense Analyses, focusing on cybersecurity and test and evaluation.

Abstract: 

 

This talk details the methodology employed to evaluate the performance of network sensors in a cyber range environment, and provides potential methodological improvements to facilitate more comprehensive data collection at future sensor assessments. Network sensors, which are a combination of hardware devices and software components, are operated by trained cyber analysts to detect, characterize, and respond to malicious cyber activity and network misconfigurations across a variety of networks and systems. Understanding the strengths and gaps of network sensors helps inform stakeholders and network owners on where to deploy these sensors, which can form a critical layer of the overall cybersecurity strategy. IDA supported a DOT&E-led sensor evaluation in October 2025, which assessed the performance of various sensors in detecting cyber adversarial behavior. These network-based sensors ingested packet capture (PCAP) data from network taps distributed throughout a cyber range environment, while a cyber Red Team aggressed the network. For this sensor evaluation, the employed methodology focused on ensuring that each sensor had access to the same PCAP data from the network taps. This tapping methodology provided each sensor with the PCAP data in a compatible format that allowed the sensor to utilize all of its capabilities. The results of this evaluation provide a framework for future studies of network sensors and their operators.


Kenneth Senechal

Senior Technical Advisor, KBR, Supporting the Test Resource Management Center
“Panel: Digitally Transforming the Test and Evaluation Landscape”

Speaker Bio: 

 

Mr. Senechal has 30 years of experience in DoD acquisition as both a member of industry and the govt. While spending most of his career in T&E, he has had tours in program management and systems engineering. He is a graduate of the US Naval Test Pilot School and a NAVAIR Associate Fellow.

For the last decade, Mr. Senechal has been on the front line of transforming how the DoW does acquisition by leading transformation efforts at NAVAIR, Dept of Navy, and ultimately OSW levels. He is currently supporting the Office of the Under Secretary of War for Research and Engineering Test Resource Management Office as they stand up the DoW-wide digital engineering infrastructure capability, methodologies, and practices to support acquisition programs.

Abstract: 

 

Since the publication of the United States Department of Defense Digital Engineering (DE) Strategy in 2018 the Department, the services, and the supporting industrial base have been working to integrate digital engineering tooling that capture the designs of the complex weapon systems that the uniformed and civilian workforce depend upon to serve the nation. Innovation of digital engineering methods have advanced rapidly for system design, hardware and software design, modeling and simulation, and ultimately test and evaluation to ensure that systems are delivered rapidly with the most advanced capability. To no great surprise parallel development has led to a diverse set of products, data structures, and methods that don’t necessarily integrate. In an effort to maximize efficiency a team form Developmental Test, Evaluation, and Accreditation, Director, Operational Test and Evaluation, and university researchers from the Acquisition Innovation Research Center joined forces to mature tools and methods in support of advancing Test and Evaluation which will work seamlessly together to help ensure the realization of the DE Strategy. A panel of experts/practitioners from these organizations will host a panel discussion describing their work, the results, the benefits, and some of the struggles that they have faced in pursuit of the Department’s objectives.


Landry Lee

Student, United State Military Academy
“An Adaptive Bayesian Framework for Risk-Controlled Skip-Lot Sampling”

Speaker Bio: 

 

Born in Indianapolis, IN, and raised in Columbus, OH, Landry Lee is a Systems Engineering major at West Point. Landry is passionate about applying data science and engineering to solve real-world problems, particularly in defense logistics and efficiency. He has conducted independent research on skip-lot sampling for ammunition testing and interned with Lockheed Martin and the Lake City Army Ammunition Plant. As a junior, I studied abroad at the United States Naval Academy for a semester exchange. At West Point, Landry is serving as company I’s First Sergeant. He has completed Air Assault School, competed with his company’s Sandhurst team, and plays tuba in the Spirit Band. Landry hopes to be commissioned as an Armor Officer with a Military Intelligence Branch Detail.

Abstract: 

 

Defense manufacturing programs operate under tight cost, schedule, and reliability constraints. While exhaustive inspection provides strong quality assurance, it is often resource-intensive or slow. Skip-lot sampling (SkSP) methods address this tradeoff by allowing some lots to bypass inspection when historical quality has been strong. However, most existing SkSP approaches use fixed inspection fractions and rely on static assumptions about process performance, limiting their ability to respond to changing production conditions.
This work presents SkSP-X, an adaptive inspection framework that treats lot testing as a risk-controlled decision problem under uncertainty. Rather than selecting from a small set of predefined skip levels, SkSP-X computes a continuous probability of testing fin[0,1]for each lot. The method maintains a Bayesian belief about the true nonconforming rate (TNCR) using a Beta model that incorporates historical lot outcomes with exponential decay, placing greater weight on recent data. This allows the inspection fraction to adapt to evolving or non-stationary manufacturing performance.
At each decision point, SkSP-X estimates the probability that the next lot will be above the acceptable nonconforming rate (bad lot) and the probability that a bad lot would be accepted if tested. The testing probability, f, is then selected to maximize savings subject to a consumer risk constraint P(mathrm{bad}midmathrm{ccepted})le r, where r represents the allowable risk of fielding a bad lot.
Simulation studies compare SkSP-X with traditional SkSP-2 and multi-level skip-lot plans across fixed, oscillating, and abrupt-failure quality scenarios representative of possible production environments. Results show that SkSP-X performs as well as other models when quality is stable but better adapts to varying production quality. Compared with conventional methods, SkSP-X demonstrates improved skipping efficiency under variable quality, reduced unnecessary skipping under poor quality, and improved control of consumer risk.
SkSP-X provides a data-driven, adaptive approach to inspection planning for defense manufacturing, supporting cost-effective quality assurance without sacrificing mission-critical reliability.


‘);”>

Laura White

Aerospace Engineer, NASA
“Validation Metrics for Applications with Mixed Uncertainty”

Speaker Bio: 

 

Dr. Laura White has a bachelors in Mathematics from Arkansas State University and PhD in Mathematics from University of Nebraska-Lincoln. She has been working as an Aerospace Engineer at NASA Langley Research center for the past 10 years and currently works in the Space Mission Analysis Branch. Her areas of expertise include aerospace applications in uncertainty quantification, surrogate modeling, and validation methods.

Abstract: 

 

Model validation is a process for determining how accurate a model is when compared to a true value. The methodology uses uncertainty analysis in order to assess the discrepancy between a measured and predicted value. In the literature, there have been several area metrics introduced to handle these type of discrepancies. These area metrics were applied to problems that include aleatory uncertainty, epistemic uncertainty, and mixed uncertainty. However, these methodologies lack the ability to fully characterize the true differences between the experimental and prediction data when mixed uncertainty exists in the measurements and/or in the predictions. This work will introduce an area metric validation approach which aims to compensate for the shortcomings in current techniques. The approach will be described in detail and comparisons between existing metrics will be shown. To demonstrate its applicability the area metric will be applied to multiple applications using wind tunnel experiments that were conducted at NASA Langley Research Center.


Lauren Milne

Research Assistant, University of Massachusetts
“Environment for Modeling and Simulation of Counter Autonomous Underwater Vehicle Systems”

Speaker Bio: 

 

Lauren Milne is a Ph.D. student in Engineering and Applied Science (Data Science) at the University of Massachusetts Dartmouth. Her research focuses on data-driven modeling and simulation of autonomous and cyber-physical systems, with an emphasis on resilience, performance evaluation, and agent-based modeling. She works on applications involving counter-autonomous underwater vehicle systems, decision-support analytics, and reproducible simulation frameworks for defense-related scenarios.

Abstract: 

 

Adversary autonomous underwater vehicles (AUV) pose an increasing threat to harbors, restricted waterways, and critical naval infrastructure, motivating ongoing efforts to evaluate detection, tracking, and response concepts and strategies using data driven methods. Recent research in this area has primarily focused on vehicle level autonomy, sensing and tracking algorithms, or high-fidelity platform and environmental modeling, often evaluated in isolation. While substantial engineering progress has been made, there remains a need for flexible data driven modeling and simulation (M&S) environments that can systematically assess counter AUV system performance across diverse scenarios, uncertainties, and evolving threat behaviors. To address this need, we build an agent-based model framework that simulates interactions between defenders and adversary AUVs. In this framework, defender assets (blue team) represent sensing, interception, and protected infrastructure, while adversary assets (red team) represent one or more attacking AUVs operating under configurable autonomy and tactics. Scenarios are executed to analyze interactions among scenario configuration, autonomous agent behavior, and automated performance evaluation. The environment supports script-based definition of blue defender and red adversary assets, parameterized sensing and interception models, and repeatable execution of Monte Carlo experiments. Quantitative measures of effectiveness including probability of detection and interception, asset loss, operational readiness, and cost tradeoffs are computed and summarized through statistical and distributional analyses to support sensitivity studies and stress testing. Recent development of our framework has focused on counter AUV software components developed with undergraduate researchers, emphasizing modular agent architectures, configurable autonomy models, and reproducible data pipelines that enable systematic exploration of deployment topology, threat density, and environmental variability. By coupling data centric simulation, automated analysis, and interpretable performance metrics, this framework provides a practical platform for evaluating counter AUV concepts and informing data driven decision making for Navy and Department of Defense stakeholders.


Leah Jones

Student, Virginia Tech
“Monitoring Supervised Learning Models with Statistical Control Charts”

Speaker Bio: 

 

Leah Jones is a second-year Statistics PhD candidate at Virginia Tech, advised by Dr. Stephanie DeHart. She earned her Bachelor’s degree in Statistics from Baylor University and her Master’s degree in Statistics from Virginia Tech. Her research focuses on statistical methods for monitoring and improving the reliability of data-driven models, with an emphasis on post-deployment performance in applied settings. Leah is passionate about translating rigorous statistical work into clear, actionable insights and enjoys collaborating with interdisciplinary teams to support research and decision-making.

Abstract: 

 

Machine learning (ML) is being integrated into a growing number of national security and defense systems, creating a need for continuous post-deployment monitoring to maintain reliable performance. Performance may deteriorate when the input data distribution changes (data drift) or when the underlying input-output relationship evolves (concept drift). This poster presents introductory considerations for applying control charting methods to monitor ML-enabled regression models and detect these forms of drift. We discuss where control charting concepts transfer directly and where they break down due to differences between traditional statistical and modern ML methods. We also summarize concept drift monitoring methods from prior literature and illustrate how they can be incorporated into control-chart frameworks. As a motivating case study, we focus on supervised learning with continuous outcomes.


Liam Wills

Cadet Researcher, United States Military Academy
“Humans vs. Machines in Aerial Search Missions: An Analysis”

Speaker Bio: 

 

With Cadets Nonthayod Pacharapongphalin and Caleb Watson, and under the supervision of research faculty COL James Bluman, Dr. John Steckenrider, and Dr. Joseph Dorta, we have conducted this study based on previous research by former Cadets at the United States Military Academy at West Point.

Abstract: 

 

Unmanned Aircraft Systems (UAS) are a primary tool for surveillance operations and aerial search missions. Currently, UAS are operated by both human operators and autonomous algorithms depending on the mission details and constraints. In this study, we compare the performance of human operators against the performance of a set of autonomous algorithms developed at USMA in the context of an aerial search mission. Participants will play a series of short video games (less than 5-minutes), where they are presented with a two-dimensional rectangular domain filled with particles of varying sizes. The weight of the particles represents the likelihood of a target being at a given location. The participants then must steer the flight path of a drone using a mouse with the goal of flying over and observing as many particles as possible in the shortest period of time. The performance of human operators on a series of terrain-based maps (involving a mix of terrain such as roads, fields, wooded areas, etc.) is measured by recording the number and weights of particles that are discovered over time, yielding a cumulative density function (CDF). Moreover, the area above the CDF is a measure of the expected value of completing the mission. The CDFs and their expected values are compared between human operators and the autonomous systems in order to determine whether humans can beat the machines in the mission of aerial search.


Lucas Borini

CDT, United States Military Academy
“Identification of Informal Modal Transition Points”

Speaker Bio: 

 

  • CDT at the United States Military Academy
  • Operations Research Major
  • 2 years of independent research experience
  • Worked with the Engineering Research and Development Center (ERDC) to assist in identifying and classifying informal modal transition points for Army Logistics Operations

Abstract: 

 

Modal transition points are locations where vehicles can move from one mode of transportation to another, usually at formal sites such as railyards or docks. These transitions require significant time and infrastructure, making them vulnerable and potentially unsuitable for operations in contested environments. The purpose of this project is to identify informal and unconventional locations that could serve as alternative modal transition points for the Army when conventional sites are inaccessible. Using a geospatial filtering methodology, the model evaluates terrain based on land classification, average elevation, proximity to roads, and direct access to rail lines. Candidate locations are then clustered to prevent oversaturation and ensure spatial diversity. Applying this approach to Rhineland-Pfalz, Germany produced 927 feasible transition sites, including 318 existing parking lots and 609 undeveloped land areas. Findings indicate that a substantial number of nontraditional locations meet baseline feasibility criteria, offering additional routing flexibility. The analysis is limited by reliance on average elevation values, which may overlook micro-terrain obstacles or unfavorable slopes; future work should incorporate higher-resolution topographic data and refine the weighting of terrain factors. Despite these limitations, the identified sites provide practical, unconventional alternatives to formal transition points, supporting more adaptable and resilient routing strategies in contested terrain.


Mark Herrera

IDA
“OED Cyber Lab Demonstrations”

Speaker Bio: 

 

I’m a physicist who makes a living by turning complex, data-heavy questions into objective, actionable analysis and test recommendations. My focus areas include the evaluation of naval platforms, mine warfare, and cybersecurity.

Abstract: 

 

The OED Cyber Lab poster sessions covers two interactive, live demonstrations of technologies that IDA has developed to support staff and community technical development:

(1) The ARINC 429 standard and its inherent lack of security by using a hardware-in-the-loop (HITL) simulator to demonstrate possible mission effects from a cyber compromise. ARINC 429 is a ubiquitous data bus for civil avionics, enabling safe and reliable communication between devices from disparate manufacturers. However, ARINC 429 lacks any form of encryption or authentication, making it an inherently insecure communication protocol and rendering any connected avionics vulnerable to a range of attacks.

(2) Demonstrations of RF wireless hacking concepts such as passive scanning, active injection, and the use of software defined radios to flexibly sample the RF spectrum.

Stop by, ask questions, and play with our flight-simulator and our software defined radio!


Meghan Hall

RISC Intern, UMD ARLIS
“Supply Chain Illumination for the Quantum Computing Sector”

Speaker Bio: 

 

I am a student at the Georgetown Walsh School of Foreign Service, where I am pursuing an M.A. in Security Studies with a concentration in Technology and Security and a B.S.F.S. in International Politics with a concentration in Security.  I have been a part of the University of Maryland’s (UMD) Applied Research Laboratory for Intelligence and Security (ARLIS) Researcher for Intelligence and Security Challenges (RISC) internship program since June 2025.

Abstract: 

 

This University of Maryland Applied (UMD), Research Laboratory for Intelligence and Security (ARLIS), Research for Intelligence & Security Challenges (RISC) intern team’s presentation aims to capture the corporate, investor, and supplier relationships in the quantum computing sector for the Department of War’s (DoW) Office of Research and Engineering (R&E) through analytical tools and data visualization methods. Using data available on the PitchBook platform and ArcGIS visualization, the RISC team identified distinct supply chain layers ranging from raw and critical minerals, instrument suppliers, and quantum computing manufacturers. Coupled with analysis of domestic versus foreign supply chain reach and investment origin, the poster highlights areas of R&D and fiscal concentration. This layered approach showcases critical supply chain connections, provides insight into emerging technology backers, and provides an encompassing look at the relationships and potential risks within the quantum computing sector.

The raw and critical minerals layer of the project maps global and domestic quantum supply chain relationships from critical mineral extraction through manufacturing and investment. While the domestic manufacturing layer highlights areas of national activity and contrasts domestic and international R&D hubs, showing areas of supplier integration or lack thereof. The investor layer focuses on mapping investment activity in quantum computing to improve visibility for the government sponsor. This integrated dataset highlights geographic concentrations, foreign dependencies, and capital origin, allowing for the identification of structural vulnerabilities and supply chain chokepoints. The resulting visualization and geospatial product offers actionable insight into areas of strategic risk and resilience within the quantum computing sector.

The presentation complements the poster by providing greater depth into the intern team’s findings by walking through the process of project scoping, data synthesis, analysis, to visualization efforts. The presentation aims to explain the replicability of this project for other emerging technologies and critical supply chains while emphasizing the necessity of supply chain illumination and its impact on US national security.


‘);”>

Miriam Armstrong

Research Staff Member, Institute for Defense Analyses
“Mini Tutorial: How to Run Effective Focus Groups”

Speaker Bio: 

 

Miriam Armstrong is a Human Factors Researcher at the Institute for Defense Analyses. She received her Ph.D. in Human Factors Psychology in 2021 from Texas Tech University. She uses her expertise in experimental design, survey methods, human performance measures, human-machine interaction, and operational testing to support the evaluation of human-systems integration for the Director of Operational Test and Evaluation.

Abstract: 

 

This 90-minute mini-tutorial introduces the fundamentals of designing and conducting effective focus groups. Whether you are new to formal research or an experienced researcher looking to expand your methodology, this session covers the essentials: defining a clear purpose, crafting engaging questions, and applying a structured framework for data analysis. You will also learn how to set up your workflow, learn practical facilitation techniques to manage group dynamics, avoid common pitfalls, and using AI tools to help with planning and data analysis. Attendees will leave with a repeatable framework and actionable tips to conduct their focus groups with confidence.


Nicole Hutchison

Research Associate Professor, Virginia Tech National Security Institute
“Readying the T&E Workforce to Leverage Digital Engineering”

Speaker Bio: 

 

Dr. Nicole Hutchison’s research focuses on workforce development. Prior to joining VTNSI, Nicole was a senior research scientist at the Systems Engineering Research Center (SERC). She personally helped collect most of the data and conducted substantial analyses into skills, career paths, and personal enabling characteristics of systems engineers which comprised the Helix project. The work focused around career paths formed the basis of her dissertation. She has over 19 years of analytic and systems engineering experience. At the SERC, she has developed competency models in system, mission, and digital engineering and is currently working on a competency model in artificial intelligence. Nicole is currently the Editor in Chief of the Systems Engineering Body of Knowledge (SEBoK). Prior to the SERC, she supported many branches of the federal government, including the Departments of Defense, Homeland Security, and Health and Human Services through her prior role at analytic services (ANSER) in Virginia. She is a Certified Systems Engineering Professional (CSEP) through INCOSE and has over three dozen publications on systems engineering. Nicole is proudly married to a US Army officer and they have two boys who keep them on their toes.

Abstract: 

 

Digital Engineering is reshaping how systems are conceived, designed, and delivered across the Department services and industry but Test & Evaluation (T&E) often engages digital artifacts too late in the lifecycle to fully exploit their value. As model-based and data-centric practices mature, the opportunity is not merely to digitize test documentation, but to embed T&E thinking directly into the digital development fabric.

In this talk, Dr. Hutchison explores how the T&E community can transition from consumers of digital artifacts to active participants in model-based development environments. She examines how digital engineering methods—MBSE, digital twins, integrated data architectures, and executable system representations—can reduce test planning cycle time, increase traceability, surface risk earlier, and improve mission-aligned validation.

By reframing Digital Engineering as a knowledge and workforce transformation challenge not simply a tooling upgrade this presentation outlines strategies for educating and empowering the T&E workforce to accelerate planning, enable earlier verification insight, and contribute more directly to lifecycle agility.


Olivia Beck

Statistician, Sandia National Labs
“Metadata Standards for Nuclear Deterrence Test Data”

Speaker Bio: 

 

Olivia Beck is an R&D Statistician at Sandia National Labs (SNL) with a PhD in Statistics and Social Data Analytics from Penn State University. She has experience in theoretical and applied statistics as well as machine learning methods. She is part of the Digital Transformation team at SNL working on establishing data governance practices for nuclear deterrence. Prior to joining SNL, her graduate work focused on quantifying influence in network data.

Abstract: 

 

Robust data governance practices are essential in modern digital engineering infrastructure to support informed decision-making and collaboration in complex environments such as defense R&D. While metadata compliance is not synonymous with AI readiness, establishing data governance practices is a critical step toward achieving scalable AI readiness.

In this presentation, we discuss Sandia National Labs’ initiatives in Nuclear Deterrence (ND) to establish metadata standards for ND test data. We introduce the ND Data Standards, which adapt existing frameworks (FAIR guidelines, PRIDE metadata definitions, etc.) to the ND context, and discuss strategies for enforcing data governance to guarantee data interoperability, quality, discoverability, and usability. Our ongoing collaboration with subject matter experts (SMEs) ensures that we capture the nuanced requirements of ND R&D-related data needed for test design and program evaluation.

Although designated as ND Metadata Standards, the insights gained from our experience in developing these standards and engaging with SMEs are applicable to broader R&D contexts within the defense and aerospace industries. This presentation aims to foster dialogue among T&E practitioners, sharing best practices that enhance the usability of test and evaluation data beyond its original use case, as well as addressing the challenges faced.


Pratyusha Sarkar

PhD in Statistics, University of Maryland Baltimore County
“Equation Discovery in Noisy Dynamical Systems”

Speaker Bio: 

 

I am currently a PhD student in Statistics at the University of Maryland Baltimore County, where my research is conducted under the supervision of Dr. Snigdhanshu (Ansu) Chatterjee. My primary project involves developing data-driven frameworks to discover underlying differential equation systems in stochastic dynamical models, particularly when working with sparse observational data. By utilizing local polynomial regression for state reconstruction and derivative estimation, I enable the identification of model structures and parameters without the need for probabilistic priors framework I have successfully demonstrated on Exponential Growth Curves, Lorenz-63.

Beyond my central project, I am checking the convergence of algorithms for structure learning in equilibrium networks using $L_{1}$ regularized estimation and investigating efficient numerical strategies for solving matrix Riccati equations. My research interests also extend to the identifiability of finite mixtures of bivariate distributions. Prior to my doctoral studies, I earned an M.Sc. in Statistics from the Indian Institute of Technology Kanpur, where I worked under the supervision of Dr. Dootika Vats on projects involving Bayesian Phylogenetic MCMC and high-dimensional Bayesian prediction models using LOO-CV. I am proficient in R, Python, C, and LaTeX

Abstract: 

 

We propose a local polynomial regression framework for discovering governing equations of dynamical systems from noisy time-indexed observations. The latent state is observed indirectly with Gaussian measurement error, while its evolution follows an underlying differential equation. As a motivating example, we consider an exponential growth model, $dot{mu}(t) = mu_1mu(t)$, and use local polynomial smoothing to obtain closed-form local weighted least squares estimators of both the state and its time derivative, where the intercept and slope provide pointwise estimates of the trajectory and dynamics. This approach enables recovery of the underlying differential relationship without imposing global parametric assumptions. We further extend the methodology by incorporating random effects in the growth parameters to accommodate subject-specific heterogeneity in initial conditions and growth rates. Simulation studies demonstrate accurate recovery of the underlying dynamics under moderate noise levels, illustrating the flexibility and interpretability of the proposed method for equation discovery in noisy longitudinal and repeated-measures settings.


Rafi Soule , PhD.

Project Scientist, VMASC at Old Dominion University
“Integrating RAG, HCD, and PD in MBSE for Mission Problem Framing”

Speaker Bio: 

 

RAFI SOULE is currently a Project Scientist at the Virginia Modeling, Analysis and Simulation Center (VMASC) at Old Dominion University (ODU). During her doctoral program, she supports a spectrum project and serves as a Model-Based Systems Engineer (MBSE) for the Spectrum Advanced Technology and Training Lab (SATTL) at the Old Dominion University Research Foundation (ODURF). In this role, she leads MBSE and UAF modeling in Cameo Systems Modeler and develops architecture artifacts that strengthen mission alignment and traceability. She also served as a Graduate Research Assistant in the Engineering Management and Systems Engineering department at ODU. Her research focuses on integrating MBSE, Digital Thread, Human-Centered Design, and Participatory Design into early-phase Mission Engineering.

Abstract: 

 

The initial phase of Mission Engineering (ME) is critical for defining mission problems or opportunities, but traditional methods in defense and space contexts rely on manual processes that are time-intensive and prone to knowledge gaps. This article introduces a Model-Based Systems Engineering (MBSE) approach that integrates Retrieval-Augmented Generation (RAG) with Human-Centered Design (HCD) and Participatory Design (PD) methodologies to enhance problem definition in ME contexts. We present an MBSE-driven framework that integrates RAG with HCD/PD methods to improve early mission problem framing. Our approach embeds stakeholder workshops and surveys into a structured modeling workflow, where RAG dynamically retrieves relevant technical and operational knowledge to augment human insights (NVIDIA, 2025). This unified process yields demonstrable practitioner benefits: it produces more accurate mission problem definitions, fosters stakeholder consensus on objectives, and shortens the time required to reach alignment. We illustrate applications in defense and aerospace mission scenarios (INCOSE, 2007), showing how a single MBSE model can unify diverse information sources and drive clearer, faster decision-making.


Randall McCutcheon

Chief of Defence Force Fellow, Australian Defence Force
“Assuring Complex and Critical Artificially Intelligent Systems”

Speaker Bio: 

 

Randall McCutcheon joined the Royal Australian Air Force through the Australian Defence Force Academy and completed his degree in Aerospace Engineering at the Royal Melbourne Institute of Technology. Randall performed roles as a junior Engineering Officer before completing Flight Test Engineer course at Empire Test Pilot’s School, at Boscombe Down in the United Kingdom. Randall then undertook flight test roles, including F-35 Developmental and Operational Test and Evaluation at Edwards Air Force Base, California. Randall was appointed a Member of the Order of Australia for exceptional service in support of the Australian F-35 program and was awarded the United States Defense Meritorious Service Medal for supporting the United States Department of Defense through Initial Operational Test and Evaluation of the F-35. Randall has over 320 hours in more than 30 different aircraft types as a Flight Test Engineer.

Abstract: 

 

Research problem: The purpose of the study was to explore the concept of machine learning based artificial intelligence providing assurance for a physical system. Research question: Can a machine-learning-based monitor be used as cognitive instrumentation to provide assurance for an artificial intelligence system? Literature review: The researchers reviewed literature across areas of test and evaluation, system monitors, real time assurance, and machine learning based anomaly detection. AI assurance has been called for to maintain lower levels of risk through confidence in performance, understanding failure modes and having no vulnerabilities. System monitoring can support through life assurance and has been used in several different areas. Real Time Assurance is a recognised industry standard to provide system monitoring functions for critical applications where safety requires a higher guarantee. The functions need to be expressed in a formalised manner for traceability, and reversion to a known, certified system to ensure safety. Formal encoding may ultimately keep a system safe, but its rigid structure restricts the possibility to deeply instrument a system. Deep instrumentation could potentially expose issues before manifesting in a safety trip. Methodology: The research was experimental, using simulated data from previous research. We developed a simple multi-layer perceptron Assurance AI model (AAI) to monitor a baseline machine learning based model. The AAI used input-output pairs from the baseline as features, with binary categorised ground-truth performance of the baseline as the target variable. Results and Conclusions: The AAI achieved an overall accuracy of 0.78 against the simulated data and produced an apparent response to a manually injected anomaly in the baseline input. This sensitivity demonstrates the potential for this type of cognitive instrumentation to identify vulnerabilities or threats which have not yet manifested in performance degradation. The AAI did however show a false positive rate of 0.216, which would limit field utility in its current form. This research suggests ML-based cognitive instrumentation can provide an additional layer of assurance which traditional, reactive, formally encoded Real Time Assurance cannot achieve. The research was limited to simple machine learning models, and limited datasets. Future research could explore additional methods of machine learning based cognitive instrumentation, to reduce false-positive rates to more suitable levels.


Ryan Monagle

Student, United States Military Academy
“Deployment of Single Board Computer Mesh Networks for Remote Cyber Operations”

Speaker Bio: 

 

Ryan Monagle is an Operations Research major at the United States Military Academy who is passionate about graph theory, data science, and finance. At the Academy, he is a sector leader on the Asset Management Team and conducts research with the Department of Mathematical Sciences. He is part of the SOCOM Ignite program, doing research for special operations units throughout all branches of the military. He has also done research on demand forecasting and vehicle location optimization with Zipcar, a car-sharing company based in Boston. Over his summers he has completed the U.S. Army’s Air Assault School at West Point and interned at MIT Lincoln Laboratory in Lexington, Massachusetts working on optimal antenna placement for drone jamming. His personal interests include finance and investing. Ryan currently serves as the operations NCO for Charlie Company, First Regiment and hopes to branch Cyber and eventually serve in Functional Area 49, Operations Research and Systems Analysis.

Abstract: 

 

Modern military operations often benefit from suppression of the enemy’s cyber capabilities. Devices capable of suppressing internet connection and communication exist but are limited by their range, longevity, electromagnetic signature, and cost. We propose that a mesh network of single board computers (SBC) as cyber devices can extend range and longevity of cyber capabilities without significantly increasing mission cost or detectability. This project presents a two-phase decision-support framework for designing and evaluating the placement of distributed commercial-off-the-shelf SBCs within a mesh network. The first phase uses integer programming to solve a set-cover problem and find efficient placement options that minimize count and detectability of SBCs while satisfying full-coverage, communication, and stealth constraints. The second phase applies discrete event simulation to evaluate candidate solutions under uncertainty by estimating reliability, battery endurance, and the likelihood of maintaining cyber effects over extended operations. This framework allows decision-makers to plan more effectively to integrate cyber operations into their missions and have confidence in the reliability of their devices.


Sam Procter

Principal Architecture Researcher, Software Engineering Institute
“Probabilistic Verification for Next-Generation Certification”

Speaker Bio: 

 

Sam Procter is a Principal Architecture Researcher at the SEI. Sam’s primary research interest is to developing tools that make it easier for people to build safe, correct systems. His research and tooling use system models in a range of formats and at different levels of abstraction: from high-level usage specifications (in, e.g. SysML), to precise architectural descriptions (in, e.g., AADL), to formal models of behavior (in, e.g., TLA+). Prior to working at the SEI, he received his PhD in Computer Science from Kansas State University, where he was a member of the SAnToS Laboratory. You can contact him at sprocter@sei.cmu.edu or visit https://samprocter.com/research

Abstract: 

 

Probabilistic Verification for Next-Generation Certification (ProVer-Cert) is a new, experimental project being conducted at Carnegie Mellon University’s Software Engineering Institute. The goal of the effort is to evaluate the use of (a) formal, probabilistic analysis techniques to support (b) aspects of a potential shift in the certification of avionics software from a process-based approach to one that is based on properties of the system itself.

(a) These techniques use probabilistic model checking to establish confidence intervals over claims that certain system properties (e.g., the system meets its timing requirements) hold. To do so, they rely on data that can be derived from common system evaluation activities like testing, simulation, or data from real-world use. The project will be evaluating if this usage of standard system evaluation data means that our approach can give significant benefits without requiring dramatic and potentially disruptive changes to existing evaluation approaches.

(b) Which evaluation approach to use is often guided by in domain-specific certification standards. We are using software from an example avionics system, which is governed by DO-178C in the United States (as well as Canada and Europe). While DO-178C and its predecessors have been the foundation of an impressive safety record, they have drawbacks. In response to these issues, some users of the standard have proposed a new style of certification, termed the “Overarching Properties” which is focused on establishing claims about the system itself, rather than the development process used to create the system as in DO-178C. Our goal is that the probabilistic analysis techniques can bring some of the advantages of DO-178C to the Overarching Properties while avoiding some of its disadvantages.

Though just beginning and small in scope (the project is only scheduled for US Government Fiscal Year 2026), we are interested in presenting the approach to practitioners and other experts in order to solicit feedback, ideas for improvement or potential concerns / risks, and identify collaborators who may be interested in further discussions or future evaluations.

————-

Copyright 2026 Carnegie Mellon University.

This material is based upon work supported by the Department of War under Air Force Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center.

The opinions, findings, conclusions, and/or recommendations contained in this material are those of the author(s) and should not be construed as an official US Government position, policy, or decision, unless designated by other documentation.

DM26-0121


Sam Donald

Data Scientist, PNNL
“Incorporating LLMs into the Bayesian Optimization Process for Airfoil Design”

Speaker Bio: 

 

Sam Donald is a data scientist at the U.S. Department of Energy’s Pacific Northwest National Laboratory (PNNL), with a focus on applying modern machine learning techniques to support domain scientists. His research areas encompass graph neural networks, reinforcement learning, and large language models, with recent contributions spanning synthetic cyber-attack generation with targeted traits, automated hypothesis generation for genomic experiments, and uncertainty quantification of aerosol optical models. Sam holds a Master of Science in Computer Science from Virginia Tech and a Bachelor of Engineering in Mechatronics from the University of Canterbury, New Zealand, with previous work experience as an aerospace avionics systems engineer.

Abstract: 

 

Large language models (LLMs) present a novel opportunity to incorporate human expertise directly into analytical optimization processes. This research explores the integration of modern LLMs with classical Bayesian optimization techniques to solve engineering design challenges. Through a case study in designing a NACA airfoil to optimize lift-to-drag ratio during ascent, we investigate how domain specific knowledge can aid sampling strategies in multi-dimensional optimization problems. In our current implementation, we focus on language guided initialization of the optimization process and outline how the same approach can be extended to candidate generation and surrogate modeling. This work establishes how human expertise, expressed through natural language, can effectively complement traditional optimization frameworks, particularly for computationally expensive simulations.


Sara Wilson

NASA
“Standardized Test Method for Artemis Spacesuit Gloves”

Speaker Bio: 

 

Dr. Sara Wilson is the Statistician for the NASA Engineering and Safety Center (NESC) and Lead for the NASA Statistical Engineering Team. She serves as an Agency-wide statistical expert and supports the NESC in performing independent testing, analysis, and assessments of NASA’s high-risk projects to ensure safety and mission success.

Abstract: 

 

NASA’s Artemis spacesuit gloves will be the first line of protection used to shield the astronaut’s hands from the environments encountered during extravehicular activity (EVA). The Artemis missions will include more extreme environments than those experienced on the International Space Station, therefore development, verification, and validation of gloves requires the development of new test methods. A standardized test procedure to characterize lunar EVA glove fabrics was developed for comparing abrasion resistance between fabrics using a rotary tumbler. Validation testing and statistical analysis of the newly developed tumbler abrasion test method were conducted, and results are being shared with suit vendors.


Shane Hall

Division Chief, US Army Test and Evaluation Command
“Workforce Education for Data Analytics”

Speaker Bio: 

 

Shane Hall graduated from Penn State in 2011 with a Bachelors in Statistics and Masters in Applied Statistics. He started his Civilian career at US Army Public Health Command as their Command Statistician. In 2016 he transferred to US Army Test and Evaluation Command in their Analytics Team. In 2020, he began his role as the division chief for the Analytics & AI Division.

Abstract: 

 

OEC’s Analytics Team is a small team of folks with statistical/data science expertise that support an entire command. This team supports a command with engineers, computer scientists, prior MIL, and project managers.  Although we are direct support, it is still our duty to improve the analytics knowledge of the command, as our team does not see all data. This team has worked extremely hard over the past couple years to improve the analytical skills of evaluators across AEC. Ultimately, we hope this talk generates discussion on how OTAs and gov organizations, who are predominantly NOT experts in analytics, grow the analytical expertise in their organizations.


Shelby Holdren

Autonomy and CV T&E Lead, CDAO
“CDAO Enterprise T&E Capabilities and Resources”

Speaker Bio: 

 

Shelby Holdren is the Autonomy Test & Evaluation Lead under R&E CDAO DCDAO for Enterprise. Her portfolio includes the development of Autonomy TEVV resources including the Joint AI Test Infrastructure Capability (JATIC) and direct support to programs testing autonomous systems. Shelby is on Intergovernmental Personal Act (IPA) assignment to CDAO and prior to her assignment, Shelby was a part of the T&E team for an autonomous UAV capability, Smart Sensor.

Her education includes a Masters of Science in Data Science from Johns Hopkins, and a B.S. in Mechanical Engineering and Aerospace Engineering from the University of Virginia. Her professional career has involved the evaluation of integrated systems and autonomous capabilities for the Department of War and Department of Transportation

Abstract: 

 

CDAO provides enterprise capabilities designed to enable programs to evaluate autonomous capabilities and computer vision models. These capabilities are integrated into workflows across the Department of War and enable the rapid development and testing of critically needed systems for the warfighter.

This presentation will provide a high-level overview on how model and autonomy developers, and independent testers alike leverage CDAO resources to evaluate and improve their systems using automated and iterative T&E workflows.

Presentation will cover the Joint AI Test Infrastructure Capability (JATIC) products, RAI Toolkit, and repository of T&E guidance and codebooks as well as the strategy used internally within the DCDAO Enterprise Autonomy Portfolio.


Stacey Allison

Research Staff Member, IDA
“CMOCKW: Exploring operational decisions and outcomes in cyberspace”

Speaker Bio: 

 

Dr. Lee Allison has worked at IDA for 8 years as a cyber analyst and subject matter expert across a wide variety of programs, including helicopters, UAVs, and missile programs. In 2024 he began contributing to the CMOCKW effort and was able to assist with the first ever execution of the wargame for an international customer, the Royal Moroccan Armed Forces, during USAFRICOM’s yearly African Lion exercise. Since then, Dr. Allison has been heavily involved in the development of additional CMOCKW capabilities and in execution of a second wargame for JFHQ-DODIN.

 

Co-authors: Mr. Erick McCroskey (IDA) and Dr. Jason Schlup (IDA)

Abstract: 

 

DOT&E developed the Cyber Maneuvers, Operations, and Combat Knowledge Wargame (CMOCKW) to explore cyberspace decisions at the operational level of war, particularly concepts for force employment and network defensive postures, via a realistic wargame replicating the uncertainty and imperfect information of cyberspace operations. As a result, players of both the friendly forces (BLUFOR) and enemy forces (OPFOR) make decisions and work through the subsequent results. The insights gained following CMOCKW execution are intended to complement the data collected during other cyber assessments such as tactical data from Cyber Red Team attacks, associated Cyber Defender detections and responses, and strategic data from observing cross-CCMD and cross-Service interactions by senior leaders.

This poster session will give an overview of CMOCKW’s purpose, development, challenges, game play, components, and recent executions. There will also be a game board set up for attendees to interact with the game and get hands-on experience.


‘);”>

Stacey Allison

Research Staff Member, IDA
“Cyber Maneuvers, Operations, and Combat Knowledge Wargame (CMOCKW)”

Speaker Bio: 

 

Dr. Lee Allison has worked at IDA for 8 years as a cyber analyst and subject matter expert across a wide variety of programs, including helicopters, UAVs, and missile programs. In 2024 he began contributing to the CMOCKW effort and was able to assist with the first ever execution of the wargame for an international customer, the Royal Moroccan Armed Forces, during USAFRICOM’s yearly African Lion exercise. Since then, Dr. Allison has been heavily involved in the development of additional CMOCKW capabilities and in execution of a second wargame for JFHQ-DODIN.

Abstract: 

 

DOT&E developed CMOCKW to explore cyberspace decisions at the operational level of war, particularly concepts for force employment and network defensive postures, via a realistic wargame replicating the uncertainty and imperfect information of cyberspace operations. As a result, players of both the friendly forces (BLUFOR) and enemy forces (OPFOR) make decisions and work through the subsequent results. The insights gained following CMOCKW execution are intended to complement the data collected during other cyber assessments such as tactical data from Cyber Red Team attacks, associated Cyber Defender detections and responses, and strategic data from observing cross-CCMD and cross-Service interactions by senior leaders.

The presentation will describe CMOCKW’s intent, core wargame mechanics and rules, and results of recent wargame executions. These results have revealed possible challenges in operational employment of cyber forces and suggests that iterative use of CMOCKW to explore alternate courses of action or refine existing plans may help broaden the Department’s understanding of how to use low-density, high-demand cyber forces.


Stephanie DeHart

Associate Professor of Practice, Virginia Tech
“Monitoring Artificial Intelligence-Enabled Systems”

Speaker Bio: 

 

Stephanie P. DeHart is an Associate Professor of Practice and Director of the Statistical Applications and Innovations Group in the Department of Statistics at Virginia Tech. With nearly twenty years of experience, she has collaborated across academia, government, and industry to develop data-driven solutions. She excels at translating complex statistical methods into clear, actionable insights for students, technical teams, and organizational leaders. Her current research focuses on adapting traditional industrial quality and statistical approaches to modern artificial intelligence and machine learning applications.

Anne R. Driscoll is a Collegiate Professor in the Department of Statistics at Virginia Tech where she also serves as Director of the Undergraduate Program. She received her PhD in Statistics from Virginia Tech. Her research interests include statistical process control, design of experiments, and statistics education. Anne maintains a connection with statistical practice through her collaboration with NASA. She is an active member of ASQ and ASA, having held many leadership positions in these organizations.

Abstract: 

 

While artificial intelligence (AI) has become commonplace, attention is shifting from efficiency to resiliency. Recent DoD policies emphasize the sustainability of AI-enabled systems post-deployment; however, best practices for system monitoring remain undefined. To fill this gap, we propose leveraging statistical quality control charts, which the manufacturing sector has successfully utilized for decades. In this presentation, we will provide an overview of control charts and discuss the possibilities, as well as issues, of adapting these techniques to AI-enabled systems.


‘);”>

T.J. Wignall

Research Engineer, NASA
“Recent Validation of Low Speed Aerodynamic Analysis for Space Launch System”

Speaker Bio: 

 

Dr. Wignall is an aerospace engineer with the Configuration Aerodynamics Branch at NASA Langley. He primarily works on surrogate modeling and low speed computational simulations for the SLS in support of the Artemis missions.

Abstract: 

 

Low speed aerodynamics for the NASA Space Launch System (SLS) program in support of the Artemis missions are focused on prelaunch analysis while the vehicle is sitting on the launchpad next to the mobile launcher and the early stages of flight below Mach 0.5. Both experimental and computational tools are used with integrated loads generally being derived from experimental data and computational data being used for distributed lineloads. This is a regime dominated by massively separated flows which makes computational techniques expensive and can cause unwanted model dynamics in the tunnel. As such minimal validation has been performed up to now. Recent analysis is focusing on testing crucial assumptions used during database development. This talk will go over two assumptions that have recently been tested and some challenges to uncertainty quantification encountered during database development that are influenced by these assumptions.

The first and primary focus is on Reynolds number effects. Testing and analysis have focused on R = 0.6 x 10^6 while flight and prelaunch conditions are often on the order of R = 10 x 10^6. When comparing to a 2D cylinder this range covers the transition from transcritical flow regime to a supercritical one. While techniques are used to increase the effective Reynolds number, until now there has not been work to fully capture the differences expected between flight and analysis. Of additional concern is the behavior of the gap flow between the centerbody and the SRBs. Flow between them exhibits biased gap flow due to the Coandă effect which can attach to either the SRBs or the centerbody. It is expected that there is a relationship between Reynolds number and the severity of this effect. The challenge of quantifying the Coandă effect and solution will be discussed as well.

Second focus of validation efforts have examined the quasi-steady state assumptions used to generate the database. The early stages of flight are characterized by the vehicle accelerating off the pad, which is an extremely dynamic event where the total angle of attack goes from 90° to <15° in about 5 seconds. As such some of the features that dominate the flow at midrange angles of attack may not have time to form. The most relevant of these is the asymmetric vortex system that is common on slender bodies at these conditions. Once it is formed, the system is stable however not much work has been done on how long it takes to form the system and to show effects on integrated loads. The challenge of quantifying the asymmetric vortices effect and solution will be discussed as well.


Tiffany Ceasor

Principal Researcher, Technological Society of Applied Research
“Compositional Assurance: Integrating Formal Verification with Statistical T&E”

Speaker Bio: 

 

Tiffany Alexis Ceasor is Technical Director of Data and AI Strategy for a federal government contractor, where she leads AI architecture, strategy & implementation of prototypes, along with AI compliance and governance initiatives for defense and national security programs. She serves as the founder and Principal Researcher at the Technological Society of Applied Research (TSAR), a nonprofit research institute dedicated to implementing emerging technologies for the public good across healthcare, government, national security, agriculture, and environmental conservation. Ms. Ceasor is a Ph.D. candidate at Florida Atlantic University, with dissertation research on SMT-based verification frameworks for Large Language Model compliance and security. Prior to her current roles, she spent five years as an AI/ML Scientist and Engineer at Microsoft.

Abstract: 

 

The proliferation of AI-enabled systems in defense applications presents a fundamental challenge for test and evaluation: how do we provide rigorous assurance for systems whose input spaces are effectively infinite? Traditional statistical T&E methods excel at characterizing system behavior over sampled distributions but cannot guarantee properties hold for all possible inputs. Conversely, formal verification methods can prove properties universally but face scalability limitations with complex neural architectures. This presentation introduces Compositional Assurance, a theoretical framework that strategically integrates formal verification with statistical T&E to achieve layered assurance guarantees exceeding what either approach delivers alone.
The framework rests on a key insight: different system properties admit different verification strategies. Safety-critical properties requiring universal guarantees (e.g., “the system never recommends unauthorized lethal force”) can be formally specified and verified using Satisfiability Modulo Theories (SMT) solvers, which provide mathematical certainty rather than statistical confidence. Performance properties requiring distributional characterization (e.g., “classification accuracy exceeds 95% on operationally representative inputs”) remain the domain of statistical T&E with appropriate uncertainty quantification. The compositional approach partitions system requirements by verification tractability, applies the strongest feasible method to each partition, and combines results into unified assurance arguments.
We present a four-tier assurance hierarchy: (1) Formally Verified properties proven to hold universally via SMT solving; (2) Statistically Bounded properties characterized with quantified confidence intervals; (3) Empirically Tested properties evaluated on representative scenarios without formal guarantees; and (4) Assumed properties accepted based on design rationale or heritage. This hierarchy enables evaluators to communicate assurance levels precisely replacing binary “passed/failed” determinations with nuanced statements about what has been proven, what has been bounded, and what remains assumed.
The presentation demonstrates practical application through a case study: evaluating a Large Language Model decision-support system for compliance with operational constraints. We show how safety boundaries can be formally verified in milliseconds using SMT solvers while performance characteristics are statistically evaluated through designed experiments. The integration produces assurance documentation that satisfies both formal methods rigor and statistical T&E standards.
Key contributions include: (1) a theoretical framework positioning formal verification and statistical T&E as complementary rather than competing paradigms; (2) decision criteria for selecting verification strategies based on property characteristics and criticality; (3) methods for combining formal and statistical evidence into unified assurance cases; and (4) practical implementation guidance using mature tools (Z3, CVC5) alongside standard statistical methods.
Attendees will gain actionable understanding of when formal verification adds value to AI system T&E, how to specify verifiable properties in SMT-LIB format, and strategies for integrating verification results into test reports and accreditation packages. The framework directly addresses DOT&E priorities for rigorous AI evaluation while remaining accessible to T&E practitioners without formal methods backgrounds. All examples use unclassified, publicly releasable scenarios suitable for broad application across defense and aerospace programs.


Tom Donnelly

Systems Engineer, JMP Statistical Discovery LLC
“Active Learning with Bayesian Optimization: A Modern Framework for Faster Testing”

Speaker Bio: 

 

Tom Donnelly works as a Systems Engineer for JMP Statistical Discovery supporting users of JMP software in the Defense and Aerospace sector. He has been actively using and teaching Design of Experiments (DOE) methods for the past 40 years to develop and optimize products, processes, and technologies. Donnelly joined JMP in 2008 after working as an analyst for the Modeling, Simulation & Analysis Branch of the US Army’s Edgewood Chemical Biological Center – now DEVCOM CBC. There, he used DOE to develop, test, and evaluate technologies for detection, protection, and decontamination of chemical and biological agents. Prior to working for the Army, Tom was a partner in the first DOE software company for 20 years where he taught over 300 industrial short courses to engineers and scientists. Tom received his PhD in Physics from the University of Delaware.

Abstract: 

 

Design of Experiments (DoE) is frequently relied upon to conduct Test and Evaluation (T&E) to ensure systems work as intended, manage acquisition risks, and confirm technical performance, operational effectiveness, suitability, and survivability in realistic environments.   To maintain the balance of an experimental design we almost always end up running conditions where we fully expect (already know?) the system will work as intended. What if we could sequentially experiment while focusing testing on the regions of greater interest while choosing factor conditions that are likely to make performance metrics meet requirements?

Industrial R&D has begun to use Active Learning methods like Bayesian Optimization to speed up testing in the development of new products and processes.  These approaches to sequential experimentation promise greater efficiency while being more approachable to scientists, engineers, and testers.  The approach can leverage historical data to inform the initial model, support modifying factor ranges during sequential experimentation, and choose among candidate trials the ones that best satisfy goals.

Generalizing Bayesian optimization (Bayes opt.) to real world complex problems involving multiple responses has proven challenging because in its standard formulation the Bayes opt. approach is inherently limited to a single response. In this tutorial we review the basics of Gaussian Process regression modeling and the standard approach to Bayes opt. We then introduce the generalization to multiple responses via the Bayesian Desirability framework. We will demonstrate the efficiency and approachability of the technique using new capabilities in JMP Pro 19.

In general, the reduced testing under an Active Learning approach like Bayesian Optimization will not yield as robust a predictive model covering as large an operational envelope as would be obtained using a response surface DOE, but one should acquire data in the more important test regions more quickly.  The data collected during the Active Learning testing can easily be augmented to support a more robust model.


Tom Donnelly

Systems Engineer, JMP Statistical Discovery LLC
“Effectively Leveraging Historical Data using Design of Experiments”

Speaker Bio: 

 

Tom Donnelly works as a Systems Engineer for JMP Statistical Discovery supporting users of JMP software in the Defense and Aerospace sector. He has been actively using and teaching Design of Experiments (DOE) methods for the past 40 years to develop and optimize products, processes, and technologies. Donnelly joined JMP in 2008 after working as an analyst for the Modeling, Simulation & Analysis Branch of the US Army’s Edgewood Chemical Biological Center – now DEVCOM CBC. There, he used DOE to develop, test, and evaluate technologies for detection, protection, and decontamination of chemical and biological agents. Prior to working for the Army, Tom was a partner in the first DOE software company for 20 years where he taught over 300 industrial short courses to engineers and scientists. Tom received his PhD in Physics from the University of Delaware.

Abstract: 

 

This presentation outlines several strategies for building trustworthy models from historical data. Although many machine learning techniques are easy to apply, real world projects often struggle with limited amounts of data. The primary approach discussed treats the historical dataset as a candidate pool from which a designed experiment can be extracted to support the proposed model. The remaining observations (or a designated holdout subset) can then be used for validation. Bootstrap decision tree methods provide a robust way to identify the most influential factors for model construction, and when data are extremely scarce, Self Validating Ensemble Methods (SVEM) offer an alternative path for model development. The session will also review common pitfalls encountered when modeling from historical data.


Trisha Radocaj

Systems Engineer, Johns Hopkins University Applied Physics Lab
“Panel: Digitally Transforming the Test and Evaluation Landscape”

Speaker Bio: 

 

Trisha Radocaj a systems engineer at the Johns Hopkins University Applied Physics Laboratory, Mission Engineering Group, Model Based and Digital Engineering Section. Her focus has been on pulling information from across the system lifecycle to support risk based decision making as it pertains to capabilities meeting mission needs. Previously she worked on satellite and launch vehicle propulsion system design and analysis. She has a B.S. in Chemical Engineering and an M.S. in Aerospace Engineering from Purdue University.

Abstract: 

 

Since the publication of the United States Department of Defense Digital Engineering (DE) Strategy in 2018 the Department, the services, and the supporting industrial base have been working to integrate digital engineering tooling that capture the designs of the complex weapon systems that the uniformed and civilian workforce depend upon to serve the nation. Innovation of digital engineering methods have advanced rapidly for system design, hardware and software design, modeling and simulation, and ultimately test and evaluation to ensure that systems are delivered rapidly with the most advanced capability. To no great surprise parallel development has led to a diverse set of products, data structures, and methods that don’t necessarily integrate. In an effort to maximize efficiency a team form Developmental Test, Evaluation, and Accreditation, Director, Operational Test and Evaluation, and university researchers from the Acquisition Innovation Research Center joined forces to mature tools and methods in support of advancing Test and Evaluation which will work seamlessly together to help ensure the realization of the DE Strategy. A panel of experts/practitioners from these organizations will host a panel discussion describing their work, the results, the benefits, and some of the struggles that they have faced in pursuit of the Department’s objectives.


Tyler Pleasant

Research Associate III, IDA
“Using Survival Analysis to Support Navy Supply Operations”

Speaker Bio: 

 

Tyler Pleasant is a Research Associate III at the Institute for Defense Analyses (IDA) within the Science, Systems, and Sustainment division. He holds a M.S. from University of Chicago in Chemistry and a B.S in Physics and Mathematics from Massachusetts Institute of Technology. In addition to analysis of weapon system sustainment data, he also works on evaluation of detection, classification, and track generation technology performance and route optimization.

Abstract: 

 

The Naval Supply Systems Command (NAVSUP), similar to other supply commands or commercial wholesalers, has to make decisions about how its items are managed in an incomplete data environment. Although traditionally used for clinical trials and other biostatistics applications, survival analysis provides a useful set of tools to address missing data problems driven by time. For NAVSUP to make informed purchasing decisions, it needs to have an accurate picture of historical demand. Since NAVSUP receives many orders that are never fulfilled (e.g., customer cancelation, incorrect information), survival analysis can estimate how many were actually ordered and identify orders unlikely to be fulfilled. By only counting orders likely to be fulfilled, NAVSUP avoids purchasing more of some items than required and instead spends that money on understocked items where purchases will better support fleet readiness.

This talk will introduce the concepts of survival analysis, introduce how it has supported NAVSUP in the past, and explain how more advanced survival analysis concepts and techniques can be applied to improve NAVSUP’s demand planning.


Tyler Morgan-Wall

Research Staff Member, IDA
“Design of Experiments and Interpretable Evaluation of GenAI-enabled Systems”

Speaker Bio: 

 

Dr. Tyler Morgan-Wall is a Research Staff Member at the Institute for Defense Analyses, and is the developer of the software library skpr: a package developed at IDA for optimal design generation and power evaluation in R. He is also the author of several other R packages for data visualization, mapping, and cartography. He has a PhD in Physics from Johns Hopkins University and lives in Silver Spring, MD.

Abstract: 

 

Design of Experiments (DoE) is a standard methodology for planning efficient, interpretable tests of engineered systems. Can the same methodology be applied to evaluating Generative AI? This talk argues yes: GenAI-enabled systems satisfy the minimal requirements for using DoE. I present several case studies showing how factorial designs can be used to quantify prompt effects, interactions, and sensitivity to problem framing. The talk addresses analytical challenges that arise in practice, especially with highly unstructured inputs that induce substantial per-input variability and can lead to quasi-separation and brittle estimation in binary outcomes. To mitigate these issues, I discuss design and analysis strategies including blocking at the input unit, mixed-effects models to separate average effects from input-driven variance, and robustness checks for more reliable inference. I discuss how statistical methods can provide insight into the underlying black-box behavior of the GenAI model, and how that insight can be used to aid system development as well as future tests.


Wayne Adams

Senior STAT Expert, STAT COE
“Strategy and Tactics to Apply Design of Experiments to Testing of AI Enabled Systems”

Speaker Bio: 

 

Mr. Wayne Adams received his Master’s Degree in Applied Statistics from Western Michigan University. His career has been defined by encouraging the application of design of experiments to problems across multiple industries including building fast emulators for long running simulations. He enjoys gardening and gaming when not working at the Homeland Security Community of Best Practices (HSCOBP) and Scientific Test and Analysis Techniques Center of Excellence (STATCOE) helping clients efficiently solve their problems.

Abstract: 

 

This mini-tutorial covers a few methods to apply design of experiments (DOE) for better tests of AI enabled systems (AIES) functionality before deployment. It will delve into what makes DOE suitable for AIES. A case study will be presented putting together multiple scientific test and analysis techniques to reduce an over-large simulation test into something manageable while still providing necessary insight for the go/no go decision.


William Fisher

MITRE
“Continuous Integration Test and Evaluation Concept of Operations”

Speaker Bio: 

 

Dr. William Fisher received a Ph.D. in Applied Physics from the University of Michigan. He is a Principal Systems Security Engineer with The MITRE Corporation supporting efforts to advance model-based engineering in T&E. He has been a system architect and lead systems engineer on large scale cyber-physical systems as well as solving hard problems for over 20 years in superconductivity, materials processing, optics, algorithm development, cyber security, and more in defense and academia.

Abstract: 

 

This paper presents a Concept of Operations for Continuous Integration Test and Evaluation (CITE), an engineering method designed to address the growing complexity and risk to both system functionality and mission capability in the Department of War capability lifecycle. Building on advances in digital tools and model-based engineering, CITE elevates continual integration and test as a guiding tenet, enabling iterative development and agile responses to evolving operational needs. The approach leverages composable and executable models to facilitate frequent integration and testing, reduce risk, and accelerate delivery of warfighter capabilities.


Yan Li

JHUAPL
“Active and Passive Cellular Multi-Static ISAC for White-Space-Aware Drone Tracking”

Speaker Bio: 

 

Dr. Yan Li is a Senior Staff Engineer and Principal Investigator at the Johns Hopkins University Applied Physics Laboratory, specializing in the convergence of RF sensing, communications, and AI/ML. With over 15 years of experience, he leads R&D efforts in distributed passive tracking and spectral situational awareness. Dr. Li has published multiple paper on the use of deep learning for RF channel adaptive waveforms and has a distinguished publication record in venues such as MILCOM, INFOCOM, and SenSys. He holds a PhD in Computer Science and multiple graduate degrees from the University of Maryland and Johns Hopkins University. His current work focuses on leveraging digital twins and machine learning to enhance UAS detection and secure communication links in contested environments.

Abstract: 

 

Dense 5G cellular deployments offer a promising infrastructure for Integrated Sensing and Communications (ISAC), particularly for countering small unmanned aerial systems (CsUAS). However, unlike dedicated radar, cellular-based sensing must contend with fixed reconfigurable antenna patterns, limited transmit power, and traffic-dependent resource availability. This paper proposes a multi-static ISAC architecture for detecting and tracking low-altitude drones using cooperative gNodeBs (gNBs) operating at 3 GHz. Our contribution is three-fold: (1) a whitespace detection mechanism that opportunistically embeds sensing waveforms into the 5G time-frequency grid without degrading Quality-of-Service (QoS); (2) a tracking-aware optimization framework for joint waveform and beamforming design; and (3) a hybrid multi-sensor tracking algorithm that combines particle filtering for hypothesis management with conditional Kalman filtering for kinematic estimation. To validate our approach, we developed a high-fidelity digital twin using OpenStreetMap data, realistic gNB placements, and empirical drone radar cross-section (RCS) models. Results indicate that our framework significantly enhances tracking continuity and accuracy over passive-only baselines while maintaining nearly all baseline communication capacity.


Yosef Razin

Research Associate, IDA
“The Risks of AI-Assisted Coding”

Speaker Bio: 

 

Yosef S. Razin is a researcher associate at the Institute for Defense Analyses (IDA), where his work sits at the critical intersection of artificial intelligence, autonomy, and national security.   With a B.S.E. from Princeton University in Mechanical & Aerospace Engineering and a doctoral candidate in Robotics at the Georgia Institute of Technology, his research focuses on the dynamics of human-machine teaming — specifically the modeling and measurement of human-robot trust. By integrating principles from game theory, psychometrics, and cognitive engineering, he develops frameworks for the test and evaluation of AI-enabled systems. Yosef has brought technically rigorous, policy-relevant insight to questions at the frontier of AI’s role in national security and government decision-making

Abstract: 

 

The increasing adoption of artificial intelligence (AI) for coding assistance presents both significant opportunities and risks within the Department of War (DoW). This presentation for DATAWorks 2026 examines AI-assisted coding risks for national security systems. Assessment of these risks must occur for every stage of machine learning operations (MLOps) and employment. It presents where risks have been identified from systems, to end users, to the organizations that build or employ AI-generated code. It does not include every possible risk, but those that have been identified as already having occurred with such systems. It also is limited by the recentness and developmental speed of this field, as new risks are frequently discovered. Here, I introduce these risks while being cognizant of AI’s potential to achieve the DoW’s objectives quickly and at scale.

Although other summaries of these risks appear disparately, addressing the coder, data integrity, or the foundation model, this work captures the vulnerabilities end-to-end. Furthermore, extant work focuses on personnel, open-source, or commercial applications. This talk aims to take a DoW-centric perspective, highlighting unique or amplified risks from AI-assisted coding when it is used in national security.


Zachary White

Student, Virginia Tech Statistics
“Applying Statistical Process Control to Monitor Deployed Machine Learning Models”

Speaker Bio: 

 

Zachary White is a third-year PhD student at Virginia Tech. His research focuses on ensuring the reliability and robustness of machine learning models, including developing high-performing models and monitoring their performance post-deployment to maintain quality under changing environments.

Abstract: 

 

Artificial intelligence (AI) and machine learning (ML) methods offer powerful, data-driven tools for decision-making across many application areas. For example, the use of image classification models to identify potential national security threats is increasing. Ensuring that these deployed models continue to accurately detect and classify objects over time is critical; however, the current literature lacks detailed approaches for monitoring ML model performance post-deployment. This presentation will demonstrate the feasibility and challenges of applying statistical process control (SPC) techniques to monitor advanced ML models.


Zoe Szajnfarber

Professor, GWU
“Architecting human-AI workflows to navigate performance-risk tradeoffs”

Speaker Bio: 

 

Zoe Szajnfarber is a Professor of Engineering Management and Systems Engineering (EMSE), International Affairs and Computer Science at the George Washington University. She is Senior Advisor to the GW President on AI Strategy, Faculty Director of the Pan-University GW Trustworthy AI initiative, and Chief Scientist of the Systems Engineering Research Center (SERC), a DoD University Affiliated Research Center.

Dr. Szajnfarber’s core research is on the design and development of complex socio-technical systems. Her work is deeply empirical and considers both the organization and technical system architectures to design in an ability to achieve performance goals across extended and highly uncertain operational lifetimes. Recent projects examine the nature and function of scientific and technical expertise in the design process, particularly when considering open innovation and AI-enabled systems.

Dr. Szajnfarber received her Ph.D. in Engineering Systems from the Massachusetts Institute of Technology in 2011. She also holds dual master’s degrees in Aeronautics and Astronautics and Technology and Policy from MIT. Her undergraduate degree is in Engineering Science from the University of Toronto. Outside of academia, she has worked on satellite and space robotics projects for government agencies and industry, in North America and Europe

Abstract: 

 

This talk will demonstrate the importance of considering the human(s)-AI(s) system when evaluating performance and risk tradeoffs, and the value of socio-technical testbeds for making that evaluation. It will present results from two use cases: a minefield traversal problem, and a customer service workflow. In each we will 1) show how the HAI “architecture” – the way tasks are decomposed, interfaces are designed and allocated to human(s) and AI(s) – influences both performance and risk, at least as much as the capability of each component; 2) extract a typology of risks and benefits that arise through the HAI interaction; and 3) outline the dimensions of a socio-technical testbed.