April 27th Agenda

7:30 AM – 8:30 AM


8:30 AM – 8:40 AM

Room A+B

Virtual Session

Opening Remarks

Bram Lillard


V. Bram Lillard assumed the role of Director of the Operational Evaluation Division (OED) in early 2022. In this position, Bram provides strategic leadership, project oversight, and direction for the division’s research program, which primarily supports the Director, Operational Test and Evaluation (DOT&E) within the Office of the Secretary of Defense. He also oversees OED’s contributions to strategic studies, weapon system sustainment analyses, and cybersecurity evaluations for DOD and anti-terrorism technology evaluations for the Department of Homeland Security. Bram joined IDA in 2004 as a member of the research staff. In 2013-14, he was the acting science advisor to DOT&E. He then served as OED’s assistant director in 2014-21, ascending to deputy director in late 2021. Prior to his current position, Bram was embedded in the Pentagon where he led IDA’s analytical support to the Cost Assessment and Program Evaluation office within the Office of the Secretary of Defense. He previously led OED’s Naval Warfare Group in support of DOT&E. In his early years at IDA, Bram was the submarine warfare project lead for DOT&E programs. He is an expert in quantitative data analysis methods, test design, naval warfare systems and operations and sustainment analyses for Defense Department weapon systems. Bram has both a doctorate and a master’s degree in physics from the University of Maryland. He earned his bachelor’s degree in physics and mathematics from State University of New York at Geneseo. Bram is also a graduate of the Harvard Kennedy School’s Senior Executives in National and International Security program, and he was awarded IDA’s prestigious Goodpaster Award for Excellence in Research in 2017.

8:40 AM – 9:20 AM

Room A+B

Virtual Session

Keynote 3

The Honorable Christine Fox

(Johns Hopkins University Applied Physics Laboratory)

The Honorable Christine Fox currently serves as a member of the President’s National Infrastructure Advisory Council, participates on many governance and advisory boards, and is a Senior Fellow at the Johns Hopkins University Applied Physics Laboratory. Previously, she was the Assistant Director for Policy and Analysis at JHU/APL, a posi­tion she held from 2014 to early 2022. Before joining APL, she served as Acting Deputy Secretary of Defense from 2013 to 2014 and as Director of Cost Assessment and Program Evaluation (CAPE) from 2009-2013. As Director, CAPE, Ms. Fox served as chief analyst to the Secretary of Defense.  She officially retired from the Pentagon in May 2014. Prior to her DoD positions, she served as president of the Center for Naval Analyses from 2005 to 2009, after working there as a research analyst and manager since 1981. Ms. Fox holds a bachelor and master of science degree from George Mason University.

9:30 AM – 10:30 AM

Room A+B

Featured Panel: AI Assurance
Session Chair: Chad Bieber

    This panel discussion will bring together an international group of AI Assurance Case experts from Academia and Government labs to discuss the challenges and opportunities of applying assurance cases to AI-enabled systems.  The panel will discuss how assurance cases apply to AI-enabled systems, pitfalls in developing assurance cases, including human-system integration into the assurance case, and communicating the results of an assurance case to non-technical audiences.

    This panel discussion will be of interest to anyone who is involved in the development or use of AI systems.  It will provide insights into the challenges and opportunities of using assurance cases to provide justified confidence to all stakeholders, from the AI users and operators to executives and acquisition decision-makers.

  • Laura Freeman (Virginia Tech National Security Institute)

  • Alec Banks (Defence Science and Technology Laboratory)

  • Josh Poore (Applied Research Laboratory for Intelligence and Security, University of Maryland)

  • John Stogoski (Software Engineering Institute, Carnegie Mellon University)

10:30 AM – 10:50 AM


10:50 AM – 12:20 PM: Parallel Sessions

Room A

Virtual Session
Session 4A: Transforming T&E to Assess Modern DoD Systems
Session Chair: Kristen Alexander, DOT&E
  •  1 

    DOT&E Strategic Initiatives, Policy, and Emerging Technologies (SIPET) Mission Brief

    Jeremy Werner (DOT&E)

    SIPET, established in 2021, is a deputate within the office of the Director, Operational Test and Evaluation (DOT&E). DOT&E created SIPET to codify and implement the Director’s strategic vision and keep pace with science and technology to modernize T&E tools, processes, infrastructure, and workforce. That is, The mission of SIPET is to drive continuous innovation to meet the T&E demands of the future; support the development of a workforce prepared to meet the toughest T&E challenges; and nurture a culture of information exchange across the enterprise and update policy and guidance. SIPET proactively identifies current and future operational test and evaluation needs, gaps, and potential solutions in coordination with the Services and agency operational test organizations. Collaborating with numerous stakeholders, SIPET develops and refines operational test policy guidance that support new test methodologies and technologies in the acquisition and test communities. SIPET, in collaboration with the T&E community, is leading the development of the 2022 DOT&E Strategy Update Implementation Plan (I-Plan). I-Plan initiatives include:

    • Test the Way We Fight – Architect T&E around validated mission threads and demonstrate the operational performance of the Joint Force in multi-domain operations.
    • Accelerate the Delivery of Weapons that Work – Embrace digital technologies to deliver high-quality systems at more dynamic rates.
    • Increase the Survivability of DOD in Contested Environments – Identify, assess, and act on cyber, electromagnetic, spectrum, space, and other risks to DOD mission – at scale and at speed.
    • Pioneer T&E of Weapons Systems Built to Change Over Time – Implement fluid and iterative T&E across the entire system lifecycle to help assure continued combat credibility as the system evolves to meet warfighter needs.
    • Foster an Agile and Enduring T&E Enterprise Workforce – Centralize and leverage efforts to assess, curate, and engage T&E talent to quicken the pace of innovation across the T&E enterprise.

  •  1 

    T&E as a Continuum

    Orlando Flores (OUSD(R&E))

    A critical change in how Test and Evaluation (T&E) supports capability delivery is needed to maintain our advantage over potential adversaries. Making this change requires a new paradigm in which T&E provides focused and relevant information supporting decision-making continually throughout capability development and informs decision makers from the earliest stages of Mission Engineering (ME) through Operations and Sustainment (O&S). This new approach improves the quality of T&E by moving from a serial set of activities conducted largely independently of Systems Engineering (SE) and ME activities to a new integrative framework focused on a continuum of activities termed T&E as a Continuum.

    T&E as a Continuum has three key attributes – capability and outcome focused testing; an agile, scalable evaluation framework; and enhanced test design – critical in the conduct of T&E and improving capability delivery. T&E as a Continuum builds off the 2018 DoD Digital Engineering (DE) Strategy’s five critical goals with three key enablers – robust live, virtual, and constructive (LVC) testing; developing model-based environments; and a “digital” workforce knowledgeable of the processes and tools associated with MBSE, model-based T&E, and other model-based processes.

    T&E as a Continuum improves the quality of T&E through integration of traditional SE and T&E activities, providing a transdisciplinary, continuous process coupling design evolution with VV&A.

  •  1 

    NASEM Range Capabilities Study and T&E of Multi-Domain Operations

    Hans Miller (MITRE)

    The future viability of DoD’s range enterprise depends on addressing dramatic changes in technology, rapid advances in adversary military capabilities, and the evolving approach the United States will take to closing kill chains in a Joint All Domain Operations environment. This recognition led DoD’s former Director of Operational Test and Evaluation (OT&E), the Honorable Robert Behler, to request that the National Academies of Science, Engineering and Medicine examine the physical and technical suitability of DoD’s ranges and infrastructure through 2035. The first half of this presentation will cover the highlights and key recommendations of this study, to include the need to create the “TestDevOps” digital infrastructure for future operational test and seamless range enterprise interoperability. The second half of this presentation looks at the legacy frameworks for the relationships of physical and virtual test capabilities, and how those frameworks are becoming outdated. This briefing explores proposals on how the interaction of operations, physical test capabilities, and virtual test capabilities need to evolve to support new paradigms of the rapidly evolving technologies and changing nature of multi-domain operations.

Room B

Virtual Session
Session 4B: Artificial Intelligence Methods & Current Initiatives
Session Chair: Brian Vickers, IDA
  •  1 

    Gaps in DoD National Artificial Intelligence Test and Evaluation Infrastructure Capabilities

    Rachel Haga (IDA)

    Significant literature has been published in recent years calling for updated and new T&E infrastructure to allow for the credible, verifiable assessment of DoD AI-enabled capabilities (AIECs) . However, existing literature falls short in providing the detail necessary to justify investments in specific DoD Enterprise T&E infrastructure. The goal of this study was to collect data about current DoD programs with AIEC and corresponding T&E infrastructure to identify high priority investments by tracing AIECs to tailored, specific recommendations for enterprise AI T&E infrastructure. The study is divided into six bins of research, itemized below. This presentation provides an interim study update on the current state of programs with AIEC across DoD.

    • Goals : State specific enterprise AI T&E normative goal(s) and timeline, including rationale.
    • Demand: Generate T&E requirements and a demand forecast by querying existing and anticipated AI programs across the Department.
    • Supply Baseline: Catalog existing and planned AI T&E activities and infrastructure at the enterprise, Service, and Program levels, including existing resourcing levels.
    • Gaps: Identify specific gaps based on this supply and demand information.
    • Responsibilities: Clarify roles and responsibilities across OSD T&E stakeholders.
    • Actions: Identify differentiated lines of effort—including objectives, milestones, and cost estimates—that create a cohesive plan to achieve enterprise AI T&E goals.

  •  2 

    Assurance of Responsible AI/ML in the DOD Personnel Space

    John Dennis (IDA)

    Testing and assuring responsible use of AI/ML enabled capabilities is a nascent topic in the DOD with many efforts being spearheaded by CDAO. In general, black box models tend to suffer from consequences related to edge cases, emergent behavior, misplaced or lack of trust, and many other issues, so traditional testing is insufficient to guarantee safety and responsibility in the employment of a given AI enabled capability. Focus of this concern tends to fall on well-publicized and high-risk capabilities, such as AI enabled autonomous weapons systems. However, while AI/ML enabled capabilities supporting personnel processes and systems, such as algorithms used for retention and promotion decision support, tend to carry low safety risk, many concerns, some of them specific to the personnel space, run the risk of undermining the DOD’s 5 ethical principles for RAI. Examples include service member privacy concerns, invalid prospective policy analysis, disparate impact against marginalized service member groups, and unintended emergent service member behavior in response to use of the capability. Eroding barriers to use of AI/ML are facilitating an increasing number of applications while some of these concerns are still not well understood by the analytical community. We consider many of these issues in the context of an IDA ML enabled capability and propose mechanisms to assure stakeholders of the adherence to the DOD’s ethical principles.

  •  2 

    CDAO Joint AI Test Infrastructure Capability

    David Jin (MITRE)

    The Chief Digital and AI Office (CDAO) Test & Evaluation Directorate is developing the Joint AI Test Infrastructure Capability (JATIC) program of record, which is an interoperable set of state-of-the-art software capabilities for AI Test & Evaluation. It aims to provide a provide a comprehensive suite of integrated testing tools which can be deployed widely across the enterprise to address key T&E gaps. In particular, JATIC will capabilities will support the assessment of AI system performance, cybersecurity, adversarial resilience, and explainability – enabling the end-user to more effectively execute their mission. It is a key component of the digital testing infrastructure that the CDAO will provide in order to support the development and deployment of data, analytics, and AI across the Department.

Room C

Session 4C: Applications of Machine Learning and Uncertainty Quantification
Session Chair: Jonathan Rathsam, NASA Langley Research Center
  •  1 

    Advanced Automated Machine Learning System for Cybersecurity

    Himanshu Dayaram Upadhyay (Florida International University)

    Florida International University (FIU) has developed an Advanced Automated Machine Learning System (AAMLS) under the sponsored research from Department of Defense – Test Resource Management Center (DOD-TRMC), to provide Artificial Intelligence based advanced analytics solutions in the area of Cyber, IOT, Network, Energy, Environment etc. AAMLS is a Rapid Modeling & Testing Tool (RMTT) for developing machine learning and deep learning models in few steps by subject matter experts from various domains with minimum machine learning knowledge using auto & optimization workflows. AAMLS allows analysis of data collected from different test technology domains by using machine learning / deep learning and ensemble learning approaches to generate models, make predictions, then apply advanced analytics and visualization to perform analysis. This system enables automated machine learning using AI based Advanced Analytics and the Analytics Control Center platforms by connecting to multiple Data Sources. Artificial Intelligence based Advanced Analytics Platform: This platform is the analytics engine of AAML which provides pre-processing, feature engineering, model building and predictions. Primary components of this platform include:

    • Machine Learning Server: This module is deployed to build ML/DL models using the training data from the data sources and perform predictions/analysis of associated test data based on the AAML-generated ML/DL models.
    • Machine Learning Algorithms: ML algorithms like Logistic Regression, Linear regression, Decision tree, Random Forest, One Class Support Vector Machine, Jaccard Similarity etc. are available for model building.
    • Deep learning Algorithms: Deep learning algorithms such as Deep Neural Networks and Recurrent Neural Networks are available to perform classification & anomaly detection using the TensorFlow framework and the Keras API.
    Analytics Control Center: This platform is a centralized application to manage the AAML system. It consists of following main modules.
    • Data Source: This module allows the user to connect to the existing data to the AAML system to perform analytics. These data sources may reside in a Network File Share, Database or Big Data Cluster.
    • Model Development: This module provides the functionality to build ML/DL models with various AI algorithms. This is performed by engaging specific ML algorithms for five types of analysis: Classification, Regression, Time-Series, Anomaly Detection and Clustering
    • Predictions: This module provides the functionality to predict the outcome of an analysis of an associated data set based on model built during Model Development.
    • Manage Models and Predictions: These modules allow the user to manage the ML models that have been generated and resulting predictions of associated data sets.

  •  2 

    Uncertainty Aware Machine Learning for Accelerators

    Malachi Schram (Thomas Jefferson National Accelerator Facility)

    Standard deep learning models for classification and regression applications are ideal for capturing complex system dynamics. Unfortunately, their predictions can be arbitrarily inaccurate when the input samples are not similar to the training data. Implementation of distance aware uncertainty estimation can be used to detect these scenarios and provide a level of confidence associated with their predictions. We present results using Deep Gaussian Process Approximation (DGPA) methods for 1) anomaly detection at Spallation Neutron Source (SNS) accelerator and 2) uncertainty aware surrogate model for the Fermi National Accelerator Lab (FNAL) Booster Accelerator Complex.

  •  2 

    Well-Calibrated Uncertainty Quantification for Language Models in the Nuclear Domain

    Karl Pazdernik (Pacific Northwest National Laboratory)

    A key component of global and national security in the nuclear weapons age is the proliferation of nuclear weapons technology and development. A key component of enforcing this non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. To support non-proliferation goals and contribute to nuclear science research, we trained a RoBERTa deep neural language model on a large set of U.S. Department of Energy Office of Science and Technical Information (OSTI) research article abstracts and then finetuned this model for classification of scientific abstracts into 60 disciplines, which we call NukeLM. This multi-step approach to training improved classification accuracy over its untrained or partially out-of-domain competitors. While it is important for classifiers to be accurate, there has also been growing interest in ensuring that classifiers are well-calibrated with uncertainty quantification that is understandable to human decision-makers. For example, in the multiclass problem, classes with a similar predicted probability should be semantically related. Therefore, we also introduced an extension of the Bayesian belief matching framework proposed by Joo et al. (2020) that easily scales to large NLP models, such as NukeLM, and better achieves the desired uncertainty quantification properties.

Room D

Spotlight on Data Literacy
Session Chair: Matthew Avery, IDA
  •  1 

    Data Literacy Within the Department of Defense

    Nicholas Clark (United States Military Academy)

    Data literacy, the ability to read, write, and communicate data in context, is fundamental for military organizations to create a culture where data is appropriately used to inform both operational and non-operational decisions. However, oftentimes organizations outsource data problems to outside entities and rely on a small cadre of data experts to tackle organizational problems. In this talk we will argue that data literacy is not solely the role or responsibility of the data expert. Ultimately, if experts develop tools and analytics that Army decision makers cannot use, or do not effectively understand the way the Army makes decisions, the Army is no more data rich than if it had no data at all. While serving on a sabbatical as the Chief Data Scientist for Joint Special Operations Command, COL Nick Clark (Department of Mathematical Sciences, West Point), noticed that a lack of basic data literacy skills was a major limitation to creating a data centric organization. As a result of this, he created 10 hours of training focusing on the fundamentals of data literacy. After delivering the course to JSOC, other DoD organizations began requesting the training. In response to this, a team from West Point joined with Army Talent Management Task Force to create mobile training teams. The teams have now delivered the training over 30 times to organizations ranging from tactical units up to strategic level commands. In this talk, we discuss what data literacy skills should be taught to the force and highlight best practices in educating soldiers, civilians, and contractors on the basics of data literacy. We will finally discuss strategies for assessing organizational Data Literacy and provide a framework for attendees to assess their own organizations data strengths and weaknesses.

Room E

Virtual Session
Mini-Tutorial 4
Session Chair: Gina Sigler, STAT COE
  •  1 

    An Overview of Methods, Tools, and Test Capabilities for T&E of Autonomous Systems

    Leonard Truett and Charlie Middleton (STAT COE)

    This tutorial will give an overview of selected methodologies, tools and test capabilities discussed in the draft “Test and Evaluation Companion Guide for Autonomous Military Systems.” This test and evaluation (T&E) companion guide is being developed to provide guidance to test and evaluation practitioners, to include program managers, test planners, test engineers, and analysts with test strategies, applicable methodologies, and tools that will help to improve rigor in addressing the challenges unique to the T&E of autonomy. It will also cover selected capabilities of test laboratories and ranges that support autonomous systems. The companion guide is intended to be a living document contributed by the entire community and will adapt to ensure the right information reaches the right audience.

12:20 PM – 1:30 PM


1:30 PM – 3:00 PM: Parallel Sessions

Room A

Virtual Session
Session 5A: Methods and Tools at National Labs
Session Chair: Karl Pazdernik, Pacific Northwest National Laboratory
  •  2 

    Test and Evaluation Methods for Authorship Attribution and Privacy Preservation

    Emily Saldanha (Pacific Northwest National Laboratory)

    The aim of the IARPA HIATUS program is to develop explainable systems for authorship attribution and author privacy preservation through the development of feature spaces which encode the distinguishing stylistic characteristics of authors independently of text genre, topic, or format. In this talk, I will discuss progress towards defining an evaluation framework for this task to provide robust insights into system strengths, weaknesses, and overall performance. Our evaluation strategy includes the use of an adversarial framework between attribution and privacy systems, development of a focused set of core metrics, analysis of system performance dependencies on key data factors, systematic exploration of experimental variables to probe targeted questions about system performance, and investigation of key trade-offs between different performance measures.

  •  2 

    Tools for Assessing Machine Learning Models’ Performance in Real-World Settings

    Carianne Martinez (Sandia National Laboratories)

    Machine learning (ML) systems demonstrate powerful predictive capability, but fielding such systems does not come without risk. ML can catastrophically fail in some scenarios, and in the absence of formal methods to validate most ML models, we require alternative methods to increase trust. While emerging techniques for uncertainty quantification and model explainability may seem to lie beyond the scope of many ML projects, they are essential tools for understanding deployment risk. This talk will share a practical workflow, useful tools, and lessons learned for ML development best practices. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2023-11982A

  •  2 

    Perspectives on T&E of ML for Assuring Reliability in Safety-Critical Applications

    Pradeep Ramuhalli (Oak Ridge National Laboratory)

    Artificial intelligence (AI) and Machine Learning (ML) are increasingly being examined for their utility in many domains. AI/ML solutions are being proposed for a broad set of applications, including surrogate modeling, anomaly detection, classification, image segmentation, control, etc. A lot of effort is being put into evaluating these solutions for robustness, especially in the context of safety critical applications. While traditional methods of verification and validation continue to be necessary, challenges exist in many safety critical applications given the limited ability to gather data covering all possible conditions, and limited ability to conduct experiments. This presentation will discuss potential approaches for testing and evaluating machine learning algorithms in such applications, as well as metrics for this purpose.

Room B

Virtual Session
Session 5B: Applications of Bayesian Statisics
Session Chair: Victoria Sieck, STAT COE
  •  3 

    A Bayesian Decision Theory Framework for Test & Evaluation

    James Ferry (Metron, Inc.)

    Decisions form the core of T&E: decisions about which tests to conduct and, especially, decisions on whether to accept or reject a system at its milestones. The traditional approach to acceptance is based on conducting tests under various conditions to ensure that key performance parameters meet certain thresholds with the required degree of confidence. In this approach, data is collected during testing, then analyzed with techniques from classical statistics in a post-action report. This work explores a new Bayesian paradigm for T&E based on one simple principle: maintaining a model of the probability distribution over system parameters at every point during testing. In particular, the Bayesian approach posits a distribution over parameters prior to any testing. This prior distribution provides (a) the opportunity to incorporate expert scientific knowledge into the inference procedure, and (b) transparency regarding all assumptions being made. Once a prior distribution is specified, it can be updated as tests are conducted to maintain a probability distribution over the system parameters at all times. One can leverage this probability distribution in a variety of ways to produce analytics with no analog in the traditional T&E framework. In particular, having a probability distribution over system parameters at any time during testing enables one to implement an optimal decision-making procedure using Bayesian Decision Theory (BDT). BDT accounts for the cost of various testing options relative to the potential value of the system being tested. When testing is expensive, it provides guidance on whether to conserve resources by ending testing early. It evaluates the potential benefits of testing for both its ability to inform acceptance decisions and for its intrinsic value to the commander of an accepted system. This talk describes the BDT paradigm for T&E and provides examples of how it performs in simple scenarios. In future work we plan to extend the paradigm to include the features, the phenomena, and the SME elicitation protocols necessary to address realistic T&E cases.

  •  2 

    Saving hardware, labor, and time using Bayesian adaptive design of experiments

    Daniel Ries (Sandia National Laboratories)

    Physical testing in the national security enterprise is often costly. Sometimes this is driven by hardware and labor costs, other times it can be driven by finite resources of time or hardware builds. Test engineers must make the most of their available resources to answer high consequence problems. Bayesian adaptive design of experiments (BADE) is one tool that should be in an engineer’s toolbox for designing and running experiments. BADE is sequential design of experiment approach which allows early stopping decisions to be made in real time using predictive probabilities (PP), allowing for more efficient data collection. BADE has seen successes in clinical trials, another high consequence arena, and it has resulted in quicker and more effective assessments of drug trials. BADE has been proposed for testing in the national security space for similar reasons of quicker and cheaper test series. Given the high-consequence nature of the tests performed in the national security space, a strong understanding of new methods is required before being deployed. The main contribution of this research is to assess the robustness of PP in a BADE under different modeling assumptions, and to compare PP results to its frequentist alternative, conditional power (CP). Comparisons are made based on Type I error rates, statistical power, and time savings through average stopping time. Simulation results show PP has some robustness to distributional assumptions. PP also tends to control Type I error rates better than CP, while maintaining relatively strong power. While CP usually recommends stopping a test earlier than PP, CP also tends to have more inconsistent results, again showing the benefits of PP in a high consequence application. An application to a real problem from Sandia National Laboratories shows the large potential cost savings for using PP. The results of this study suggest BADE can be one piece of an evidence package during testing to stop testing early and pivot, in order to decrease costs and increase flexibility. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

  •  2 

    Uncertainty Quantification of High Heat Microbial Reduction for NASA Planetary Protection

    Michael DiNicola (Jet Propulsion Laboratory, California Institute of Technology)

    Planetary Protection is the practice of protecting solar system bodies from harmful contamination by Earth life and protecting Earth from possible life forms or bioactive molecules that may be returned from other solar system bodies. Microbiologists and engineers at NASA’s Jet Propulsion Laboratory (JPL) design microbial reduction and sterilization protocols that reduce the number of microorganisms on spacecraft or eliminate them entirely. These protocols are developed using controlled experiments to understand the microbial reduction process. Many times, a phenomenological model (such as a series of differential equations) is posited that captures key behaviors and assumptions of the process being studied. A Sterility Assurance Level (SAL) – the probability that a product, after being exposed to a given sterilization process, contains one or more viable organisms – is a standard metric used to assess risk and define cleanliness requirements in industry and for regulatory agencies. Experiments performed to estimate the SAL of a given microbial reduction or sterilization protocol many times have large uncertainties and variability in their results even under rigorously implemented controls that, if not properly quantified, can make it difficult for experimenters to interpret their results and can hamper a credible evaluation of risk by decision makers. In this talk, we demonstrate how Bayesian statistics and experimentation can be used to quantify uncertainty in phenomenological models in the case of microorganism survival under short-term high heat exposure. We show how this can help stakeholders make better risk-informed decisions and avoid the unwarranted conservatism that is often prescribed when processes are not well understood. The experiment performed for this study employs a 6 kW infrared heater to test survivability of heat resistant Bacillus canaveralius 29669 to temperatures as high as 350 °C for time durations less than 30 sec. The objective of this study was to determine SALs for various time-temperature combinations, with a focus on those time-temperature pairs that give a SAL of 10^-6. Survival ratio experiments were performed that allow estimation of the number of surviving spores and mortality rates characterizing the effect of the heat treatment on the spores. Simpler but less informative fraction-negative experiments that only provide a binary sterile/not-sterile outcome were also performed once a sterilization temperature regime was established from survival ratio experiments. The phenomenological model considered here is a memoryless mortality model that underlies many heat sterilization protocols in use today. This discussion and poster will outline how the experiment and model were brought together to determine SALs for the heat treatment under consideration. Ramifications to current NASA planetary protection sterilization specifications and current missions under development such as Mars Sample Return will be discussed. This presentation/poster is also relevant to experimenters and microbiologists working on military and private medical device applications where risk to human life is determined by sterility assurance of equipment.

Room C

Session 5C: Methods and Tools for T&E
Session Chair: Sam McGregor, AFOTEC
  •  2 

    Confidence Intervals for Derringer and Suich Desirability Function Optimal Points

    Peter Calhoun (HQ AFOTEC)

    A shortfall of the Derringer and Suich (1980) desirability function for multi-objective optimization has been a lack of inferential methods to quantify uncertainty. Most articles for addressing uncertainty involve robust methods, providing a point estimate that is less affected by variation. Few articles address confidence intervals or bands but not specifically for the widely used Derringer and Suich method. 8 methods are presented to construct 100(1-alpha) confidence intervals around Derringer and Suich desirability function optimal values. First order and second order models using bivariate and multivariate data sets are used as examples to demonstrate effectiveness. The 8 proposed methods include a simple best/worst case method, 2 generalized methods, 4 simulated surface methods, and a nonparametric bootstrap method. One of the generalized methods, 2 of the simulated surface methods, and the nonparametric method account for covariance between the response surfaces. All 8 methods seem to perform decently on the second order models; however, the methods which utilize an underlying multivariate-t distribution, Multivariate Generalized (MG) and Multivariate t Simulated Surface (MVtSSig) are recommended methods from this research as they perform well with small samples for both first order and second order models with coverage only becoming unreliable at consistently non-optimal solutions. MG and MVtSSig inference could also be used in conjunction with robust methods such as Pareto Front Optimization to help ascertain which solutions are more likely to be optimal before constructing confidence interval.

  •  1 

    Skyborg Data Pipeline

    Alexander Malburg (AFOTEC/EX)

    The purpose of the Skyborg Data Pipeline is to allow for the rapid turnover of flight data collect during a test event, using collaborative easily access tool sets available in the AFOTEC Data Vault. Ultimately the goal of this data pipeline is to provide a working up to date dashboard that leadership can utilize shortly after a test event.

  •  2 

    Circular Error Probable and an Example with Multilevel Effects

    Jacob Warren (Marine Corps Operational Test and Evaluation Activity)

    Circular Error Probable (CEP) is a measure of a weapon system’s precision developed based on the Bivariate Normal Distribution. Failing to understanding the theory behind CEP can result in misuse of equations developed to help estimation. Estimation of CEP is also much more straightforward given situations such as single samples where factors are not being manipulated. This brief aims to help build a theoretical understanding of CEP, and then presents a non-trivial example in which CEP is estimated via multilevel regression. The goal is to help build an understanding of CEP so it can be properly estimated in trivial (single sample) and non-trivial cases (e.g. regression and multilevel regression).

Room E

Virtual Session
Mini-Tutorial 5
Session Chair: John Haman, IDA
  •  1 

    The Automaton General-Purpose Data Intelligence Platform

    Jeremy Werner (DOT&E)

    The Automaton general-purpose data intelligence platform abstracts data analysis out to a high level and automates many routine analysis tasks while being highly extensible and configurable – enabling complex algorithms to elucidate mission-level effects. Automaton is built primarily on top of R Project and its features enable analysts to build charts and tables, calculate aggregate summary statistics, group data, filter data, pass arguments to functions, generate animated geospatial displays for geospatial time series data, flatten time series data into summary attributes, fit regression models, create interactive dashboards, and conduct rigorous statistical tests. All of these extensive analysis capabilities are automated and enabled from an intuitive configuration file requiring no additional software code. Analysts or software engineers can easily extend Automaton to include new algorithms, however. Automaton’s development was started at Johns Hopkins University Applied Physics Laboratory in 2018 to support an ongoing military mission and perform statistically rigorous analyses that use Bayesian-inference-based Artificial Intelligence to elucidate mission-level effects. Automaton has unfettered Government Purpose Rights and is freely available. One of DOT&E’s strategic science and technology thrusts entails automating data analyses for Operational Test & Evaluation as well as developing data analysis techniques and technologies targeting mission-level effects; Automaton will be used, extended, demonstrated/trained on, and freely shared to accomplish these goals and collaborate with others to drive our Department’s shared mission forward. This tutorial will provide an overview of Automaton’s capabilities (first 30 min, for Action Officers and Senior Leaders) as well as instruction on how to install and use the platform (remaining duration for hands-on-time with technical practitioners).

     Installation instructions are below and depend upon the user installing Windows Subsystem for Linux or having access to another Unix environment (e.g., macOS):

     Please install WSL V2 on your machines before the tutorial:

    https://learn.microsoft.com/en-us/windows/wsl/install Then please download/unzip the Automaton demo environment and place in your home directory: https://www.edaptive.com/dataworks/automaton_2023-04-18_dry_run_1.tar

    Then open up powershell from your home directory and type:

    wsl –import automaton_2023-04-18_dry_run_1 automaton_2023-04-18_dry_run_1 automaton_2023-04-18_dry_run_1.tar wsl -d automaton_2023-04-18_dry_run_1

3:00 PM – 3:20 PM


3:20 PM – 4:20 PM: Parallel Sessions

Room A

Virtual Session
Session 6A: Tools for Decision-Making and Data Management
Session Chair: Tyler Lesthaeghe, University of Dayton Research Institute
  •  3 

    User-Friendly Decision Tools

    Clifford Bridges (IDA)

    Personal experience and anecdotal evidence suggest that presenting analyses to sponsors, especially technical sponsors, is improved by helping the sponsor understand how results were derived. Providing summaries of analytic results is necessary but can be insufficient when the end goal is to help sponsors make firm decisions. When time permits, engaging sponsors with walk-throughs of how results may change given different inputs is particularly salient in helping sponsors make decisions in the context of the bigger picture. Data visualizations and interactive software are common examples of what we call “decision tools” that can walk sponsors through varying inputs and views of the analysis. Given long-term engagement and regular communication with a sponsor, developing user-friendly decision tools is a helpful practice to support sponsors. This talk presents a methodology for building decision tools that combines leading practices in agile development and STEM education. We will use a Python-based app development tool called Streamlit to show implementations of this methodology.

  •  1 

    Seamlessly Integrated Materials Labs at AFRL

    Lauren Ferguson (Air Force Research Laboratory)

    One of the challenges to conducting research in the Air Force Research Laboratory is that many of our equipment controllers cannot be directly connected to our internal networks, due to older or specialized operating systems and the need for administrative privileges for proper functioning. This means that the current data collection process is often highly manual, with users documenting experiments in physical notebooks and transferring data via CDs or portable hard drives to connected systems for sharing or further processing. In the Materials & Manufacturing Directorate, we have developed a unique approach to seamlessly integrate our labs for more efficient data collection and transfer, which is specifically designed to help users ensure that data is findable for future reuse. In this talk, we will highlight our two enabling tools: NORMS, which assists users to easily generate metadata for direct association with data collected in the lab to eliminate physical notebooks; and Spike, which automates one-way data transfer from isolated systems to databases mirrored on other networks. In these databases, metadata can be used for complex search queries and data is automatically shared with project members without requiring additional transfers. The impact of this solution has been significantly faster data availability (including searchability) to all project members: a transfer and scanning process that used to take 3 hours can now take a few minutes. Future use cases will also enable Spike to transfer data directly into cloud buckets for in situ analysis, which would streamline collaboration with partners.

Room B

Virtual Session
Session 6B: Methods for DoD System Supply Chain and Performance Estimation
Session Chair: Margaret Zientek, IDA
  •  3 

    Applications of Network Methods for Supply Chain Review

    Zed Fashena (IDA)

    The DoD maintains a broad array of systems, each one sustained by an often complex supply chain of components and suppliers. The ways that these supply chains are interlinked can have major implications for the resilience of the defense industrial base as a whole, and the readiness of multiple weapon systems. Finding opportunities to improve overall resilience requires gaining visibility of potential weak links in the chain, which requires integrating data across multiple disparate sources. By using open-source data pipeline software to enhance reproducibility, and flexible network analysis methods, multiple stovepiped data sources can be brought together to develop a more complete picture of the supply chain across systems.

  •  1 

    Predicting Aircraft Load Capacity Using Regional Climate Data

    Abraham Holland (IDA)

    While the impact of local weather conditions on aircraft performance is well-documented, climate change has the potential to create long-term shifts in aircraft performance. Using just one metric, internal load capacity, we document operationally relevant performance changes for a UH-60L within the Indo-Pacific region. This presentation uses publicly available climate and aircraft performance data to create a representative analysis. The underlying methodology can be applied at varying geographic resolutions, timescales, airframes, and aircraft performance characteristics across the entire globe.

Room C

Session 6C: Statistical Methods for Ranking and Functional Data Types
Session Chair: Elise Roberts, JHU/APL
  •  2 

    An Introduction to Ranking Data and a Case Study of a National Survey of First Responders

    Adam Pintar (National Institute of Standards and Technology)

    Ranking data are collected by presenting a respondent with a list of choices, and then asking what are the respondent’s favorite, second favorite, and so on. The rankings may be complete, the respondent rank orders the complete list, or partial, only the respondent’s favorite two or three, etc. Given a sample of rankings from a population, one goal may be to estimate the most favored choice from the population. Another may be to compare the preferences of one subpopulation to another. In this presentation I will introduce ranking data and probability models that form the foundation for statistical inference for them. The Plackette-Luce model will be the main focus. After that I will introduce a real data set containing ranking data assembled by the National Institute of Standards and Technology (NIST) based on the results of a national survey of first responders. The survey asked about how first responders use communication technology. With this data set, questions such as do rural and urban/suburban first responders prefer the same types communication devices, can be explored. I will conclude with some ideas for incorporating rankning data into test and evaluation settings.

  •  3 

    Estimating Sparsely and Irregularly Observed Multivariate Functional Data

    Maximillian Chen (Johns Hopkins University Applied Physics Laboratory)

    With the rise in availability of larger datasets, there is a growing need of tools and methods to help inform data-driven decisions. Data that vary over a continuum, such as time, exist in a wide array of fields, such as defense, finance, and medicine. One such class of methods that addresses data varying over a continuum is functional data analysis (FDA). FDA methods typically make three assumptions that are often violated in real datasets: all observations exist over the same continuum interval (such as a closed interval [a,b]), all observations are regularly and densely observed, and if the dataset consists of multiple covariates, the covariates are independent of one another. We look to address violation of the latter two assumptions. In this talk, we will discuss methods for analyzing functional data that are irregularly and sparsely observed, while also accounting for dependencies between covariates. These methods will be used to estimate the reconstruction of partially observed multivariate functional data that contain measurement errors. We will begin with a high-level introduction of FDA. Next, we will introduce functional principal components analysis (FPCA), which is a representation of functions that our estimation methods are based on. We will discuss a specific approach called principal components analysis through conditional expectation (PACE) (Yao et al, 2005), which computes the FPCA quantities for a sparsely or irregularly sampled function. The PACE method is a key component that allows us to estimate partially observed functions based on the available dataset. Finally, we will introduce multivariate functional principal components analysis (MFPCA) (Happ & Greven, 2018), which utilizes the FPCA representations of each covariate’s functions in order to compute a principal components representation that accounts for dependencies between covariates. We will illustrate these methods through implementation on simulated and real datasets. We will discuss our findings in terms of the accuracy of our estimates with regards to the amount and portions of a function that is observed, as well as the diversity of functional observations in the dataset. We will conclude our talk with discussion on future research directions.

Room E

Virtual Session
Mini-Tutorial 6
Session Chair: Elizabeth Gregory, NASA Langley Research Center
  •  2 

    An Introduction to Uncertainty Quantification for Modeling & Simulation

    James Warner (NASA Langley Research Center)

    Predictions from modeling and simulation (M&S) are increasingly relied upon to inform critical decision making in a variety of industries including defense and aerospace. As such, it is imperative to understand and quantify the uncertainties associated with the computational models used, the inputs to the models, and the data used for calibration and validation of the models. The rapidly evolving field of uncertainty quantification (UQ) combines elements of statistics, applied mathematics, and discipline engineering to provide this utility for M&S. This mini tutorial provides an introduction to UQ for M&S geared towards engineers and analysts with little-to-no experience in the field but with some knowledge of probability and statistics. A brief review of basic probability will be provided before discussing some core UQ concepts in more detail, including uncertainty propagation and the use of Monte Carlo simulation for making probabilistic predictions with computational models, model calibration to estimate uncertainty in model input parameters using experimental data, and sensitivity analysis for identifying the most important and influential model inputs parameters. Examples from relevant NASA applications are included and references are provided throughout to point viewers to resources for further study.

4:20 PM – 4:40 PM

Room A+B

Virtual Session


Army Wilks Award

Wilks Award Winner to Be Announced

ASA SDNS Student Poster Awards

Student Winners to be Announced

DATAWorks Distinguished Leadership Award

Winner To Be Announced

4:40 PM – 4:50 PM

Room A+B

Virtual Session

Closing Remarks

General Norty Schwartz

(U.S. Air Force, retired / Institute for Defense Analyses)

Norton A. Schwartz serves as President of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA. General Schwartz has a long and prestigious career of service and leadership that spans over 5 decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees. Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975. General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College. He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981.