Invited Abstracts
Total Invited Abstracts: 63
Show Invited Abstracts
# | Name / Org | Type | Abstract Title | Theme | Abstract | Add to Speakers |
---|---|---|---|---|---|---|
1 | Terril Hurst Senior Engineering Fellow,Raytheon Technologies |
Breakout |
A Decision-Theoretic Framework for Adaptive Simulation Experiments | Design of Experiments | We describe a model-based framework for increasing effectiveness of simulation experiments in the Read More presence of uncertainty. Unlike conventionally designed simulation experiments, it adaptively chooses where to sample, based on the value of information obtained. A Bayesian perspective is taken to formulate and update the framework’s four models. A simulation experiment is conducted to answer some question. In order to define precisely how informative a run is for answering the question, the answer must be defined as a random variable. This random variable is called a query and has the general form of p(theta | y), where theta is the query parameter and y is the available data. Examples of each of the four models employed in the framework are briefly described below: 1. The continuous correlated beta process model (CCBP) estimates the proportions of successes and failures using beta-distributed uncertainty at every point in the input space. It combines results using an exponentially decaying correlation function. The output of the CCBP is used to estimate value of a candidate run. 2. The mutual information model quantifies uncertainty in one random variable that is reduced by observing the other one. The model quantifies the mutual information between any candidate runs and the query, thereby scoring the value of running each candidate. 3. The cost model estimates how long future runs will take, based upon past runs using, e.g., a generalized linear model. A given simulation might have multiple fidelity options that require different run times. It may be desirable to balance information with the cost of a mixture of runs using these multi-fidelity options. 4. The grid state model, together with the mutual information model, are used to select the next collection of runs for optimal information per cost, accounting for current grid load. The framework has been applied to several use cases, including model verification and validation with uncertainty quantification (VVUQ). Given a mathematically precise query, an 80 percent reduction in total runs has been observed. | Add to Speakers |
2 | Kathy Ortega System Integration Test & Evaluation Engineering Manager,Northrop Grumman Aerospace Systems |
Breakout |
Leveraging IBM Jazz Capabilities to Optimize T&E Processes | Test and Evaluation Methods for Emerging Technology | Many Test & Evaluation (T&E) processes within the Aerospace and Defense industry, Read More specifically the autonomous weapon systems area, are outdated, inefficient, and prone to user error. Not only this, but there is also lack of customer visibility and the Digital Thread is not maintained throughout. Current test planning, procedure development, peer review, test execution, metrics collection and defect reporting processes rely on a wide array of fragmented tools. Incorporating a single-location integrated suite of online tools is greatly beneficial to both defense contractors and customers alike. The online suite of tools that presents a huge opportunity for modernization and tool conglomeration is IBM Jazz. In this Breakout session, the Quality Management and Change and Configuration Management areas of IBM Jazz will be explored. More to Follow. | Add to Speakers |
3 | J. Michael Gilmore Research Staff Member,Institute for Defense Analyses |
Breakout |
Applying Design of Experiments to Cyber Testing | Design of Experiments | We describe a potential framework for applying DOE to cyber testing and provide an example of its Read More application to testing of a hypothetical command and control system. | Add to Speakers |
4 | Joel Williamsen Research Staff Member,Institute for Defense Analyses |
Breakout |
DebProp: Orbital Debris Collision Effects Prediction Tool for Satellite Constellations | Analysis Tools and Techniques | Based on observations gathered from the IDA Forum on Orbital Debris (OD) Risks and Challenges Read More (October 8-9, 2020), the Director, Operational Test and Evaluation (DOT&E) needed first order predictive tools to evaluate the effects of orbital debris on mission risk, catastrophic collision, and collateral damage to DoD spacecraft and other orbital assets – either from unintentional or intentional [Anti-Satellite (ASAT)] collisions. This lack of modeling capability hindered the ability of DOT&E to evaluate the risk to operational effectiveness and survivability of individual satellites and large constellations, as well as risks to the overall use of space assets in the future. This presentation describes an IDA-derived technique (DebProp) to evaluate the debris propagating effects of large, trackable debris (>5 cm) or antisatellite weapons colliding with satellites within constellations. Using a Starlink-like satellite as a case study and working with Stellingwerf Associates, IDA researchers modified the Smooth Particle Hydrodynamic Code (SPHC) to output a file format that is readable as an input file for predicting orbital stability or debris re-entry for thousands of created particles, and predict additional, short term OD induced losses to other satellites in the constellation. By pairing this technique with SatPen (an Excel-based tool for determining the probability and mission effects of >1mm OD impacts and penetration on individual satellites with ORDEM 3.1 as an input, supplemented with typical damage prediction equations to support mission loss predictions), IDA can conduct long-term debris growth studies). | Add to Speakers |
5 | Joel Williamsen Research Staff Member,IDA |
Breakout |
SatPen: Orbital Debris Risk Prediction Tool for Constellation Satellites | Analysis Tools and Techniques | Based on observations gathered from the IDA Forum on Orbital Debris (OD) Risks and Challenges Read More (October 8-9, 2020), DOT&E needed first order predictive tools to evaluate the effects of orbital debris on mission risk, catastrophic collision, and collateral damage to DoD spacecraft and other orbital assets – either from unintentional or intentional [Anti-Satellite (ASAT)] collisions. This lack of modeling capability hindered the ability of DOT&E to evaluate the risk to operational effectiveness and survivability of individual satellites and large constellations, as well as risks to the overall use of space assets in the future. This presentation describes an IDA-derived Excel-based tool (SatPen) for determining the probability and mission effects of >1mm orbital debris impacts and penetration on individual satellites in low Earth orbit (LEO). Using a Starlink-like satellite as a case study and NASA’s ORDEM 3.1 orbital debris environment as an input, supplemented with typical damage prediction equations to support mission loss predictions, IDA estimated the likelihood of satellite mission loss. By pairing this technique with DebProp (another IDA tool described at this forum for evaluating the debris propagating effects of large, trackable debris or antisatellite weapons colliding with satellites within constellations), IDA can predict additional, short term OD induced losses to other satellites in the constellation, and conduct long-term debris growth studies. | Add to Speakers |
6 | Joel Williamsen Research Staff Member,IDA |
Breakout Publish |
Orbital Debris Effects Prediction Tool for Satellite Constellations | Analysis Tools and Techniques | Based on observations gathered from the IDA Forum on Orbital Debris (OD) Risks and Challenges Read More (October 8-9, 2020), DOT&E needed first-order predictive tools to evaluate the effects of orbital debris on mission risk, catastrophic collision, and collateral damage to DOD spacecraft and other orbital assets – either from unintentional or intentional [Anti-Satellite (ASAT)] collisions. This lack of modeling capability hindered DOT&E’s ability to evaluate the risk to operational effectiveness and survivability of individual satellites and large constellations, as well as risks to the overall use of space assets in the future. Part 1 of this presentation describes an IDA-derived Excel-based tool (SatPen) for determining the probability and mission effects of >1mm orbital debris impacts and penetration on individual satellites in low Earth orbit (LEO). IDA estimated the likelihood of satellite mission loss using a Starlink-like satellite as a case study and NASA’s ORDEM 3.1 orbital debris environment as an input, supplemented with typical damage prediction equations to support mission loss predictions. Part 2 of this presentation describes an IDA-derived technique (DebProp) to evaluate the debris propagating effects of large, trackable debris (>5 cm) or antisatellite weapons colliding with satellites within constellations. IDA researchers again used a Starlink-like satellite as a case study and worked with Stellingwerf Associates to modify the Smooth Particle Hydrodynamic Code (SPHC) in order to predict the number and direction of fragments following a collision by a tracked satellite fragment. The result is a file format that is readable as an input file for predicting orbital stability or debris re-entry for thousands of created particles, and predict additional, short-term OD-induced losses to other satellites in the constellation. By pairing these techniques, IDA can predict additional, short-term and long-term OD-induced losses to other satellites in the constellation, and conduct long-term debris growth studies. | Add to Speakers |
7 | ZHI WANG Data Scientist Contractor,Bayer Crop Science |
Breakout No Publish |
Estimating the time of sudden shift in the location or scale of ergodic-stationary process | Analysis Tools and Techniques | Autocorrelated sequences arise in many modern-day industrial applications. In this paper, our focus Read More is on estimating the time of sudden shift in the location or scale of a continuous ergodic-stationary sequence following a genuine signal from a statistical process control chart. Our general approach involves “clipping” the continuous sequence at the median or interquartile range (IQR) to produce a binary sequence, and then modeling the joint mass function for the binary sequence using a Bahadur approximation. We then derive a maximum likelihood estimator for the time of sudden shift in the mean of the binary sequence. Performance comparisons are made between our proposed change point estimator and two other viable alternatives. Although the literature contains existing methods for estimating the time of sudden shift in the mean and/or variance of a continuous process, most are derived under strict independence and distributional assumptions. Such assumptions are often too restrictive, particularly when applications involve Industry 4.0 processes where autocorrelation is prevalent and the distribution of the data is likely unknown. The change point estimation strategy proposed in this work easily incorporates autocorrelation and is distribution-free. Consequently, it is widely applicable to modern-day industrial processes. | Add to Speakers |
8 | Gina Sigler & Alex (Mary) McBride Statistician,Scientific Test and Analysis Techniques Center of Excellence (STAT COE) |
Tutorial No Publish |
Survey Dos and Don’ts | Analysis Tools and Techniques | How many surveys have you been asked to fill out? How many did you actually complete? Why those Read More surveys? Did you ever feel like the answer you wanted to mark was missing from the list of possible responses? Surveys can be a great tool for data collection if they are thoroughly planned out and well-designed. They are a relatively inexpensive way to collect a large amount of data from hard to reach populations. However, if they are poorly designed, the test team might end up with a lot of data and little to no information. Join the STAT COE for a short tutorial on the dos and don’ts of survey design and analysis. We’ll point out the five most common survey mistakes, compare and contrast types of questions, discuss the pros and cons for potential analysis methods (such as descriptive statistics, linear regression, principal component analysis, factor analysis, hypothesis testing, and cluster analysis), and highlight how surveys can be used to supplement other sources of information to provide value to an overall test effort. DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. CLEARED on 5 Jan 2022. Case Number: 88ABW-2022-0003 | Add to Speakers |
9 | Ryan Norman Chief Data Officer,Test Resource Management Center |
Breakout No Publish |
TRMC Big Data Analytics Investments & Technology Review | Analysis Tools and Techniques | To properly test and evaluate today’s advanced military systems, the T&E community must utilize Read More big data analytics (BDA) and techniques to quickly process, visualize, understand, and report on massive amounts of data. This tutorial/presentation/TBD will inform the audience how to transform the current T&E data infrastructure and analysis techniques to one employing enterprise BDA and Knowledge Management (BDKM) that supports the current warfighter T&E needs and the developmental and operational testing of future weapon platforms. The TRMC enterprise BDKM will improve acquisition efficiency, keep up with the rapid pace of acquisition technological advancement, and ensure that effective weapon systems are delivered to warfighters at the speed of relevance – all while enabling T&E analysts across the acquisition lifecycle to make better and faster decisions using data previously inaccessible or unusable. This capability encompasses a big data architecture framework – its supporting resources, methodologies, and guidance – to properly address the current and future data needs of systems testing and analysis, as well as an implementation framework, the Cloud Hybrid Edge-to-Enterprise Evaluation and Test Analysis Suite (CHEETAS). In combination with the TRMC’s Joint Mission Environment Test Capability (JMETC) which provides readily-available connectivity to the Services’ distributed test capabilities and simulations, the TRMC has demonstrated that applying enterprise-distributed BDA tools and techniques to distributed T&E leads to faster and more informed decision-making – resulting in reduced overall program cost and risk. | Add to Speakers |
10 | Daniel A. Timme PhD Candidate,Florida State University |
Breakout Publish |
Nonparametric multivariate profile monitoring using regression trees | Test and Evaluation Methods for Emerging Technology | Monitoring noisy profiles for changes in the behavior can be used to validate whether the process is Read More operating under normal conditions over time. Change-point detection and estimation in sequences of multivariate functional observations is a common method utilized in monitoring such profiles. A nonparametric method utilizing Classification and Regression Trees (CART) to build a sequence of regression trees is proposed which makes use of the Kolmogorov-Smirnov statistic to monitor profile behavior. Our novel method compares favorably to existing methods in the literature. | Add to Speakers |
11 | Keyla Pagán-Rivera Research Staff Member,Institute for Defense Analyses |
Tutorial No Publish |
Case Study on Applying Sequential Methods in Operational Testing | Design of Experiments | Sequential methods concerns statistical evaluation in which the number, pattern, or composition of Read More the data is not determined at the start of the investigation but instead depends on the information acquired during the investigation. Although sequential methods originated in ballistics testing for the Department of Defense (DoD), it is underutilized in the DoD. Expanding the use of sequential methods may save money and reduce test time. In this presentation, we introduce sequential methods, describe its potential uses in operational test and evaluation (OT&E), and present a method for applying it to the test and evaluation of defense systems. We evaluate the proposed method by performing simulation studies and applying the method to a case study. Additionally, we discuss some of the challenges we might encounter when using sequential analysis in OT&E. | Add to Speakers |
12 | James Warner Computational Scientist,NASA Langley Research Center |
Breakout No Publish |
Machine Learning for Uncertainty Quantification: Trusting the Black Box | Analysis Tools and Techniques | Adopting uncertainty quantification (UQ) has become a prerequisite for providing credibility in Read More modeling and simulation (M&S) applications. It is well known, however, that UQ can be computationally prohibitive for problems involving expensive high-fidelity models, since a large number of model evaluations is typically required. A common approach for improving efficiency is to replace the original model with an approximate surrogate model (i.e., metamodel, response surface, etc.) using machine learning that makes predictions in a fraction of the time. While surrogate modeling has been commonplace in the UQ field for over a decade, many practitioners still remain hesitant to rely on “black box” machine learning models over trusted physics-based models (e.g., FEA) for their analyses. This talk discusses the role of machine learning in enabling computational speedup for UQ, including traditional limitations and modern efforts to overcome them. An overview of surrogate modeling and its best practices for effective use is first provided. Then, some emerging methods that aim to unify physics-based and data-based approaches for UQ are introduced, including multi-model Monte Carlo simulation and physics-informed machine learning. The use of both traditional surrogate modeling and these more advanced machine learning methods for UQ are highlighted in the context of applications at NASA, including trajectory simulation and spacesuit certification. | Add to Speakers |
13 | Elizabeth Claassen Research Statistician Developer,JMP Statistical Discovery |
Tutorial No Publish |
Mixed Models: A Critical Tool for Dependent Observations | Analysis Tools and Techniques | The use of fixed and random effects have a rich history. They often go by other names, including Read More blocking models, variance component models, nested and split-plot designs, hierarchical linear models, multilevel models, empirical Bayes, repeated measures, covariance structure models, and random coefficient models. Mixed models are one of the most powerful and practical ways to analyze experimental data, and investing time to become skilled with them is well worth the effort. Many, if not most, real-life data sets do not satisfy the standard statistical assumption of independent observations. Failure to appropriately model design structure can easily result in biased inferences. With an appropriate mixed model we can estimate primary effects of interest as well as compare sources of variability using common forms of dependence among sets of observations. Mixed Models can readily become the most handy method in your analytical toolbox and provide a foundational framework for understanding statistical modeling in general. In this course we will cover many types of mixed models, including blocking, split-plot, and random coefficients. | Add to Speakers |
14 | Christina Heinich AST, Data Systems,NASA |
Tutorial No Publish |
An Introduction to Data Visualization | Special Topics | Data visualization can be used to present findings, explore data, and use the human eye to find Read More patterns that a computer would struggle to locate. Borrowing tools from art, storytelling, data analytics and software development, data visualization is an indispensable part of the analysis process. While data visualization usage spans across multiple disciplines and sectors, most never receive formal training in the subject. As such, this tutorial will introduce key data visualization building blocks and how to best use those building blocks for different scenarios and audiences. We will also go over tips on accessibility, design and interactive elements. While this will by no means be a complete overview of the data visualization field, by building a foundation and introducing some rules of thumb, attendees will be better equipped for communicating their findings to their audience. | Add to Speakers |
15 | Lindsey Butler Research Staff Member,Institute for Defense Analyses |
Breakout No Publish |
A New Method for Planning Full-Up System-Level (FUSL) Live Fire Tests | Design of Experiments | Planning Full-Up System-Level (FUSL) Live Fire tests is a complex process that has historically Read More relied solely on subject matter expertise. In particular, there is no established method to determine the appropriate number of FUSL tests necessary for a given program. We developed a novel method that is analogous to the Design of Experiments process that is used to determine the scope of Operational Test events. Our proposed methodology first requires subject matter experts (SMEs) to define all potential FUSL shots. For each potential shot, SMEs estimate the severity of that shot, the uncertainty of that severity estimate, and the similarity of that shot to all other potential shots. We developed a numerical optimization algorithm that uses the SME inputs to generate a prioritized list of FUSL events and a corresponding plot of the total information gained with each successive shot. Together, these outputs can help analysts determine the adequate number of FUSL tests for a given program. We illustrate this process with an example on a notional ground vehicle. Future work is necessary prior to implementation on a program of record. | Add to Speakers |
16 | Jane E. Valentine Senior Biomedical Engineer,Johns Hopkins University Applied Physics Laboratory |
Breakout No Publish |
Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network f | Special Topics | Current disease outbreak surveillance practices reflect underlying delays in the detection and Read More reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis. | Add to Speakers |
17 | William J Doebler Research Aerospace Engineer,NASA Langley Research Center |
Breakout No Publish |
Experiment Design and Visualization Techniques for an X-59 Low-boom Variability Study | Design of Experiments | This presentation outlines the design of experiments approach and data visualization techniques for Read More a simulation study of sonic booms from NASA’s X-59 supersonic aircraft. The X-59 will soon be flown over communities across the contiguous USA as it produces a low-loudness sonic boom, or low-boom. Survey data on human perception of low-booms will be collected to support development of potential future commercial supersonic aircraft noise regulatory standards. The macroscopic atmosphere plays a critical role in the loudness of sonic booms. The extensive sonic boom simulation study presented herein was completed to assess climatological, geographical, and seasonal effects on the variability of the X-59’s low-boom loudness and noise exposure region size in order to inform X-59 community test planning. The loudness and extent of the noise exposure region make up the “sonic boom carpet.” Two spatial and temporal resolutions of atmospheric input data to the simulation were investigated. A Fast Flexible Space-Filling Design was used to select the locations across the USA for the two spatial resolutions. Analysis of simulated X-59 low-boom loudness data within a regional subset of the northeast USA was completed using a bootstrap forest to determine the final spatial and temporal resolution of the countrywide simulation study. Atmospheric profiles from NOAA’s Climate Forecast System Version 2 database were used to generate over one million simulated X-59 carpets at the final selected 138 locations across the USA. Effects of aircraft heading, season, geography, and climate zone on low-boom levels and noise exposure region size were analyzed. Models were developed to estimate loudness metrics throughout the USA for X-59 supersonic cruise overflight, and results were visualized on maps to show geographical and seasonal trends. These results inform regulators and mission planners on expected variations in boom levels and carpet extent from atmospheric variations. Understanding potential carpet variability is important when planning community noise surveys using the X-59. | Add to Speakers |
18 | Andrew Farina Assistant Professor- Department of Behavioral Sciences and Leadership,United States Military Academy |
Breakout No Publish |
Using the R ecosystem to produce a reproducible data analysis pipeline | Data Management and Reproducible Research | Advances in open-source software have brought powerful machine learning and data analysis tools Read More requiring little more than a few coding basics. Unfortunately, the very nature of rapidly changing software can contribute to legitimate concerns surrounding the reproducibility of research and analysis. Borrowing from current practices in data science and software engineering fields, a more robust process using the R ecosystem to produce a version-controlled data analysis pipeline is proposed. By integrating the data cleaning, model generation, manuscript writing, and presentation scripts, a researcher or data analyst can ensure small changes at any step will automatically be reflected throughout using the Rmarkdown, targets, renv, and xaringan R packages. | Add to Speakers |
19 | Joseph Fabritius and Kyle Remley Research Staff Member/Research Staff Member,Institute for Defense Analyses/Institute for Defense Analyses |
Breakout Publish |
Taming the beast: making questions about the supply system tractable by quantifying risk | Analysis Tools and Techniques | The DoD sustainment system is responsible for managing the supply of millions of different spare Read More parts, most of which are infrequently and inconsistently requisitioned, and many of which have procurement lead times measured in years. The DoD must generally buy items in anticipation of need, yet it simply cannot afford to buy even one copy of every unique part it might be called upon to deliver. Deciding which items to purchase necessarily involves taking risks, both military and financial. However, the huge scale of the supply system makes these risks difficult to quantify. We have developed methods that use raw supply data in new ways to support this decision making process. First, we have created a method to identify areas of potential overinvestment that could safely be reallocated to areas at risk of underinvestment. Second, we have used raw requisition data to create an item priority list for individual weapon systems in terms of importance to mission success. Together, these methods allow DoD decision makers to make better-informed decisions about where to take risks and where to invest scarce resources. | Add to Speakers |
20 | William C. Schneck, III Research AST,NASA LaRC |
Breakout No Publish |
Enabling Enhanced Validation of NDE Computational Models and Simulations | Data Management and Reproducible Research | Enabling Enhanced Validation of NDE Computational Models and Simulations William C. Schneck, III, Read More Ph.D. Elizabeth D. Gregory, Ph.D. NASA Langley Research Center Computer simulations of physical processes are increasingly used in the development, design, deployment, and life-cycle maintenance of many engineering systems [1] [2]. Non-Destructive Evaluation (NDE) and Structural Health Monitoring (SHM) must employ effective methods to inspect increasingly complex structural and material systems developed for new aerospace systems. Reliably and comprehensively interrogating this multidimensional [3] problem domain from a purely experimental perspective can become cost and time prohibitive. The emerging way to confront these new complexities in a timely and cost-effective manner is to utilize computer simulations. These simulations must be Verified and Validated [4] [5] to assure reliable use for these NDE/SHM applications. Beyond the classical use of models for engineering applications for equipment or system design efforts, NDE/SHM are necessarily applied to as-built and as-used equipment. While most structural or CFD models are applied to ascertain performance of as-designed systems, the performance of an NDE/SHM system is necessarily tied to the indications of damage/defects/deviations (collectively, flaws) within as-built and as-used structures and components. Therefore, the models must have sufficient fidelity to determine the influence of these aberrations on the measurements collected during interrogation. To assess the accuracy of these models, the Validation data sets must adequately encompass these flaw states. Due to the extensive parametric spaces that this coverage would entail, this talk proposes an NDE Benchmark Validation Data Repository, which should contain inspection data covering representative structures and flaws. This data can be reused from project to project, amortizing the cost of performing high quality Validation testing. Works Cited [1] Director, Modeling and Simulation Coordination Office, “Department of Defense Standard Practice: Documentation of Verification, Validation, and Accredation (VV&A) for Models and Simulations,” Department of Defense, 2008. [2] Under Secretary of Defense (Acquisition, Technology and Logistics), “DoD Modeling and Simulation (M&S) Verification, Validation, and Accredation (VV&A),” Department of Defense, 2003. [3] R. C. Martin, Clean Architecture: A Craftsman’s Guide to Software Structure and Design, Boston: Prentice Hall, 2018. [4] C. J. Roy and W. L. Oberkampf, “A Complete Framework for Verification, Validation, and Uncertainty Quantification in Scientific Computing (Invited),” in 48th AIAA Aerospace Sciences Meeting, Orlando, 2010. [5] ASME Performance Test Code Committee 60, “Guide for Verification and Validation in Computational Solid Mechanics,” ASME International, New York, 2016. | Add to Speakers |
21 | Greg Hunt Assistant Professor,William & Mary |
Breakout No Publish |
Everyday Reproducibility | Data Management and Reproducible Research | Modern data analysis is typically quite computational. Correspondingly, sharing scientific and Read More statistical work now often means sharing code and data in addition writing papers and giving talks. This type of code sharing faces several challenges. For example, it is often difficult to take code from one computer and run it on another due to software configuration, version, and dependency issues. Even if the code runs, writing code that is easy to understand or interact with can be difficult. This makes it difficult to assess third-party code and its findings. In this talk we describe a combination of two computing technologies that help make analyses shareable, interactive, and completely reproducible. These technologies are (1) analysis containerization, which leverages virtualization to fully encapsulate analysis, data, code and dependencies into an interactive and shareable format, and (2) code notebooks, a literate programming format for interacting with analyses. This talks reviews both the problems at the high-level and also provides concrete solutions to the challenges faced. | Add to Speakers |
22 | Jonathan Rathsam Technical Lead,NASA Langley Research Center |
Breakout No Publish |
An Overview of NASA’s Low Boom Flight Demonstration | Test and Evaluation Methods for Emerging Technology | NASA will soon begin a series of tests that will collect nationally representative data on how Read More people perceive low noise supersonic overflights. For half a century, civilian aircraft have been required to fly slower than the speed of sound over land to prevent “creating an unacceptable situation” on the ground due to sonic booms. However, new aircraft shaping techniques have led to dramatic changes in how shockwaves from supersonic flight merge together as they travel to the ground. What used to sound like a boom on the ground will be transformed into a thump. NASA is now building a full-scale, piloted demonstration aircraft called the X-59 to demonstrate low noise supersonic flight. In 2024, the X-59 aircraft will commence a national series of community overflight tests to collect data on how people perceive “sonic thumps.” The community response data will be provided to national and international noise regulators as they consider creating new standards that allow supersonic flight over land at acceptably low noise levels. | Add to Speakers |
23 | Jay Wilkins Research Staff Member,Institute for Defense Analyses |
Shortcourse Publish |
Topological Modeling of Human-Machine Teams | Special Topics | A Human-Machine Team (HMT) is a group of agents consisting of at least one human and at least one Read More machine, all functioning collaboratively towards one or more common objectives. As industry and defense find more helpful, creative, and difficult applications of AI-driven technology, the need to effectively and accurately model, simulate, test, and evaluate HMTs will continue to grow and become even more essential. Going along with that growing need, new methods are required to evaluate whether a human-machine team is performing effectively as a team in testing and evaluation scenarios. You cannot predict team performance from knowledge of the individual team agents, alone; interaction between the humans and machines – and interaction between team agents, in general – increases the problem space and adds a measure of unpredictability. Collective team or group performance, in turn, depends heavily on how a team is structured and organized, as well as the mechanisms, paths, and substructures through which the agents in the team interact with one another – i.e. the team’s topology. With the tools and metrics for measuring team structure and interaction becoming more highly developed in recent years, we will propose and discuss a practical, topological HMT modeling framework that not only takes into account but is actually built around the team’s topological characteristics, while still utilizing the individual human and machine performance measures. | Add to Speakers |
24 | Newton Campbell Senior Computer Scientist,NASA GSFC/SAIC |
Breakout Publish |
Developing EnDEVR: An Environment for Data Engineering in Virtual Reality | Special Topics | The US Government is going through a massive Digital Transformation effort. The US Digital Service Read More Playbook is guiding all agencies to make their processes data-driven with the help of emerging technologies. As a result, organizations across every government sector are integrating new digital technologies into existing processes to improve how they operate and deliver value to customers. However, when integrating artificial intelligence services, the investment of time and effort in obtaining product licenses, support, training, and other product-specific services can unwittingly bind agencies to specific product vendors. This concept is known as “vendor lock-in.” And it limits government organizations in the data science operations they can perform and who they can perform it with. The NASA Langley Research Center Digital Transformation Group has developed the Environment for Data Engineering in Virtual Reality (EnDEVR) to address this concern. EnDEVR is an applied mathematics and data science ecosystem that allows users to command and investigate customizable data analyses from a VR environment. EnDEVR permits a user to load and quickly preprocess data, develop a data processing and analysis pipeline from a VR environment, and kickoff computation of that pipeline on a back-end computing server. In addition, it allows thorough exploration of the analysis results from within VR. EnDEVR users construct pipelines using algorithm widgets provided by a crowdsourced algorithm database. The NASA user community submits science, engineering, data processing, and machine learning algorithm code that EnDEVR automatically converts to widgets. As a result, the system quickly adopts the functionality of new open-source or proprietary data analysis libraries and promotes collaboration across NASA Centers. When coupled with an ability to share custom algorithm code, EnDEVR creates a separation layer between specific vendor implementations and an intended function, permitting users to quickly adopt new APIs and services. Several NASA teams have prototyped use cases in the EnDEVR system, validated in the Oculus Rift S and Quest environments and supported by NASA High-End Computing (HEC). The EnDEVR team has developed several AI capabilities to guide research and automation within the environment and to facilitate algorithm sharing across NASA. Based on the recent report “NASA Framework for the Ethical Use of Artificial Intelligence,” the team has investigated ethical considerations that will impact the system’s functionality and effectiveness. Key features and considerations that the EnDEVR system has addressed and will addressed in future iterations are: ● Concept of Operations – What are the goals of developing the EnDEVR system? How do we deploy and continue development after deployment? Where and when AI is involved? ● Democratization of Artificial Intelligence – What are the pros and cons of democratizing AI in this way, within our organization? How do we establish policies for democratization? ● Accountability – What does accountability look like in this framework? What would an accountable human need to know before developing or using algorithms in EnDEVR? ● Future Development – What processes will future developers follow to ensure ethical design and analysis of the EnDEVR system? We will discuss EnDEVR features and considerations during this breakout session. | Add to Speakers |
25 | Akshay Jain Data Science Fellow,Institute for Defense Analyses |
Breakout No Publish |
Forecasting with Machine Learning | Analysis Tools and Techniques | The Department of Defense (DoD) has a considerable interest in forecasting key quantities of Read More interest including demand signals, personnel flows, and equipment failure. Many forecasting tools exist to aid in predicting future outcomes, and there are many methods to evaluate the quality and uncertainty in those forecasts. When used appropriately, these methods can facilitate planning and lead to dramatic reductions in costs. This talk explores the application of machine learning algorithms, specifically gradient-boosted tree models, to forecasting and presents some of the various advantages and pitfalls of this approach. We conclude with an example where we use gradient-boosted trees to forecast Air National Guard personnel retention. | Add to Speakers |
26 | Thomas A Donnelly Principal Systems Engineer,JMP Statistical Discovery LLC |
Breakout No Publish |
Using Sensor Stream Data as Both an Input and Output in a Functional Data Analysis | Analysis Tools and Techniques | A case study will be presented where patients wearing continuous glycemic monitoring systems provide Read More sensor stream data of their glucose levels before and after consuming 1 of 5 different types of snacks. The goal is to be able to better predict a new patient’s glycemic-response-over-time trace after being given a particular type of snack. Functional Data Analysis (FDA) is used to extract eigenfunctions that capture the longitudinal shape information of the traces and principal component scores that capture the patient-to-patient variation. FDA is used twice. First it is used on the “before” baseline glycemic-response-over-time traces. Then a separate analysis is done on the snack-induced “after” response traces. The before FPC scores and the type of snack are then used to model the after FPC scores. This final FDA model will then be used to predict the glycemic response of new patients given a particular snack and their existing baseline response history. Although the case study is for medical sensor data, the methodology employed would work for any sensor stream where an event perturbs the system thus affecting the shape of the sensor stream post event. | Add to Speakers |
27 | Mikhail Smirnov Research Staff Member,Institute for Defense Analyses |
Breakout Publish |
Structural Dynamic Programming Methods for DOD Research | Analysis Tools and Techniques | Structural dynamic programming models are a powerful tool to help guide policy under uncertainty. By Read More creating a mathematical representation of the intertemporal optimization problem of interest, these models can answer questions that static models cannot address. Applications can be found from military personnel policy (how does future compensation affect retention now?) to inventory management (how many aircraft are needed to meet readiness objectives?). Recent advances in statistical methods and computational algorithms allow us to develop dynamic programming models of complex real-world problems that were previously too difficult to solve. | Add to Speakers |
28 | Dr. Michelle Rodio HPC Developer Relations,Next Silicon Inc. |
Breakout No Publish |
Lowering the barrier to emerging technologies | Test and Evaluation Methods for Emerging Technology | An old adage says “if you judge a fish by its ability to climb a tree, it will live its whole life Read More believing that it is stupid.” The same concept can apply to domain scientists. When put into a situation where they need to quickly become experts in emerging technologies with steep learning curves, their confidence is challenged. Moreover, new technologies typically affect their scientific output when they end up spending more time testing, evaluating, and implementing new technology than focusing on the science they are trying to explore. The designers of emerging and revolutionary technologies need to consider the typical high barriers to installing, learning, and using new state-of-the-art tech and the impact this has on productivity and user experience. New technologies tend to focus on very specific problems (e.g., new chips for artificial intelligence or development of quantum computing), rather than being broadly applicable across scientific domains. To integrate specialized technologies, domain scientists are forced to learn new ecosystems before they can become effective. For example, new computing technologies might require new and proprietary programming languages and stringent demands for porting applications to the new ecosystems. This distracts domain experts from researching their science effectively. Imagine a world where scientists can go back to focusing solely on their fundamental research – where there is no need to spend time learning new languages and hardware ecosystems that may become obsolete anyway. How can new technology work for scientists to improve their performance and research outcomes, rather than against them? Makers of emerging technologies can encourage deployment and adoption of their innovations by giving more attention during design, testing, and evaluation to broad applicability, compatibility with existing systems, and ease of installation and operation, as well as cybersecurity issues. | Add to Speakers |
29 | Alan B. Gelder Research Staff Member,Institute for Defense Analyses |
Breakout No Publish |
Legal, Moral, and Ethical Implications of Machine Learning | Analysis Tools and Techniques | Machine learning algorithms can help to distill vast quantities of information to support decision Read More making. However, machine learning also presents unique legal, moral, and ethical concerns – ranging from potential discrimination in personnel applications to misclassifying targets on the battlefield. Building on foundational principles in ethical philosophy, this presentation summarizes key legal, moral, and ethical criteria applicable to machine learning and provides pragmatic considerations and recommendations. | Add to Speakers |
30 | Brittany Fischer PhD Candidate,Arizona State University |
Breakout Publish |
Optimal Designs for Multiple Response Distributions | Design of Experiments | Designed experiments can be a powerful tool for gaining fundamental understanding of systems and Read More processes or maintaining or optimizing systems and processes. There are usually multiple performance and quality metrics that are of interest in an experiment, and these multiple responses may include data from nonnormal distributions, such as binary or count data. A design that is optimal for a normal response can be very different from a design that is optimal for a nonnormal response. This work includes a two-phase method that helps experimenters identify a hybrid design for a multiple response problem. Mixture and optimal design methods are used with a weighted optimality criterion for a three-response problem that includes a normal, a binary, and a Poisson model, but could be generalized to an arbitrary number and combination of responses belonging to the exponential family. A mixture design is utilized to identify the optimal weights in the criterion presented. | Add to Speakers |
31 | Brian Woolley T&E Enclave Manager,Joint Artificial Intelligence Center |
Breakout No Publish |
Test and Evaluation Framework for AI Enabled Systems | Test and Evaluation Methods for Emerging Technology | In the current moment, autonomous and artificial intelligence (AI) systems are emerging at a Read More dizzying pace. Such systems promise to expand the capacity and capability of individuals by delegating increasing levels of decision making down to the agent-level. In this way, operators can set high-level objectives for multiple vehicles or agents and need only intervene when alerted to anomalous conditions. Test and evaluation efforts at the Join AI Center are focused on exercising a prescribed test strategy for AI-enabled systems. This new AI T&E Framework recognizes the inherent complexity that follows from incorporating dynamic decision makers into a system (or into a system-of-systems). The AI T&E Framework is composed of four high-level types of testing that examine at an AI-enabled system from different angles to provide as complete a picture as possible of the system’s capabilities and limitations, including algorithmic, system integration, human-system integration, and operational tests. These testing categories provides stakeholders with appropriate qualitative and quantitative assessments that bound the system’s use cases in a meaningful way. The algorithmic tests characterize the AI models themselves against metrics for effectiveness, security, robustness and responsible AI principles. The system integration tests the system itself to ensure it operates reliably, functions correctly, and is compatible with other components. The human-machine testing asks what do human operators think of the system, if they understand what the system is telling them, and if they trust the system under appropriate conditions. All of which culminates in an operational test that evaluates how the system performs in a realistic environment with realistic scenarios and adversaries. Interestingly, counter to traditional approaches, this framework is best applied during and throughout the development of an AI-enabled system. Our experience is that programs that conduct independent T&E alongside development do not suffer delays, but instead benefit from the feedback and insights gained from incremental and iterative testing, which leads to the delivery of a better overall capability. | Add to Speakers |
32 | Anna Rubinstein Director of Test and Evaluation,MORSE Corporation |
Breakout Publish |
Test & Evaluation of ML Models | Test and Evaluation Methods for Emerging Technology | Machine Learning models have been incredibly impactful over the past decade; however, testing those Read More models and comparing their performance has remained challenging and complex. In this presentation, I will demonstrate novel methods for measuring the performance of computer vision object detection models, including running those models against still imagery and video. The presentation will start with an introduction to the pros and cons of various metrics, including traditional metrics like precision, recall, average precision, mean average precision, F1, and F-beta. The talk will then discuss more complex topics such as tracking metrics, handling multiple object classes, visualizing multi-dimensional metrics, and linking metrics to operational impact. Anecdotes will be shared discussing different types of metrics that are appropriate for different types of stakeholders, how system testing fits in, best practices for model integration, best practices for data splitting, and cloud vs on-prem compute lessons learned. The presentation will conclude by discussing what software libraries are available to calculate these metrics, including the MORSE-developed library Charybdis. | Add to Speakers |
33 | Matthew Avery Assistant Director, Operational Evaluation Division,Institute for Defense Analyses |
Breakout Publish |
Evaluating and Evolving Data Management | Data Management and Reproducible Research | Improving data management within organizations and projects requires first understanding the status Read More quo, developing policies and best practices for staff to follow, and using appropriate tools to measure progress. Over the past three years, IDA has conducted internal assessments, crafted policy, and developed data management assessment tools. IDA’s Data Management Survey and Data Management Maturity Assessment provided a baseline understanding of IDA’s data posture. They identified pockets of excellence within IDA as well as areas where future investments could have the broadest impact. IDA also developed best practice documents and internal policies, driven initially by project teams, Divisions, and cross-divisional working groups. These efforts culminated in centralized policy and guidance outlining data management requirements throughout the data lifecycle. Continuing to evolve and improve data management requires measurement. IDA developed the Data Organization and Lifecycle Planning Rating (DOPLR) tool to assess data management at the program level across the full data lifecycle. This tool will help IDA understand the current state of data management maturity for all of its projects, set goals for improvement, and track progress towards those goals. | Add to Speakers |
34 | Zachary Szlendak Research Staff Member,Institute for Defense Analyses |
Breakout Publish |
The Mental Health Impact of Local COVID-19 Cases Prior to the Mass Availability of Vaccine | Special Topics | During the COVID-19 pandemic the majority of Americans experienced many new mental health stressors, Read More including isolation, economic instability, fear of exposure to COVID-19, and the effects of themselves or loved ones catching COVID-19. Service members, veterans, and their families experienced these stressors differently from the general public. In this seminar we examine how local COVID-19 case counts affected mental health outcomes prior to the mass availability of vaccines. We show that households we identify as likely military households and TRICARE and Military health system beneficiaries reported higher mental health quality than their general population peers, but VA beneficiaries do not. We find local case counts are an important factor in determining demographic groups reporting drops in mental health during the pandemic. | Add to Speakers |
35 | Andrew C. Flack, Han G. Yi Research Staff Member,IDA (OED) |
Breakout Publish |
M&S approach for quantifying readiness impact of sustainment investment scenarios | Special Topics | Sustainment for weapon systems involves multiple components that influence readiness outcomes Read More through a complex array of interactions. While military leadership can use simple analytical approaches to yield insights into current metrics (e.g., dashboard for top downtime drivers) or historical trends of a given sustainment structure (e.g., correlative studies between stock sizes and backorders), they are inadequate tools for guiding decision-making due to their inability to quantify the impact on readiness. In this talk, we discuss the power of IDA’s end-to-end modeling and simulation (M&S) approach that estimates time-varying readiness outcomes based on real-world data on operations, supply, and maintenance. These models are designed to faithfully emulate fleet operations at the level of individual components and operational units, as well as to incorporate the multi-echelon inventory system used in military sustainment. We showcase a notional example in which our M&S approach produces a set of recommended component-level investments and divestments in wholesale supply that would improve the readiness of a weapon system. We argue for the urgency of increased end-to-end M&S efforts across the Department of Defense to guide the senior leadership in its data-driven decision-making for readiness initiatives. | Add to Speakers |
36 | Benjamin Ashwell Research Staff Member,Institute for Defense Analyses |
Breakout No Publish |
From Gripe to Flight: Building an End-to-End Picture of DOD Sustainment | Data Management and Reproducible Research | The DOD has to maintain readiness across a staggeringly diverse array of modern weapon systems, yet Read More no single person or organization in the DOD has an end-to-end picture of the sustainment system that supports them. This shortcoming can lead to bad decisions when it comes to allocating resources in a funding-constrained environment. The underlying problem is driven by stovepiped databases, a reluctance to share data even internally, and a reliance on tribal knowledge of often cryptic data sources. Notwithstanding these difficulties, we need to create a comprehensive picture of the sustainment system to be able to answer pressing questions from DOD leaders. To that end, we have created a documented and reproducible workflow that shepherds raw data from DOD databases through cleaning and curation steps, and then applies logical rules, filters, and assumptions to transform the raw data into concrete values and useful metrics. This process gives us accurate, up-to-date data that we use to support quick-turn studies, and to rapidly build (and efficiently maintain) a suite of readiness models for a wide range of complex weapon systems. | Add to Speakers |
37 | Mark Herrera Research Staff Member,Institute for Defense Analyses |
Breakout Publish |
Adversaries and Airwaves – Compromising Wireless and Radio Frequency Communications | Special Topics | Wireless and radio frequency (RF) technology are ubiquitous in our daily lives, including laptops, Read More key fobs, remote sensors, and antennas. These devices, while oftentimes portable and convenient, can potentially be susceptible to adversarial attack over the air. This breakout session will provide a short introduction into wireless hacking concepts such as passive scanning, active injection, and the use of software defined radios to flexibly sample the RF spectrum. We will also ground these concepts in live demonstrations of attacks against both wireless and wired systems. | Add to Speakers |
38 | V. Bram Lillard and Megan L. Gelsinger Deputy Director, OED (VBL) and Research Staff Member (MLG),Institute for Defense Analyses (IDA) |
Breakout Publish |
An Introduction to Sustainment: The Importance and Challenges of Analyzing System Readines | Special Topics | The Department of Defense (DoD) spends the majority of its annual budget on making sure that systems Read More are ready to perform when called to action. Even with large investments, though, maintaining adequate system readiness poses a major challenge for the DoD. Here, we discuss why readiness is so difficult to maintain and introduce the tools IDA has developed to aid readiness and supply chain analysis and decision-making. Particular emphasis is placed on “honeybee,” the tool developed to clean, assemble, and mine data across a variety of sources in a well-documented and reproducible way. Using a notional example, we demonstrate the utility of this tool and others like it in our suite; these tools lower the barrier to performing meaningful analysis, constructing and estimating input data for readiness models, and aiding the DoD’s ability to tie resources to readiness outcomes. | Add to Speakers |
39 | Curtis Miller Research Staff Member,Institute for Defense Analyses |
Shortcourse Publish |
Introducing git for reproducible research | Data Management and Reproducible Research | Version control software manages different versions of files, providing both an archive of files, a Read More means to manage multiple versions of a file, and perhaps distribution. Perhaps the most popular program in the computer science community for version control is git, which serves as the backbone for websites such as Github, Bitbucket, and others. In this mini-tutorial we will introduce basics of version control in general, git in particular. We explain what role git plays in a reproducible research context. The goal of the course is to get participants started using git. We will create and clone repositories, add and track files in a repository, and manage git branches. We also discuss a few git best practices. | Add to Speakers |
40 | Kelsey Cannon Materials Engineer,Lockheed Martin |
Tutorial No Publish |
STAT and UQ Implementation Lessons Learned | Special Topics | David Harrison and Kelsey Cannon from Lockheed Martin Space will present on STAT and UQ Read More implementation lessons learned within Lockheed Martin. Faced with training 60,000 engineers in statistics, David and Kelsey formed a plan to make STAT and UQ processes the standard at Lockheed Martin. The presentation includes a range of information from initial communications plan, to obtaining leader adoption, to training engineers across the corporation. Not all programs initially accepted this process, but implementation lessons have been learned over time as many compounding successes and savings have been recorded. ©2022 Lockheed Martin, all rights reserved | Add to Speakers |
41 | Dominik Alder Project Management & Planning Operations Rep Senior Staff,Program Management |
Breakout No Publish |
Applications for Monte Carlo Analysis within Job Shop Planning | Analysis Tools and Techniques | Summary overview of Discrete Event Simulations (DES) for optimizing scheduling operations in a high Read More mix, low volume, job shop environment. The DES model employs Monte Carlo simulation to minimize schedule conflicts and prioritize work, while taking into account competition for limited resources. Iterative simulation balancing to dampen model results and arrive at a globally optimized schedule plan will be contrasted with traditional deterministic scheduling methodologies. | Add to Speakers |
42 | John Haman RSM,Inst. for Defense Analyses |
Breakout |
What statisticians should do to improve M&S validation studies | Special Topics | It is often said that many research findings — from social sciences, medicine, economics, and other Read More disciplines — are false. This fact is trumpeted in the media and by many statisticians. There are several reasons that false research is published, but to what extent should we be worried about them in defense testing and in particular modeling and simulation validation studies? In this talk I will present several recommendations for actions that statisticians and data scientists can take to improve the quality of our validations and evaluations. | Add to Speakers |
43 | Brian Vickers Research Staff Member,IDA |
Breakout |
Measuring training efficacy: Structural validation of the Operational Assessment of Traini | Analysis Tools and Techniques | Effective training of the broad set of users/operators of systems has downstream impacts on Read More usability, workload, and ultimate system performance that are related to mission success. In order to measure training effectiveness, we designed a survey called the Operational Assessment of Training Scale (OATS) in partnership with the Army Test and Evaluation Center (ATEC). Two subscales were designed to assess the degrees to which training covered relevant content for real operations (Relevance subscale) and enabled self-rated ability to interact with systems effectively after training (Efficacy subscale). The full list of 15 items were given to over 700 users/operators across a range of military systems and test events (comprising both developmental and operational testing phases). Systems included vehicles, aircraft, C3 systems, and dismounted squad equipment, among other types. We evaluated reliability of the factor structure across these military samples using confirmatory factor analysis. We confirmed that OATS exhibited a two-factor structure for training relevance and training efficacy. Additionally, a shortened, six-item measure of the OATS with three items per subscale continues to fit observed data well, allowing for quicker assessments of training. We discuss various ways that the OATS can be applied to one-off, multi-day, multi-event, and other types of training events. Additional OATS details and information about other scales for test and evaluation are available at the Institute for Defense Analyses’ web site, https://testscience.org/validated-scales-repository/. | Add to Speakers |
44 | Roshan Patel Systems Engineer/Data Scientist,US Army |
Shortcourse Publish |
Data Integrity For Deep Learning Models | Data Management and Reproducible Research | Deep learning models are built from algorithm frameworks that fit parameters over a large set of Read More structured historical examples. Model robustness relies heavily on the accuracy and quality of the input training datasets. This mini-tutorial seeks to explore the practical implications of data quality issues when attempting to build reliable and accurate deep learning models. The tutorial will review the basics of neural networks, model building, and then dive deep into examples and data quality considerations using practical examples. An understanding of data integrity and data quality is pivotal for verification and validation of deep learning models, and this tutorial will provide students with a foundation of this topic. | Add to Speakers |
45 | Derek Young Associate Professor of Statistics,University of Kentucky |
Breakout Publish |
Computing Statistical Tolerance Regions Using the R Package ‘tolerance’ | Analysis Tools and Techniques | Statistical tolerance intervals of the form (1−α, P) provide bounds to capture at least a specified Read More proportion P of the sampled population with a given confidence level 1−α. The quantity P is called the content of the tolerance interval and the confidence level 1−α reflects the sampling variability. Statistical tolerance intervals are ubiquitous in regulatory documents, especially regarding design verification and process validation. Examples of such regulations are those published by the Food and Drug Administration (FDA), the Environmental Protection Agency (EPA), the International Atomic Energy Agency (IAEA), and the standard 16269-6 of the International Organization for Standardization (ISO). Research and development in the area of statistical tolerance intervals has undoubtedly been guided by the needs and demands of industry experts. Some of the broad applications of tolerance intervals include their use in quality control of drug products, setting process validation acceptance criteria, establishing sample sizes for process validation, assessing biosimilarity, and establishing statistically-based design limits. While tolerance intervals are available for numerous parametric distributions, procedures are also available for regression models, mixed-effects models, and multivariate settings (i.e., tolerance regions). Alternatively, nonparametric procedures can be employed when assumptions of a particular parametric model are not met. Tools for computing such tolerance intervals and regions are a necessity for researchers and practitioners alike. This was the motivation for designing the R package ‘tolerance,’ which not only has the capability of computing a wide range of tolerance intervals and regions for both standard and non-standard settings, but also includes some supplementary visualization tools. This session will provide a high-level introduction to the ‘tolerance’ package and its many features. Relevant data examples will be integrated with the computing demonstration, and specifically designed to engage researchers and practitioners from industry and government. A recently-launched Shiny app corresponding to the package will also be highlighted. | Add to Speakers |
46 | Tyler Cody Research Assistant Professor,Virginia Tech National Security Institute |
Breakout Publish |
A Systems Perspective on Bringing Reliability and Prognostics to Machine Learning | Test and Evaluation Methods for Emerging Technology | Machine learning is being deployed into the real-world, yet the body of knowledge on testing, Read More evaluating, and maintaining machine learning models is overwhelmingly centered on component-level analysis. But, machine learning and engineered systems are tightly coupled. This is evidenced by extreme sensitivity to of ML to changes in system structure and behavior. Thus, reliability, prognostics, and other efforts related to test and evaluation for ML cannot be divorced from the system. That is, machine learning and its system go hand-in-hand. Any other way makes an unjustified assumption about the existence of an independent variable. This talk explores foundational reasons for this phenomena, and the foundational challenges it poses to existing practice. Cases in machine health monitoring and in cyber defense are used to motivate the position that machine learning is not independent of physical changes to the system with which it interacts, and ML is not independent of the adversaries it defends against. By acknowledging these couplings, systems and mission engineers can better align test and evaluation practices with the fundamental character of ML. | Add to Speakers |
47 | David Tate Senior Analyst,Institute for Defense Analyses |
Breakout No Publish |
Let’s stop talking about “transparency” with regard to AI | Test and Evaluation Methods for Emerging Technology | For AI-enabled and autonomous systems, issues of safety, security, and mission effectiveness are not Read More separable—the same underlying data and software give rise to interrelated risks in all of these dimensions. If treated separately, there is considerable unnecessary duplication (and sometimes mutual interference) among efforts needed to satisfy commanders, operators, and certification authorities of the systems’ dependability. Assurances cases, pioneered within the safety and cybersecurity communities, provide a structured approach to simultaneously verifying all dimensions of system dependability with minimal redundancy of effort. In doing so, they also provide a more concrete and useful framework for system development and explanation of behavior than is generally seen in discussions of “transparency” and “trust” in AI and autonomy. Importantly, trust generally cannot be “built in” to systems, because the nature of the assurance arguments needed for various stakeholders requires iterative identification of evidence structures that cannot be anticipated by developers. | Add to Speakers |
48 | Sarah Burke Analyst,The Perduco Group |
Breakout No Publish |
Applications of Equivalence Testing in T&E | Analysis Tools and Techniques | Traditional hypothesis testing is used extensively in test and evaluation (T&E) to determine if Read More there is a difference between two or more populations. For example, we can analyze a designed experiment using t-tests to determine if a factor affects the response or not. Rejecting the null hypothesis would provide evidence that the factor changes the response value. However, there are many situations in T&E where the goal is to actually show that things didn’t change; the response is actually the same (or nearly the same) after some change in the process or system. If we use traditional hypothesis testing to assess this scenario, we would want to “fail to reject” the null hypothesis; however, this doesn’t actually provide evidence that the null hypothesis is true. Instead, we can orient the analysis to the decision that will be made and use equivalence testing. Equivalence testing initially assumes the populations are different; the alternative hypothesis is that they are the same. Rejecting the null hypothesis provides evidence that the populations are the same, matching the objective of the test. This talk provides an overview of equivalence testing with examples demonstrating its applicability in T&E. We also discuss additional considerations for planning a test where equivalence testing will be used including: sample size and what does “equivalent” really mean. | Add to Speakers |
49 | Jane E. Valentine Senior Biomedical Engineer,Johns Hopkins University Applied Physics Laboratory |
Breakout No Publish |
Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network f | Special Topics | Current disease outbreak surveillance practices reflect underlying delays in the detection and Read More reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis. | Add to Speakers |
50 | John Lipp LM Fellow,Systems Engineerng |
Breakout Publish |
Kernel Regression, Bernoulli Trial Responses, and Designed Experiments | Analysis Tools and Techniques | Boolean responses are common for both tangible and simulation experiments. Well known approaches to Read More fit models to Boolean responses include ordinary regression with normal approximations or variance stabilizing transforms, and logistic regression. Less well known is kernel regression. This session will present properties of kernel regression, its application to Bernoulli trial experiments, and other lessons learned from using kernel regression in the wild. Kernel regression is a non-parametric method. This requires modifications to many analyses, such as the required sample size. Unlike ordinary regression, the experiment design and model solution interact with each other. Consequently, the number of experiment samples for a desired modeling accuracy depends on the true state of nature. There has been trend in increasingly large simulation sample sizes as computing horsepower has grown. With kernel regression there is a point of diminishing return on sample sizes. That is, an experiment is better off with more data sites once a sufficient sample size is reached. Confidence interval accuracy is also dependent on the true state of nature. Parsimonious model tuning is required for accurate confidence intervals. Kernel tuning to build a parsimonious model using cross validation methods will be illustrated. | Add to Speakers |
51 | Christopher Gotwalt Chief Data Scientist,JMP Statistical Discovery LLC |
Breakout No Publish |
Safe Machine Learning Prediction and Optimization via Extrapolation Control | Analysis Tools and Techniques | Uncontrolled model extrapolation leads to two serious kinds of errors: (1) the model may be Read More completely invalid far from the data, and (2) the combinations of variable values may not be physically realizable. Optimizing models that are fit to observational data can lead to extrapolated solutions that are of no practical use without any warning. In this presentation we introduce a general approach to identifying extrapolation based on a regularized Hotelling T-squared metric. The metric is robust to certain kinds of messy data and can handle models with both continuous and categorical inputs. The extrapolation model is intended to be used in parallel with a machine learning model to identify when the machine learning model is being applied to data that are not close to that model training set or as a non-extrapolation constraint when optimizing the model. The methodology described was introduced into the JMP Pro 16 Profiler. | Add to Speakers |
52 | Jane E. Valentine Senior Biomedical Engineer,Johns Hopkins University Applied Physics Laboratory |
Breakout No Publish |
Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network f | Special Topics | Current disease outbreak surveillance practices reflect underlying delays in the detection and Read More reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis. | Add to Speakers |
53 | Jane E. Valentine Senior Biomedical Engineer,Johns Hopkins University Applied Physics Laboratory |
Breakout No Publish |
Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network f | Special Topics | Current disease outbreak surveillance practices reflect underlying delays in the detection and Read More reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis. | Add to Speakers |
54 | Jane E. Valentine Senior Biomedical Engineer,Johns Hopkins University Applied Physics Laboratory |
Breakout No Publish |
Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network f | Special Topics | Current disease outbreak surveillance practices reflect underlying delays in the detection and Read More reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis. | Add to Speakers |
55 | Laura Freeman Director, Intelligent Systems Division,Virginia Tech |
Breakout Publish |
Digital Transformation & Data – Keys to Evolving DOD Acquisition Processes | Data Management and Reproducible Research | Recent advances in data collection, curation, sharing—coupled with increased storage and processing Read More power—have fueled organizational digital transformations. They have enabled the application of technology advances such as data science, machine learning, and artificial intelligence. Also, the digitization of engineering practices—coupled with advances in data science—have the potential to revolutionize DoD acquisition, test and evaluation, and sustainment. In this talk I will share a framework for digital transformation synthesized from industry leaders in a digital transformation form. I will address how best practices in data management and reproducible research translate to improvements in our DOD processes. I will also capture the new challenges presented by digital transformation. | Add to Speakers |
56 | Rachel Haga Research Associate,Institute for Defense Analyses |
Breakout No Publish |
T&E of Responsible AI | Test and Evaluation Methods for Emerging Technology | Getting Responsible AI (RAI) right is difficult and demands expertise. All AI-relevant skill sets, Read More including ethics, are in high demand and short supply, especially regarding AI’s intersection with test and evaluation (T&E). Frameworks, guidance, and tools are needed to empower working-level personnel across DOD to generate RAI assurance cases with support from RAI SMEs. At a high level, framework should address the following points: 1. T&E is a necessary piece of the RAI puzzle–testing provides a feedback mechanism for system improvement and builds public and warfighter confidence in our systems, and RAI should be treated just like performance, reliability, and safety requirements. 2. We must intertwine T&E and RAI across the cradle-to-grave product life cycle. Programs must embrace T&E and RAI from inception; as development proceeds, these two streams must be integrated in tight feedback loops to ensure effective RAI implementation. Furthermore, many AI systems, along with their operating environments and use cases, will continue to update and evolve and thus will require continued evaluation after fielding. 3. The five DOD RAI principles are a necessary north star, but alone they are not enough to implement or ensure RAI. Programs will have to integrate multiple methodologies and sources of evidence to construct holistic arguments for how much the programs have reduced RAI risks. 4. RAI must be developed, tested, and evaluated in context–T&E without operationally relevant context will fail to ensure that fielded tools achieve RAI. Mission success depends on technology that must interact with warfighters and other systems in complex environments, while constrained by processes and regulation. AI systems will be especially sensitive to operational context and will force T&E to expand what it considers. | Add to Speakers |
57 | Paul Fanto Research Staff Member, System Evaluation Division,Institute for Defense Analyses |
Breakout Publish |
Method for Evaluating Bayesian Reliability Models for Developmental Testing | Analysis Tools and Techniques | For analysis of military Developmental Test (DT) data, frequentist statistical models are Read More increasingly challenged to meet the needs of analysts and decision-makers. Bayesian models have the potential to address this challenge. Although there is a substantial body of research on Bayesian reliability estimation, there appears to be a paucity of Bayesian applications to issues of direct interest to DT decision makers. To address this deficiency, this research accomplishes two tasks. First, this work provides a motivating example that analyzes reliability for a notional but representative system. Second, to enable the motivated analyst to apply Bayesian methods, it provides a foundation and best practices for Bayesian reliability analysis in DT. The first task is accomplished by applying Bayesian reliability assessment methods to notional DT lifetime data generated using a Bayesian reliability growth planning methodology (Wayne 2018). The tested system is assumed to be a generic complex system with a large number of failure modes. Starting from the Bayesian assessment methodology of (Wayne and Modarres, A Bayesian Model for Complex System Reliability 2015), this work explores the sensitivity of the Bayesian results to the choice of the prior distribution and compares the Bayesian results for the reliability point estimate and uncertainty interval with analogous results from traditional reliability assessment methods. The second task is accomplished by establishing a generic structure for systematically evaluating relevant statistical Bayesian models. It identifies what have been implicit reliability issues for DT programs using a structured poll of stakeholders combined with interviews of a selected set of Subject Matter Experts. Secondly, candidate solutions are identified in the literature. Thirdly, solutions matched to issues using criteria designed to evaluate the capability of a solution to improve support for decision-makers at critical points in DT programs. The matching process uses a model taxonomy structured according to decisions at each DT phase, plus criteria for model applicability and data availability. The end result is a generic structure that allows an analyst to identify and evaluate a specific model for use with a program and issue of interest. Wayne, Martin. 2018. “Modeling Uncertainty in Reliability Growth Plans.” 2018 Annual Reliability and Maintainability Symposium (RAMS). 1-6. Wayne, Martin, and Mohammad Modarres. 2015. “A Bayesian Model for Complex System Reliability.” IEEE Transactions on Reliability 64: 206-220. | Add to Speakers |
58 | Timothy Dawsno Lead Mobility Test Operations Analyst,AFOTEC Detachment 5 |
Keynote Publish |
Leveraging Data Science and Cloud Tools to Enable Continuous Reporting | Analysis Tools and Techniques | The DoD’s challenge to provide test results at the “Speed of Relevance” has generated many new Read More strategies to accelerate data collection, adjudication, and analysis. As a result, the Air Force Operational Test and Evaluation Center (AFOTEC), in conjunction with the Air Force Chief Data Office’s Visible, Accessible, Understandable, Linked and Trusted Data Platform (VAULT), is developing a Survey Application. This new cloud-based application will be deployable on any AFNET-connected computer or tablet and merges a variety of tools for collection, storage, analytics, and decision-making into one easy-to-use platform. By placing cloud-computing power in the hands of operators and testers, authorized users can view report-quality visuals and statistical analyses the moment a survey is submitted. Because the data is stored in the cloud, demanding computations such as machine learning are run at the data source to provide even more insight into both quantitative and qualitative metrics. The T-7A Red Hawk will be the first operational test (OT) program to utilize the Survey Application. Over 1000 flying and simulator test points have been loaded into the application, with many more coming from developmental test partners. The Survey app development will continue as USAF testing commences. Future efforts will focus on making the Survey Application configurable to other research and test programs to enhance their analytic and reporting capabilities. | Add to Speakers |
59 | Nicholas Ashby Student,United States Military Academy |
Breakout No Publish |
Combining data from scanners to inform cadet physical performance | Test and Evaluation Methods for Emerging Technology | Digital anthropometry obtained from 3D body scanners has already revolutionized the clothing and Read More fitness industries. Within seconds, these scanners collect hundreds of anthropometric measurements which are used by tailors to customize an article of clothing or by fitness trainers to track their client’s progress towards a goal. Three-dimensional body scanners have also been used in military applications, such as predicting injuries at Army basic training and checking a solder’s compliance with body composition standards. In response this increased demand, several 3D body scanners have become commercially available, each with a proprietary algorithm for measuring specific body parts. Individual scanners may suffice to collect measurements from a small population; however, they are not practical for use in creating large data sets necessary to train artificial intelligence (AI) or machine learning algorithms. This study fills the gap between these two applications by correlating body circumferences taken from a small population (n = 109) on three different body scanners and creating a standard scale for pooling data from the different scanners into one large AI ready data set. This data set is then leveraged in a separate application to understand the relationship between body shape and performance on the Army Combat Fitness Test (ACFT). | Add to Speakers |
60 | Eli Golden Statistician,US Army DEVCOM Armaments Center |
Breakout Publish |
Next Gen Breaching Technology: A Case Study in Deterministic Binary Response Emulation | Analysis Tools and Techniques | Combat Capabilities Development Command Armaments Center (DEVCOM AC) is developing the next Read More generation breaching munition, a replacement for the M58 Mine Clearing Line Charge. A series of M&S experiments were conducted to aid with the design of mine-neutralizing submunitions, utilizing space-filling designs, support vector machines, and hyper-parameter optimization. A probabilistic meta-model of the FEA-simulated performance data was generated with Platt Scaling in order to facilitate optimization, which was implemented to generate several candidate designs for follow-up live testing. This paper will detail the procedure used to iteratively explore and extract information from a deterministic process with a binary response. | Add to Speakers |
61 | Jane E. Valentine Senior Biomedical Engineer,Johns Hopkins University Applied Physics Laboratory |
Breakout No Publish |
Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network f | Special Topics | Current disease outbreak surveillance practices reflect underlying delays in the detection and Read More reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis. | Add to Speakers |
62 | Brendan Croom Postdoctoral Fellow,JHU Applied Physics Laboratory |
Breakout No Publish |
Deep learning aided inspection of additively manufactured metals | Test and Evaluation Methods for Emerging Technology | The performance and reliability of additively manufactured (AM) metals is limited by the ubiquitous Read More presence of void- and crack-like defects that form during processing. Many applications require non-destructive evaluation of AM metals to detect potentially critical flaws. To this end, we propose a deep learning approach that can help with the interpretation of inspection reports. Convolutional neural networks (CNN) are developed to predict the elastic stress fields in images of defect-containing metal microstructures, and therefore directly identify critical defects. A large dataset consisting of the stress response of 100,000 random microstructure images is generated using high-resolution Fast Fourier Transform-based finite element (FFT-FE) calculations, which is then used to train a modified U-Net style CNN model. The trained U-Net model more accurately predicted the stress response compared to previous CNN architectures, exceeded the accuracy of low-resolution FFT-FE calculations, and were evaluated more than 100 times faster than conventional FE techniques. The model was applied to images of real AM microstructures with severe lack of fusion defects, and predicted a strong linear increase of maximum stress as a function of pore fraction. This work shows that CNNs can aid the rapid and accurate inspection of defect-containing AM material. | Add to Speakers |
63 | Jane E. Valentine Senior Biomedical Engineer,Johns Hopkins University Applied Physics Laboratory |
Breakout No Publish |
Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network f | Special Topics | Current disease outbreak surveillance practices reflect underlying delays in the detection and Read More reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis. | Add to Speakers |
Contributed Abstracts
Total Contributed Abstracts: 28
Show Contributed Abstracts
# | Name / Org | Type | Abstract Title | Theme | Abstract | Add to Speakers |
---|---|---|---|---|---|---|
1 | Lauren H. Perry Senior Project Engineer,The Aerospace Corporation |
Presentation | Trust Throughout the Artificial Intelligence Lifecycle | Test & Evaluation Methods for Emerging Technology | AI and machine learning have become widespread throughout the defense, government, and commercial Read More sectors. This has led to increased attention on the topic of trust and the role it plays in successfully integrating AI into highconsequence environments where tolerance for risk is low. Driven by recent successes of AI algorithms in a range of applications, users and organizations rely on AI to provide new, faster, and more adaptive capabilities. However, along with those successes have come notable pitfalls, such as bias, vulnerability to adversarial attack, and inability to perform as expected in novel environments. Many types of AI are data-driven, meaning they operate on and learn their internal models directly from data. Therefore, tracking how data were used to build data properties (e.g., training, validation, and testing) is crucial not only to ensure a high-performing model, but also to understand if the AI should be trusted. MLOps, an offshoot of DevSecOps, is a set of best practices meant to standardize and streamline the end-to-end lifecycle of machine learning. In addition to supporting the software development and hardware requirements of AI-based systems, MLOps provides a scaffold by which the attributes of trust can be formally and methodically evaluated. Additionally, MLOps encourages reasoning about trust early and often in the development cycle. To this end, we present a framework that encourages the development of AI-based applications that can be trusted to operate as intended and function safely both with and without human interaction. This framework offers guidance for each phase of the AI lifecycle, utilizing MLOps, through a detailed discussion of pitfalls resulting from not considering trust, metrics for measuring attributes of trust, and mitigations strategies for when risk tolerance is low. | |
2 | Gregory J. Hunt Assistant Professor,William & Mary |
Presentation | Everday Reproducibility | Data Management / Reproducible Research | Modern data analysis is typically quite computational. Correspondingly, sharing scientific and Read More statistical work now often means sharing code and data in addition writing papers and giving talks. This type of code sharing faces several challenges. For example, it is often difficult to take code from one computer and run it on another due to software configuration, version, and dependency issues. Even if the code runs, writing code that is easy to understand or interact with can be difficult. This makes it difficult to assess third-party code and its findings, for example, in a review process. In this talk we describe a combination of two computing technologies that help make analyses shareable, interactive, and completely reproducible. These technologies are (1) analysis containerization, which leverages virtualization to fully encapsulate analysis, data, code and dependencies into an interactive and shareable format, and (2) code notebooks, a literate programming format for interacting with analyses. This talks reviews both the problems at the high-level and also provides concrete solutions to the challenges faced. In addition to discussing reproducibility and data/code sharing generally, we will touch upon several such issues that arise specifically in the defense and aerospace communities. | |
3 | William Raymond Whitledge Research Staff Member,IDA |
Poster Session | Analysis Apps for the Operational Tester | Analysis Tools and Techniques | In the acquisition and testing world, data analysts repeatedly encounter certain categories of data, Read More such as time or distance until an event (e.g., failure, alert, detection), binary outcomes (e.g., success/failure, hit/miss), and survey responses. Analysts need tools that enable them to produce quality and timely analyses of the data they acquire during testing. This poster presents four web-based apps that can analyze these types of data. The apps are designed to assist analysts and researchers with simple repeatable analysis tasks, such as building summary tables and plots for reports or briefings. Using software tools like these apps can increase reproducibility of results, timeliness of analysis and reporting, attractiveness and standardization of aesthetics in figures, and accuracy of results. The first app models reliability of a system or component by fitting parametric statistical distributions to time-to-failure data. The second app fits a logistic regression model to binary data with one or two independent continuous variables as predictors. The third calculates summary statistics and produces plots of groups of Likert-scale survey question responses. The fourth calculates the system usability scale (SUS) scores for SUS survey responses and enables the app user to plot scores versus an independent variable. These apps are available for public use on the Test Science Interactive Tools webpage https://new.testscience.org/interactive-tools/. | |
4 | Terril Hurst Senior Engineering Fellow,Raytheon Technologies |
Presentation | A Decision-Theoretic Framework for Adaptive Simulation Experiments | Design of Experiments | This paper describes a framework for increasing effectiveness of high-performance computing (HPC) to Read More support decision-making in the presence of uncertainty intrinsic to queries, models, and simulation results. Given a mathematically precise query, the framework adaptively chooses where to sample. Unlike conventionally designed simulation experiments, which specify beforehand where to sample, the framework optimally schedules sampling predicated upon four interconnected models: (a) the surrogate model, e.g., a continuous correlated beta process, which globally estimates the response using beta distributions; (b) a value model, e.g., mutual information, for estimating the benefit of candidate runs to answering the query; (c) a cost model for predicting time to execute candidate runs, possibly from multi-fidelity simulation options; and (d) a grid state model. Runs are chosen by maximizing information per cost. A Bayesian perspective is taken to formulate and update each of these models as simulation results arrive for use in the iterative run-selection and scheduling phase. For a precisely stated query, up to an 80 percent reduction in total runs has been observed. The paper illustrates use of the framework with simple examples. A simulation experiment is conducted to answer some question. In order to define precisely how informative a run is for answering the question, the answer must be defined as a random variable. This random variable is called a query and has the general form of p(theta | y), where theta is the query parameter and y is the available data. Example models employed in the framework are briefly described below: 1. The continuous correlated beta process model (CCBP) estimates the proportions of successes and failures using beta-distributed uncertainty at every point in the input space. It combines results using an exponentially decaying correlation function. The output of the CCBP is used to estimate value of a candidate run. 2. The mutual information model quantifies uncertainty in one random variable that is reduced by observing the other one. The model quantifies the mutual information between any candidate runs and the query , thereby scoring the value of running each candidate. 3. The cost model estimates how long future runs will take, based upon past runs using, e.g., a generalized linear model. A given simulation might have multiple fidelity options that require different run times. It may be desirable to balance information with the cost of a mixture of runs using these multi-fidelity options. 4. The grid state model, together with the mutual information model, are used to select the next collection of runs for optimal information per cost, accounting for current grid load. The framework has been applied to multiple use cases involving a variety of queries: (a) assessing compliance with a performance requirement, (b) sensitivity analysis to system input factors, (c) design optimization in the presence of uncertainty, and (d) calibration of simulations using field data, i.e. model verification and validation that includes quantifying uncertainty (VVUQ). The paper describes several aspects that emerge when applying the framework to each of these use cases. | |
5 | Neil Ashton WW Principal CFD Specialist Solution Architect, HPC,Amazon Web Services |
Presentation | Cloud Computing for Computational Fluid Dynamics (CFD) in T&E | Test & Evaluation Methods for Emerging Technology | In this talk we’ll focus on exploring the motivation for using cloud computing for Computational Read More Fluid Dynamics (CFD) for Federal Government Test & Evaluation. Using examples from automotive, aerospace and manufacturing we’ll look at benchmarks for a number of CFD codes using CPUs (x86 & Arm) and GPUs and we’ll look at how the development of high-fidelity CFD e.g. WMLES, HRLES, is accelerating the need for access to large scale HPC. The onset of COVID-19 has also meant a large increase in the need for remote visualization with greater numbers of researchers and engineering needing to work from home. This has also accelerated the adoption of the same approaches needed towards the pre- and post-processing of peta/exa-scale CFD simulation and we’ll look at how these are more easily accessed via a cloud infrastructure. Finally, we’ll explore perspectives on integrating ML/AI into CFD workflows using data lakes from a range of sources and where the next decade may take us. | |
6 | kazu okumoto CEO,Sakura Software Solutions (3S) LLC |
Presentation | A revolutionary approach to software reliability modeling for software quality assurance | Analysis Tools and Techniques | This tutorial introduces a revolutionary approach to software reliability modeling for software Read More quality assurance. It is an integration of traditional software reliability modeling with recent advances in computing science. Software reliability models have been based on a single mathematical curve, i.e., either an S-shaped or an exponential curve, to represent a defect detection process. We have developed an innovative method for automatically generating multiple curves to accurately represent a defect detection process by identifying inflection points. It’s a piece-wise application of well-known exponential statistical models based on a non-homogeneous Poisson process. It is a good representation of the complex nature of current software development and test processes. With the advancement in computing science the innovative algorithm has been implemented in Python to run in a cloud environment. This approach enables the analytics to run in a real-time environment and share the results with other project team members. And it becomes more practical for use. In addition to the defect detection process, we also address the importance of defect closure process. In practice, we need to allocate development resources to fix software defects or bugs. We have developed a method for predicting software defect closure curve based on the defect detection curve. The difference between the detection curve and the closure curve is called the defect open curve, which represents the number of defects to be fixed. It’s an important measure for software quality at a delivery date. By combining the detection and open curves the project management will be able to balance the development and test resource allocation for a given delivery date. Next, we address early defect prediction without actual defect data during a planning phase. For this purpose, we use development and test effort data which are good representation of software complexity. We then analyze the relationship between defects detected and effort spent from previous releases. Our approach is not only technically sound but also useful for real software development industries to deliver a high-quality software product. Key steps encompassing from concept and proof of concept (prototype & trials) to productization of the innovative algorithm have been demonstrated. We have developed an online tool, called STAR. Its presentation of the output is focused on visualization using charts and tables. It will help project managers to quantitatively strike a balance between software delivery deadlines and quality. STAR can be used for internal test defects or acceptance test defects. It also includes early defect prediction without actual data using planning data such as development and test effort data during a planning period. It’s applicable to various software projects ranging from small-scale to large-scale development. In addition, STAR interactively provides quality impact of corrective actions such as a delay of delivery date or additional development resources. Analytics are all based on zero-touch automation algorithm. User input is basically delivery dates and defect data plus planning data as an option. The tutorial will cover an online demonstration of STAR as time permits. | |
7 | Hyoshin Kim ,North Carolina State University |
Poster Session | Risk Comparison and Planning for Bayesian Assurance Tests | Design of Experiments | Designing a Bayesian assurance test plan requires choosing a test plan that guarantees a product of Read More interest is good enough to satisfy consumer’s criteria but not ‘so good’ that it causes producer’s concern if they fail the test. Bayesian assurance tests are especially useful because they can incorporate previous product information in the test planning and explicitly control levels of risk for the consumer and producer. We demonstrate an algorithm for efficiently computing a test plan given desired levels of risks in binomial and exponential testing. Numerical comparisons with the Operational Characteristic (OC) curve, Probability Ratio Sequential Test (PRST), and a simulation-based Bayesian sample size determination approach are also considered. | |
8 | kazuhira okumoto CEO,Sakura Software Solutions (3S) LLC |
Poster Session | Live demo of a revolutionary online tool for software quality assurance | Analysis Tools and Techniques | This poster session presents a revolutionary online tool, STAR, for software quality assurance. STAR Read More implements a world-leading prediction method of software defects with user friendly interface and visualization. Key measurements used as input are defect detected date and closed date with severity and impacted component for each defect during the internal test and customer deployment periods. It will automatically generate output such as how many more defects are expected by delivery, how many more defects are expected after deployment, how many defects are still open at delivery. It will enable software development companies to quantitatively strike a balance between delivery deadlines and quality. With a direct connection to customer’s fault database, it will automatically extract the data from the database and update prediction in real time. The results can be shared with other project members via the online access. A live demonstration will show a user input upload process and the interpretation of STAR output using demo data sets. The defect data for demonstration has been generated based on over 40 years of experience working with real projects. STAR can be used for internal test defects or acceptance test defects. It’s applicable to various software projects ranging from small-scale to large-scale development. It contains several output views with actual vs. prediction: Executive summary of current quality assessment and predicted quality metrics at delivery, defect arrival & closure trends, defects by severity & component, release over release view, prediction stability over time. STAR also includes early defect prediction without actual data using planning data such as development and test effort data during the planning period. In addition, it also provides quality impact of corrective actions interactively. Examples of corrective action are delay of delivery date and additional developers. Analytics are all based on zero-touch automation algorithm. User input is basically delivery dates and defect data plus planning data as an option. This poster session will be helpful for the software reliability community for both academia and industry practitioners. Academia will be able to understand the need for making the current reliability modeling more practical. Practitioners will be able to understand the power of the online analytics tool, STAR, and begin to collect right sets of data from their own development projects. It will allow them to focus on developing quality improvement plans. An earlier version of this tool has been extensively used at Nokia. | |
9 | James Brownlow mathematical statistician,USAF |
Presentation | Analysis of Target Location Error using Stochastic Differential Equations | Analysis Tools and Techniques | This paper presents an analysis of target location error (TLE) based on the Cox Ingersoll Ross (CIR) Read More model. In brief, this model characterizes TLE as a function of range based the stochastic differential equation model dX(r) = a(b-X(r))dr + sigma *sqrt(X(r)) dW(r) where X(t) is TLE at range r, b is the long-term mean (terminal) of the TLE, a is the rate of reversion of X(r) to b, sigma is the process volatility, and W(t) is the standard Weiner process. Multiple flight test runs under the same conditions exhibit different realizations of the TLE process. This approach to TLE analysis models each flight test run as a realization the CIR process. Fitting a CIR model to multiple data runs then provides a characterization of the TLE system under test. This paper presents an example use of the CIR model. Maximum likelihood estimates of the parameters of the CIR model are found from a collection of TLE data runs. The resulting CIR model is then used to characterize overall system TLE performance as a function of range to the target as well as the asymptotic estimate of long-term TLE. | |
10 | Andrew Mastin Operations Research Scientist,Lawrence Livermore National Laboratory |
Presentation | Sparse Models for Detecting Malicious Behavior in OpTC | Analysis Tools and Techniques | Host-based sensors are standard tools for generating event data to detect malicious activity on a Read More network. There is often interest in detecting activity using as few event classes as possible in order to minimize host processing slowdowns. Using DARPA’s Operationally Transparent Cyber (OpTC) Data Release, we consider the problem of detecting malicious activity using event counts aggregated over five-minute windows. Event counts are categorized by eleven features according to MITRE CAR data model objects. In the supervised setting, we use regression trees with all features to show that malicious activity can be detected at above a 90% true positive rate with a negligible false positive rate. Using forward and exhaustive search techniques, we show the same performance can be obtained using a sparse model with only three features. In the unsupervised setting, we show that the isolation forest algorithm is somewhat successful at detecting malicious activity, and that a sparse three-feature model performs comparably. Finally, we consider various search criteria for identifying sparse models and demonstrate that the RMSE criteria is generally optimal. | |
11 | Sean Fiorito short stop,VSTRAT, LLC |
Speed Session / Poster | Testing Abstract Notice | Design of Experiments | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore Read More et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum | |
12 | Poornima Madhavan Capability Steward for Social/Behavioral Sciences,MITRE |
Presentation | A Framework to Guide Human-Systems Integration and Resilience in Sociotechnical Systems | Test and Evaluation Methods for Emerging Technology | The goal of this work is to support technology development teams in building advanced technologies Read More and transitioning them into existing systems in a way that strengthens the adaptivity and resilience of high-consequence work systems, including military, healthcare, air traffic, petrochemical, and utility systems. These work systems are complex: They feature a large variety of interacting factors; the variety and interactions produce emergent conditions that can challenge or exceed the system’s capability envelope. Furthermore, in these systems there is constant potential for excessive demands, outages and malfunctions, anomalies, threats such as ransomware, and crises. Given the variety and unpredictability of operating conditions as well as the high cost of failure, these work systems must be capable of adapting and evolving, i.e., demonstrate resilience. In today’s work systems, humans are the actuators and guides of system adaptation and evolution. For humans to effectively guide these processes, the system must have certain features, referred to as system resilience sources, which tend to be inherent in all resilient complex systems. We have developed the “Transform with Resilience during Upgrades to Socio-Technical Systems” (TRUSTS) Framework to guide system modernization and advanced-technology acquisition. The framework specifies complex-system resilience sources, allowing system stakeholders to identify modernization strategies that improve system resilience and enabling technology developers to design and transition technologies in ways that contribute to system resilience. Currently, we are translating the framework’s system resilience sources into engineering methods and tools. This presentation will provide an overview of the TRUSTS Framework and describe our efforts towards integrating it into systems development practice. | |
13 | David Spalding Research Staff Member,Institute for Defense Analyses |
Presentation | Method for Evaluating Bayesian Reliability Models for Developmental Testing | Analysis Tools and Techniques | For analysis of military Developmental Test (DT) data, frequentist statistical models are Read More increasingly challenged to meet the needs of analysts and decision-makers. This is largely due to tightening constraints on test resources and schedule that reduce quantity and increase complexity of test data. Bayesian models have the potential to address this challenge. However, although there is a substantial body of research on Bayesian reliability estimation, there appears to be a paucity of Bayesian applications to issues of direct interest to DT decision makers. Due to user unfamiliarity with the characteristics and data needs of Bayesian models, the potential for such models appears to be unexploited. To address this deficiency, the purpose of this research is to provide a foundation and best practices for use of Bayesian reliability analysis in DT. This is accomplished by establishing a generic structure for systematically evaluating relevant statistical Bayesian models. First, it identifies reliability issues for DT programs using a structured poll of stakeholders combined with interviews of a selected set of Subject Matter Experts. Secondly, candidate solutions are identified in the literature and, thirdly solutions matched to issues using criteria designed to evaluate the capability of a solution to improve support for decision-makers at critical points in DT programs. The matching process uses a model taxonomy structured according to decisions at each DT phase, plus criteria for model applicability and data availability. The end result is a generic structure that allows an analyst to identify and evaluate a specific model for use with a program and issue of interest. This work includes example applications to models described in the statistical literature. | |
14 | Anthony Sgambellone ,Huntington Ingalls Industries |
Presentation | Building Bridges: a Case Study of Assisting a Program from the Outside | Special Topics | STAT practitioners often find ourselves outsiders to the programs we assist. This session presents Read More a case study that demonstrates some of the obstacles in communication of capabilities, purpose, and expectations that may arise due to approaching the project externally. Incremental value may open the door to greater collaboration in the future, and this presentation discusses potential solutions to provide greater benefit to testing programs in the face of obstacles that arise due to coming from outside the program team. DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. CLEARED on 5 Jan 2022. Case Number: 88ABW-2022-0002 | |
15 | Daniel Ries ,Sandia National Laboratories |
Speed Session / Poster | Exploring the behavior of Bayesian adaptive design of experiments | Design of Experiments | Physical experiments in the national security arena, including nuclear deterrence, are often Read More expensive and time-consuming resulting in small sample sizes which make it difficult to achieve desired statistical properties. Bayesian adaptive design of experiments (BADE) is a sequential design of experiment approach which updates the test design in real time, in order to optimally collect data. BADE recommends ending experiments early by either concluding that the experiment would have ended in efficacy or futility, had the testing completely finished, with sufficiently high probability. This is done by using data already collected and marginalizing over the remaining uncollected data and updating the Bayesian posterior distribution in near real-time. BADE has seen successes in clinical trials, resulting in quicker and more effective assessments of drug trials while also reducing ethical concerns. BADE has typically only been used in futility studies rather than efficacy studies for clinical trials, although there hasn’t been much debate for this current paradigm. BADE has been proposed for testing in the national security space for similar reasons of quicker and cheaper test series. Given the high-consequence nature of the tests performed in the national security space, a strong understanding of new methods is required before being deployed. The main contribution of this research was to reproduce results seen in previous studies, for different aspects of model performance. A large simulation inspired by a real testing problem at Sandia National Laboratories was performed to understand the behavior of BADE under various scenarios, including shifts to mean, standard deviation, and distributional family, all in addition to the presence of outliers. The results help explain the behavior of BADE under various assumption violations. Using the results of this simulation, combined with previous work related to BADE in this field, it is argued this approach could be used as part of an “evidence package” for deciding to stop testing early due to futility, or with stronger evidence, efficacy. The combination of expert knowledge with statistical quantification provides the stronger evidence necessary for a method in its infancy in a high-consequence, new application area such as national security. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. | |
16 | Takayuki Iguchi PhD Student,Florida State University |
Speed Session / Poster | Profile Monitoring via Eigenvector Perturbation | Test and Evaluation Methods for Emerging Technology | Control charts are often used to monitor the quality characteristics of a process over time to Read More ensure undesirable behavior is quickly detected. The escalating complexity of processes we wish to monitor spurs the need for more flexible control charts such as those used in profile monitoring. Additionally, designing a control chart that has an acceptable false alarm rate for a practitioner is a common challenge. Alarm fatigue can occur if the sampling rate is high (say, once a millisecond) and the control chart is calibrated to an average in-control run length (ARL0) of 200 or 370 which is often done in the literature. As alarm fatigue may not just be annoyance but result in detrimental effects to the quality of the product, control chart designers should seek to minimize the false alarm rate. Unfortunately, reducing the false alarm rate typically comes at the cost of detection delay or average out-of-control run length (ARL1). Motivated by recent work on eigenvector perturbation theory, we develop a computationally fast control chart called the Eigenvector Perturbation Control Chart for nonparametric profile monitoring. The control chart monitors the l_2 perturbation of the leading eigenvector of a correlation matrix and requires only a sample of known in-control profiles to determine control limits. Through a simulation study we demonstrate that it is able to outperform its competition by achieving an ARL1 close to or equal to 1 even when the control limits result in a large ARL0 on the order of 10^6. Additionally, non-zero false alarm rates with a change point after 10^4 in-control observations were only observed in scenarios that are either pathological or truly difficult for a correlation based monitoring scheme. | |
17 | Paul Fanto Research Staff Member,Institute for Defense Analyses |
Presentation | Bayesian Reliability Methods for Developmental Testing of a Generic Complex System | Analysis Tools and Techniques | Improving the statistical methods used to predict and assess defense system reliability in Read More developmental test (DT) programs would benefit Department of Defense (DoD) acquisition. Current methods for reliability planning tend to produce high reliability goals that are difficult to achieve in practice. Moreover, in many current applications, traditionally used frequentist methods for reliability assessment generally do not combine information across DT segments and tend to produce large uncertainty intervals of limited use to program decision makers. Much recent work has demonstrated the advantages of using Bayesian statistical methods for planning and assessing system reliability. This work investigates the application of Bayesian reliability assessment methods to notional DT lifetime data. The notional data are generated with the Bayesian reliability growth planning methodology of (Wayne 2018), and the Bayesian assessment methods are based on the methodology of (Wayne and Modarres 2015). The system under test is assumed to be a generic complex system, in which the large number of individual failure modes are not described individually. This work explores the sensitivity of the Bayesian results to the choice of the prior distribution and amount of DT data available. Furthermore, this work compares the Bayesian results for the reliability point estimate, uncertainty interval, and probability of passing a demonstration test to analogous results from traditional reliability assessment methods. Defining point estimates of key reliability metrics, in particular the mean time before system failure (MTBF), is discussed. The effect of relaxing the assumption of a generic complex system and attributing failures to individual failure modes is considered. Finally, the results of this study are compared with recent case studies that apply Bayesian reliability assessment methods. Wayne, Martin. 2018. “Modeling Uncertainty in Reliability Growth Plans.” 2018 Annual Reliability and Maintainability Symposium (RAMS). 1-6. Wayne, Martin, and Mohammad Modarres. 2015. “A Bayesian Model for Complex System Reliability Growth Under Arbitrary Corrective Actions.” IEEE Transactions on Reliability 64: 206-220. | |
18 | Victoria Sieck Deputy Director / Assistant Professor of Statistics,Scientific Test & Analysis Techniques Center of Excellence (STAT COE) / Air Force Institute of Technology (AFIT) |
Presentation | A Framework for Using Priors in a Continuum of Testing | Analysis Tools and Techniques | A strength of the Bayesian paradigm is that it allows for the explicit use of all available Read More information—to include subject matter expert (SME) opinion and previous (possibly dissimilar) data. While frequentists are constrained to only including data in an analysis (that is to say, only including information that can be observed), Bayesians can easily consider both data and SME opinion, or any other related information that could be constructed. This can be accomplished through the development and use of priors. When prior development is done well, a Bayesian analysis will not only lead to more direct probabilistic statements about system performance, but can result in smaller standard errors around fitted values when compared to a frequentist approach. Furthermore, by quantifying the uncertainty surrounding a model parameter, through the construct of a prior, Bayesians are able to capture the uncertainty across a test space of consideration. This presentation develops a framework for thinking about how different priors can be used throughout the continuum of testing. In addition to types of priors, how priors can change or evolve across the continuum of testing—especially when a system changes (e.g., is modified or adjusted) during phases of testing—will be addressed. Priors that strive to provide no information (reference priors) will be discussed, and will build up to priors that contain available information (informative priors). Informative priors—both those based on institutional knowledge or summaries from databases, as well as those developed based on previous testing data—will be discussed, with a focus on how to consider previous data that is dissimilar in some way, relative to the current test event. What priors might be more common in various phases of testing, types of information that can be used in priors, and how priors evolve as information accumulates will all be discussed. | |
19 | Priscila Silva Graduate Student,University of Massachusetts Dartmouth |
Speed Session / Poster | Bayesian Estimation for Covariate Defect Detection Model Based on Discrete Cox Proportiona | Analysis Tools and Techniques | Traditional methods to assess software characterize the defect detection process as a function of Read More testing time or effort to quantify failure intensity and reliability. More recent innovations include models incorporating covariates that explain defect detection in terms of underlying test activities. These covariate models are elegant and only introduce a single additional parameter per testing activity. However, the model forms typically exhibit a high degree of non-linearity. Hence, stable and efficient model fitting methods are needed to enable widespread use by the software community, which often lacks mathematical expertise. To overcome this limitation, this poster presents Bayesian estimation methods for covariate models, including the specification of informed priors as well as confidence intervals for the mean value function and failure intensity, which often serves as a metric of software stability. The proposed approach is compared to traditional alternative such as maximum likelihood estimation. Our results indicate that Bayesian methods with informed priors converge most quickly and achieve the best model fits. Incorporating these methods into tools should therefore encourage widespread use of the models to quantitatively assess software. | |
20 | Mark Fuller Associate Professor,University of Massachusetts Dartmouth |
Presentation | Quantifying the Impact of Staged Rollout Policies on Software Process and Product Metrics | Analysis Tools and Techniques | Software processes define specific sequences of activities performed to effectively produce Read More software, whereas tools provide concrete computational artifacts by which these processes are carried out. Tool independent modeling of processes and related practices enable quantitative assessment of software and competing approaches. This talk presents a framework to assess an approach employed in modern software development known as staged rollout, which releases new or updated software features to a fraction of the user base in order to accelerate defect discovery without imposing the possibility of failure on all users. The framework quantifies process metrics such as delivery time and product metrics, including reliability, availability, security, and safety, enabling tradeoff analysis to objectively assess the quality of software produced by vendors, establish baselines, and guide process and product improvement. Failure data collected during software testing is employed to emulate the approach as if the project were ongoing. The underlying problem is to identify a policy that decides when to perform various stages of rollout based on the software’s failure intensity. The illustrations examine how alternative policies impose tradeoffs between two or more of the process and product metrics. | |
21 | Leonard Lombardo Mathematician,U.S. Army Aberdeen Test Center |
Presentation | Likelihood Ratio Test Comparing V50 Values for a Partially Nested Generalized Linear Mixed | Analysis Tools and Techniques | Ballistic limit testing is a type of sensitivity testing in which the stressor is the velocity of a Read More kinetic energy threat (fixed effect) and the response is the penetration result. A generalized linear model (GLM) may be used to analyze the data. If there is an additional random effect (e.g., lot number for a threat), then a generalized linear mixed model (GLMM) may be used. In both cases, the V50 (velocity at which there is a 50% probability of penetration) is often used as a metric of performance. Surrogates are developed to improve repeatability, to increase the availability of test resources, and to decrease the cost of testing. Examples include a surrogate threat to replace a foreign round and a human skin simulant to replace the need for cadaver testing. Testing is required to ensure that the surrogate’s performance is similar to the material it is replacing. A Wald statistical test compares V50 values between the actual item and a surrogate. Although both values are based on a large sample approximation, likelihood ratio tests tend to outperform Wald tests for smaller samples. A likelihood ratio test on V50 has previously been proposed when using a GLM. This work extends this method to the comparison of V50 values for a partially nested GLMM as would be seen when evaluating the performance of a surrogate. A simulation study is conducted and quantile-quantile plots are used to investigate the performance of this likelihood ratio test. | |
22 | Christian Ellis Journeyman Fellow,Army Research Laboratory |
Presentation | Assurance Techniques for Learning Enabled Autonomous Systems which Aid Systems Engineering | Test and Evaluation Methods for Emerging Technology | It is widely recognized that the complexity and resulting capabilities of autonomous systems created Read More using machine learning methods, which we refer to as learning enabled autonomous systems (LEAS), pose new challenges to systems engineering test, evaluation, verification, and validation (TEVV) compared to their traditional counterparts. This presentation provides a preliminary attempt to map recently developed technical approaches in the assurance and TEVV of learning enabled autonomous systems (LEAS) literature to a traditional systems engineering v-model. This mapping categorizes such techniques into three main approaches: development, acquisition, and sustainment. This mapping reviews the latest techniques to develop safe, reliable, and resilient learning enabled autonomous systems, without recommending radical and impractical changes to existing systems engineering processes. By performing this mapping, we seek to assist acquisition professionals by (i) informing comprehensive test and evaluation planning, and (ii) objectively communicating risk to leaders. The inability to translate qualitative assessments to quantitative metrics which measure system performance hinder adoption. Without understanding the capabilities and limitations of existing assurance techniques, defining safety and performance requirements that are both clear and testable remains out of reach. We accompany recent literature reviews on autonomy assurance and TEVV by mapping such developments to distinct steps of a well known systems engineering model chosen due to its prevalence, namely the v-model. For three top-level lifecycle phases: development, acquisition, and sustainment, a section of the presentation has been dedicated to outlining recent technical developments for autonomy assurance. This representation helps identify where the latest methods for TEVV fit in the broader systems engineering process while also enabling systematic consideration of potential sources of defects, faults, and attacks. Note that we use the v-model only to assist the classification of where TEVV methods fit. This is not a recommendation to use a certain software development lifecycle over another. | |
23 | Caitlan Fealing Data Science Fellow,Institute for Defense Analyses |
Speed Session / Poster | Predicting Trust in Automated Systems: Validation of the Trust of Automated Systems Test | Test and Evaluation Methods for Emerging Technology | Over the past three years, researchers in OED have developed a scale to measure trust in automated Read More systems called the Trust of Automated Systems Test (TOAST) and provided initial evidence of its validity. This poster will describe how accurate the TOAST scale is at predicting trust in an automated system by measuring the extent to which a civilian will use a provided system. The main question we plan to answer is if the scale can predict what we consider to be a “steady” level of reliance; or the level of reliance that a person reaches that does not vary as time continues. We also have two supporting questions designed to help us understand how people determine levels of trust of high-accuracy and low-accuracy systems and how long it takes for people to reach their steady level of trust. We believe that this scale should be used to evaluate the trust level of any human using any system, including predicting when operators will misuse or disuse complex, automated and autonomous systems. | |
24 | Anna Vinnedge Student,United States Military Academy |
Presentation | Utilizing Machine Learning Models to Predict Success in Special Operations Assessment | Data Management and Reproducible Research | The 75th Ranger Regiment is an elite Army Unit responsible for some of the most physically and Read More mentally challenging missions. Entry to the unit is based on an assessment process called Ranger Regiment Assessment and Selection (RASP), which consists of a variety of tests and challenges of strength, intellect, and grit. This study explores the psychological and physical profiles of candidates who attempt to pass RASP. Using a Random Forest Artificial Intelligence model, and a penalized logistic regression model, we identify initial entry characteristics that are predictive of success in RASP. We focus on the differences between racial sub-groups and military occupational specialties (MOS) sub-groups to provide information for recruiters to identify underrepresented groups who are likely to succeed into the selection process. | |
25 | David Shteinman CEO/Managing Director,Industrial Sciences Group |
Presentation | USE OF DESIGN & ANALYSIS OF COMPUTER EXPERIMENTS (DACE) IN SPACE MISSION TRAJECTORY DESIGN | Design of Experiments | Numerical astrodynamics simulations are characterized by a large input space and com-plex, nonlinear Read More input-output relationships. Standard Monte Carlo runs of these simulations are typically time-consuming and numerically costly. We adapt the Design and Analysis of Com-puter Experiments (DACE) approach to astrodynamics simulations to improve runtimes and increase information gain. Space-filling designs such as the Latin Hypercube Sampling (LHS) methods, Maximin and Maximum Projection Sampling, with the Surrogate modelling tech-niques of DACE such as Radial Basis Functions and Gaussian Process Regression, gave sig-nificant improvements for astrodynamics simulations, including: reduced run time of Monte Carlo simulations, improved speed of sensitivity analysis, confidence intervals for non-Gaussian behavior, determination of outliers, and identifying extreme output cases not found by standard simulation and sampling methods. Four case studies are presented on novel applications of DACE to mission trajectory design & conjunction assessments with space debris: 1) Gaussian Process regression modelling of maneuvers and navigation uncertainties for commercial cislunar and NASA CLPS lunar missions; 2) Development of a Surrogate model for predficting collision risk and miss distance volatility between debris and satellites in Low Earth orbit; 3) Prediction of the displace-ment of an object in orbit using laser photon pressure; 4) Prediction of eclipse durations for the NASA IBEX-extended mission. The surrogate models are assessed by k-fold cross validation. The relative selection of sur-rogate model performance is verified by the Root Mean Square Error (RMSE) of predic-tions at untried points. To improve the sampling of manoeuvre and navigational uncertain-ties within trajectory design for lunar missions, a maximin LHS was used, in combination with the Gates model for thrusting uncertainty. This led to improvements in simulation ef-ficiency, producing a non-parametric ΔV distribution that was processed with Kernel Density Estimation to resolve a ΔV99.9 prediction with confidence bounds. In a collaboration with the NASA Conjunction Assessment Risk Analysis (CARA) group, the changes in probability of collision (Pc) for two objects in LEO was predicted using a network of 13 Gaussian Process Regression-based surrogate models that deter-mined the future trends in covariance and miss distance volatility, given the data provided within a conjunction data message. This allowed for determination of the trend in the prob-ability distribution of Pc up to three days from the time of closest approach, as well as the interpretation of this prediction in the form of an urgency metric that can assist satellite operators in the manoeuvre decision process. The main challenge in adapting the methods of DACE to astrodynamics simulations was to deliver a direct benefit to mission planning and design. This was achieved by delivering improvements in confidence and predictions for metrics including propellant required to complete a lunar mission (expressed as ΔV); statistical validation of the simulation models used and advising when a sufficient number of simulation runs have been made to verify convergence to an adequate confidence interval. Future applications of DACE for mission design include determining an optimal tracking schedule plan for a lunar mission, and ro-bust trajectory design for low thrust propulsion. | |
26 | Nadeem Damanhuri Cadet,USAFA |
Speed Session / Poster | Machine Learning for Efficient Fuzzing | Test and Evaluation Methods for Emerging Technology | A high level of security in software is a necessity in today’s world; the best way to achieve Read More confidence in security is through comprehensive testing. This paper covers the development of a fuzzer that explores the massively large input space of a program using machine learning to find the inputs most associated with errors. A formal methods model of the software in question is used to generate and evaluate test sets. Using those test sets, a two-part algorithm is used: inputs get modified according to their Hamming distance from error-causing inputs and then a tree-based model learns the relative importance of each variable in causing errors. This architecture was tested against a model of an aircraft’s thrust reverser and predefined model properties offered a starting test set. From there, the hamming algorithm and importance model expand upon the original set to offer a more informed set of test cases. This system has great potential in producing efficient and effective test sets and has further applications in verifying the security of software programs and cyber-physical systems, contributing to national security in the cyber domain. | |
27 | Prarabdha Ojwaswee Yonzon Cadet,United States Military Academy (West Point) |
Speed Session / Poster | Convolutional Neural Networks and Semantic Segmentation for Cloud and Ice Detection | Special Topics | Recent research shows the effectiveness of machine learning on image classification and Read More segmentation. The use of artificial neural networks (ANNs) on image datasets such as the MNIST dataset of handwritten digits is highly effective. However, when presented with a more complex image, ANNs and other simple computer vision algorithms tend to fail. This research uses Convolutional Neural Networks (CNNs) to determine how we can differentiate between ice and clouds in the imagery of the Arctic. Instead of using ANNs, where we analyze the problem in one dimension, CNNs identify features using the spatial relationships between the pixels in an image. This technique allows us to extract spatial features, presenting us with higher accuracy. Using a CNN named the Cloud-Net Model, we analyze how a CNN performs when analyzing satellite images. First, we examine recent research on the Cloud-Net Model’s effectiveness on satellite imagery, specifically from Landsat data, with four channels: red, green, blue, and infrared. We extend and modify this model, allowing us to analyze data from the most common channels used by satellites: red, green, and blue. By training on different combinations of these three channels, we extend this analysis by testing on an entirely different data set: GOES imagery. This gives us an understanding of the impact of each individual channel in image classification. By selecting images that exist in the same geographic location and containing both ice and clouds, such as the Landsat, we test GOES analyzing the CNN’s generalizability. Finally, we present CNN’s ability to accurately identify the clouds and ice in the GOES data versus the Landsat data. | |
28 | John Cilli Computer Scientist,Picatinny Arsenal |
Presentation | Data Science & ML-Enabled Terminal Effects Optimization | Analysis Tools and Techniques | Warhead design and performance optimization against a range of targets is a foundational aspect of Read More the Department of the Army’s mission on behalf of the warfighter. The existing procedures utilized to perform this basic design task do not fully leverage the exponential growth in data science, machine learning, distributed computing, and computational optimization. Although sound in practice and methodology, existing implementations are laborious and computationally expensive, thus limiting the ability to fully explore the trade space of all potentially viable solutions. An additional complicating factor is the fast paced nature of many Research and Development programs which require equally fast paced conceptualization and assessment of warhead designs. By utilizing methods to take advantage of data analytics, the workflow to develop and assess modern warheads will enable earlier insights, discovery through advanced visualization, and optimal integration of multiple engineering domains. Additionally, a framework built on machine learning would allow for the exploitation of past studies and designs to better inform future developments. Combining these approaches will allow for rapid conceptualization and assessment of new and novel warhead designs. US overmatch capability is quickly eroding across many tactical and operational weapon platforms. Traditional incremental improvement approaches are no longer generating appreciable performance improvements to warrant investment. Novel next generation techniques are required to find efficiencies in designs and leap forward technologies to maintain US superiority. The proposed approach seeks to shift existing design mentality to meet this challenge. |