All Abstracts
Total Abstracts: 99
Total Contributed Abstracts: 31
Show Invited Abstracts
# | Name / Org | Type | Level | Abstract Title | Theme | Abstract | Invitation | Add to Speakers |
---|---|---|---|---|---|---|---|---|
64 | Daniel Ries Senior Member of the Technical Staff, Sandia National Laboratories |
Presentation Publish |
2 | Saving hardware, labor, and time using Bayesian adaptive design of experiments | Improving the Quality of Test & Evaluation | Physical testing in the national security enterprise is often costly. Sometimes this is driven by Read More hardware and labor costs, other times it can be driven by finite resources of time or hardware builds. Test engineers must make the most of their available resources to answer high consequence problems. Bayesian adaptive design of experiments (BADE) is one tool that should be in an engineer’s toolbox for designing and running experiments. BADE is sequential design of experiment approach which allows early stopping decisions to be made in real time using predictive probabilities (PP), allowing for more efficient data collection. BADE has seen successes in clinical trials, another high consequence arena, and it has resulted in quicker and more effective assessments of drug trials. BADE has been proposed for testing in the national security space for similar reasons of quicker and cheaper test series. Given the high-consequence nature of the tests performed in the national security space, a strong understanding of new methods is required before being deployed. The main contribution of this research is to assess the robustness of PP in a BADE under different modeling assumptions, and to compare PP results to its frequentist alternative, conditional power (CP). Comparisons are made based on Type I error rates, statistical power, and time savings through average stopping time. Simulation results show PP has some robustness to distributional assumptions. PP also tends to control Type I error rates better than CP, while maintaining relatively strong power. While CP usually recommends stopping a test earlier than PP, CP also tends to have more inconsistent results, again showing the benefits of PP in a high consequence application. An application to a real problem from Sandia National Laboratories shows the large potential cost savings for using PP. The results of this study suggest BADE can be one piece of an evidence package during testing to stop testing early and pivot, in order to decrease costs and increase flexibility. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. | Victoria Sieck | Add to Speakers |
65 | Kazu Okumoto CEO, Sakura Software Solutions (3S) LLC |
Presentation Publish |
2 | STAR: A Cloud-based Innovative Tool for Software Quality Analysis | Sharing Analysis Tools, Methods, and Collaboration Strategies | Traditionally, subject matter experts perform software quality analysis using custom spreadsheets Read More which produce inconsistent output and are challenging to share and maintain across teams. This talk will introduce and demonstrate STAR – a cloud-based, data-driven tool for software quality analysis. The tool is aimed at practitioners who manage software quality and make decisions based on its readiness for delivery. Being web-based and fully automated allows teams to collaborate on software quality analysis across multiple projects and releases. STAR is an integration of SaaS and automated analytics. It is a digital engineering tool for software quality practice. To use the tool, all users need to do is upload their defect and development effort (optional) data to the tool and set a couple of planned release milestones, such as test start date and delivery dates for customer trial and deployment. The provided data is then automatically processed and aggregated into a defect growth curve. The core innovation of STAR is in its set of Statistical sound algorithms that are then used to fit a defect prediction curve to the provided data. This is achieved through the automated identification of inflection points in the original defect data and their use in generating piece-wise exponential models that make up the final prediction curve. Moreover, during the early days of software development, where no defect data is available, STAR can use the development effort plan and learn from previous software releases’ defects and effort data to make predictions for the current release. Finally, the tool implements a range of what-if scenarios that enable practitioners to evaluate several potential actions to correct course. Thanks to the use of an earlier version of STAR by a large software development group at Nokia and the current trialing collaboration with NASA, the features and accuracy of the tool have improved to be better than traditional single curve fitting. In particular, the defect prediction is stable several weeks before the planned software release, and the multiple metrics provided by the tool make the analysis of software quality straightforward, guiding users in making an intelligent decision regarding the readiness for high-quality software delivery. | Lance Fiondella | Add to Speakers |
66 | Jonathan Rathsam Senior Research Engineer, NASA Langley Research Center |
Presentation No Publish |
1 | An Overview of the NASA Quesst Community Test Campaign with the X-59 Aircraft | Advancing Test & Evaluation of Emerging and Prevalent Technologies | In its mission to expand knowledge and improve aviation, NASA conducts research to address sonic Read More boom noise, the prime barrier to overland supersonic flight. For half a century civilian aircraft have been required to fly slower than the speed of sound when over land to prevent sonic boom disturbances to communities under the flight path. However, lower noise levels may be achieved via new aircraft shaping techniques that reduce the merging of shockwaves generated during supersonic flight. As part of its Quesst mission, NASA is building a piloted, experimental aircraft called the X-59 to demonstrate low noise supersonic flight. After initial flight testing to ensure the aircraft performs as designed, NASA will begin a national campaign of community overflight tests to collect data on how people perceive the sounds from this new design. The data collected will support national and international noise regulators’ efforts as they consider new standards that would allow supersonic flight over land at low noise levels. This presentation provides an overview of the community test campaign, including the scope, key objectives, stakeholders, and challenges. | Peter Parker | Add to Speakers |
67 | Nathan B. Cruze Statistician, NASA Langley Research Center |
Presentation No Publish |
2 | Infusing Statistical Thinking into the NASA Quesst Community Test Campaign | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Statistical thinking permeates many important decisions as NASA plans its Quesst mission, which will Read More culminate in a series of community overflights using the X-59 aircraft to demonstrate low-noise supersonic flight. Month-long longitudinal surveys will be deployed to assess human perception and annoyance to this new acoustic phenomenon. NASA works with a large contractor team to develop systems and methodologies to estimate noise doses, to test and field socio-acoustic surveys, and to study the relationship between the two quantities, dose and response, through appropriate choices of statistical models. This latter dose-response relationship will serve as an important tool as national and international noise regulators debate whether overland supersonic flights could be permitted once again within permissible noise limits. In this presentation we highlight several areas where statistical thinking has come into play, including issues of sampling, classification and data fusion, and analysis of longitudinal survey data that are subject to rare events and the consequences of measurement error. We note several operational constraints that shape the appeal or feasibility of some decisions on statistical approaches, and we identify several important remaining questions to be addressed. | Peter Parker | Add to Speakers |
68 | Andrew Hollis Graduate Student, North Carolina State University |
Speed Presentation Publish |
2 | Uncertain Text Classification for Proliferation Detection | Sharing Analysis Tools, Methods, and Collaboration Strategies | A key global security concern in the nuclear weapons age is the proliferation and development of Read More nuclear weapons technology, and a crucial part of enforcing non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. Deep, transformer-based text classification models are an important piece of systems designed to monitor scientific research for this purpose. For applications like proliferation detection involving high-stakes decisions, there has been growing interest in ensuring that we can perform well-calibrated, interpretable uncertainty quantification with such classifier models. However, because modern transformer-based text classification models have hundreds of millions of parameters and the computational cost of uncertainty quantification typically scales with the size of the parameter space, it has been difficult to produce computationally tractable uncertainty quantification for these models. We propose a new variational inference framework that is computationally tractable for large models and meets important uncertainty quantification objectives including producing predicted class probabilities that are well-calibrated and reflect our prior conception of how different classes are related. | Alyson Wilson | Add to Speakers |
69 | Michael Anthony DiNicola Systems Engineer, Jet Propulsion Laboratory, California Institute of Technology |
Speed Presentation No Publish |
2 | Uncertainty Quantification of High Heat Microbial Reduction for NASA Planetary Protection | Solving Program Evaluation Challenges | Planetary Protection is the practice of protecting solar system bodies from harmful contamination by Read More Earth life and protecting Earth from possible life forms or bioactive molecules that may be returned from other solar system bodies. Microbiologists and engineers at NASA’s Jet Propulsion Laboratory (JPL) design microbial reduction and sterilization protocols that reduce the number of microorganisms on spacecraft or eliminate them entirely. These protocols are developed using controlled experiments to understand the microbial reduction process. Many times, a phenomenological model (such as a series of differential equations) is posited that captures key behaviors and assumptions of the process being studied. A Sterility Assurance Level (SAL) – the probability that a product, after being exposed to a given sterilization process, contains one or more viable organisms – is a standard metric used to assess risk and define cleanliness requirements in industry and for regulatory agencies. Experiments performed to estimate the SAL of a given microbial reduction or sterilization protocol many times have large uncertainties and variability in their results even under rigorously implemented controls that, if not properly quantified, can make it difficult for experimenters to interpret their results and can hamper a credible evaluation of risk by decision makers. In this talk, we demonstrate how Bayesian statistics and experimentation can be used to quantify uncertainty in phenomenological models in the case of microorganism survival under short-term high heat exposure. We show how this can help stakeholders make better risk-informed decisions and avoid the unwarranted conservatism that is often prescribed when processes are not well understood. The experiment performed for this study employs a 6 kW infrared heater to test survivability of heat resistant Bacillus canaveralius 29669 to temperatures as high as 350 °C for time durations less than 30 sec. The objective of this study was to determine SALs for various time-temperature combinations, with a focus on those time-temperature pairs that give a SAL of 10^-6. Survival ratio experiments were performed that allow estimation of the number of surviving spores and mortality rates characterizing the effect of the heat treatment on the spores. Simpler but less informative fraction-negative experiments that only provide a binary sterile/not-sterile outcome were also performed once a sterilization temperature regime was established from survival ratio experiments. The phenomenological model considered here is a memoryless mortality model that underlies many heat sterilization protocols in use today. This discussion and poster will outline how the experiment and model were brought together to determine SALs for the heat treatment under consideration. Ramifications to current NASA planetary protection sterilization specifications and current missions under development such as Mars Sample Return will be discussed. This presentation/poster is also relevant to experimenters and microbiologists working on military and private medical device applications where risk to human life is determined by sterility assurance of equipment. | Kelli McCoy | Add to Speakers |
70 | Zheming Gou Graduate Research assistant, University of Southern California |
Speed Presentation No Publish |
1 | A data-driven approach of uncertainty quantification on Reynolds stress based on DNS | Sharing Analysis Tools, Methods, and Collaboration Strategies | High-fidelity simulation capabilities have progressed rapidly over the past decades in computational Read More fluid dynamics (CFD), resulting in plenty of high-resolution flow field data. Uncertainty quantification remains an unsolved problem due to the high-dimensional input space and the intrinsic complexity of turbulence. Here we developed an uncertainty quantification method to model the Reynolds stress based on Karhunen-Loeve Expansion(KLE) and Project Pursuit basis Adaptation polynomial chaos expansion(PPA). First, different representative volume elements (RVEs) were randomly drawn from the flow field, and KLE was used to reduce them into a moderate dimension. Then, we build polynomial chaos expansions of Reynolds stress using PPA. Results show that this method can yield a surrogate model with a test accuracy of up to 90%. PCE coefficients also show that Reynolds stress strongly depends on second-order KLE random variables instead of first-order terms. Regarding data efficiency, we built another surrogate model using a neural network(NN) and found that our method outperforms NN in limited data cases. | Kelli McCoy | Add to Speakers |
71 | Nicholas Clark Associate Professor, United States Military Academy |
Presentation Publish |
1 | Data Literacy Within the Department of Defense | Sharing Analysis Tools, Methods, and Collaboration Strategies | Data literacy, the ability to read, write, and communicate data in context, is fundamental for Read More military organizations to create a culture where data is appropriately used to inform both operational and non-operational decisions. However, oftentimes organizations outsource data problems to outside entities and rely on a small cadre of data experts to tackle organizational problems. In this talk we will argue that data literacy is not solely the role or responsibility of the data expert. Ultimately, if experts develop tools and analytics that Army decision makers cannot use, or do not effectively understand the way the Army makes decisions, the Army is no more data rich than if it had no data at all. While serving on a sabbatical as the Chief Data Scientist for Joint Special Operations Command, COL Nick Clark (Department of Mathematical Sciences, West Point), noticed that a lack of basic data literacy skills was a major limitation to creating a data centric organization. As a result of this, he created 10 hours of training focusing on the fundamentals of data literacy. After delivering the course to JSOC, other DoD organizations began requesting the training. In response to this, a team from West Point joined with Army Talent Management Task Force to create mobile training teams. The teams have now delivered the training over 30 times to organizations ranging from tactical units up to strategic level commands. In this talk, we discuss what data literacy skills should be taught to the force and highlight best practices in educating soldiers, civilians, and contractors on the basics of data literacy. We will finally discuss strategies for assessing organizational Data Literacy and provide a framework for attendees to assess their own organizations data strengths and weaknesses. | Nicholas Clark | Add to Speakers |
72 | Lance Fiondella Associate Professor, University of Massachusetts |
Presentation No Publish |
2 | Covariate Software Vulnerability Discovery Model to Support Cybersecurity T&E | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Vulnerability discovery models (VDM) have been proposed as an application of software reliability Read More growth models (SRGM) to software security related defects. VDM model the number of vulnerabilities discovered as a function of testing time, enabling quantitative measures of security. Despite their obvious utility, past VDM have been limited to parametric forms that do not consider the multiple activities software testers undertake in order to identify vulnerabilities. In contrast, covariate SRGM characterize the software defect discovery process in terms of one or more test activities. However, data sets documenting multiple security testing activities suitable for application of covariate models are not readily available in the open literature. To demonstrate the applicability of covariate SRGM to vulnerability discovery, this research identified a web application to target as well as multiple tools and techniques to test for vulnerabilities. The time dedicated to each test activity and the corresponding number of unique vulnerabilities discovered were documented and prepared in a format suitable for application of covariate SRGM. Analysis and prediction were then performed and compared with a flexible VDM without covariates, namely the Alhazmi-Malaiya Logistic Model (AML). Our results indicate that covariate VDM significantly outperformed the AML model on predictive and information theoretic measures of goodness of fit, suggesting that covariate VDM are a suitable and effective method to predict the impact of applying specific vulnerability discovery tools and techniques. | Lance Fiondella | Add to Speakers |
73 | Aaron B. Vaughn Research Aerospace Engineer, NASA Langley Research Center |
Presentation No Publish |
1 | Dose-Response Data Considerations for the NASA Quesst Community Test Campaign | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Key outcomes for NASA’s Quesst mission are noise dose and perceptual response data to inform Read More regulators on their decisions regarding noise certification standards for the future of overland commercial supersonic flight. Dose-response curves are commonly utilized in community noise studies to describe the annoyance of a community to a particular noise source. The X-59 aircraft utilizes shaped-boom technology to demonstrate low noise supersonic flight. For X-59 community studies, the sound level from X-59 overflights constitutes the dose, while the response is an annoyance rating selected from a verbal scale, e.g., “slightly annoyed” and “very annoyed.” Dose-response data will be collected from individual flyovers (single event dose) and an overall response to the accumulation of single events at the end of the day (cumulative dose). There are quantifiable sources of error in the noise dose due to uncertainty in microphone measurements of the sonic thumps and uncertainty in predicted noise levels at survey participant locations. Assessing and accounting for error in the noise dose is essential to obtain an accurate dose-response model. There is also a potential for error in the perceptual response. This error is due to the ability of participants to provide their response in a timely manner and participant fatigue after responding to up to one hundred surveys over the course of a month. This talk outlines various challenges in estimating noise dose and perceptual response and the methods considered in preparation for X-59 community tests. | Pete Parker | Add to Speakers |
74 | Himanshu Dayaram Upadhyay Associate Professor (ECE), Florida International University |
Presentation No Publish |
1 | Advanced Automated Machine Learning System for Cybersecurity | Sharing Analysis Tools, Methods, and Collaboration Strategies | Florida International University (FIU) has developed an Advanced Automated Machine Learning System Read More (AAMLS) under the sponsored research from Department of Defense – Test Resource Management Center (DOD-TRMC), to provide Artificial Intelligence based advanced analytics solutions in the area of Cyber, IOT, Network, Energy, Environment etc. AAMLS is a Rapid Modeling & Testing Tool (RMTT) for developing machine learning and deep learning models in few steps by subject matter experts from various domains with minimum machine learning knowledge using auto & optimization workflows. AAMLS allows analysis of data collected from different test technology domains by using machine learning / deep learning and ensemble learning approaches to generate models, make predictions, then apply advanced analytics and visualization to perform analysis. This system enables automated machine learning using AI based Advanced Analytics and the Analytics Control Center platforms by connecting to multiple Data Sources. Artificial Intelligence based Advanced Analytics Platform: This platform is the analytics engine of AAML which provides pre-processing, feature engineering, model building and predictions. Primary components of this platform include: • Machine Learning Server: This module is deployed to build ML/DL models using the training data from the data sources and perform predictions/analysis of associated test data based on the AAML-generated ML/DL models. • Machine Learning Algorithms: ML algorithms like Logistic Regression, Linear regression, Decision tree, Random Forest, One Class Support Vector Machine, Jaccard Similarity etc. are available for model building. • Deep learning Algorithms: Deep learning algorithms such as Deep Neural Networks and Recurrent Neural Networks are available to perform classification & anomaly detection using the TensorFlow framework and the Keras API. Analytics Control Center: This platform is a centralized application to manage the AAML system. It consists of following main modules. • Data Source: This module allows the user to connect to the existing data to the AAML system to perform analytics. These data sources may reside in a Network File Share, Database or Big Data Cluster. • Model Development: This module provides the functionality to build ML/DL models with various AI algorithms. This is performed by engaging specific ML algorithms for five types of analysis: Classification, Regression, Time-Series, Anomaly Detection and Clustering • Predictions: This module provides the functionality to predict the outcome of an analysis of an associated data set based on model built during Model Development. • Manage Models and Predictions: These modules allow the user to manage the ML models that have been generated and resulting predictions of associated data sets. | Dr. Jeremy Werner & Dr. Kelly Avery | Add to Speakers |
75 | Melissa Hooke Systems Engineer, NASA Jet Propulsion Laboratory |
Speed Presentation No Publish |
1 | Fully Bayesian Data Imputation using Stan Hamiltonian Monte Carlo | Sharing Analysis Tools, Methods, and Collaboration Strategies | When doing multivariate data analysis, one common obstacle is the presence of incomplete Read More observations, i.e., observations for which one or more covariates are missing data. Rather than deleting entire observations that contain missing data, which can lead to small sample sizes and biased inferences, data imputation methods can be used to statistically “fill-in” missing data. Imputing data can help combat small sample sizes by using the existing information in partially complete observations with the end goal of producing less biased and higher confidence inferences. In aerospace applications, imputation of missing data is particularly relevant because sample sizes are small and quantifying uncertainty in the model is of utmost importance. In this paper, we outline the benefits of a fully Bayesian imputation approach which samples simultaneously from the joint posterior distribution of model parameters and the imputed values for the missing data. This approach is preferred over multiple imputation approaches because it performs the imputation and modeling steps in one step rather than two, making it more compatible with complex model forms. An example of this imputation approach is applied to the NASA Instrument Cost Model (NICM), a model used widely across NASA to estimate the cost of future spaceborne instruments. The example models are implemented in Stan, a statistical-modeling tool enabling Hamiltonian Monte Carlo (HMC). | Kelli McCoy | Add to Speakers |
76 | Peng Liu Principal Research Statistician Developer, JMP Statistical Discovery LLC |
Mini-Tutorial Publish |
2 | A Tour of JMP Reliability Platforms and Bayesian Methods for Reliability Data | Sharing Analysis Tools, Methods, and Collaboration Strategies | JMP is a comprehensive, visual, and interactive statistical discovery software with a carefully Read More curated graphical user interface designed for statistical discovery. The software is packed with traditional and modern statistical analysis capabilities and many unique innovative features. The software hosts several suites of tools that are especially valuable to the DATAWorks’s audience. The software includes suites for Design of Experiments, Quality Control, Process Analysis, and Reliability Analysis. JMP has been building its reliability suite for the past fifteen years. The reliability suite in JMP is a comprehensive and mature collections of JMP platforms. The suite empowers reliability engineers with tools for analyzing time-to-event data, accelerated life test data, observational reliability data, competing cause data, warranty data, cumulative damage data, repeated measures degradation data, destructive degradation data, and recurrence data. For each type of data, there are numerous models and one or more methodologies that are applicable based on the nature of data. In addition to reliability data analysis platforms, the suite also provides capabilities of reliability engineering for system reliability from two distinct perspectives, one for non-repairable systems and the other for repairable systems. The capability of JMP reliability suite is also at the frontier of advanced research on reliability data analysis. Inspired by the research by Prof. William Meeker at Iowa State University, we have implemented Bayesian inference methodologies for analyzing three most important types of reliability data. The tutorial will start with an overall introduction to JMP’s reliability platforms. Then the tutorial will focus on analyzing time-to-event data, accelerated life test data, and repeated measures degradation data. The tutorial will present analyzing these types of reliability data using traditional methods, and highlight when, why, and how to analyze them in JMP using Bayesian methods. | Andrea Brown | Add to Speakers |
77 | Christopher Dimapasok Graduate Student/IDA Summer Associate, Johns Hopkins University |
Speed Presentation Publish |
1 | Implementing Fast Flexible Space Filling Designs In R | Improving the Quality of Test & Evaluation | Modeling and simulation (M&S) can be a useful tool for testers and evaluators when they need to Read More augment the data collected during a test event. During the planning phase, testers use experimental design techniques to determine how much and which data to collect. When designing a test that involves M&S, testers can use Space-Filling Designs (SFD) to spread out points across the operational space. Fast Flexible Space-Filling Designs (FFSFD) are a type of SFD that are useful for M&S because they work well in nonrectangular design spaces and allow for the inclusion of categorical factors. Both of these are recurring features in defense testing. Guidance from the Deputy Secretary of Defense and the Director of Operational Test and Evaluation encourages the use of open and interoperable software and recommends the use of SFD. This project aims to address those. IDA analysts developed a function to create FFSFD using the free statistical software R. To our knowledge, there are no R packages for the creation of an FFSFD that could accommodate a variety of user inputs, such as categorical factors. Moreover, by using this function, users can share their code to make their work reproducible. This presentation starts with background information about M&S and, more specifically, SFD. The briefing uses a notional missile system example to explain FFSFD in more detail and show the FFSFD R function inputs and outputs. The briefing ends with a summary of the future work for this project. | Jason, Rebecca, Kelly, and Keyla | Add to Speakers |
78 | Mindy Hotchkiss Technical Specialist, Aerojet Rocketdyne |
Presentation Publish |
2 | Assessing Predictive Capability and Contribution for Binary Classification Models | Sharing Analysis Tools, Methods, and Collaboration Strategies | Classification models for binary outcomes are in widespread use across a variety of industries. Read More Results are commonly summarized in a misclassification table, also known as an error or confusion matrix, which indicates correct vs incorrect predictions for different circumstances. Models are developed to minimize both false positive and false negative errors, but the optimization process to train/obtain the model fit necessarily results in cost-benefit trades. However, how to obtain an objective assessment of the performance of a given model in terms of predictive capability or benefit is less well understood, due to both the rich plethora of options described in literature as well as the largely overlooked influence of noise factors, specifically class imbalance. Many popular measures are susceptible to effects due to underlying differences in how the data are allocated by condition, which cannot be easily corrected. This talk considers the wide landscape of possibilities from a statistical robustness perspective. Results are shown from sensitivity analyses for a variety of different conditions for several popular metrics and issues are highlighted, highlighting potential concerns with respect to machine learning or ML-enabled systems. Recommendations are provided to correct for imbalance effects, as well as how to conduct a simple statistical comparison that will detangle the beneficial effects of the model itself from those of imbalance. Results are generalizable across model type. | Lance Fiondella | Add to Speakers |
79 | Addison Adams Summer Associate VI, IDA |
Speed Presentation Publish |
1 | Comparing Normal and Binary D-optimal Design of Experiments by Statistical Power | Sharing Analysis Tools, Methods, and Collaboration Strategies | In many Department of Defense (DoD) Test and Evaluation (T&E) applications, binary response Read More variables are unavoidable. Many have considered D-optimal design of experiments (DOEs) for generalized linear models (GLMs). However, little consideration has been given to assessing how these new designs perform in terms of statistical power for a given hypothesis test. Monte Carlo simulations and exact power calculations suggest that D-optimal designs generally yield higher power than binary D-optimal designs, despite using logistic regression in the analysis after data have been collected. Results from using statistical power to compare designs contradict traditional DOE comparisons which employ D-efficiency ratios and fractional design space (FDS) plots. Power calculations suggest that practitioners that are primarily interested in the resulting statistical power of a design should use normal D-optimal designs over binary D-optimal designs when logistic regression is to be used in the data analysis after data collection. | Jason Sheldon | Add to Speakers |
80 | Michael Thompson Research Associate, Naval Postgraduate School |
Presentation No Publish |
2 | Cyber Testing Embedded Systems with Digital Twins | Sharing Analysis Tools, Methods, and Collaboration Strategies | Dynamic cyber testing and analysis require instrumentation to facilitate measurements, e.g., to Read More determine which portions of code have been executed, or detection of anomalous conditions which might not manifest at the system interface. However, instrumenting software causes execution to diverge from the execution of the deployed binaries. And instrumentation requires mechanisms for storing and retrieving testing artifacts on target systems. RESim is a dynamic testing and analysis platform that does not instrument software. Instead, RESim instruments high fidelity models of target hardware upon which software-under-test executes, providing detailed insight into program behavior. Multiple modeled computer platforms run within a single simulation that can be paused, inspected and run forward or backwards to selected events such as the modification of a specific memory address. Integration of the Google’s AFL fuzzer with RESim avoids the need to create fuzzing harnesses because programs are fuzzed in their native execution environment, commencing from selected execution states with data injected directly into simulated memory instead of I/O streams. RESim includes plugins for the IDA Pro and NSA’s Ghidra disassembler/debuggers to facilitate interactive analysis of individual processes and threads, providing the ability to skip to selected execution states (e.g., a reference to an input buffer) and “reverse execution” to reach a breakpoint by appearing to run backwards in time. RESim simulates networks of computers through use of Wind River’s Simics platform of high fidelity models of processors, peripheral devices (e.g., network interface cards), and memory. The networked simulated computers load and run firmware and software from images extracted from the physical systems being tested. Instrumenting the simulated hardware allows RESim to observe software behavior from the other side of the hardware, i.e., without affecting its execution. Simics includes tools to extend and create high fidelity models of processors and devices, providing a clear path to deploying and managing digital twins for use in developmental test and evaluation. The simulations can include optional real-world network and bus interfaces to facilitate integration into networks and test ranges. Simics is a COTS product that runs on commodity hardware and is able to execute several parallel instances of complex multi-component systems on a typical engineering workstation or server. This presentation will describe RESim and strategies for using digital twins for cyber testing of embedded systems. And the presentation will discuss some of the challenges associated with fuzzing non-trivial software systems. | Mark Herrera | Add to Speakers |
81 | Kathryn Lahman Program Manager, Johns Hopkins University Applied Physics Laboratory |
Presentation Publish |
1 | T&E Landscape for Advanced Autonomy | Sharing Analysis Tools, Methods, and Collaboration Strategies | The DoD is making significant investments in the development of autonomous systems, spanning from Read More basic research, at organizations such as DARPA and ONR, to major acquisition programs, such as PEO USC. In this talk we will discuss advanced autonomous systems as complex, fully autonomous systems and systems of systems, rather than specific subgenres of autonomous functions – i.e. basic path planning autonomy or vessel controllers for moving vessels from point A to B. As a community, we are still trying to understand how to integrate these systems in the field with the warfighter to fully optimize their added capabilities. A major goal of using autonomous systems is to support multi-domain, distributed operations. We have a vision for how this may work, but we don’t know when, or if, these systems will be ready any time soon to implement these visions. We must identify trends, analyze bottlenecks, and find scalable approaches to fielding these capabilities, such as identifying certification criterion, or optimizing methods of testing and evaluating (T&E) autonomous systems. Traditional T&E methods are not sufficient for cutting edge autonomy and artificial intelligence (AI). Not only do we have to test the traditional aspects of system performance (speed, endurance, range, etc.) but also the decision-making capabilities that would have previously been performed by humans. This complexity increases when an autonomous system changes based on how it is applied in the real world. Each domain, environment, and platform an autonomy is run on, presents unique autonomy considerations. Complexity is further compounded when we begin to stack these autonomies and integrate them into a fully autonomous system of systems. Currently, there are no standard processes or procedures for testing these nested, complex autonomies; yet there are numerous areas for growth and improvement in this space. We will dive into identified capability gaps in Advanced Autonomy T&E that we have recognized and provide approaches for how the DOD may begin to tackle these issues. It is important that we make critical contributions towards testing, trusting and certifying these complex autonomous systems. Primary focus areas that are addressed include: – Recommending the use of bulk testing through Modeling and Simulation (M&S), while ensuring that the virtual environment is representative of the operational environment. – Developing intelligent tests and test selection tools to locate and discriminate areas of interest faster than through traditional Monte-Carlo sampling methods. – Building methods for testing black box autonomies faster than real time, and with fewer computational requirements. – Providing data analytics that assess autonomous systems in ways that human provide decision makers a means for certification. – Expanding the concept of what trust means, how to assess and, subsequently, validate trustworthiness of these systems across stakeholders. – Testing experimental autonomous systems in a safe and structured manner that encourages rapid fielding and iteration on novel autonomy components. | Dr. Jeremy Werner, DOT&E | Add to Speakers |
82 | Ryan Lekivetz Manager, Advanced Analytics R&D, JMP Statistical Discovery |
Presentation No Publish |
1 | On the Validation of Statistical Software | Sharing Analysis Tools, Methods, and Collaboration Strategies | Validating statistical software involves a variety of challenges. Of these, the most difficult is Read More the selection of an effective set of test cases, sometimes referred to as the “test case selection problem”. To further complicate matters, for many statistical applications, development and validation are done by individuals who often have limited time to validate their application and may not have formal training in software validation techniques. As a result, it is imperative that the adopted validation method is efficient, as well as effective, and it should also be one that can be easily understood by individuals not trained in software validation techniques. As it turns out, the test case selection problem can be thought of as a design of experiments (DOE) problem. This talk discusses how familiar DOE principles can be applied to validating statistical software. | Andrea Brown | Add to Speakers |
83 | Yeng Saanchi Analytic Software Tester, JMP Statistical Discovery LLC |
Presentation No Publish |
1 | Validating the Prediction Profiler with Disallowed Combination: A Case Study | Sharing Analysis Tools, Methods, and Collaboration Strategies | The prediction profiler is an interactive display in JMP statistical software that allows a user to Read More explore the relationships between multiple factors and responses. A common use case of the profiler is for exploring the predicted model from a designed experiment. For experiments with a constrained design region defined by disallowed combinations, the profiler was recently enhanced to obey such constraints. In this case study, we show how a DOE based approach to validating statistical software was used to validate this enhancement. | Andrea Brown | Add to Speakers |
84 | Jacob Warren Assistant Scientific Advisor, Marine Corps Operational Test and Evaluation Activity |
Presentation Publish |
2 | Circular Error Probable and an Example with Multilevel Effects | Sharing Analysis Tools, Methods, and Collaboration Strategies | Circular Error Probable (CEP) is a measure of a weapon system’s precision developed based on the Read More Bivariate Normal Distribution. Failing to understanding the theory behind CEP can result in misuse of equations developed to help estimation. Estimation of CEP is also much more straightforward given situations such as single samples where factors are not being manipulated. This brief aims to help build a theoretical understanding of CEP, and then presents a non-trivial example in which CEP is estimated via multilevel regression. The goal is to help build an understanding of CEP so it can be properly estimated in trivial (single sample) and non-trivial cases (e.g. regression and multilevel regression). | Shane Hall | Add to Speakers |
85 | Adam Pintar Mathematical Statistician, National Institute of Standards and Technology |
Presentation Publish |
2 | An Introduction to Ranking Data and a Case Study of a National Survey of First Responders | Sharing Analysis Tools, Methods, and Collaboration Strategies | Ranking data are collected by presenting a respondent with a list of choices, and then asking what Read More are the respondent’s favorite, second favorite, and so on. The rankings may be complete, the respondent rank orders the complete list, or partial, only the respondent’s favorite two or three, etc. Given a sample of rankings from a population, one goal may be to estimate the most favored choice from the population. Another may be to compare the preferences of one subpopulation to another. In this presentation I will introduce ranking data and probability models that form the foundation for statistical inference for them. The Plackette-Luce model will be the main focus. After that I will introduce a real data set containing ranking data assembled by the National Institute of Standards and Technology (NIST) based on the results of a national survey of first responders. The survey asked about how first responders use communication technology. With this data set, questions such as do rural and urban/suburban first responders prefer the same types communication devices, can be explored. I will conclude with some ideas for incorporating rankning data into test and evaluation settings. | James Warner | Add to Speakers |
86 | David Jin Senior AI Engineer, The MITRE Corporation |
Presentation Publish |
2 | CDAO Joint AI Test Infrastructure Capability | Advancing Test & Evaluation of Emerging and Prevalent Technologies | The Chief Digital and AI Office (CDAO) Test & Evaluation Directorate is developing the Joint AI Read More Test Infrastructure Capability (JATIC) program of record, which is an interoperable set of state-of-the-art software capabilities for AI Test & Evaluation. It aims to provide a provide a comprehensive suite of integrated testing tools which can be deployed widely across the enterprise to address key T&E gaps. In particular, JATIC will capabilities will support the assessment of AI system performance, cybersecurity, adversarial resilience, and explainability – enabling the end-user to more effectively execute their mission. It is a key component of the digital testing infrastructure that the CDAO will provide in order to support the development and deployment of data, analytics, and AI across the Department. | Chad Bieber | Add to Speakers |
87 | Peter A. Calhoun Operational Test Analyst, HQ AFOTEC |
Presentation No Publish |
2 | Confidence Intervals for Derringer and Suich Desirability Function Optimal Points | Sharing Analysis Tools, Methods, and Collaboration Strategies | A shortfall of the Derringer and Suich (1980) desirability function for multi-objective optimization Read More has been a lack of inferential methods to quantify uncertainty. Most articles for addressing uncertainty involve robust methods, providing a point estimate that is less affected by variation. Few articles address confidence intervals or bands but not specifically for the widely used Derringer and Suich method. 8 methods are presented to construct 100(1-alpha) confidence intervals around Derringer and Suich desirability function optimal values. First order and second order models using bivariate and multivariate data sets are used as examples to demonstrate effectiveness. The 8 proposed methods include a simple best/worst case method, 2 generalized methods, 4 simulated surface methods, and a nonparametric bootstrap method. One of the generalized methods, 2 of the simulated surface methods, and the nonparametric method account for covariance between the response surfaces. All 8 methods seem to perform decently on the second order models; however, the methods which utilize an underlying multivariate-t distribution, Multivariate Generalized (MG) and Multivariate t Simulated Surface (MVtSSig) are recommended methods from this research as they perform well with small samples for both first order and second order models with coverage only becoming unreliable at consistently non-optimal solutions. MG and MVtSSig inference could also be used in conjunction with robust methods such as Pareto Front Optimization to help ascertain which solutions are more likely to be optimal before constructing confidence interval. | Rachel Milliron | Add to Speakers |
88 | Giuseppe Cataldo Head, Planetary Protection, MSR CCRS, NASA |
Presentation No Publish |
2 | The Containment Assurance Risk Framework of the Mars Sample Return Program | Sharing Analysis Tools, Methods, and Collaboration Strategies | The Mars Sample Return campaign aims at bringing rock and atmospheric samples from Mars to Earth Read More through a series of robotic missions. These missions would collect the samples being cached and deposited on Martian soil by the Perseverance rover, place them in a container, and launch them into Martian orbit for subsequent capture by an orbiter that would bring them back. Given there exists a non-zero probability that the samples contain biological material, precautions are being taken to design systems that would break the chain of contact between Mars and Earth. These include techniques such as sterilization of Martian particles, redundant containment vessels, and a robust reentry capsule capable of accurate landings without a parachute. Requirements exist that the probability of containment not assured of Martian-contaminated material into Earth’s biosphere be less than one in a million. To demonstrate compliance with this strict requirement, a statistical framework was developed to assess the likelihood of containment loss during each sample return phase and make a statement about the total combined mission probability of containment not assured. The work presented here describes this framework, which considers failure modes or fault conditions that can initiate failure sequences ultimately leading to containment not assured. Reliability estimates are generated from databases, design heritage, component specifications, or expert opinion in the form of probability density functions or point estimates and provided as inputs to the mathematical models that simulate the different failure sequences. The probabilistic outputs are then combined following the logic of several fault trees to compute the ultimate probability of containment not assured. Given the multidisciplinary nature of the problem and the different types of mathematical models used, the statistical tools needed for analysis are required to be computationally efficient. While standard Monte Carlo approaches are used for fast models, a multi-fidelity approach to rare event probabilities is proposed for expensive models. In this paradigm, inexpensive low-fidelity models are developed for computational acceleration purposes while the expensive high-fidelity model is kept in the loop to retain accuracy in the results. This work presents an example of end-to-end application of this framework highlighting the computational benefits of a multi-fidelity approach. The decision to implement Mars Sample Return will not be finalized until NASA’s completion of the National Environmental Policy Act process. This document is being made available for information purposes only. | Kelly McCoy | Add to Speakers |
89 | Dr. Olivia Gozdz, Dr. Kyle Remley, and Dr. Benjamin Ashwell Research Staff Member, Institute for Defense Analyses |
Presentation Publish |
2 | Back to the Future: Implementing a Time Machine to Improve and Validate Model Predictions | Solving Program Evaluation Challenges | At a time when supply chain problems are challenging even the most efficient and robust supply Read More ecosystems, the DOD faces the additional hurdles of primarily dealing in low volume orders of highly complex components with multi-year procurement and repair lead times. When combined with perennial budget shortfalls, it is imperative that the DOD spend money efficiently by ordering the “right” components at the “right time” to maximize readiness. What constitutes the “right” components at the “right time” depends on model predictions that are based upon historical demand rates and order lead times. Given that the time scales between decisions and results are often years long, even small modeling errors can lead to months-long supply delays or tens of millions of dollars in budget shortfalls. Additionally, we cannot evaluate the accuracy and efficacy of today’s decisions for some years to come. To address this problem, as well as a wide range of similar problems across our Sustainment analysis, we have built “time machines” to pursue retrospective validation – for a given model, we rewind DOD data sources to some point in the past and compare model predictions, using only data available at the time, against known historical outcomes. This capability allows us to explore different decisions and the alternate realities that would manifest in light of those choices. In some cases, this is relatively straightforward, while in others it is made quite difficult by problems familiar to any time-traveler: changing the past can change the future in unexpected ways. | Rebecca Medlin | Add to Speakers |
90 | Daniel Timme PhD Candidate, Florida State University |
Speed Presentation No Publish |
3 | A Bayesian Approach for Nonparametric Multivariate Process Monitoring using Universal Resi | Sharing Analysis Tools, Methods, and Collaboration Strategies | In Quality Control, monitoring sequential-functional observations for characteristic changes through Read More change-point detection is a common practice to ensure that a system or process produces high-quality outputs. Existing methods in this field often only focus on identifying when a process is out-of-control without quantifying the uncertainty of the underlying decision-making processes. To address this issue, we propose using universal residuals under a Bayesian paradigm to determine if the process is out-of-control and assess the uncertainty surrounding that decision. The universal residuals are computed by combining two non-parametric techniques: regression trees and kernel density estimation. These residuals have the key feature of being uniformly distributed when the process is in control. To test if the residuals are uniformly distributed across time (i.e., that the process is in-control), we use a Bayesian approach for hypothesis testing, which outputs posterior probabilities for events such as the process being out-of-control at the current time, in the past, or in the future. We perform a simulation study and demonstrate that the proposed methodology has remarkable detection and a low false alarm rate. | Andrea Brown | Add to Speakers |
91 | Miriam Armstrong Research Staff Member, Institute of Defense Analyses |
Presentation Publish |
1 | Towards Scientific Practices for Situation Awareness Evaluation in Operational Testing | Improving the Quality of Test & Evaluation | Situation Awareness (SA) plays a key role in decision making and human performance; higher operator Read More SA is associated with increased operator performance and decreased operator errors. In the most general terms, SA can be thought of as an individual’s “perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future.” While “situational awareness” is a common suitability parameter for systems under test, there is no standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and suboptimal treatments of SA across programs and test events. Current measures of SA are exclusively subjective and paint an inadequate picture. Future advances in system connectedness and mission complexity will exacerbate the problem. We believe that technological improvements will necessitate increases in the complexity of the warfighters’ mission, including changes to team structures (e.g., integrating human teams with human-machine teams), command and control (C2) processes (e.g., expanding C2 frameworks toward joint all-domain C2), and battlespaces (e.g., overcoming integration challenges for multi-domain operations). Operational complexity increases the information needed for warfighters to maintain high SA, and assessing SA will become increasingly important and difficult to accomplish. IDA’s Test science team has proposed a piecewise approach to improve the measurement of situation awareness in operational evaluations. The aim of this presentation is to promote a scientific understanding of what SA is (and is not) and encourage discussion amongst practitioners tackling this challenging problem. We will briefly introduce Endsley’s Model of SA, review the trade-offs involved in some existing measures of SA, and discuss a selection of potential ways in which SA measurement during OT may be improved. | Rebecca Medlin | Add to Speakers |
92 | Michael R. Smith Principal Member, Sandia National Laboratories |
Keynote Publish |
1 | Test and Evaluation of Systems with Embedded Artificial Intelligence Components | Advancing Test & Evaluation of Emerging and Prevalent Technologies | As Artificial Intelligence (AI) continues to advance, it is being integrated into more systems. Read More Often, the AI component represents a significant portion of the system that reduces the burden on the end user or significantly improves the performance of a task. The AI component represents an unknown complex phenomenon that is learned from collected data without the need to be explicitly programmed. Despite the improvement in performance, the models are black boxes. Evaluating the credibility and the vulnerabilities of AI models poses a gap in current test and evaluation practice. For high consequence applications, the lack of testing and evaluation procedures represents a significant source of uncertainty and risk. To help reduce that risk, we have developed a red-teaming inspired methodology to evaluate systems embedded with an AI component. This methodology highlights the key expertise and components that are needed beyond what a typical red team generally requires. Opposed to academic evaluation of AI models, we present a system-level evaluation rather than the AI model in isolation. We outline three axes along which to evaluate an AI component: 1) Evaluating the performance of the AI component to ensure that the model functions as intended and is developed based on bast practices developed by the AI community. This process entails more than simply evaluating the learned model. As the model operates on data used for training as well as perceived by the system, peripheral functions such as feature engineering and the data pipeline need to be included. 2) AI components necessitate supporting infrastructure in deployed systems. The support infrastructure may introduce additional vulnerabilities that are overlooked in traditional test and evaluation processes. Further, the AI component may be subverted by modifying key configuration files or data pipeline components. 3) AI models introduce possible vulnerabilities to adversarial attacks. These could be attacks designed to evade detection by the model, steal the model, poison the model, steal the model or data, or misuse the model to act inappropriately. Within the methodology, we highlight tools that may be applicable as well as gaps that need to be addressed by the community. SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525 | Mark R Herrera | Add to Speakers |
93 | Brian Conway Research Staff Member, IDA |
Presentation No Publish |
1 | Planning for Public Sector Test and Evaluation in the Commercial Cloud | Advancing Test & Evaluation of Emerging and Prevalent Technologies | As the public sector shifts IT infrastructure toward commercial cloud solutions, the government test Read More community needs to adjust its test and evaluation (T&E) methods to provide useful insights into a cloud-hosted system’s cyber posture. Government entities must protect what they develop in the cloud by enforcing strict access controls and deploying securely configured virtual assets. However, publicly available research shows that doing so effectively is difficult, with accidental misconfigurations leading to the most commonly observed exploitations of cloud-hosted systems. Unique deployment configurations and identity and access management across different cloud service providers increases the burden of knowledge on testers. More care must be taken during the T&E planning process to ensure that test teams are poised to succeed in understanding the cyber posture of cloud-hosted systems and finding any vulnerabilities present in those systems. The T&E community must adapt to this new paradigm of cloud-hosted systems to ensure that vulnerabilities are discovered and mitigated before an adversary has the opportunity to use those vulnerabilities against the system. | Mark Herrera | Add to Speakers |
94 | Shing-hon Lau Senior Cybersecurity Engineer, Carnegie Mellon University — Software Engineering Institute |
Presentation Publish |
2 | Test and Evaluation of AI Cyber Defense Systems | Improving the Quality of Test & Evaluation | Adoption of Artificial Intelligence and Machine Learning powered cybersecurity defenses (henceforth, Read More AI defenses) has outpaced testing and evaluation (T&E) capabilities. Industrial and governmental organizations around the United States are employing AI defenses to protect their networks in ever increasing numbers, with the commercial market for AI defenses currently estimated at $15 billion and expected to grow to $130 billion by 2030. This adoption of AI defenses is powered by a shortage of over 500,000 cybersecurity staff in the United States, by a need to expeditiously handle routine cybersecurity incidents with minimal human intervention and at machine speeds, and by a need to protect against highly sophisticated attacks. It is paramount to establish, through empirical testing, trust and understanding of the capabilities and risks associated with employing AI defenses. While some academic work exists for performing T&E of individual machine learning models trained using cybersecurity data, we are unaware of any principled method for assessing the capabilities of a given AI defense within an actual network environment. The ability of AI defenses to learn over time poses a significant T&E challenge, above and beyond those faced when considering traditional static cybersecurity defenses. For example, an AI defense may become more (or less) effective at defending against a given cyberattack as it learns over time. Additionally, a sophisticated adversary may attempt to evade the capabilities of an AI defense by obfuscating attacks to maneuver them into its blind spots, by poisoning the training data utilized by the AI defense, or both. Our work provides an initial methodology for performing T&E of on-premises network-based AI defenses on an actual network environment, including the use of a network environment with generated user network behavior, automated cyberattack tools to test the capabilities of AI cyber defenses to detect attacks on that network, and tools for modifying attacks to include obfuscation or data poisoning. Discussion will also center on some of the difficulties with performing T&E on an entire system, instead of just an individual model. | Mark Herrera | Add to Speakers |
95 | Miriam Armstrong Research Staff Member, Institute of Defense Analyses |
Poster Presentation No Publish |
1 | Framework for Operational Test Design: An Example Application of Design Thinking | Improving the Quality of Test & Evaluation | Design thinking is a problem-solving approach that promotes the principles of human-centeredness, Read More iteration, and diversity. The poster provides a five-step framework for how to incorporate these design principles when building an operational test. In the first step, test designers conduct research on test users and the problems they encounter. In the second step, designers articulate specific user needs to address in the test design. In the third step, designers generate multiple solutions to address user needs. In the forth step, designers create prototypes of their best solutions. In the fifth step, designers refine prototypes through user testing. | Rebecca Medlin | Add to Speakers |
96 | Zed Fashena Research Associate, IDA |
Presentation No Publish |
3 | Applications of Network Methods for Supply Chain Review | Sharing Analysis Tools, Methods, and Collaboration Strategies | The DoD maintains a broad array of systems, each one sustained by an often complex supply chain of Read More components and suppliers. The ways that these supply chains are interlinked can have major implications for the resilience of the defense industrial base as a whole, and the readiness of multiple weapon systems. Finding opportunities to improve overall resilience requires gaining visibility of potential weak links in the chain, which requires integrating data across multiple disparate sources. By using open-source data pipeline software to enhance reproducibility, and flexible network analysis methods, multiple stovepiped data sources can be brought together to develop a more complete picture of the supply chain across systems. | Margaret Zientek | Add to Speakers |
97 | Dean Thomas Researcher, George Mason University |
Poster Presentation No Publish |
1 | Comparison of Magnetic Field Line Tracing Methods | Sharing Analysis Tools, Methods, and Collaboration Strategies | At George Mason University, we are developing swmfio, a Python package, for processing Space Weather Read More Modeling Framework (SWMF) magnetosphere and ionosphere results, which is used to study the sun, heliosphere, and the magnetosphere. The SWMF framework centers around a high-performance magnetohydrodynamic (MHD) model, the Block Adaptive Tree Solar-wind Roe Upwind Scheme (BATS-R-US). This analysis uses swmfio and other methods, to trace magnetic field lines, compare the results, and identify why the methods differ. While the earth’s magnetic field protects the planet from solar radiation, solar storms can distort the earth’s magnetic field allowing solar storms to damage satellites and electrical grids. Being able to trace magnetic field lines helps us understand space weather. In this analysis, the September 1859 Carrington Event is examined. This event is the most intense geomagnetic storm in recorded history. We use three methods to trace magnetic field lines in the Carrington Event, and compare the field lines generated by the different methods. We consider two factors in the analysis. First, we directly compare methods by measuring the distances between field lines generated by different methods. Second, we consider how sensitive the methods are to initial conditions. We note that swmfio’s linear interpolation, which is customized for the BATS-R-US adaptive mesh, provides expected results. It is insensitive to small changes in initial conditions and terminates field lines at boundaries. We observe, that for any method, when the mesh size becomes large, results may not be accurate. | Rebecca Medlin | Add to Speakers |
98 | James Wisnowski (co authors Andrew Karl) Principal Consultant, Adsurgo |
Presentation Publish |
1 | Effective Application of Self-Validated Ensemble Models in Challenging Test Scenarios | Advancing Test & Evaluation of Emerging and Prevalent Technologies | We test the efficacy of SVEM versus alternative variable selection methods in a mixture experiment Read More setting. These designs have built-in dependencies that require modifications of the typical design and analysis methods. The usual design metric of power is not helpful for these tests and analyzing results becomes quite challenging, particularly for factor characterization. We provide some guidance and lessons learned from hypersonic fuel formulation experience. We also show through simulation favorable combinations of design and Generalized Regression analysis options that lead to the best results. Specifically, we quantify the impact of changing run size, including complex design region constraints, using space-filling vs optimal designs, including replicates and/or center runs, and alternative analysis approaches to include full model, backward stepwise, SVEM forward selection, SVEM Lasso, and SVEM neural network. | Andrea Brown | Add to Speakers |
99 | Aayushi Verma Data Science Fellow, IDA |
Presentation Publish |
1 | I-TREE: a tool for characterizing research using taxonomies | Solving Program Evaluation Challenges | IDA is developing a Data Strategy to develop solid infrastructures and practices that allow for a Read More rigorous data-centric approach to answering U.S. security and science policy questions. The data strategy implements data governance and data architecture strategies to leverage data to gain trusted insights, and establishes a data-centric culture. One key component of the Data Strategy is a set of research taxonomies that describe and characterize the research done at IDA. These research taxonomies, broadly divided into six categories, are a vital tool to help IDA researchers gain insight into the research expertise of staff and divisions, in terms of the research products that are produced for our sponsors. We have developed an interactive web application which consumes numerous disparate sources of data related to these taxonomies, research products, researchers, and divisions, and unites them to create quantified analytics and visualizations to answer questions about research at IDA. This tool, titled I-TREE (IDA-Taxonomical Research Expertise Explorer), will enable staff to answer questions like ‘Who are the researchers most commonly producing products for a specified research area?’, ‘What is the research profile of a specified author?’, ‘What research topics are most commonly addressed by a specified division?’, ‘Who are the researchers most commonly producing products in a specified division?’, and ‘What divisions are producing products for a specified research topic?’. These are essential questions whose answers allow IDA to identify subject-matter expertise areas, methodologies, and key skills in response to sponsor requests, and to identify common areas of expertise to build a research team with a broad range of skills. I-TREE demonstrates the use of data science and data management techniques that enhance the company’s data strategy while actively enabling researchers and management to make informed decisions. | Margaret Zientek | Add to Speakers |
100 | James Ferry Senior Research Scientist, Metron, Inc. |
Presentation Publish |
3 | A Bayesian Decision Theory Framework for Test & Evaluation | Improving the Quality of Test & Evaluation | Decisions form the core of T&E: decisions about which tests to conduct and, especially, Read More decisions on whether to accept or reject a system at its milestones. The traditional approach to acceptance is based on conducting tests under various conditions to ensure that key performance parameters meet certain thresholds with the required degree of confidence. In this approach, data is collected during testing, then analyzed with techniques from classical statistics in a post-action report. This work explores a new Bayesian paradigm for T&E based on one simple principle: maintaining a model of the probability distribution over system parameters at every point during testing. In particular, the Bayesian approach posits a distribution over parameters prior to any testing. This prior distribution provides (a) the opportunity to incorporate expert scientific knowledge into the inference procedure, and (b) transparency regarding all assumptions being made. Once a prior distribution is specified, it can be updated as tests are conducted to maintain a probability distribution over the system parameters at all times. One can leverage this probability distribution in a variety of ways to produce analytics with no analog in the traditional T&E framework. In particular, having a probability distribution over system parameters at any time during testing enables one to implement an optimal decision-making procedure using Bayesian Decision Theory (BDT). BDT accounts for the cost of various testing options relative to the potential value of the system being tested. When testing is expensive, it provides guidance on whether to conserve resources by ending testing early. It evaluates the potential benefits of testing for both its ability to inform acceptance decisions and for its intrinsic value to the commander of an accepted system. This talk describes the BDT paradigm for T&E and provides examples of how it performs in simple scenarios. In future work we plan to extend the paradigm to include the features, the phenomena, and the SME elicitation protocols necessary to address realistic T&E cases. | Dr. Jeremy Werner | Add to Speakers |
101 | Clifford Bridges Research Staff Member, Institute for Defense Analyses |
Presentation No Publish |
3 | User-Friendly Decision Tools | Sharing Analysis Tools, Methods, and Collaboration Strategies | Personal experience and anecdotal evidence suggest that presenting analyses to sponsors, especially Read More technical sponsors, is improved by helping the sponsor understand how results were derived. Providing summaries of analytic results is necessary but can be insufficient when the end goal is to help sponsors make firm decisions. When time permits, engaging sponsors with walk-throughs of how results may change given different inputs is particularly salient in helping sponsors make decisions in the context of the bigger picture. Data visualizations and interactive software are common examples of what we call “decision tools” that can walk sponsors through varying inputs and views of the analysis. Given long-term engagement and regular communication with a sponsor, developing user-friendly decision tools is a helpful practice to support sponsors. This talk presents a methodology for building decision tools that combines leading practices in agile development and STEM education. We will use a Python-based app development tool called Streamlit to show implementations of this methodology. | Margaret Zientek | Add to Speakers |
102 | Lauren Ferguson Digital Transformation Lead, Materials & Manufacturing Directorate, Air Force Research Laboratory |
Presentation Publish |
1 | Seamlessly Integrated Materials Labs at AFRL | Sharing Analysis Tools, Methods, and Collaboration Strategies | One of the challenges to conducting research in the Air Force Research Laboratory is that many of Read More our equipment controllers cannot be directly connected to our internal networks, due to older or specialized operating systems and the need for administrative privileges for proper functioning. This means that the current data collection process is often highly manual, with users documenting experiments in physical notebooks and transferring data via CDs or portable hard drives to connected systems for sharing or further processing. In the Materials & Manufacturing Directorate, we have developed a unique approach to seamlessly integrate our labs for more efficient data collection and transfer, which is specifically designed to help users ensure that data is findable for future reuse. In this talk, we will highlight our two enabling tools: NORMS, which assists users to easily generate metadata for direct association with data collected in the lab to eliminate physical notebooks; and Spike, which automates one-way data transfer from isolated systems to databases mirrored on other networks. In these databases, metadata can be used for complex search queries and data is automatically shared with project members without requiring additional transfers. The impact of this solution has been significantly faster data availability (including searchability) to all project members: a transfer and scanning process that used to take 3 hours can now take a few minutes. Future use cases will also enable Spike to transfer data directly into cloud buckets for in situ analysis, which would streamline collaboration with partners. | Tyler Lesthaeghe | Add to Speakers |
103 | Emily Saldanha Senior Data Scientist, Pacific Northwest National Laboratory |
Presentation No Publish |
2 | Test and Evaluation Methods for Authorship Attribution and Privacy Preservation | Advancing Test & Evaluation of Emerging and Prevalent Technologies | The aim of the IARPA HIATUS program is to develop explainable systems for authorship attribution and Read More author privacy preservation through the development of feature spaces which encode the distinguishing stylistic characteristics of authors independently of text genre, topic, or format. In this talk, I will discuss progress towards defining an evaluation framework for this task to provide robust insights into system strengths, weaknesses, and overall performance. Our evaluation strategy includes the use of an adversarial framework between attribution and privacy systems, development of a focused set of core metrics, analysis of system performance dependencies on key data factors, systematic exploration of experimental variables to probe targeted questions about system performance, and investigation of key trade-offs between different performance measures. | Karl Pazdernik | Add to Speakers |
104 | Carianne Martinez Principal Computer Scientist, Sandia National Laboratories |
Presentation No Publish |
2 | Tools for Assessing Machine Learning Models’ Performance in Real-World Settings | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Machine learning (ML) systems demonstrate powerful predictive capability, but fielding such systems Read More does not come without risk. ML can catastrophically fail in some scenarios, and in the absence of formal methods to validate most ML models, we require alternative methods to increase trust. While emerging techniques for uncertainty quantification and model explainability may seem to lie beyond the scope of many ML projects, they are essential tools for understanding deployment risk. This talk will share a practical workflow, useful tools, and lessons learned for ML development best practices. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2023-11982A | Justin Newcomer | Add to Speakers |
105 | Nathan Pond Program Manager – Business Enterprise Systems, Edaptive Computing, Inc. |
Presentation No Publish |
1 | Digital Transformation Enabled by Enterprise Automation | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Digital transformation is a broad term that means a variety of things to people in many different Read More operational domains, but the underlying theme is consistent: using digital technologies to improve business processes, culture, and efficiency. Digital transformation results in streamlining communications, collaboration, and information sharing while reducing errors. Properly implemented digital processes provide oversight and cultivate accountability to ensure compliance with business processes and timelines. A core tenet of effective digital transformation is automation. The elimination or reduction of human intervention in processes provides significant gains to operational speed, accuracy, and efficiency. DOT&E uses automation to streamline the creation of documents and reports which need to include up-to-date information. By using Smart Documentation capabilities, authors can define and automatically populate sections of documents with the most up-to-date data, ensuring that every published document always has the most current information. This session discusses a framework for driving digital transformation to automate nearly any business process. | Dr. Jeremy Werner | Add to Speakers |
106 | John W. Dennis Research Staff Member (Economist), Institute for Defense Analyses |
Presentation No Publish |
2 | Assurance of Responsible AI/ML in the DOD Personnel Space | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Testing and assuring responsible use of AI/ML enabled capabilities is a nascent topic in the DOD Read More with many efforts being spearheaded by CDAO. In general, black box models tend to suffer from consequences related to edge cases, emergent behavior, misplaced or lack of trust, and many other issues, so traditional testing is insufficient to guarantee safety and responsibility in the employment of a given AI enabled capability. Focus of this concern tends to fall on well-publicized and high-risk capabilities, such as AI enabled autonomous weapons systems. However, while AI/ML enabled capabilities supporting personnel processes and systems, such as algorithms used for retention and promotion decision support, tend to carry low safety risk, many concerns, some of them specific to the personnel space, run the risk of undermining the DOD’s 5 ethical principles for RAI. Examples include service member privacy concerns, invalid prospective policy analysis, disparate impact against marginalized service member groups, and unintended emergent service member behavior in response to use of the capability. Eroding barriers to use of AI/ML are facilitating an increasing number of applications while some of these concerns are still not well understood by the analytical community. We consider many of these issues in the context of an IDA ML enabled capability and propose mechanisms to assure stakeholders of the adherence to the DOD’s ethical principles. | Rebecca Medlin | Add to Speakers |
107 | Kelli McCoy Senior Systems Engineer, NASA Jet Propulsion Laboratory |
Speed Presentation Publish |
1 | Systems Engineering Applications of UQ in Space Mission Formulation | Sharing Analysis Tools, Methods, and Collaboration Strategies | It is critical to link the scientific phenomenology under investigation, and the operating Read More environment directly to the spacecraft design, mission design, and concept of operations. With many missions of discovery, the large uncertainty in the science phenomenology and the operating environment necessitates mission architecture solutions that are robust and resilient to these unknowns, to maximize probability of achieving the mission objectives. Feasible mission architectures are assessed against performance, cost, and risk, in the context of large uncertainties. For example, despite Cassini observations of Enceladus, significant uncertainties exist in the moon’s surface properties and the surrounding Enceladus environment. Orbilander or any other mission to Enceladus will need to quantify or bound these uncertainties to formulate a viable design and operations trade space that addresses a range of mission objectives within the imposed technical, programmatic constraints. Uncertainty quantification (UQ), utilizes a portfolio of stochastic, data science, and mathematical methods to characterize uncertainty of a system and inform decision-making. This discussion will focus on a formulation of a workflow and an example of an Enceladus mission development use case. | Kelli McCoy | Add to Speakers |
108 | James Oreluk Postdoctoral Researcher, Sandia National Laboratories |
Speed Presentation No Publish |
3 | A Bayesian Optimal Experimental Design for High-dimensional Physics-based Models | Solving Program Evaluation Challenges | Many scientific and engineering experiments are developed to study specific questions of interest. Read More Unfortunately, time and budget constraints make operating these controlled experiments over wide ranges of conditions intractable, thus limiting the amount of data collected. In this presentation, we discuss a Bayesian approach to identify the most informative conditions, based on the expected information gain. We will present a framework for finding optimal experimental designs that can be applied to physics-based models with high-dimensional inputs and outputs. We will study a real-world example where we aim to infer the parameters of a chemically reacting system, but there are uncertainties in both the model and the parameters. A physics-based model was developed to simulate the gas-phase chemical reactions occurring between highly reactive intermediate species in a high-pressure photolysis reactor coupled to a vacuum-ultraviolet (VUV) photoionization mass spectrometer. This time-of-flight mass spectrum evolves in both kinetic time and VUV energy producing a high-dimensional output at each design condition. The high-dimensional nature of the model output poses a significant challenge for optimal experimental design, as a surrogate model is built for each output. We discuss how accurate low-dimensional representations of the high-dimensional mass spectrum are necessary for computing the expected information gain. Bayesian optimization is employed to maximize the expected information gain by efficiently exploring a constrained design space, taking into account any constraint on the operating range of the experiment. Our results highlight the trade-offs involved in the optimization, the advantage of using optimal designs, and provide a workflow for computing optimal experimental designs for high-dimensional physics-based models. | Kelli McCoy | Add to Speakers |
109 | Jo Anna Capp Research Staff Member, Institute for Defense Analyses |
Presentation Publish |
2 | Model verification in a digital engineering environment: an operational test perspective | Sharing Analysis Tools, Methods, and Collaboration Strategies | As the Department of Defense adopts digital engineering strategies for acquisition systems in Read More development, programs are embracing the use highly federated models to assess the end-to-end performance of weapon systems, to include the threat environment. Often, due to resource limitations or political constraints, there is limited live data with which to validate the end-to-end performance of these models. In these cases, careful verification of the model, including from an operational factor-space perspective, early in model development can assist testers in prioritizing resources for model validation in later system development. This presentation will discuss how using Design of Experiments to assess the operational factor space can shape model verification efforts and provide data for model validation focused on the end-to-end performance of the system. | Rebecca Medlin | Add to Speakers |
110 | Malachi Head of Data Science Dept., Thomas Jefferson National Accelerator Facility |
Presentation Publish |
2 | Uncertainty Aware Machine Learning for Accelerators | Improving the Quality of Test & Evaluation | Standard deep learning models for classification and regression applications are ideal for capturing Read More complex system dynamics. Unfortunately, their predictions can be arbitrarily inaccurate when the input samples are not similar to the training data. Implementation of distance aware uncertainty estimation can be used to detect these scenarios and provide a level of confidence associated with their predictions. We present results using Deep Gaussian Process Approximation (DGPA) methods for 1) anomaly detection at Spallation Neutron Source (SNS) accelerator and 2) uncertainty aware surrogate model for the Fermi National Accelerator Lab (FNAL) Booster Accelerator Complex. | Jonathan Rathsam | Add to Speakers |
111 | James Warner Computational Scientist, NASA Langley Research Center |
Mini-Tutorial No Publish |
2 | An Introduction to Uncertainty Quantification for Modeling & Simulation | Solving Program Evaluation Challenges | Predictions from modeling and simulation (M&S) are increasingly relied upon to inform critical Read More decision making in a variety of industries including defense and aerospace. As such, it is imperative to understand and quantify the uncertainties associated with the computational models used, the inputs to the models, and the data used for calibration and validation of the models. The rapidly evolving field of uncertainty quantification (UQ) combines elements of statistics, applied mathematics, and discipline engineering to provide this utility for M&S. This mini tutorial provides an introduction to UQ for M&S geared towards engineers and analysts with little-to-no experience in the field but with some knowledge of probability and statistics. A brief review of basic probability will be provided before discussing some core UQ concepts in more detail, including uncertainty propagation and the use of Monte Carlo simulation for making probabilistic predictions with computational models, model calibration to estimate uncertainty in model input parameters using experimental data, and sensitivity analysis for identifying the most important and influential model inputs parameters. Examples from relevant NASA applications are included and references are provided throughout to point viewers to resources for further study. | James Warner | Add to Speakers |
112 | Anthony Salvatore Cappetta Cadet, United States Military Academy at West Point |
Speed Presentation Publish |
2 | Topological Data Analysis’ involvement in Cyber Security | Advancing Test & Evaluation of Emerging and Prevalent Technologies | The purpose of this research is to see the use and application of Topological Data Analysis (TDA) in Read More the real of Cyber Security. The methods used in this research include an exploration of different Python libraries or C++ python interfaces in order to explore the shape of data that is involved using TDA. These methods include, but are not limited to, the GUDHI, GIOTTO, and Scikit-tda libraries. The project’s results will show where the literal holes in cyber security lie and will offer methods on how to better analyze these holes and breaches. | Rebecca Medlin | Add to Speakers |
113 | Karl Pazdernik Senior Data Scientist, Pacific Northwest National Laboratory |
Presentation No Publish |
2 | Well-Calibrated Uncertainty Quantification for Language Models in the Nuclear Domain | Advancing Test & Evaluation of Emerging and Prevalent Technologies | A key component of global and national security in the nuclear weapons age is the proliferation of Read More nuclear weapons technology and development. A key component of enforcing this non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. To support non-proliferation goals and contribute to nuclear science research, we trained a RoBERTa deep neural language model on a large set of U.S. Department of Energy Office of Science and Technical Information (OSTI) research article abstracts and then finetuned this model for classification of scientific abstracts into 60 disciplines, which we call NukeLM. This multi-step approach to training improved classification accuracy over its untrained or partially out-of-domain competitors. While it is important for classifiers to be accurate, there has also been growing interest in ensuring that classifiers are well-calibrated with uncertainty quantification that is understandable to human decision-makers. For example, in the multiclass problem, classes with a similar predicted probability should be semantically related. Therefore, we also introduced an extension of the Bayesian belief matching framework proposed by Joo et al. (2020) that easily scales to large NLP models, such as NukeLM, and better achieves the desired uncertainty quantification properties. | Alyson Wilson | Add to Speakers |
29 | Elie Alhajjar Senior Research Scientist, USMA |
Presentation Publish |
1 | Novelty Detection in Network Traffic: Using Survival Analysis for Feature Identification | Improving the Quality of Test & Evaluation | Over the past decade, Intrusion Detection Systems have become an important component of many Read More organizations’ cyber defense and resiliency strategies. However, one of the greatest downsides of these systems is their reliance on known attack signatures for successful detection of malicious network events. When it comes to unknown attack types and zero-day exploits, modern Intrusion Detection Systems often fall short. Since machine learning algorithms for event classification are widely used in this realm, it is imperative to analyze the characteristics of network traffic that can lead to novelty detection using such classifiers. In this talk, we introduce a novel approach to identifying network traffic features that influence novelty detection based on survival analysis techniques. Specifically, we combine several Cox proportional hazards models to predict which features of a network flow are most indicative of a novel network attack and likely to confuse the classifier as a result. We also implement Kaplan-Meier estimates to predict the probability that a classifier identifies novelty after the injection of an unknown network attack at any given time. The proposed model is successful at pinpointing PSH Flag Count, ACK Flag Count, URG Flag Count, and Down/Up Ratio as the main features to impact novelty detection via Random Forest, Bayesian Ridge, and Linear SVR classifiers. | Add to Speakers | |
30 | Caleb King Research Statistician Developer, JMP Statistical Discovery, LLC |
Presentation No Publish |
3 | Empirical Calibration for a Linearly Extrapolated Lower Tolerance Bound | Sharing Analysis Tools, Methods, and Collaboration Strategies | In many industries, the reliability of a product is often determined by a quantile of a distribution Read More of a product’s characteristics meeting a specified requirement. A typical approach to address this is to assume a distribution model and compute a one-sided confidence bound on the quantile. However, this can become difficult if the sample size is too small to reliably estimate a parametric model. Linear interpolation between order statistics is a viable nonparametric alternative if the sample size is sufficiently large. In most cases, linear extrapolation from the extreme order statistics can be used, but can result in inconsistent coverage. In this talk, we’ll present an empirical study from our submitted manuscript used to generate calibrated weights for linear extrapolation that greatly improves the accuracy of the coverage across a feasible range of distribution families with positive support. We’ll demonstrate this calibration technique using two examples from industry. | Add to Speakers | |
31 | Gregory Hunt Assistant Professor, William & Mary |
Presentation No Publish |
2 | Analysis of Surrogate Strategies and Regularization with Application to High-Speed Flows | Sharing Analysis Tools, Methods, and Collaboration Strategies | Surrogate modeling is an important class of techniques used to reduce the burden of Read More resource-intensive computational models by creating fast and accurate approximations. In aerospace engineering, surrogates have been used to great effect in design, optimization, exploration, and uncertainty quantification (UQ) for a range of problems, like combustor design, spacesuit damage assessment, and hypersonic vehicle analysis. Consequently, the development, analysis, and practice of surrogate modeling is of broad interest. In this talk, several widely used surrogate modeling strategies are studied as archetypes in a discussion on parametric/nonparametric surrogate strategies, local/global model forms, complexity regularization, uncertainty quantification, and relative strengths/weaknesses. In particular, we consider several variants of two widely used classes of methods: polynomial chaos and Gaussian process regression. These surrogate models are applied to several synthetic benchmark test problems and examples of real high-speed flow problems, including hypersonic inlet design, thermal protection systems, and shock-wave/boundary-layer interactions. Through analysis of these concrete examples, we analyze the trade-offs that modelers must navigate to create accurate, flexible, and robust surrogates. | Add to Speakers | |
32 | Carrington Metts Data Science Fellow, IDA |
Speed Presentation No Publish |
3 | Development of a Wald-Type Statistical Test to Compare Live Test Data and M&S Predictions | Sharing Analysis Tools, Methods, and Collaboration Strategies | This work describes the development of a statistical test created in support of ongoing Read More verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test decides between a null hypothesis of agreement between the simulation and reality, and an alternative hypothesis stating the simulation and reality do not agree. To do so, it generates a Wald-type statistic that compares the coefficients of two generalized linear models that are estimated on live test data and analogous simulated data, then determines whether any of the coefficient pairs are statistically different. The test was applied to two logistic regression models that were estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s (NUWC) Environment Centric Weapons Analysis Facility (ECWAF). The test did not show any significant differences between the live and simulated tests for the scenarios modeled by the ECWAF. While more work is needed to fully validate the ECWAF’s performance, this finding suggests that the facility is adequately modeling the various target characteristics and environmental factors that affect in-water torpedo performance. The primary advantage of this test is that it is capable of handling cases where one or more variables are estimable in one model but missing or inestimable from the other. While it is possible to simply create the linear models on the common set of variables, this results in the omission of potentially useful test data. Instead, this approach identifies the mismatched coefficients and combines them with the model’s intercept term, thus allowing the user to consider models that are created on the entire set of available data. Furthermore, the test was developed in a generalized manner without any references to a specific dataset or system. Therefore, other researchers who are conducting VV&A processes on other operational systems may benefit from using this test for their own purposes. | Add to Speakers | |
34 | Paul Fanto Research Staff Member, Institute for Defense Analyses |
Presentation Publish |
1 | Best Practices for Using Bayesian Reliability Analysis in Developmental Testing | Improving the Quality of Test & Evaluation | Traditional methods for reliability analysis are challenged in developmental testing (DT) as systems Read More become increasingly complex and DT programs become shorter and less predictable. Bayesian statistical methods, which can combine data across DT segments and use additional data to inform reliability estimates, can address some of these challenges. However, Bayesian methods are not widely used. I will present the results of a study aimed at identifying effective practices for the use of Bayesian reliability analysis in DT programs. The study consisted of interviews with reliability subject matter experts, together with a review of relevant literature on Bayesian methods. This analysis resulted in a set of best practices that can guide an analyst in deciding whether to apply Bayesian methods, in selecting the appropriate Bayesian approach, and in applying the Bayesian method and communicating the results. | Add to Speakers | |
35 | Sumit Kumar Kar Ph.D. candidate, University of North Carolina at Chapel Hill |
Presentation Publish |
1 | A generalized influence maximization problem | Sharing Analysis Tools, Methods, and Collaboration Strategies | The influence maximization problem is a popular topic in social networks with several applications Read More in viral marketing and epidemiology. One possible way to understand the problem is from the perspective of a marketer who wants to achieve the maximum influence on a social network by choosing an optimum set of nodes of a given size as seeds. The marketer actively influences these seeds, followed by a passive viral process based on a certain influence diffusion model, in which influenced nodes influence other nodes without external intervention. Kempe et al. showed that a greedy algorithm-based approach can provide a (1-1/e)-approximation guarantee compared to the optimal solution if the influence spreads according to the Triggering model. In our current work, we consider a much more general problem where the goal is to maximize the total expected reward obtained from the nodes that are influenced by a given time (that may be finite or infinite) where the reward obtained by influencing a set of nodes can depend on the set (and not necessarily a sum of rewards from the individual nodes) as well as the times at which each node gets influenced, we can restrict ourself to a subset of the network from where the seeds can be chosen, we can choose to assign multiple units of our budget to a single node (where the maximum number of budget units that may be assigned on a node can depend on the node), and a seeded node will actually get influenced with a certain probability where the probability is a non-decreasing function of the number of budget units assigned to that node. We have formulated a greedy algorithm that provides a (1-1/e)-approximation guarantee compared to the optimal solution of this generalized influence maximization problem if the influence spreads according to the Triggering model. | Add to Speakers | |
36 | Ebenezer Yawlui Master’s Student, University of Massachusetts Dartmouth |
Speed Presentation No Publish |
2 | Optimal Release Policy for Covariate Software Reliability Models. | Sharing Analysis Tools, Methods, and Collaboration Strategies | The optimal time to release a software is a common problem of broad concern to software engineers, Read More where the goal is to minimize cost by balancing the cost of fixing defects before or after release as well as the cost of testing. However, the vast majority of these models are based on defect discovery models that are a function of time and can therefore only provide guidance on the amount of additional effort required. To overcome this limitation, this paper presents a software optimal release model based on cost criteria, incorporating the covariate software defect detection model based on the Discrete Cox Proportional Hazards Model. The proposed model provides more detailed guidance recommending the amount of each distinct test activity performed to discover defects. Our results indicate that the approach can be utilized to allocate effort among alternative test activities in order to minimize cost. | Add to Speakers | |
37 | Sushovan Bhadra Master’s student, University of Massachusetts |
Speed Presentation No Publish |
2 | A Stochastic Petri Net Model of Continuous Integration and Continuous Delivery | Sharing Analysis Tools, Methods, and Collaboration Strategies | Modern software development organizations rely on continuous integration and continuous delivery Read More (CI/CD), since it allows developers to continuously integrate their code in a single shared repository and automates the delivery process of the product to the user. While modern software practices improve the performance of the software life cycle, they also increase the complexity of this process. Past studies make improvements to the performance of the CI/CD pipeline. However, there are fewer formal models to quantitatively guide process and product quality improvement or characterize how automated and human activities compose and interact asynchronously. Therefore, this talk develops a stochastic Petri net model to analyze a CI/CD pipeline to improve process performance in terms of the probability of successfully delivering new or updated functionality by a specified deadline. The utility of the model is demonstrated through a sensitivity analysis to identify stages of the pipeline where improvements would most significantly improve the probability of timely product delivery. In addition, this research provided an enhanced version of the conventional CI/CD pipeline to examine how it can improve process performance in general. The results indicate that the augmented model outperforms the conventional model, and sensitivity analysis suggests that failures in later stages are more important and can impact the delivery of the final product. | Add to Speakers | |
38 | Sean Fiorito Contractor, IDA |
Speed Presentation No Publish |
1 | Introducing TestScience.org | Sharing Analysis Tools, Methods, and Collaboration Strategies | The Test Science Team facilitates data-driven decision-making by disseminating various testing and Read More analysis methodologies. One way they disseminate these methodologies is through the annual workshop, DATAWorks; another way is through the website, TestScience.org. The Test Science website includes video training, interactive tools, a related research library as well as the DATAWorks Archive. “Introducing TestScience.org”, a presentation at DATAWorks, could include a poster and an interactive guided session through the site content. The presentation would inform interested DATAWorks attendees of the additional resources throughout the year. It could also be used to inform the audience about ways to participate, such as contributing interactive Shiny tools, training content, or research. “Introducing TestScience.org” would highlight the following sections of the website: 1. The DATAWorks Archives 2. Learn (Video Training) 3. Tools (Interactive Tools) 5. Research (Library) 6. Team (About and Contact) Incorporating into DATAWorks an introduction to TestScience.org would inform attendees of additional valuable resources available to them, and could encourage broader participation in Testscience.org, adding value to both the DATAWorks attendees and the TestScience.org efforts. | Add to Speakers | |
39 | Olga Chen Computer Scientist, U.S. Naval Research Laboratory |
Presentation Publish |
2 | Test and Evaluation Tool for Stealthy Communication | Improving the Quality of Test & Evaluation | Stealthy communication allows the transfer of information while hiding not only the content of that Read More information but also the fact that any hidden information was transferred. One way of doing this is embedding information into network covert channels, e.g., timing between packets, header fields, and so forth. We describe our work on an integrated system for the design, analysis, and testing of such communication. The system consists of two main components: the analytical component, the NExtSteP (NRL Extensible Stealthy Protocols) testbed, and the emulation component, consisting of CORE (Common Open Research Emulator), an existing open source network emulator, and EmDec, a new tool for embedding stealthy traffic in CORE and decoding the result. We developed the NExtSteP testbed as a tool to evaluate the performance and stealthiness of embedders and detectors applied to network traffic. NExtSteP includes modules to: generate synthetic traffic data or ingest it from an external source (e.g., emulation or network capture); embed data using an extendible collection of embedding algorithms; classify traffic, using an extendible collection of detectors, as either containing or not containing stealthy communication; and quantify, using multiple metrics, the performance of a detector over multiple traffic samples. This allows us to systematically evaluate the performance of different embedders (and embedder parameters) and detectors against each other. Synthetic data are easy to generate with NExtSteP. We use these data for initial experiments to broadly guide parameter selection and to study asymptotic properties that require numerous long traffic sequences to test. The modular structure of NExtSteP allows us to make our experiments increasingly realistic. We have done this in two ways: by ingesting data from captured traffic and then doing embedding, classification, and detector analysis using NExtSteP, and by using EmDec to produce external traffic data with embedded communication and then using NExtStep to do the classification and detector analysis. The emulation component was developed to build and evaluate proof-of-concept stealthy communications over existing IP networks. The CORE environment provides a full network, consisting of multiple nodes, with minimal hardware requirements and allows testing and orchestration of real protocols. Our testing environment allows for replay of real traffic and generation of synthetic traffic using MGEN (Multi-Generator) network testing tool. The EmDec software was created with the already existing NRL-developed protolib (protocol library). EmDec, running on CORE networks and orchestrated using a set of scripts, generates sets of data which are then evaluated for effectiveness by NExtSteP. In addition to evaluation by NExtSteP, development of EmDec allowed us to discover multiple novelties that were not apparent while using theoretical models. We describe current status of our work, the results so far, and our future plans. | Add to Speakers | |
40 | Craig Andres Mathematical Statistician, DEVCOM Analysis Center |
Presentation No Publish |
1 | Standard Army vulnerability measures sensitivity to High Explosive zdata characterization | Improving the Quality of Test & Evaluation | AJEM is a joint forces model developed by the US Army that provides survivability/ Read More vulnerability/lethality (S/V/L) predictions for threat/target interactions. This complex model primarily generates a probability response for various components, scenarios, loss of capabilities, or summary conditions. Sensitivity analysis (SA) and uncertainty quantification (UQ), referred to jointly as SA/UQ, are disciplines that provide a working space for understanding the model, including how its estimates change with respect to changes in input variables. A summary from two sensitivity studies will be presented covering the variability to Mean Area of Effects (MAE) from High-Explosive (HE) fragmentation arena tests (zdata files) and anti-tank Single-Shot Pk|h (Probability of Kill given a hit) from varying Behind Armor Debris (BAD) characteristics. The sensitivity of MAE estimates were developed for two different munitions with individual zdata characteristics for each of three horizontal tests and the combined zdata against three target vehicles. The combined zdata also has two vertical tests included for a superior main beam spray characteristic. In addition to the four different zdata characterizations per munition, thirty-three different irregular fragment characterizations were modelled, with/without gravity effects, and single/secondary particles were all included in the analysis. | Add to Speakers | |
41 | Priscila Silva (additional authors: Andrew Bajumpaa, Drew Borden, and Christian Taylor) Graduate Research Assistant, University of Massachusetts Dartmouth |
Speed Presentation No Publish |
2 | Covariate Resilience Modeling | Sharing Analysis Tools, Methods, and Collaboration Strategies | Resilience is the ability of a system to respond, absorb, adapt, and recover from a disruptive Read More event. Dozens of metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict these metrics or the time at which a system will be restored to its nominal performance level after experiencing degradation. This talk presents three alternative approaches to model and predict performance and resilience metrics with techniques from reliability engineering, including (i) bathtub-shaped hazard functions, (ii) mixture distributions, and (iii) a model incorporating covariates related to the intensity of events that degrade performance as well as efforts to restore performance. Historical data sets on job losses during seven different recessions in the United States are used to assess the predictive accuracy of these approaches, including the recession that began in 2020 due to COVID-19. Goodness of fit measures and confidence intervals as well as interval-based resilience metrics are computed to assess how well the models perform on the data sets considered. The results suggest that both bathtub-shaped functions and mixture distributions can produce accurate predictions for data sets exhibiting V, U, L, and J shaped curves, but that W and K shaped curves that respectively experience multiple shocks, deviate from the assumption of a single decrease and subsequent increase, or suffers a sudden drop in performance cannot be characterized well by either of those classes proposed. In contrast, the model incorporating covariates is capable of tracking all of types of curves noted above very well, including W and K shaped curves such as the two successive shocks the U.S. economy experienced in 1980 and the sharp degradation in 2020. Moreover, covariate models outperform the simpler models on all of the goodness of fit measures and interval-based resilience metrics computed for all seven data sets considered. These results suggest that classical reliability modeling techniques such as bathtub-shaped hazard functions and mixture distributions are suitable for modeling and prediction of some resilience curves possessing a single decrease and subsequent recovery, but that covariate models to explicitly incorporate explanatory factors and domain specific information are much more flexible and achieve higher goodness of fit and greater predictive accuracy. Thus, the covariate modeling approach provides a general framework for data collection and predictive modeling for a variety of resilience curves. | Add to Speakers | |
42 | Phillip Koshute , Johns Hopkins University Applied Physics Laboratory |
Presentation No Publish |
2 | Case Study on Test Planning and Data Analysis for Comparing Time Series | Solving Program Evaluation Challenges | Several years ago, the US Army Research Institute of Environmental Medicine developed an algorithm Read More to estimate core temperature in military working dogs (MWDs). This canine thermal model (CTM) is based on thermophysiological principles and incorporates environmental factors and acceleration. The US Army Medical Materiel Development Activity is implementing this algorithm in a collar-worn device that includes computing hardware, environmental sensors, and an accelerometer. Among other roles, Johns Hopkins University Applied Physics Laboratory (JHU/APL) is coordinating the test and evaluation of this device. The device’s validation is ultimately tied to field tests involving MWDs. However, to minimize the burden to MWDs and the interruptions to their training, JHU/APL seeks to leverage non-canine laboratory-based testing to the greatest possible extent. For example, JHU/APL is testing the device’s accelerometers with shaker tables that vertically accelerate the device according to specified sinusoidal acceleration profiles. This test yields time series of acceleration and related metrics, which are compared to ground-truth measurements from a reference accelerometer. Statistically rigorous comparisons between the CTM and reference measurements must account for the potential lack of independence between measurements that are close in time. Potentially relevant techniques include downsampling, paired difference tests, hypothesis tests of absolute difference, hypothesis tests of distributions, functional data analysis, and bootstrapping. These considerations affect both test planning and subsequent data analysis. This talk will describe JHU/APL’s efforts to test and evaluate the CTM accelerometers and will outline a range of possible methods for comparing time series. | Add to Speakers | |
43 | Anders Grau , United States Military Academy |
Poster Presentation No Publish |
1 | Developing a Domain-Specific NLP Topic Modeling Process for Army Experimental Data | Sharing Analysis Tools, Methods, and Collaboration Strategies | Researchers across the U.S. Army are conducting experiments on the implementation of emerging Read More technologies on the battlefield. Key data points from these experiments include text comments on the technologies’ performances. Researchers use a range of Natural Language Processing (NLP) tasks to analyse such comments, including text summarization, sentiment analysis, and topic modelling. Based on the successful results from research in other domains, this research aims to yield greater insights by implementing military-specific language as opposed to a generalized corpus. This research is dedicated to developing a methodology to analyze text comments from Army experiments and field tests using topic models trained on an Army domain-specific corpus. The methodology is tested on experimental data agglomerated in the Forge database, an Army Futures Command (AFC) initiative to provide researchers with a common operating picture of AFC research. As a result, this research offers an improved framework for analysis with domain-specific topic models for researchers across the U.S. Army. | Add to Speakers | |
44 | Fatemeh Salboukh phd student, massachusetts dartmouth university |
Speed Presentation No Publish |
2 | Application of Recurrent Neural Network for Software Defect Prediction | Sharing Analysis Tools, Methods, and Collaboration Strategies | Traditional software reliability growth models (SRGM) characterize software defect detection as a Read More function of testing time. Many of those SRGM are modeled by the non-homogeneous Poisson process (NHPP). However, those models are parametric in nature and do not explicitly encode factors driving defect or vulnerability discovery. Moreover, NHPP models are characterized by a mean value function that predicts the average of the number of defects discovered by a certain point in time during the testing interval, but may not capture all changes and details present in the data and do not consider them. More recent studies proposed SRGM incorporating covariates, where defect discovery is a function of one or more test activities documented and recorded during the testing process. These covariate models introduce an additional parameter per testing activity, which adds a high degree of non-linearity to traditional NHPP models, and parameter estimation becomes complex since it is limited to maximum likelihood estimation or expectation maximization. Therefore, this talk assesses the potential use of neural networks to predict software defects due to their ability to remember trends. Three different neural networks are considered, including (i) Recurrent neural networks (RNNs), (ii) Long short-term memory (LSTM), and (iii) Gated recurrent unit (GRU) to predict software defects. The neural network approaches are compared with the covariate model to evaluate the ability in predictions. Results suggest that GRU and LSTM present better goodness-of-fit measures such as SSE, PSSE, and MAPE compared to RNN and covariate models, indicating more accurate predictions. | Add to Speakers | |
45 | Elijah Abraham Dabkowski Student, United States Military Academy |
Poster Presentation Publish |
2 | The Application of Semi-Supervised Learning in Image Classification | Sharing Analysis Tools, Methods, and Collaboration Strategies | In today’s Army, one of the fastest growing and most important areas in the effectiveness of our Read More military is data science. One aspect of this field is image classification, which has applications such as target identification. However, one drawback within this field is that when an analyst begins to deal with a multitude of images, it becomes infeasible for an individual to examine all the images and classify them accordingly. My research presents a methodology for image classification which can be used in a military context, utilizing a typical unsupervised classification approach involving K-Means to classify a majority of the images while pairing this with user input to determine the label of designated images. The user input comes in the form of manual classification of certain images which are deliberately selected for presentation to the user, allowing this individual to select which group the image belongs in and refine the current image clusters. This shows how a semi-supervised approach to image classification can efficiently improve the accuracy of the results when compared to a traditional unsupervised classification approach. | Add to Speakers | |
46 | Jack Perreault Cadet, United States Military Academy |
Poster Presentation Publish |
2 | Multimodal Data Fusion: Enhancing Image Classification with Text | Sharing Analysis Tools, Methods, and Collaboration Strategies | Image classification is a critical part of gathering information on high-value targets. To this end, Read More Convolutional Neural Networks (CNN) have become the standard model for image and facial classification. However, CNNs alone are not entirely effective at image classification, and especially human classification due to their lack of robustness and bias. Recent advances in CNNs, however, allow for data fusion to help reduce the uncertainty in their predictions. In this project, we describe a multimodal algorithm designed to increase confidence in image classification with the use of a joint fusion model with image and text data. Our work utilizes CNNs for image classification and bag-of-words for text categorization on Wikipedia images and captions relating to the same classes as the CIFAR-100 dataset. Using data fusion, we combine the vectors of the CNN and bag-of-words models and utilize a fully connected network on the joined data. We measure improvements by comparing the SoftMax layer for the joint fusion model and image-only CNN. | Add to Speakers | |
47 | Karen Alves da Mata Master Student, University of Massachusetts – Dartmouth |
Speed Presentation No Publish |
2 | Neural Networks for Quantitative Resilience Prediction | Sharing Analysis Tools, Methods, and Collaboration Strategies | System resilience is the ability of a system to survive and recover from disruptive events, which Read More finds applications in several engineering domains, such as cyber-physical systems and infrastructure. Most studies emphasize resilience metrics to quantify system performance, whereas more recent studies propose resilience models to project system recovery time after degradation using traditional statistical modeling approaches. Moreover, past studies are either performed on data after recovering or limited to idealized trends. Therefore, this talk considers alternative machine learning approaches such as (i) Artificial Neural Networks (ANN), (ii) Recurrent Neural Networks, and (iii) Long-Short Term Memory (LSTM) to model and predict system performance of alternative trends other than ones previously considered. These approaches include negative and positive factors driving resilience to understand and precisely quantify the impact of disruptive events and restorative activities. A hybrid feature selection approach is also applied to identify the most relevant covariates. Goodness of fit measures are calculated to evaluate the models, including (i) mean squared error, (ii) predictive-ratio risk, (iii) and adjusted R squared. The results indicate that LSTM models outperform ANN and RNN models requiring fewer neurons in the hidden layer in most of the data sets considered. In many cases, ANN models performed better than RNNs but required more time to be trained. These results suggest that neural network models for predictive resilience are both feasible and accurate relative to traditional statistical methods and may find practical use in many important domains. | Add to Speakers | |
48 | Zakaria Faddi Master Student, University of Massachusetts Dartmouth |
Speed Presentation No Publish |
2 | Application of Software Reliability and Resilience Models to Machine Learning | Sharing Analysis Tools, Methods, and Collaboration Strategies | Machine Learning (ML) systems such as Convolutional Neural Networks (CNNs) are susceptible to Read More adversarial scenarios. In these scenarios, an attacker attempts to manipulate or deceive a machine learning model by providing it with malicious input, necessitating quantitative reliability and resilience evaluation of ML algorithms. This can result in the model making incorrect predictions or decisions, which can have severe consequences in applications such as security, healthcare, and finance. Failure in the ML algorithm can lead not just to failures in the application domain but also to the system to which they provide functionality, which may have a performance requirement, hence the need for the application of software reliability and resilience. This talk demonstrates the applicability of software reliability and resilience tools to ML algorithms providing an objective approach to assess recovery after a degradation from known adversarial attacks. The results indicate that software reliability growth models and tools can be used to monitor the performance and quantify the reliability and resilience of ML models in the many domains in which machine learning algorithms are applied. | Add to Speakers | |
49 | Christian Ellis Journeyman Fellow, Army Research Laboratory |
Speed Presentation No Publish |
3 | Utilizing Side Information alongside Human Demonstrations for Safe Robot Navigation | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Rather than wait to the test and evaluation stage of a given system to evaluate safety, this talk Read More proposes a technique which explicitly considers safety constraints during the learning process while providing probabilistic guarantees on performance subject to the operational environment’s stochasticity. We provide evidence that such an approach results an overall safer system than their non-explicit counterparts in the context of wheeled robotic ground systems learning autonomous waypoint navigation from human demonstrations. Specifically, inverse reinforcement learning (IRL) provides a means by which humans can demonstrate desired behaviors for autonomous systems to learn environmental rewards (or inversely costs). The proposed presentation addresses two limitations of existing IRL techniques. First, previous algorithms require an excessive amount of data due to the information asymmetry between the expert and the learner. When a demonstrator avoids a state, it is not clear if it was because the state is sub-optimal or dangerous. The proposed talk explains how safety can be explicitly incorporated in IRL by using task specifications defined using linear temporal logic. Referred to as side information, this approach enables autonomous ground robots to avoid dangerous states both during training, and evaluation. Second, previous IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs) which induces state uncertainty. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. The intrinsic nonconvexity of the underlying problem is managed in a scalable manner through a sequential linear programming scheme that guarantees local converge. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the safety specifications while inducing similar behavior to the expert by leveraging the provided side information. | Add to Speakers | |
50 | Mark Bobinski Cadet, United States Military Academy |
Poster Presentation Publish |
1 | Predicting Success and Identifying Key Characteristics in Special Forces Selection | Sharing Analysis Tools, Methods, and Collaboration Strategies | The United States Military possesses special forces units that are entrusted to engage in the most Read More challenging and dangerous missions that are essential to fighting and winning the nations wars. Entry into special forces is based on a series of assessments called Special Forces Assessment and Selection (SFAS), which consists of numerous challenges that test a soldiers mental toughness, physical fitness, and intelligence. Using logistic regression, random forest classification, and neural network classification, the researchers in this study aim to create a model that both accurately predicts whether a candidate passes SFAS and which variables are significant indicators of passing selection. Logistic regression proved to be the most accurate model, while also highlighting physical fitness, military experience, and intellect as the most significant indicators associated with success. | Add to Speakers | |
51 | Skyler Chauff Student, United States Miltary Academy at West Point |
Poster Presentation Publish |
2 | The Calculus of Mixed Meal Tolerance Test Trajectories | Sharing Analysis Tools, Methods, and Collaboration Strategies | BACKGROUND Post-prandial glucose response resulting from a mixed meal tolerance test is evaluated Read More from trajectory data of measured glucose, insulin, C-peptide, GLP-1 and other measurements of insulin sensitivity and β-cell function. In order to compare responses between populations or different composition of mixed meals, the trajectories are collapsed into the area under the curve (AUC) or incremental area under the curve (iAUC) for statistical analysis. Both AUC and iAUC are coarse distillations of the post-prandial curves and important properties of the curve structure are lost. METHODS Visual Basic Application (VBA) code was written to automatically extract seven different key calculus-based curve-shape properties of post-prandial trajectories (glucose, insulin, C-peptide, GLP-1) beyond AUC. Through two-sample t-tests, the calculus-based markers were compared between outcomes (reactive hypoglycemia vs. healthy) and against demographic information. RESULTS Statistically significant p-values (p < .01) between multiple curve properties in addition to AUC were found between each molecule studied and the health outcome of subjects based on the calculus-based properties of their molecular response curves. A model was created which predicts reactive hypoglycemia based on individual curve properties most associated with outcomes. CONCLUSIONS There is a predictive power using response curve properties that was not present using solely AUC. In future studies, the response curve calculus-based properties will be used for predicting diabetes and other health outcomes. In this sense, response-curve properties can predict an individual's susceptibility to illness prior to its onset using solely mixed meal tolerance test results. | Add to Speakers | |
52 | Dominic Rudakevych Cadet / Student, United States Military Academy |
Presentation Publish |
2 | An Evaluation Of Periodic Developmental Reviews Using Natural Language Processing | Improving the Quality of Test & Evaluation | As an institution committed to developing leaders of character, the United States Military Academy Read More (USMA) holds a vested interest in measuring character growth. One such tool, the Periodic Developmental Review (PDR), has been used by the Academy’s Institutional Effectiveness Office for over a decade. PDRs are written counseling statements evaluating how a cadet is developing with respect to his/her peers. The objective of this research was to provide an alternate perspective of the PDR system by using statistical and natural language processing (NLP) based approaches to find whether certain dimensions of PDR data were predictive of a cadet’s overall rating. This research implemented multiple NLP tasks and techniques, including sentiment analysis, named entity recognition, tokenization, part-of-speech tagging, and word2vec, as well as statistical models such as linear regression and ordinal logistic regression. The ordinal logistic regression model concluded PDRs with optional written summary statements had more predictable overall scores than those without summary statements. Additionally, those who wrote the PDR on the cadet (Self, Instructor, Peer, Subordinate) held strong predictive value towards the overall rating. When compared to a self-reflecting PDR, instructor-written PDRs were 62.40% more probable to have a higher overall score, while subordinate-written PDRs had a probability of improvement of 61.65%. These values were amplified to 70.85% and 73.12% respectively when considering only those PDRs with summary statements. These findings indicate that different writer demographics have a different understanding of the meaning of each rating level. Recommendations for the Academy would be implementing a forced distribution or providing a deeper explanation of overall rating in instructions. Additionally, no written language facets analyzed demonstrated predictive strength, meaning written statements do not introduce unwanted bias and could be made a required field for more meaningful feedback to cadets. | Add to Speakers | |
53 | Grant Parker Cadet, United States Military Academy |
Poster Presentation Publish |
1 | Using Multi-Linear Regression to Understand Cloud Properties’ Impact on Solar Radiance | Improving the Quality of Test & Evaluation | With solar energy being the most abundant energy source on Earth, it is no surprise that the Read More reliance on solar photovoltaics (PV) has grown exponentially in the past decade. The increasing costs of fossil fuels have made solar PV more competitive and renewable energy more attractive, and the International Energy Agency (IEA) forecasts that solar PV’s installed power capacity will surpass that of coal by 2027. Crucial to the management of solar PV power is the accurate forecasting of solar irradiance, which is heavily impacted by different types and distributions of clouds. Many studies have aimed to develop models that accurately predict the global horizontal irradiance (GHI) while accounting for the volatile effects of clouds; in this study, we aim to develop a statistical model that helps explain the relationship between various cloud properties and solar radiance reflected by clouds them-self. Using 2020 GOES-16 data from the GOES R-Series Advanced Baseline Imager (ABI), we investigated the effect that the cloud-optical depth, cloud top temperature, solar zenith angle, and look zenith angle had on cloud solar radiance while accounting for differing longitude and latitudes. Using these variables as the explanatory variables, we developed a linear model using multi-linear regression that, when tested on untrained data sets from different days (same time of day as the training set), results in a coefficient of determination (R^2) between .70-.75. Lastly, after analyzing the variables’ degree of contribution to the cloud solar radiance, we presented error maps that highlight areas where the model succeeds and fails in prediction accuracy. | Add to Speakers | |
54 | Madison McGovern Student, United States Military Academy |
Poster Presentation No Publish |
2 | Data Fusion: Using Data Science to Facilitate the Fusion of Multiple Streams of Data | Sharing Analysis Tools, Methods, and Collaboration Strategies | Today there are an increasing number of sensors on the battlefield. These sensors collect data that Read More includes, but is not limited to, images, audio files, videos, and text files. With today’s technology, the data collection process is strong, and there is a growing opportunity to leverage multiple streams of data, each coming in different forms. This project aims to take multiple types of data, specifically images and audio files, and combine them to increase our ability to detect and recognize objects. The end state of this project is the creation of an algorithm that utilizes and merges voice recordings and images to allow for easier recognition. Most research tends to focus on one modality or the other, but here we focus on the prospect of simultaneously leveraging both modalities for improved entity resolution. With regards to audio files, the most successful deconstruction and dimension reduction technique is a deep auto encoder. For images, the most successful technique is the use of a convolutional neural network. To combine the two modalities, we focused on two different techniques. The first was running each data source through a neural network and multiplying the resulting class probability vectors to capture the combined result. The second technique focused on running each data source through a neural network, extracting a layer from each network, concatenating the layers for paired image and audio samples, and then running the concatenated object through a fully connected neural network. | Add to Speakers | |
55 | James P Theimer Operations Research Analyst/STAT Expert, Homeland Security Community of Best Practices |
Presentation No Publish |
2 | Comparison of Bayesian and Frequentist Methods for Regression | Sharing Analysis Tools, Methods, and Collaboration Strategies | Statistical analysis is typically conducted using either a frequentist or Bayesian approach. But Read More what is the impact of choosing one analysis method over another? This presentation will compare the results of both linear and logistic regression using Bayesian and frequentist methods. The data set combines information on simulated diffusion of material and anticipated background signal to imitate sensor output. The sensor is used to estimate the total concentration of material, and a threshold will be set such that the false alarm rate (FAR) due to the background is a constant. The regression methods are used to relate the probability of detection, for a given FAR, to predictor variables, such as the total amount of material released. The presentation concludes with a comparison of the similarities and differences between the two methods given the results. | Add to Speakers | |
56 | Naomi Edegbe , United States Military Academy |
Presentation Publish |
1 | Energetic Defect Characterizations | Improving the Quality of Test & Evaluation | Energetic defect characterizations in munitions is a task requiring further refinement in military Read More manufacturing processes. Convolutional neural networks (CNN) have shown promise in defect localization and segmentation in recent studies. These studies supplement that we may utilize a CNN architecture to localize casting defects in X-ray images. The U.S. Armament center has provided munition images for training to develop a system against MILSPEC requirements to identify and categorize defect munitions. In our approach, we utilize preprocessed munitions images and transfer learning from prior studies’ model weights to compare the localization accuracy of this dataset for application in the field. | Add to Speakers | |
57 | Justin Krometis Research Assistant Professor, Virginia Tech National Security Institute |
Presentation Publish |
1 | Avoiding Pitfalls in AI/ML Packages | Sharing Analysis Tools, Methods, and Collaboration Strategies | Recent years have seen an explosion in the application of artificial intelligence and machine Read More learning (AI/ML) to practical problems from computer vision to game playing to algorithm design. This growth has been mirrored and, in many ways, been enabled by the development and maturity of publicly-available software packages such as PyTorch and TensorFlow that make model building, training, and testing easier than ever. While these packages provide tremendous power and flexibility to users, and greatly facilitate learning and deploying AI/ML techniques, they and the models they provide are extremely complicated and as a result can present a number of subtle but serious pitfalls. This talk will present three examples from the presenter’s recent experience where obscure settings or bugs in these packages dramatically changed model behavior or performance – one from a classic deep learning application, one from training of a classifier, and one from reinforcement learning. These examples illustrate the importance of thinking carefully about the results that a model is producing and carefully checking each step in its development before trusting its output. | Add to Speakers | |
58 | Alexei Skurikhin scientist, Los Alamos National Laboratory |
Speed Presentation Publish |
2 | Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification | Solving Program Evaluation Challenges | Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene Read More Classification Steadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space. Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion. | Add to Speakers | |
59 | Karen O’Brien Principal Data Scientist, Modern Technology Solutions, Inc |
Presentation Publish |
3 | Reinforcement Learning Approaches to the T&E of AI/ML-based Systems Under Test | Advancing Test & Evaluation of Emerging and Prevalent Technologies | Designed experiments provide an efficient way to sample the complex interplay of essential factors Read More and conditions during operational testing. Analysis of these designs provide more detailed and rigorous insight into the system under test’s (SUT) performance than top-level summary metrics provide. The introduction of artificial intelligence and machine learning (AI/ML) capabilities in SUTs create a challenge for test and evaluation because the factors and conditions that constitute the AI SUT’s “feature space” are more complex than those of a mechanical SUT. Executing the equivalent of a full-factorial design quickly becomes infeasible. This presentation will demonstrate an approach to efficient, yet rigorous, exploration of the AI/ML-based SUT’s feature space that achieves many of the benefits of a traditional design of experiments – allowing more operationally meaningful insight into the strengths and limitations of the SUT than top-level AI summary metrics (like ‘accuracy’) provide. The approach uses an algorithmically defined search method within a reinforcement learning-style test harness for AI/ML SUTs. An adversarial AI (or AI critic) efficiently traverses the feature space and maps the resulting performance of the AI/ML SUT. The process identifies interesting areas of performance that would not otherwise be apparent in a roll-up metric. Identifying ‘toxic performance regions’, in which combinations of factors and conditions result in poor model performance, provide critical operational insights for both testers and evaluators. The process also enables T&E to explore the SUT’s sensitivity and robustness to changes in inputs and the boundaries of the SUT’s performance envelope. Feedback from the critic can be used by developers to improve the AI/ML SUT and by evaluators to interpret in terms of effectiveness, suitability, and survivability. This procedure can be used for white box, grey box and black box testing. | Add to Speakers | |
61 | Nelson Walker Mathematical Statistician, USAF |
Presentation Publish |
1 | Under Pressure? Using Unsupervised Machine Learning for Classification May Help | Improving the Quality of Test & Evaluation | Classification of fuel pressure states is a topic of aerial refueling that is open to interpretation Read More from subject matter experts when primarily visual examination is utilized. Fuel pressures are highly stochastic, so there are often differences in classification based on the experience level and judgement calls between a particular engineers. This hurts reproducibility and defensibility between test efforts, in addition to being highly time-consuming. The Pruned Exact Linear Time (PELT) changepoint detection algorithm is an unsupervised machine learning method that has shown promise towards leading to a consistent and reproducible solution regarding classification. This technique combined with classification rules shows promise to classify oscillatory behavior, transient spikes, and steady states, all while having malleable features that can adjust the sensitivity to identify key chunks of fuel pressure states across multiple receivers and tankers. | Add to Speakers | |
62 | Michael Thompson Research Associate, Naval Postgraduate School |
Presentation No Publish |
2 | Cyber Testing Embedded Systems with Digital Twins | Sharing Analysis Tools, Methods, and Collaboration Strategies | Dynamic cyber testing and analysis require instrumentation to facilitate measurements, e.g., to Read More determine which portions of code have been executed, or detection of anomalous conditions which might not manifest at the system interface. However, instrumenting software causes execution to diverge from the execution of the deployed binaries. And instrumentation requires mechanisms for storing and retrieving testing artifacts on target systems. RESim is a dynamic testing and analysis platform that does not instrument software. Instead, RESim instruments high fidelity models of target hardware upon which software-under-test executes, providing detailed insight into program behavior. Multiple modeled computer platforms run within a single simulation that can be paused, inspected and run forward or backwards to selected events such as the modification of a specific memory address. Integration of the Google’s AFL fuzzer with RESim avoids the need to create fuzzing harnesses because programs are fuzzed in their native execution environment, commencing from selected execution states with data injected directly into simulated memory instead of I/O streams. RESim includes plugins for the IDA Pro and NSA’s Ghidra disassembler/debuggers to facilitate interactive analysis of individual processes and threads, providing the ability to skip to selected execution states (e.g., a reference to an input buffer) and “reverse execution” to reach a breakpoint by appearing to run backwards in time. RESim simulates networks of computers through use of Wind River’s Simics platform of high fidelity models of processors, peripheral devices (e.g., network interface cards), and memory. The networked simulated computers load and run firmware and software from images extracted from the physical systems being tested. Instrumenting the simulated hardware allows RESim to observe software behavior from the other side of the hardware, i.e., without affecting its execution. Simics includes tools to extend and create high fidelity models of processors and devices, providing a clear path to deploying and managing digital twins for use in developmental test and evaluation. The simulations can include optional real-world network and bus interfaces to facilitate integration into networks and test ranges. Simics is a COTS product that runs on commodity hardware and is able to execute several parallel instances of complex multi-component systems on a typical engineering workstation or server. This presentation will describe RESim and strategies for using digital twins for cyber testing of embedded systems. And the presentation will discuss some of the challenges associated with fuzzing non-trivial software systems. | Mark Herrera | Add to Speakers |
63 | Chris Gotwalt JMP Chief Data Scientist, JMP |
Presentation No Publish |
2 | Introducing Self-Validated Ensemble Models (SVEM) – Bringing Machine Learning to DOEs | Advancing Test & Evaluation of Emerging and Prevalent Technologies | DOE methods have evolved over the years, as have the needs and expectations of experimenters. Read More Historically, the focus emphasized separating effects to reduce bias in effect estimates and maximizing hypotheses testing power, which are largely a reflection of the methodological and computational tools of their time. Often DOE in industry is done to predict product or process behavior under possible changes. We introduce Self-Validating Ensemble Models (SVEM), an inherently predictive algorithmic approach to the analysis of DOEs, generalizing the fractional bootstrap to make machine learning and bagging possible for small datasets common in DOE. In many DOE applications the number of rows is small, and the factor layout is carefully structured to maximize information gain in the experiment. Applying machine learning methods to DOE is generally avoided because they begin with a partitioning the rows into a training set for model fitting and a holdout set for model selection. This alters the structure of the design in undesirable ways such as randomly introducing effect aliasing. SVEM avoids this problem by using a variation of the fractionally weighted bootstrap to create training and validation versions of the complete data that differ only in how rows are weighted. The weights are reinitialized, and models refit multiple times so that our final SVEM model is a model average, much like bagging. We find this allows us to fit models where the number of estimated effects exceeds the number of rows. We will present simulation results showing that in these supersaturated cases SVEM outperforms existing approaches like forward selection as measured by prediction accuracy. | Tom Donnelly | Add to Speakers |
64 | Alexander Malburg Data Analyst, AFOTEC/EX |
Presentation No Publish |
1 | Skyborg Data Pipeline | Sharing Analysis Tools, Methods, and Collaboration Strategies | The purpose of the Skyborg Data Pipeline is to allow for the rapid turnover of flight data collect Read More during a test event, using collaborative easily access tool sets available in the AFOTEC Data Vault. Ultimately the goal of this data pipeline is to provide a working up to date dashboard that leadership can utilize shortly after a test event. | AFOTEC | Add to Speakers |
65 | Justin Krometis Research Assistant Professor, Virginia Tech National Security Institute |
Presentation Publish |
1 | Dodging Pitfalls in Popular AI/ML Packages | Sharing Analysis Tools, Methods, and Collaboration Strategies | Recent years have seen an explosion in the application of artificial intelligence and machine Read More learning (AI/ML) to practical problems from computer vision to game playing to algorithm design. This growth has been mirrored and, in many ways, been enabled by the development and maturity of publicly-available software packages such as PyTorch and TensorFlow that make model building, training, and testing easier than ever. While these packages provide tremendous power and flexibility to users, and greatly facilitate learning and deploying AI/ML techniques, they and the models they provide are extremely complicated and as a result can present a number of subtle but serious pitfalls. This talk will present three examples from the presenter’s recent experience where obscure settings or bugs in these packages dramatically changed model behavior or performance – one from a classic deep learning application, one from training of a classifier, and one from reinforcement learning. These examples illustrate the importance of thinking carefully about the results that a model is producing and carefully checking each step in its development before trusting its output. | Laura Freeman | Add to Speakers |
66 | Tyler Morgan-Wall Research Staff Member, IDA |
Mini-Tutorial No Publish |
1 | Introduction to Design of Experiments in R: Generating and Evaluating Designs with skpr | Sharing Analysis Tools, Methods, and Collaboration Strategies | The Department of Defense requires rigorous testing to support the evaluation of effectiveness and Read More suitability of oversight acquisition programs. These tests are performed in a resource constrained environment and must be carefully designed to efficiently use those resources. The field of Design of Experiments (DOE) provides methods for testers to generate optimal experimental designs taking these constraints into account, and computational tools in DOE can support this process by enabling analysts to create designs tailored specifically for their test program. In this tutorial, I will show how you can run these types of analyses using “skpr”: a free and open source R package developed by researchers at IDA for generating and evaluating optimal experimental designs. This software package allows you to perform DOE analyses entirely in code; rather than using a graphical user interface to generate and evaluate individual designs one-by-one, this tutorial will demonstrate how an analyst can use “skpr” to automate the creation of a variety of different designs using a short and simple R script. Attendees will learn the basics of using the R programming language and how to generate, save, and share their designs. Additionally, “skpr” provides a straightforward interface to calculate statistical power. Attendees will learn how to use built-in parametric and Monte Carlo power evaluation functions to compute power for a variety of models and responses, including linear models, split-plot designs, blocked designs, generalized linear models (including logistic regression), and survival models. Finally, I will demonstrate how you can conduct an end-to-end DOE analysis entirely in R, showing how to generate power versus sample size plots and other design diagnostics to help you design an experiment that meets your program’s needs. | Rebecca Medlin | Add to Speakers |
67 | Jouni Susiluoto Data Scientist, Jet Propulsion Laboratory, California Institute of Technology |
Presentation No Publish |
3 | Large-scale cross-validated Gaussian processes for efficient multi-purpose emulators | Sharing Analysis Tools, Methods, and Collaboration Strategies | We describe recent advances in Gaussian process emulation, which allow us to both save computation Read More time and to apply inference algorithms that previously were too expensive for operational use. Specific examples are given from the Earth-orbiting Orbiting Carbon Observatory and the future Surface Biology and Geology Missions, dynamical systems, and other applications. While Gaussian processes are a well-studied field, there are surprisingly important choices that the community has not paid so much attention to this far, including dimension reduction, kernel parameterization, and objective function selection. This talk will highlight some of those choices and help understand what practical implications they have. | Kelly McCoy | Add to Speakers |
68 | Daniel Lee Cadet, US Military Academy West Point |
Poster Presentation No Publish |
1 | Assessing Risk with Cadet Candidates and USMA Admissions | Sharing Analysis Tools, Methods, and Collaboration Strategies | Though the United States Military Academy (USMA) graduates approximately 1,000 cadets annually, over Read More 100 cadets from the initial cohort fail to graduate and are separated or resign at great expense to the federal government. Graduation risk among incoming cadet candidates is difficult to measure; based on current research, the strongest predictors of college graduation risk are high school GPA and, to a lesser extent, standardized test scores. Other predictors include socioeconomic factors, demographics, culture, and measures of prolonged and active participation in extra-curricular activities. For USMA specifically, a cadet candidate’s Whole Candidate Score (WCS), which includes measures to score leadership and physical fitness, has historically proven to be a promising predictor of a cadet’s performance at USMA. However, predicting graduation rates and identifying risk variables still proves to be difficult. Using data from the USMA Admissions Department, we used logistic regression, k-Nearest Neighbors, random forests, and gradient boosting algorithms to better predict which cadets would be separated or resign using potential variables that may relate to graduation risk. Using measures such as p-values for statistical significance, correlation coefficients, and the Area Under the Curve (AUC) scores to determine true positives, we found supplementing the current admissions criteria with data on the participation of certain extra-curricular activities improves prediction rates on whether a cadet will graduate. | Rebecca M. Medlin | Add to Speakers |
69 | Abraham Holland Research Staff Member, Institute for Defense Analyses |
Presentation Publish |
1 | Predicting Aircraft Load Capacity Using Regional Climate Data | Sharing Analysis Tools, Methods, and Collaboration Strategies | While the impact of local weather conditions on aircraft performance is well-documented, climate Read More change has the potential to create long-term shifts in aircraft performance. Using just one metric, internal load capacity, we document operationally relevant performance changes for a mock CH-53E Super Stallion within the Indo-Pacific region. This presentation uses publicly available climate and aircraft performance data to create a representative analysis. The underlying methodology can be applied at varying geographic resolutions, timescales, airframes, and aircraft performance characteristics across the entire globe. | John Dennis III | Add to Speakers |
70 | Curtis Miller Research Staff Member, Institute for Defense Analyses |
Presentation Publish |
1 | Recommendations for statistical analysis of modeling and simulation environment outputs | Sharing Analysis Tools, Methods, and Collaboration Strategies | Modeling and simulation (M&S) environments feature frequently in test and evaluation (T&E) Read More of Department of Defense (DoD) systems. Testers may generate outputs from M&S environments more easily than collecting live test data, but M&S outputs nevertheless take time to generate, cost money, require training to generate, and are accessed directly only by a select group of individuals. Nevertheless, many M&S environments do not suffer many of the resourcing limitations associated with live test. We thus recommend testers apply higher resolution output generation and analysis techniques compared to those used for collecting live test data. Doing so will maximize stakeholders’ understanding of M&S environments’ behavior and help utilize its outputs for activities including M&S verification, validation, and accreditation (VV&A), live test planning, and providing information for non-T&E activities. This presentation provides recommendations for collecting outputs from M&S environments such that a higher resolution analysis can be achieved. Space filling designs (SFDs) are experimental designs intended to fill the operational space for which M&S predictions are expected. These designs can be coupled with statistical metamodeling techniques that estimate a model that flexibly interpolates or predicts M&S outputs and their distributions at both observed settings and unobserved regions of the operational space. Analysts can use the resulting metamodels as a surrogate for M&S outputs in situations where the M&S environment cannot be deployed. They can also study metamodel properties to decide if a M&S environment adequately represents the original systems. IDA has published papers recommending specific space filling design and metamodeling techniques; this presentation briefly covers the content of those papers. | Rebecca Medlin | Add to Speakers |
71 | Maximillian Chen Senior Professional Staff, Johns Hopkins University Applied Physics Laboratory |
Presentation Publish |
3 | Estimating Sparsely and Irregularly Observed Multivariate Functional Data | Sharing Analysis Tools, Methods, and Collaboration Strategies | With the rise in availability of larger datasets, there is a growing need of tools and methods to Read More help inform data-driven decisions. Data that vary over a continuum, such as time, exist in a wide array of fields, such as defense, finance, and medicine. One such class of methods that addresses data varying over a continuum is functional data analysis (FDA). FDA methods typically make three assumptions that are often violated in real datasets: all observations exist over the same continuum interval (such as a closed interval [a,b]), all observations are regularly and densely observed, and if the dataset consists of multiple covariates, the covariates are independent of one another. We look to address violation of the latter two assumptions. In this talk, we will discuss methods for analyzing functional data that are irregularly and sparsely observed, while also accounting for dependencies between covariates. These methods will be used to estimate the reconstruction of partially observed multivariate functional data that contain measurement errors. We will begin with a high-level introduction of FDA. Next, we will introduce functional principal components analysis (FPCA), which is a representation of functions that our estimation methods are based on. We will discuss a specific approach called principal components analysis through conditional expectation (PACE) (Yao et al, 2005), which computes the FPCA quantities for a sparsely or irregularly sampled function. The PACE method is a key component that allows us to estimate partially observed functions based on the available dataset. Finally, we will introduce multivariate functional principal components analysis (MFPCA) (Happ & Greven, 2018), which utilizes the FPCA representations of each covariate’s functions in order to compute a principal components representation that accounts for dependencies between covariates. We will illustrate these methods through implementation on simulated and real datasets. We will discuss our findings in terms of the accuracy of our estimates with regards to the amount and portions of a function that is observed, as well as the diversity of functional observations in the dataset. We will conclude our talk with discussion on future research directions. | Elise Roberts | Add to Speakers |
72 | Hans Miller Department Chief Engineer, MITRE |
Presentation Publish |
1 | NASEM Range Capabilities Study and T&E of Multi-Domain Operations | Advancing Test & Evaluation of Emerging and Prevalent Technologies | The future viability of DoD’s range enterprise depends on addressing dramatic changes in technology, Read More rapid advances in adversary military capabilities, and the evolving approach the United States will take to closing kill chains in a Joint All Domain Operations environment. This recognition led DoD’s former Director of Operational Test and Evaluation (OT&E), the Honorable Robert Behler, to request that the National Academies of Science, Engineering and Medicine examine the physical and technical suitability of DoD’s ranges and infrastructure through 2035. The first half of this presentation will cover the highlights and key recommendations of this study, to include the need to create the “TestDevOps” digital infrastructure for future operational test and seamless range enterprise interoperability. The second half of this presentation looks at the legacy frameworks for the relationships of physical and virtual test capabilities, and how those frameworks are becoming outdated. This briefing explores proposals on how the interaction of operations, physical test capabilities, and virtual test capabilities need to evolve to support new paradigms of the rapidly evolving technologies and changing nature of multi-domain operations. | Dr Jeremy Werner | Add to Speakers |
73 | Yvan Christophe General Engineer, CCDC-AC |
Poster Presentation Publish |
1 | Multi-Digit Serial Number Recognition in X-ray Images | Sharing Analysis Tools, Methods, and Collaboration Strategies | This paper presents a methodology for multi-digit serial number recognition using deep learning and Read More image processing. The samples used in this paper come from digital X-ray images of munition energetics. These munitions need to be labeled properly so that the X-ray images can be cross-checked and validated with their respective inspection reports for future data analysis. The task of identifying and labeling serial numbers within the sampled X-ray images can be challenging due to the presence of a myriad of factors. Certain digits within our samples of X-ray images have inconsistent placement, brightness, contrast, and orientation. To address this challenge this paper will focus on several key steps, including image binarization, identifying and extracting regions of interest, and the use of the YOLO (You Only Look Once) algorithm. Additionally, this paper will leverage data from associated inspection reports to make use of prediction similarity scores from our model. The approach’s accuracy will depend on the precise location of each region of interest in each X-ray image, and the confidence level set across all number digit predictions. The results from the proposed method will demonstrate the accuracy of recognizing serial numbers in X-ray images relative to the samples associated with this study. This method can be utilized in various settings such as military and industrial environments, where accurate identification of serial numbers is crucial for safety and security purposes. Overall, this paper will provide a valuable contribution to the field of image processing, energetics, and deep learning, specifically in the context of recognizing and predicting multi-digit serial numbers in X-ray images. | Kelly M Avery | Add to Speakers |
74 | Leonard Truett Senior STAT Expert, STAT COE |
Mini-Tutorial No Publish |
1 | An Overview of Methods, Tools, and Test Capabilities for T&E of Autonomous Systems | Advancing Test & Evaluation of Emerging and Prevalent Technologies | This tutorial will give an overview of selected methodologies, tools and test capabilities discussed Read More in the draft “Test and Evaluation Companion Guide for Autonomous Military Systems.” This test and evaluation (T&E) companion guide is being developed to provide guidance to test and evaluation practitioners, to include program managers, test planners, test engineers, and analysts with test strategies, applicable methodologies, and tools that will help to improve rigor in addressing the challenges unique to the T&E of autonomy. It will also cover selected capabilities of test laboratories and ranges that support autonomous systems. The companion guide is intended to be a living document contributed by the entire community and will adapt to ensure the right information reaches the right audience. | MJ Seick | Add to Speakers |
75 | Buck Thome Research Staff Member, Institute for Defense Analyses |
Poster Presentation No Publish |
1 | Overarching Tracker of DOT&E Actions | Improving the Quality of Test & Evaluation | OED’s Overarching Tracker of DOT&E Actions distills information from DOT&E’s operational Read More test reports and memoranda on test plan and test strategy approvals to generate informative metrics on the office’s activities. In FY22, DOT&E actions covered 68 test plans, 28 strategies, and 28 reports, relating to 74 distinct programs. This poster presents data from those documents and highlights findings on DOT&E’s effectiveness, suitability, and survivability determinations and other topics related to the state of T&E. | Rebecca Medlin | Add to Speakers |
76 | Matthew Avery Assistant Director, IDA |
Mini-Tutorial Publish |
1 | Data Management for Research, Development, Test, and Evaluation | Sharing Analysis Tools, Methods, and Collaboration Strategies | It is important to manage Data from research, development, test, and evaluation effectively. Read More Well-managed data makes research more efficient and promotes better analysis and decision-making. At present, numerous federal organizations are engaged in large-scale reforms to improve the way they manage their data, and these reforms are already effecting the way research is executed. Data management effects every part of the research process. Thoughtful, early planning sets research projects on the path to success by ensuring that the resources and expertise required to effectively manage data throughout the research process are in place when they are needed. This interactive tutorial will discuss the planning and execution of data management for research projects. Participants will build a data management plan, considering data security, organization, metadata, reproducibility, and archiving. By the conclusion of the tutorial, participants will be able to define data management and understand its importance, understand how the data lifecycle relates to the research process, and be able to build a data management plan. | Kelly Avery | Add to Speakers |
77 | Jeremy Werner Chief Scientist, DOT&E |
Presentation Publish |
1 | DOT&E Strategic Initiatives, Policy, and Emerging Technologies (SIPET) Mission Brief | Advancing Test & Evaluation of Emerging and Prevalent Technologies | SIPET, established in 2021, is a deputate within the office of the Director, Operational Test and Read More Evaluation (DOT&E). DOT&E created SIPET to codify and implement the Director’s strategic vision and keep pace with science and technology to modernize T&E tools, processes, infrastructure, and workforce. That is, The mission of SIPET is to drive continuous innovation to meet the T&E demands of the future; support the development of a workforce prepared to meet the toughest T&E challenges; and nurture a culture of information exchange across the enterprise and update policy and guidance. SIPET proactively identifies current and future operational test and evaluation needs, gaps, and potential solutions in coordination with the Services and agency operational test organizations. Collaborating with numerous stakeholders, SIPET develops and refines operational test policy guidance that support new test methodologies and technologies in the acquisition and test communities. SIPET, in collaboration with the T&E community, is leading the development of the 2022 DOT&E Strategy Update Implementation Plan (I-Plan). I-Plan initiatives include: • Test the Way We Fight – Architect T&E around validated mission threads and demonstrate the operational performance of the Joint Force in multi-domain operations • Accelerate the Delivery of Weapons that Work – Embrace digital technologies to deliver high-quality systems at more dynamic rates. • Increase the Survivability of DOD in Contested Environments – Identify, assess, and act on cyber, electromagnetic, spectrum, space, and other risks to DOD mission – at scale and at speed. • Pioneer T&E of Weapons Systems Built to Change Over Time – Implement fluid and iterative T&E across the entire system lifecycle to help assure continued combat credibility as the system evolves to meet warfighter needs. • Foster an Agile and Enduring T&E Enterprise Workforce – Centralize and leverage efforts to assess, curate, and engage T&E talent to quicken the pace of innovation across the T&E enterprise. | self | Add to Speakers |
78 | Jeremy Werner Chief Scientist, DOT&E |
Mini-Tutorial Publish |
1 | The Automaton General-Purpose Data Intelligence Platform | Sharing Analysis Tools, Methods, and Collaboration Strategies | The Automaton general-purpose data intelligence platform abstracts data analysis out to a high level Read More and automates many routine analysis tasks while being highly extensible and configurable – enabling complex algorithms to elucidate mission-level effects. Automaton is built primarily on top of R Project and its features enable analysts to build charts and tables, calculate aggregate summary statistics, group data, filter data, pass arguments to functions, generate animated geospatial displays for geospatial time series data, flatten time series data into summary attributes, fit regression models, create interactive dashboards, and conduct rigorous statistical tests. All of these extensive analysis capabilities are automated and enabled from an intuitive configuration file requiring no additional software code. Analysts or software engineers can easily extend Automaton to include new algorithms, however. Automaton’s development was started at Johns Hopkins University Applied Physics Laboratory in 2018 to support an ongoing military mission and perform statistically rigorous analyses that use Bayesian-inference-based Artificial Intelligence to elucidate mission-level effects. Automaton has unfettered Government Purpose Rights and is freely available. One of DOT&E’s strategic science and technology thrusts entails automating data analyses for Operational Test & Evaluation as well as developing data analysis techniques and technologies targeting mission-level effects; Automaton will be used, extended, demonstrated/trained on, and freely shared to accomplish these goals and collaborate with others to drive our Department’s shared mission forward. This tutorial will provide an overview of Automaton’s capabilities as well as instruction on how to install and use the platform. Installation instructions will be provided ahead of the tutorial, and depend upon the user installing Windows Subsystem for Linux or having access to another Unix environment (e.g., macOS). | self | Add to Speakers |
79 | Tom Johnson Research Staff Member, IDA |
Poster Presentation Publish |
2 | The Component Damage Vector Method: A statistically rigorous method for validating AJEM us | Sharing Analysis Tools, Methods, and Collaboration Strategies | As the Test and Evaluation community increasingly relies on Modeling and Simulation (M&S) to Read More supplement live testing, M&S validation has become critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV method compares components that were damaged in FUSL testing to simulated representations of that damage from AJEM. | Kelly Avery | Add to Speakers |