All Abstracts

Total Abstracts: 97
# Name / Org Type Level Theme Abstract Title / Abstract Invitation Add to Speakers
1 Allison Holston
Mathematical Statistician, Army Evaluation Center
https://dataworks.testscience.org/wp-content/uploads/formidable/32/holston-headshot-1-150×150.jpg
Poster Presentation
Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Improving Data Visualizations Through R Shiny
Poster Abstract Shiny is an R package . . . and framework used to build applications in a web browser without needing any programming experience. This capability allows the power and functionality of R to be accessible to audiences that would typically not utilize R because of the roadblocks with learning a programming language. Army Evaluation Center (AEC) has rapidly increased their utilization of R shiny to develop apps for use in Test and Evaluation. These apps have allowed evaluators to upload data, perform calculations, and then create data visualizations which can be customized for their reporting needs. The apps have streamlined and standardized the analysis process. The poster outlines the before and after of three data visualizations utilizing the shiny apps. The first displays survey data using the likert package. The second graphs a cyberattack cycle timeline. And the third displays individual qualification course results and calculates qualification scores. The apps are hosted in a cloud environment and their usage is tracked with an additional shiny app.
Add to Speakers
2 Shane Hall
Division Chief – Analytics and Artificial Intelligence, Army Evaluation Center
https://dataworks.testscience.org/wp-content/uploads/formidable/32/SH-150×150.jpg
Poster Presentation
No Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Functional Data Analysis of Radar Tracking Data
Functional data are an ordered series of . . . data collected over a continuous scale, such as time or distance. The data are collected in ordered x,y pairs and can be viewed as a smoothed line with an underlying function. The Army Evaluation Center (AEC) has identified multiple instances where functional data analysis could have been applied, but instead evaluators used more traditional and/or less statistically rigorous methods to evaluate the data. One of these instances is radar tracking data. This poster highlights historical shortcomings of how AEC currently analyzes functional data, such as radar tracking data and our vision for future applications. Using a notional data from a real radar example, the response of 3D track error is plotted against distance, where each function represents a unique run number and additional factors held constant throughout a given run. The example includes the selected model, functional principle components, the resulting significant factors, and summary graphics used for report. Additionally, the poster will highlight historical analysis methods and the improvements the functional data analysis method brings. The analysis and output from this poster will utilize JMP’s functional data analysis platform.
Add to Speakers
4 Terril N Hurst
Senior Engineering Fellow, Raytheon
https://dataworks.testscience.org/wp-content/uploads/formidable/32/terrilHurstRMS2011c-1-150×150.jpg
Presentation
Publish
1Identifyi . . .ng opportunities to increase efficiency and effectiveness through strategic resource sharing Re-visiting the DASE Axioms: A Bayesian Approach for Simulation Development and Usage
During the 2016 Conference on Applied . . . Statistics in Defense (CASD), we presented a paper describing “The DASE Axioms.” The paper included several “divide-and-conquer” strategies for addressing the Curse of Dimensionality that is typical in simulated systems. Since then, a new, integrate-and-conquer approach has emerged, which applies decision-theoretic concepts from Bayesian Analysis (BA). This paper and presentation re-visit the DASE axioms from the perspective of BA. Over the past fifteen years, we have tailored and expanded conventional design-of-experiments (DOE) principles to take advantage of the flexibility that is offered by modeling, simulation, and analysis (MSA). The result is embodied within three, high-level checklists: (a) the Model Description and Report (MDR) protocol enables iteratively developing credible models and simulation (M&S) for an evolving intended use; (b) the 7-step Design & Analysis of Simulation Experiments (DASE) protocol guides credible M&S usage; and (c) the Bayesian Analysis (BA ) protocol enables fully quantifying the uncertainty that accumulates, both when building and using M&S. When followed iteratively by all MSA stakeholders throughout the product lifecycle, the MSA protocols result in effective and efficient risk-informed decision making. The paper and presentation include several quantitative examples to show how the three MSA protocols interact. For example, we show how to use BA to combine Sim and field data for calibrating M&S. Thereafter, given a well-specified query, adaptive sampling is illustrated for optimizing usage of high-performance computing (HPC), either to minimize resources required to answer a specific query, or to maximize HPC utilization within a fixed time period. The Bayesian approach to M&S development and usage reflects a shift in perspective, from viewing MSA as mainly a design tool, to being a digital test and evaluation venue. This change renders fully relevant all of the attendant operational constraints and associated risks regarding M&S scheduling, availability, cost, accuracy, and delay in analyzing inappropriately large HPC data sets. The MSA protocols employ statistical models and other aspects of Scientific Test and Analysis Techniques (STAT) that are being taught and practiced within the operational test and evaluation community.
Add to Speakers
5 David Niblick
AI Evaluator, Army Test and Evaluation Command
https://dataworks.testscience.org/wp-content/uploads/formidable/32/MAJ-DAVID-NIBLICK-8×10-1-150×150.jpg
Presentation
Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Development, Test, and Evaluation of Small-Scale Artificial Intelligence Models
As data becomes more commoditized across . . . all echelons of the DoD, developing Artificial Intelligence (AI ) solutions, even at small scales, offer incredible opportunity for advanced data analysis and processing. However, these solutions require intimate knowledge of the data in question, as well as robust Test and Evaluation (T&E) procedures to ensure performance and trustworthiness. This paper presents a case study and recommendations for developing and evaluating small-scale AI solutions. The model automates an acoustic trilateration system. First, the system accurately identifies the precise times of acoustic events across a variable number of sensors using a neural network. It then corresponds the events across the sensors through a heuristic matching process. Finally, using the correspondences and difference of times, the system triangulates a physical location. We find that even a relatively simple dataset requires extensive understanding at all phases of the process. Techniques like data augmentation and data synthesis, which must capture the unique attributes of the real data, were necessary both for improved performance, as well as robust T&E. The T&E metrics and pipeline required unique approaches to account for the AI solution, which lacked traceability and explainability. As leaders leverage the growing availability of AI tools to solve problems within their organizations, strong data analysis skills must remain at the core of process.
Rachel Milliron Add to Speakers
6 Adam Miller
RSM, IDA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/photo_ampm_copy-150×150.png
Poster Presentation
Publish
1Improving . . . the Quality of Test & Evaluation Poor survey design invalidates mean confidence intervals
Surveys play an important role . . . quantifying user opinion during test and evaluation. However, it can be difficult to analyze data that comes from custom surveys with unknown design properties. This is because surveys record continuous user opinions as discrete responses, and the fidelity of this transformation depends on the design of the survey. Here, I quantify the effects of three survey design properties—spacing, order, and categorizability—on the validity of mean confidence intervals (MCIs). To do this, I repeatedly simulated the survey process of transforming continuous user opinions into discrete survey responses, and then I evaluated the accuracy of MCIs computed from the survey responses. I found that for well-designed surveys, MCIs calculated from survey responses were similar to those computed from continuous user opinions, both in terms of their precision (MCI width) and accuracy (the proportion of true means captured). However, for poorly-designed surveys—such those with irregular, disordered or overlapping response categories—the confidence intervals computed from survey responses were inaccurate and imprecise. As the survey design became more irregular or disordered, the MCIs became less accurate. Likewise, overlapping or disordered response categories decreased the precision of MCIs relative to those computed on continuous data. These findings are support existing IDA guidance on the use of validated surveys and on treating validated survey data as continuous.
Add to Speakers
7 Dean Thomas
Researcher, George Mason University
https://dataworks.testscience.org/wp-content/uploads/formidable/32/IMG_0524-150×150.jpeg

No Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies What drove the Carrington Event, the largest solar storm in recorded history?
The 1859 Carrington Event is the most . . . intense geomagnetic storm in recorded history. This storm produced large changes to the geomagnetic field observed on the Earth’s surface, damaged telegraph systems, and created aurora visible over large portions of the earth. The literature provides numerous explanations for which phenomena drove the observed effects. Previous analyses typically relied upon on the historic magnetic field data from the event, newspaper reports, and empirical models. These analyses generally focus on whether one current system (e.g., magnetospheric currents) is more important than another (e.g., ionospheric currents). We expand the analysis by using results from the Space Weather Modeling Framework (SWMF), a complex magnetohydrodynamics code, to compute the contributions that various currents and geospace regions make to the northward magnetic field on the Earth’s surface. The analysis considers contributions from magnetospheric currents, ionospheric currents, and gap region field-aligned currents (FACs). In addition, we evaluate contributions from specific regions: the magnetosheath (between the earth and the sun), near Earth (within 6.6 earth radii), and the neutral sheet (behind the earth). Our analysis indicates that magnetic field changes observed during the Carrington Event involved a combination of current systems and regions rather than being driven by one specific current or region.
DATAWorks Co-Chairs Add to Speakers
8 THOMAS DONNELLY
Principal Systems Engineer, JMP Statistical Discovery
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Tom-Donnelly-150×150.jpg
Presentation
No Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Spectral Analysis using Functional Data – Predict Spectra Shape or Chemical Composition
Curves and spectra are fundamental to . . . understanding many scientific and engineering applications. As a result, curves or spectral data are created by many types of analytical, test, and manufacturing equipment. When these data are used as part of a designed experiment or a machine learning application, most software requires the practitioner to extract “landmark” features from the data prior to modeling. This leads to models that are more difficult to interpret and are less accurate than models that treat spectral/curve data as first-class citizens. This talk will present an overview of functional data analysis applied to spectral data. It will feature a case study showing a reanalysis of published NMR spectra for 231 blends of three alcohols – propanol, butanol, and pentanol. Small subsets of the full data set are modeled and used to predict either the spectra or the composition of checkpoint blends not used in the analysis. Functional data analysis was performed using wavelets as the basis functions to break the spectra into Shape Functions and Shape Weights (Functional Principal Components scores). A prediction profiler can now be used to predict spectral shape as functions of the shape weights and shape functions. Predicting as a function of the Shape Weights is difficult to use practically as they are not components in the mixture. However, by modeling the Shape Weights as functions of the proportions of the mixture components, a prediction profiler can be used to predict the shape of any blend of these three alcohols as confirmed using the checkpoint formulations. Furthermore, by minimizing the integrated error from target of the spectra for a blend not used in the modeling, the chemical proportions of this unknown blend can be closely determined.
Add to Speakers
9 Francesca McFadden
Data Analyst, JHU/APL
https://dataworks.testscience.org/wp-content/uploads/formidable/32/MCFADFR1_LThumb-150×150.jpg
Presentation
Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Clustering based approach to Competence Estimation of Machine Learned Models
Calibrated trust is the concept that a . . . machine learned model’s predictions are not over or under trusted by an end user. While early on in use end users may not entirely trust a model, over time the model predictions may be trusted too often. Eventually the model will be trusted to make predictions on inputs it is not competent to perform on. Historically trust in model use has centered on reported confidence in prediction. A growing interest area is in novelty detection to ascertain how statistically different inputs are from training data. Confidence estimation along with out of distribution indicators should both be factored in to provide a measure of model prediction competence. The goal of model competence estimation is to provide the end user with more than model confidence, and to calibrate trust in a specific decision with a metric for how likely it is that the input is truly outside the prediction space of the model. This metric should indicate when and where the user should trust the model based on the current prediction space, including if the model will ever be able to tell the end user that an input falls into the true category of the current input. DARPA announced the 2019 Competency-Aware Machine Learning program to “develop machine learning systems that continuously assess their own performance in time-critical, dynamic situations and communicate that information to human team members in an easily understood format.” This sparked a research trend into what could be done to achieve this capability. At the Neuryps 2019 conference, the authors of Accurate Layer-wise Interpretable Competence Estimation (ALICE) provided a rigorous definition of competence and introduced a score for how likely it is that inputs are outside a model’s prediction space. Application of ALICE requires tuning of multiple parameters tailored to the training data and domain. Application of ALICE to multiple domains has shown that the ALICE scores indicative of model competence often do not translate to a threshold that meets the human standards for trust. Depending on the similarity measure used for out of distribution, it may not be suitable for broad application. The purpose of this concept is to enhance current approaches to estimating model competence so they may be more easily applied and used at the speed of decision. The described methodology applies unsupervised learning to expand current approaches to recognize when there is a lack of prediction competence for a machine learned model. Clustering of training data is used as an efficient out of distribution and prediction measure to aid in model competence measure estimation. The out of distribution terms in the ALICE measure are replaced with use of clustering suited for the modes in the training data and Mahalanobis distance similarity measures for statistical interpretation. The amount of data to be stored and evaluated against scales with the number of features and the number of clusters in the training data set. Therefore, this method may be applied with limited stored data and is computationally highly efficient for in-line use.
Add to Speakers
10 Gina Sigler
STAT COE contractor, HII/STAT COE
https://dataworks.testscience.org/wp-content/uploads/formidable/32/SiglerGina-150×150.jpg
Presentation
No Publish
3Solving P . . .rogram Evaluation Challenges Failure Distributions for Parallel Dependent Identical Weibull Components
For a parallel system, when one . . . component fails, the failure distribution of the remaining components will have an increased failure rate. This research takes a novel approach to finding the associated failure distribution of the full system using ordinal statistic distributions for correlated Weibull components, allowing for unknown correlations between the dependent components. A Taylor series approximation is presented for two components; system failure time distributions are also derived for two failures in a two component system, two failures in an n component system, three failures in a three component system, and k failures in an component system. Additionally, a case study is presented on aircraft turnbuckles. Simulated data is used to illustrate how the derived formulas can be used to create a maintenance plan for the second turnbuckle in the two component system.
Corinne Stafford Add to Speakers
11 Russell Kupferer
Naval Warfare Action Officer, DOT&E
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Kupferer-Profile-Pic-Feb-2023-150×150.png
Presentation
Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Threat Integration for Full Spectrum Survivability Assessments
The expansion of DOT&E’s oversight . . . role to cover full spectrum survivability and lethality assessments includes a need to reexamine how threats are evaluated in a Live Fire Test and Evaluation (LFT&E) rubric. Traditionally, threats for LFT&E assessment have been considered in isolation, with focus on only conventional weapon hits that have the potential to directly damage the system under test. The inclusion of full spectrum threats – including electronic warfare, directed energy, CBRNE, and cyber – requires a new approach to how LFT&E assessments are conducted. Optimally, assessment of full spectrum threats will include integrated survivability vignettes appropriate to how our systems will actually be used in combat and how combinations of adversary threats are likely to be used against them. This approach will require new assessment methods with an increased reliance on data from testing at design sites, component/surrogate tests, and digital twins.
Maura McCoy Klink – DOT&E Add to Speakers
12 Priscila Silva
Graduate Research Assistant, University of Massachusetts Dartmouth, Department of Electrical and Computer Engineering
https://dataworks.testscience.org/wp-content/uploads/formidable/32/download1-1-150×150.jpg
Speed Presentation
No Publish
1Improving . . . the Quality of Test & Evaluation Regression and Time Series Mixture Approaches to Predict Resilience
Resilience engineering is the ability to . . . build and sustain a system that can deal effectively with disruptive events. Previous resilience engineering research focuses on metrics to quantify resilience and models to characterize system performance. However, resilience metrics are normally computed after disruptions have occurred and existing models lack the ability to predict one or more shocks and subsequent recoveries. To address these limitations, this talk presents three alternative approaches to model system resilience with statistical techniques based on (i) regression, (ii) time series, and (iii) a combination of regression and time series to track and predict how system performance will change when exposed to multiple shocks and stresses of different intensity and duration, provide structure for planning tests to assess system resilience against particular shocks and stresses and guide data collection necessary to conduct tests effectively. These modeling approaches are general and can be applied to systems and processes in multiple domains. A historical data set on job losses during the 1980 recessions in the United States is used to assess the predictive accuracy of these approaches. Goodness-of-fit measures and confidence intervals are computed, and interval-based and point-based resilience metrics are predicted to assess how well the models perform on the data set considered. The results suggest that resilience models based on statistical methods such as multiple linear regression and multivariate time series models are capable of modeling and predicting resilience curves exhibiting multiple shocks and subsequent recoveries. However, models that combine regression and time series account for changes in performance due to current and time-delayed effects from disruptions most effectively, demonstrating superior performance in long-term predictions and higher goodness-of-fit despite increased parametric complexity.
Add to Speakers
13 Kevin Quinlan
Applied Statistician, Lawrence Livermore National Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Quinlan-150×150.jpg
Presentation
Publish
2Improving . . . the Quality of Test & Evaluation Constructing Aerodynamic Databases with Non-Uniform Multi-Fidelity Active Learning
Constructing aerodynamic databases with . . . high-fidelity CFD can be time consuming, and in higher dimensions techniques such as grid sampling are infeasible for generating a well-characterized database. Therefore, it is of interest to generate new databases in an efficient manner, especially when considering multiple competing designs. To reduce the total simulation time, our approach leverages fast low-fidelity CFD solvers as well as interim trajectory information within an active learning loop to inform the generation of new high-fidelity CFD samples. In this talk, we describe the details of our approach and present the advantages of non-uniform active learning. Finally, we apply this approach to an example multi-fidelity aero-database problem to demonstrate its effectiveness in practice. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-858150
Add to Speakers
14 Karen Alves da Mata
Graduate Research Assistant, University of Massachusetts Dartmouth
https://dataworks.testscience.org/wp-content/uploads/formidable/32/profilePicture-150×150.jpg
Speed Presentation
No Publish
3Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Quantitative Reliability and Resilience Assessment of a Machine Learning Algorithm
Advances in machine learning (ML) have . . . led to applications in safety-critical domains, including security, defense, and healthcare. These ML models are confronted with dynamically changing and actively hostile conditions characteristic of real-world applications, requiring systems incorporating ML to be reliable and resilient. Many studies propose techniques to improve the robustness of ML algorithms. However, fewer consider quantitative methods to assess the reliability and resilience of these systems. To address this gap, this study demonstrates how to collect relevant data during the training and testing of ML suitable for applying software reliability, with and without covariates, and resilience models, and the subsequent interpretation of these analyses. The proposed approach promotes quantitative risk assessment of machine learning technologies, providing the ability to track and predict degradation and improvement in the ML model performance and assisting ML and system engineers with an objective approach to compare the relative effectiveness of alternative training and testing methods. The approach is illustrated in the context of an image recognition model subjected to two generative adversarial attacks and then iteratively retrained to improve the system’s performance. Our results indicate that software reliability models incorporating covariates characterized the misclassification discovery process more accurately than models without covariates. Moreover, the resilience model based on multiple linear regression incorporating interactions between covariates tracked and predicted degradation and recovery of performance best. Thus, software reliability and resilience models offer rigorous quantitative assurance methods for ML-enabled systems and processes.
Add to Speakers
15 Vishal Subedi
PhD Student, University of Maryland Baltimore County
https://dataworks.testscience.org/wp-content/uploads/formidable/32/vishal_pic-150×150.jpg
Presentation
Publish
1Identifyi . . .ng opportunities to increase efficiency and effectiveness through strategic resource sharing Classifying violent anti-government conflicts in Mexico: A machine learning framework
Domestic crime, conflict, and . . . instability pose a significant threat to many contemporary governments. These challenges have proven to be particularly acute within modern-day Mexico. While there have been significant developments in predicting intrastate armed and electoral conflict in various contemporary settings, such efforts have thus far been limited in their use of spatial as well as temporal correlations, as well as in the features they have considered. Machine learning, especially deep learning, has been proven to be highly effective in predicting future conflicts using word embeddings in Convolutional Neural Networks (CNN) but lacks the spatial structure and, due to the black box nature, cannot explain the importance of predictors. We develop a novel methodology using machine learning that can accurately classify future anti-government violence in Mexico. We further demonstrate that our approach can identify important leading predictors of such violence. This can help policymakers make informed decisions and can also help governments and NGOs better allocate security and humanitarian resources, which could prove beneficial in tackling this problem. Using a variety of political event aggregations from the ICEWS database alongside other textual and demographic features, we trained various classical machine learning algorithms, including but not limited to Logistic Regression, Random Forest, XGBoost, and a Voting classifier. The development of this reseearch was a stepwise process in three phases where the following phase was built upon the shortcomings of the previous phases. In the very first phase, we considered a mix of CNN + Long Short Term Memory (LSTM) networks to decode the spatial and temporal relationship in the data. The performance of all the black box deep learning models was not at par with the classical machine learning models. The second phase deals with the analysis of the temporal relationships in the data to identify the dependency of the conflicts over time and its lagged relationship. This also serves as a method to reduce feature dimension space by removing variables not covered with the cutoff lag. The third phase talks about the general variable selection methodologies used to further reduce the feature space along with identifying the important predictors that fuel anti-government violence along with their directional effect using Shapley additive values. The voting classifier, utilizing a subset of features derived from LASSO across 100 simulations, consistently surpasses alternative models in performance and demonstrates efficacy in accurately classifying future anti-government conflicts. Notably, Random Forest feature importance indicates that some features, including but not limited to homicides, accidents, material conflicts, and positive worded citizen information sentiments emerge as pivotal predictors in the classification of anti-government conflicts. Finally, in the fourth phase, we conclude the research by analysing the spatial structure of the data using Moran’s I index extended version for spatiotemporal data to identify the global spatial dependency and local clusters followed by modelling the data spatially and evaluating the same using Gaussian Process Boost(GPBoost). The global spatial autocorrelation is minimal, characterized by localized conflicts cluster within the region. Furthermore, the Voting Classifier demonstrates superior performance over GPBoost, leading to the inference that no substantial spatial dependency exists among the various locations.
Add to Speakers
16 Sambit Bhattacharya
Professor, Fayetteville State University
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Dr._Sambit_Bhattacharya-150×150.jpg
Poster Presentation
Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies AI for Homeland Security: A Comprehensive Approach for Detecting Sex Trafficking
Sex trafficking remains a global . . . problem, requiring new innovations to detect and disrupt such criminal enterprises. Our research project is an application of artificial intelligence (AI) methods, and the knowledge of social science and homeland security to the detection and understanding of the operational models of sex trafficking networks (STNs). Our purpose is to enhance the AI capabilities of software-based detection technologies and support the homeland defense community in detecting and countering human sex trafficking, including the trafficking of underage victims. To accomplish this, we propose a novel architecture capable of jointly representing and learning from multiple modalities, including images and text. The interdisciplinary nature of this work involves the fusion of computer vision, natural language processing, and deep neural networks (DNNs) to address the complexities of sex trafficking detection from online advertisements. This research proposes the creation of a software prototype as an extension of the Image Surveillance Assistant (ISA) built by our research team, to focus on cross-modal information retrieval and context understanding critical for identifying potential sex trafficking cases. Our initiative aligns with the objectives outlined in the DHS Strategic Plan, aiming to counter both terrorism and security threats, specifically focusing on the victim-centered approach to align with security threat segments. We leverage current AI and machine learning techniques integrated by the project to create a working software prototype. DeepFace, a DNN for biometric analysis of facial image features such as age, race, and gender from images is utilized. Few-shot text classification, utilizing the SciKit Learn Python library and Large Language Models (LLMs), is enabling the detection of written trafficking advertisements. The prime funding agency, the Department of Homeland Security (DHS) mandates the use of synthetic data for this unclassified project, so we have developed code to leverage Application Programming Interfaces (APIs) that connect to LLMs and generative AI for images to create synthetic training and test data for the DNN models. Test and evaluation with synthetic data are the core capabilities of our approach to build prototype software that can potentially be used for real applications with real data. Ongoing work includes creating a program to fuse the outputs from AI models on a single advertisement input. The fusion program will provide the numeric value of the likelihood of the class of advertisement, ranging from classes such as legal advertisement to different categories of trafficking. This research project is a potential contribution to the development of deployment-ready software for intelligence agencies, law enforcement, and border security. We currently show high accuracy for detecting advertisements related to victims of specific demographic categories. We have identified areas where increased accuracy is needed, and we are collecting more training data to address those gaps. The AI-based capabilities emerging from our research hold promise for enhancing the understanding of STN operational models, addresses technical challenges of sex trafficking detection, and also emphasizes the broader societal impact and alignment with national security goals.
Add to Speakers
17 Meghan Galiardi Sahakian
Principal Member of Technical Staff, Sandia National Laboratories
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Galiardi-Meghan-150×150.jpg
Presentation
Publish
3Improving . . . the Quality of Test & Evaluation Design of In-Flight Cyber Experimentation for Spacecraft
Cyber resilience technologies are . . . critical to ensuring the survival of mission critical assets for space systems. Such emerging cyber resilience technologies ultimately need to be proven out through in-flight experimentation. However, there are significant technical challenges for proving that new technologies actually enhance resilience of spacecraft. In particular, in-flight experimentation suffers from a “low data” problem due to many factors including 1) no physical access limits what types of data can be collected, 2) even if data can be collected, size, weight, and power (SWaP) constraints of the spacecraft make it difficult to store large amounts of data, 3) even if data can be stored, bandwidth constraints limit the transfer of data to the ground in a timely manner, and 4) only a limited number of trials can be performed due to spacecraft scheduling and politics. This talk will discuss a framework developed for design and execution of in-flight cyber experimentation as well as statistical techniques appropriate for analyzing the data. More specifically, we will discuss how data from ground-based test beds can be used to augment the results of in-flight experiments. The discussed framework and statistical techniques will be demonstrated on a use case. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2023-14847A.
AB Add to Speakers
18 Cameron Liang
Research Staff Member, IDA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/photo-150×150.jpg
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Demystifying Deep Learning – Aircraft Identification from Satellite Images
In the field of Artificial Intelligence . . . and Machine Learning (AI/ML), the literature can be filled with technical language and/or buzzwords, making it challenging for readers to understand the content. It will be a pedagogical talk focusing on demystifying “Artificial Intelligence” by providing a mathematical, but most importantly, an intuitive understanding of how deep learning really works. I will provide some existing tools and practical steps for how one can train their own neutral networks using an example of automatically identifying aircrafts and their attributes (e.g., civil vs. military, engine types, and size) from satellite images. Audience members with some knowledge of linear regression and coding will be armed with an increased understanding, confidence, and practical tools to develop their own AI applications.
Add to Speakers
19 Matthew Wilkerson
Undergraduate Researcher, Intelligent Systems Laboratory, Fayetteville State University
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot_MattWilkerson-1-150×150.png
Poster Presentation
Publish
3Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Advancing Edge AI: Benchmarking ResNet50 for Image Analysis on Diverse Hardware Platforms
The ability to run AI at the edge can be . . . transformative for applications that need to process data to make decisions at the location where sensing and data acquisition takes place. Deep neural networks (DNNs) have a huge number of parameters and consist of many layers, including nodes and edges that contain mathematical relationships that need to be computed when the DNN is run during deployment. This is why it is important to benchmark DNNs on edge computers which are constrained in hardware resources and usually run on a limited supply of battery power. The objective of our NASA funded project which is aligned to the mission of robotic space exploration is to enable AI through fine-tuning of convolutional neural networks (CNNs) for extraterrestrial terrain analysis. This research currently focuses on the optimization of the ResNet50 model, which consists of 4.09 GFLOPs and 25.557 million parameters, to set performance baselines on various edge devices using the Mars Science Laboratory (MSL) v2.1 dataset. Although our initial focus is on Martian terrain classification, the research is potentially impactful for other sectors where efficient edge computing is critical. We addressed a critical imbalance in the dataset by augmenting the underrepresented class of with an additional 167 images, improving the model’s classification accuracy substantially. Pre-augmentation, these images were frequently misclassified as another class, as indicated by our confusion matrix analysis. Post-augmentation, the fine-tuned ResNet50 model achieved an exceptional test accuracy of 99.31% with a test loss of 0.0227, setting a new benchmark for similar tasks. The core objective of this project extends beyond classification accuracy; it aims to establish a robust development environment for testing efficient edge AI models suitable for deployment in resource-constrained scenarios. The fine-tuned ResNet50-MSL-v2.1 model serves as a baseline for this development. The model was converted into a TorchScript format to facilitate cross-platform deployment and inference consistency. Our comprehensive cross-platform evaluation included four distinct hardware configurations, chosen to mirror a variety of deployment scenarios. The NVIDIA Jetson Nano achieved an average inference time of 62.04 milliseconds with 85.83% CPU usage, highlighting its utility in mobile contexts. An Intel NUC with a Celeron processor, adapted for drone-based deployment, registered an inference time of 579.87 milliseconds at near-maximal CPU usage of 99.77%. A standard PC equipped with an RTX 3060 GPU completed inference in just 6.52 milliseconds, showcasing its capability for high-performance, stationary tasks. Lastly, an AMD FX 8350 CPU-only system demonstrated a reasonable inference time of 215.17 milliseconds, suggesting its appropriateness for less demanding edge computing applications. These results not only showcase the adaptability of ResNet50 across diverse computational environments but also emphasize the importance of considering both model complexity and hardware capabilities when deploying AI at the edge. Our findings indicate that with careful optimization and platform-specific tuning, it is possible to deploy advanced AI models like ResNet50 effectively on a range of hardware, from low-power edge devices to high-performance ground stations. Our ongoing research will use these established baselines to further explore efficient AI model deployment in resource-constrained setting.
Add to Speakers
20 Fatemeh Salboukh
phd student, massachusetts dartmouth university
https://dataworks.testscience.org/wp-content/uploads/formidable/32/FotoCollage_202286223626848-150×150.jpg
Speed Presentation
No Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Enhancing Time Series-based Resilience Model Prediction with Transfer Functions
Resilience engineering involves creating . . . and maintaining systems capable of efficiently managing disruptive incidents. Past research in this field has employed various statistical techniques to track and forecast the system’s recovery process within the resilience curve. However, many of these techniques fall short in terms of flexibility, struggling to accurately capture the details of shocks. Moreover, most of them are not able to predict long-term dependencies. To address these limitations, this paper introduces an advanced statistical method, the transfer function, which effectively tracks and predicts changes in system performance when subjected to multiple shocks and stresses of varying intensity and duration. This approach offers a structured methodology for planning resilience assessment tests tailored to specific shocks and stresses and guides the necessary data collection to ensure efficient test execution. Although resilience engineering is domain-specific, the transfer function is a versatile approach, making it suitable for various domains. To assess the effectiveness of the transfer function model, we conduct a comparative analysis with the interaction regression model, using historical data on job losses during the 1980 recessions in the United States. This comparison not only underscores the strengths of the transfer function in handling complex temporal data but also reaffirms its competitiveness compared to existing methods. Our numerical results using goodness of fit measures provide compelling evidence of the transfer function model’s enhanced predictive power, offering an alternative for advancing resilience prediction in time series analysis.
Add to Speakers
21 Fatemeh Salboukh
phd student, massachusetts dartmouth university
https://dataworks.testscience.org/wp-content/uploads/formidable/32/FotoCollage_202286223626848-1-150×150.jpg
Speed Presentation
No Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Enhancing Time Series-based Resilience Model Prediction with Transfer Functions
Resilience engineering involves creating . . . and maintaining systems capable of efficiently managing disruptive incidents. Past research in this field has employed various statistical techniques to track and forecast the system’s recovery process within the resilience curve. However, many of these techniques fall short in terms of flexibility, struggling to accurately capture the details of shocks. Moreover, most of them are not able to predict long-term dependencies. To address these limitations, this paper introduces an advanced statistical method, the transfer function, which effectively tracks and predicts changes in system performance when subjected to multiple shocks and stresses of varying intensity and duration. This approach offers a structured methodology for planning resilience assessment tests tailored to specific shocks and stresses and guides the necessary data collection to ensure efficient test execution. Although resilience engineering is domain-specific, the transfer function is a versatile approach, making it suitable for various domains. To assess the effectiveness of the transfer function model, we conduct a comparative analysis with the interaction regression model, using historical data on job losses during the 1980 recessions in the United States. This comparison not only underscores the strengths of the transfer function in handling complex temporal data but also reaffirms its competitiveness compared to existing methods. Our numerical results using goodness of fit measures provide compelling evidence of the transfer function model’s enhanced predictive power, offering an alternative for advancing resilience prediction in time series analysis.
Add to Speakers
22 Jason Schlup
Research Staff, Institute for Defense Analyses
Poster Presentation
No Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Operationally Representative Data and Cybersecurity for Avionics Demonstration
This poster session considers the ARINC . . . 429 standard and its inherent lack of security by using a hardware-in-the-loop (HITL) simulator to demonstrate possible mission effects from a cyber compromise. ARINC 429 is a ubiquitous data bus for civil avionics, enabling safe and reliable communication between devices from disparate manufacturers. However, ARINC 429 lacks any form of encryption or authentication, making it an inherently insecure communication protocol and rendering any connected avionics vulnerable to a range of attacks. This poster session includes a hands-on demonstration of possible mission effects due to a cyber compromise of the ARINC 429 data bus by putting the audience at the control of the HITL flight simulator with ARINC 429 buses. The HITL simulator uses commercial off-the-shelf avionics hardware including a multi-function display, and an Enhanced Ground Proximity Warning System to generate operationally realistic ARINC 429 messages. Realistic flight controls and flight simulation software are used to further increase the simulator’s fidelity. The cyberattack is based on a system with a malicious device physically-connected to the ARINC 429 bus network. The cyberattack degrades the multi-function display through a denial-of-service attack which disables important navigational aids. The poster also describes how testers can plan to test similar buses found on vehicles and can observe and document data from this type of testing event.
Add to Speakers
23 Cooper Klein`
Cadet, United States Military Academy, West Point
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot_ht_commission_cropped-150×150.jpg
Poster Presentation
No Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Enhancing Battlefield Intelligence with ADS-B Change Detection
The ability to detect change in flight . . . patterns using air traffic control (ATC) communication can better inform battlefield intelligence. ADS-B (Automatic Dependent Surveillance Broadcast) technology has this capability to capture movement of both military and civilian aircraft over conflict zones. Leveraging the inclusivity of ADS-B in flight tracking and its widespread global availability, we focus on its application in understanding changes leading up to conflicts, with a specific case study on Ukraine. In this presentation we analyze days leading up to Russia’s February 24 invasion to understand how ADS-B technology can indicate change in Russo-Ukrainian military movements. The proposed detection algorithm encourages the use of ADS-B technology in future intelligence efforts. The potential for fusion with GICB (Ground-initiated Comm-B) ATC communication and other modes of data is also explored. This is a submission for the Student Poster Competition
Add to Speakers
24 Nicholas Wagner
Research Staff Member, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/IMG_0185-2-150×150.jpg
Presentation
Publish
3Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Emerging Strategies for Detecting and Mitigating LLM Prompt Injections and Hallucinations
Before large language models (LLMs) . . . achieve widespread enterprise deployment, more engineering work must be done on mitigating their weaknesses. Prompt injections, where a prompter crafts a LLM input that causes it to perform actions not intended by its system operator, and hallucinations, where an LLM generates responses that sound plausible but lack factual validity, are two major known issues with today’s LLMs. I will discuss some of the approaches being taken today in the academic literature and in industry to address these problems and their relative success.
Add to Speakers
25 Todd Remund
Staff Data Scientist, Northrop Grumman
Presentation
No Publish
3Improving . . . the Quality of Test & Evaluation Rocket Motor Design Qualification Through Enhanced Reliability Assurance Testing
Composite pressure vessel designs for . . . rocket motors must be qualified for use in both military and space applications. By intent, demonstration testing methods ignore a priori information about a system which inflates typically constrained test budgets and often have low probability of test success. On the other hand, reliability assurance tests encourage use of previous test data and other relevant information about a system. Thus, an assurance testing approach can dramatically reduce the cost of a qualification test. This work extends reliability assurance testing to allow scenarios with right-censored and exact failure possibilities. This enhancement increases the probability of test success and provides a post-test re-evaluation of test results. The method is demonstrated by developing a rocket motor design qualification assurance test.
N/A – Contributed Add to Speakers
26 Jose Alvarado
Technical Advisor, AFOTEC Det 5/CTO
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Jose-Alvarado–150×150.jpg
Presentation
Publish
1 Developing Model-Based Flight Test Scenarios
The Department of Defense (DoD) is . . . undergoing a digital engineering transformation in every process of the systems engineering lifecycle. This transformation provides the requirement that DoD Test and Evaluation (T&E) processes begin to implement and begin executing model-based testing methodologies. This paper describes and assesses a grey box model-driven test design (MDTD) approach to create flight test scenarios based on model-based systems engineering artifacts. To illustrate the methodology and evaluate the expected outcomes of the process in practice, a case study using a model representation of a training system utilized to train new Air Force Operational Test and Evaluation Center (AFOTEC) members in conducting operational test and evaluation (OT&E) is presented. The results of the grey box MDTD process are a set of activity diagrams that are validated to generate the same test scenario cases as the traditional document-centric approach. Using artifacts represented in System Modeling Language (SysML), this paper will discuss key comparisons between the traditional and MDTD processes. This paper demonstrates the costs and benefits of model-based testing and their relevance in the context of operational flight testing.
Maj. Rachel M. Milliron Add to Speakers
27 Nikolai Lipscomb
Research Staff Member, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/niko_headshot-150×150.jpg
Presentation
Publish
1Identifyi . . .ng opportunities to increase efficiency and effectiveness through strategic resource sharing A Mathematical Programming Approach to Wholesale Planning
The DOD’s materiel commands generally . . . rely on working capital funds (WCFs) to fund their purchases of spares. A WCF insulates the materiel commands against the disruptions of the yearly appropriations cycle, and allows for long-term planning and contracting. A WCF is expected to cover its own costs by allocating its funds judiciously and adjusting the prices it charges to the end customer, but the multi-year lead times associated with most items means that items must be ordered years in advance of anticipated need. Being financially conservative (ordering less) leads to backorders, while minimizing backorders (ordering more) often introduces financial risk by buying items that may not be sold in a timely manner. In this work, we develop an optimization framework that produces a “Buy List” of repairs and procurements for each fiscal year. The optimizer seeks to maximize a financial and readiness-minded objective function subject to constraints such as budget limitations, contract priorities, and historical variability of demand signals. Buy Lists for each fiscal year provide a concrete baseline for examining the repair/procurement decisions of real wholesale planners and comparing performance via simulation of different histories.
IDA Add to Speakers
28 Chris Jenkins
R&D S&E, Cybersecurity, Sandia National Labs
Presentation
Publish
3Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Moving Target Defense for Space Systems
Space systems provide many critical . . . functions to the military, federal agencies, and infrastructure networks. In particular, MIL-STD-1553 serves as a common command and control network for space systems, nuclear weapons, and DoD weapon systems. Nation-state adversaries have shown the ability to disrupt critical infrastructure through cyber-attacks targeting systems of networked, embedded computers. Moving target defenses (MTDs) have been proposed as a means for defending various networks and systems against potential cyber-attacks. In addition, MTDs could be employed as an ‘operate through’ mitigation for improving cyber resilience. We devised a MTD algorithm and tested its application to a MIL-STD-1553 network. We demonstrated and analyzed four aspects of the MTD algorithm usage: 1) characterized the performance, unpredictability, and randomness of the core algorithm, 2) demonstrated feasibility by conducting experiments on actual commercial hardware, 3) conducted an exfiltration experiment where the reduction in adversarial knowledge was 97%, and 4) employed the LSTM machine learning model to see if it could defeat the algorithm and glean information about the algorithm’s resistance to machine learning attacks. Given the above analysis, we show that the algorithm has the ability to be used in real-time bus networks as well as other (non-address) applications.
DATAWorks Technical Program committee Add to Speakers
29 Giri Gopalan
Scientist, Los Alamos
https://dataworks.testscience.org/wp-content/uploads/formidable/32/gg-150×150.jpg
Presentation
Publish
3Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies A Statistical Framework for Benchmarking Foundation Models with Uncertainty
Modern artificial intelligence relies . . . upon foundation models (FMs), which are prodigious, multi-purpose machine learning models, typically deep neural networks, trained on a massive data corpus. Many benchmarks assess FMs by evaluating their performances on a battery of tasks for which the FMs are adapted to solve, but uncertainty is usually not accounted for in such benchmarking practices. This talk will present statistical approaches for performing uncertainty quantification with benchmarks meant to compare FMs. We demonstrate bootstrapping of task evaluation data, Bayesian hierarchical models for task evaluation data, rank aggregation techniques, and visualization of model performance under uncertainty with different task weightings. The utility of these statistical approaches is illustrated with real machine learning benchmark data, and a crucial finding is that the incorporation of uncertainty leads to less clear-cut distinctions in FM performance than would otherwise be apparent.
Add to Speakers
30 Victoria Nilsen
Operations Research Analyst, NASA HQ
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Capture-150×150.png
Presentation
Publish
2Solving P . . .rogram Evaluation Challenges Examining the Effects of Implementing Data-Driven Uncertainty in Cost Estimating Models
When conducting probabilistic cost . . . analysis, correlation assumptions are key assumptions and often a driver for the total output or point estimate of a cost model. Although the National Aeronautics and Space Administration (NASA) has an entire community dedicated to the development of statistical cost estimating tools and techniques to manage program and project performance, the application of accurate and data-driven correlation coefficients within these models is often overlooked. Due to the uncertain nature of correlation between random variables, NASA has had difficulty quantifying the relationships between spacecraft subsystems with specific, data-driven correlation matrices. Previously, the NASA cost analysis community has addressed this challenge by either selecting a blanket correlation value to address uncertainty within the model or opting out of using any correlation value altogether. One hypothesized method of improving NASA cost estimates involves deriving subsystem correlation coefficients from the residuals of the regression equations for the cost estimating relationships (CERs) of various spacecraft subsystems and support functions. This study investigates the feasibility of this methodology using the CERs from NASA’s Project Cost Estimating Capability (PCEC) model. The correlation coefficients for each subsystem of the NASA Work Breakdown Structure were determined by correlating the residuals of PCEC’s subsystem CERs. These correlation coefficients were then compiled into a 20×20 correlation matrix and were implemented into PCEC as an uncertainty factor influencing the model’s pre-existing cost distributions. Once this correlation matrix was implemented into the cost distributions of PCEC, the Latin Hypercube Sampling function of the Microsoft Excel add-in Argo was used to simulate PCEC results for 40 missions within the PCEC database. These steps were repeated three additional times using the following correlation matrices: (1) a correlation matrix assuming the correlation between each subsystem is zero, (2) a correlation matrix assuming the correlation between each subsystem is 1, and (3) a correlation matrix using a blanket value of 0.3. The results of these simulations showed that the correlation matrix derived from the residuals of the subsystem CERs significantly reduced bias and error within PCEC’s estimating capability. The results also indicated that the probability density function and cumulative distribution function of each mission in the PCEC database were altered significantly by the correlation matrices that were implemented into the model. This research produced (1) a standard subsystem correlation matrix that has been proven to improve estimating accuracy within PCEC and (2) a replicable methodology for creating this correlation matrix that can be used in future cost estimating models. This information can help the NASA cost analysis community understand the effects of applying uncertainty within cost models and perform sensitivity analyses on project cost estimates. This is significant because NASA has been frequently critiqued for underestimating project costs and this methodology has shown promise in improving NASA’s future cost estimates and painting a more realistic picture of the total possible range of spacecraft development costs.
NASA Add to Speakers
31 Steven Movit
Research Staff Member, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/movit-picture-150×150.jpg
Presentation
No Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Operationally Representative Data and Cybersecurity for Avionics
This talk discusses the ARINC 429 . . . standard and its inherent lack of security, demonstrates proven mission effects in a hardware-in-the-loop (HITL) simulator, and presents a data set collected from real avionics. ARINC 429 is a ubiquitous data bus for civil avionics, enabling safe and reliable communication between devices from disparate manufacturers. However, ARINC 429 lacks any form of encryption or authentication, making it an inherently insecure communication protocol and rendering any connected avionics vulnerable to a range of attacks. We constructed a HITL simulator with ARINC 429 buses to explore these vulnerabilities, and to identify potential mission effects. The HITL simulator includes commercial off-the-shelf avionics hardware including a multi-function display, an Enhanced Ground Proximity Warning System, as well as a realistic flight simulator. We performed a denial-of-service attack against the multi-function display via a compromised transmit node on an ARINC 429 bus, using commercially available tools, which succeeded in disabling important navigational aids. This simple replay attack demonstrates how effectively a “leave-behind” device can cause serious mission effects. This proven adversarial effect on physical avionics illustrates the risk inherent in ARINC 429 and the need for the ability to detect, mitigate, and recover from these attacks. One potential solution is an intrusion detection system (IDS) trained using data collected from the electrical properties of the physical bus. Although previous research has demonstrated the feasibility of an IDS on an ARINC 429 bus, none have been trained on data generated by actual avionics hardware.
Kelly Avery Add to Speakers
32 Andrei Gribok
Distinguished Scientist, Idaho National Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Gribok.bmp
Presentation
Publish
3 Regularization Approach to Learning Bioburden Density for Planetary Protection
Over the last 2 years, the scientific . . . community and the general public both saw a surge of practical application of artificial intelligence (AI) and machine learning (ML) to numerous technological and everyday problems. The emergence of AI/ML data-driven tools was enabled by decades of research in statistics, neurobiology, optimization, neural networks, statistical learning theory, and other fields—research that synergized into an overarching discipline of learning from data. Learning from data is one of the most fundamental problems facing empirical science. In the most general setting, it may be formulated as finding the true data-generating function or dependency given a set of noisy empirical observations. In statistics, the most prominent example is estimation of the cumulative distribution function or probability density function from a limited number of observations. The principal difficulty in learning functional dependencies from a limited set of noisy data is the ill-posed nature of this problem. Here, “ill-posed” is used in the sense suggested by Hadamard—namely, that the problem’s solution lacks existence, uniqueness, or stability with respect to minor variations in the data. In other words, ill-posed problems are underdetermined as the data do not contain all the information necessary to arrive at a unique, stable solution. Finding functional dependencies from noisy data may in fact be hindered by all three conditions of ill-posedness: the data may not contain information about the solution, numerous solutions can be found to fit the data, and the solution may be unstable with respect to minor variations in the data. To deal with ill-posed problems, a regularization method was proposed for augmenting the information contained in the data with some additional information about the solution (e.g., its smoothness). In this presentation, we demonstrate how the regularization techniques, as applied to learning function dependencies with neural networks, can be successfully applied to the planetary protection problem of estimating microbial bioburden density (i.e., spores per square meter) on spacecraft. We shall demonstrate that the problem of bioburden density estimation can be formulated as a solution to a least squares problem, and that this problem is indeed ill-posed. This presentation will elucidate the relationship between maximum likelihood estimates and the least squares solution by demonstrating their mathematical equivalence. It will be shown that the maximum likelihood estimation is identical to the differentiation of the cumulative count of colony forming units which can be represented as a least squares problem. Since the problem of differentiation of noisy data is ill-posed the method of regularization will be applied to obtain a stable solution. It will demonstrate that the problem of bioburden density estimation can be cast as a problem of regularized differentiation of the cumulative count of colony-forming units found on the spacecraft. The regularized differentiation will be shown to be a shrinkage estimator and its performance compared with other shrinkage estimators commonly used in statistics for simultaneously estimating parameters of a set of independent Poisson distributions. The strengths and weaknesses of the regularized differentiation will then be highlighted in comparison to the other shrinkage estimators.
Michael Dinicola Add to Speakers
33 Morgan Alexis Brown
Undergrad Researcher, USMA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/MorganBrown-150×150.jpeg
Poster Presentation
Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Using AI to Classify Combat Vehicles in Degraded Environments
In the last decade, warfare has come to . . . be characterized by rapid technological advances and the increased integration of artificial intelligence platforms. From China’s growing emphasis on advanced technological development programs to Ukraine’s use of facial recognition technologies in the war with Russia, the prevalence of artificial intelligence (AI) is undeniable. Currently, the United States is innovating the use of machine learning (ML) and AI through a variety of projects. Various systems use cutting-edge sensing technologies and emerging ML algorithms to automate the target acquisition process. As the United States attempts to increase its use of ATR and AiTR systems, it is important to consider the inaccuracy that may occur as a result of environmental degradations, such as smoke, fog, or rain. Therefore, this project aims to mimic various battlefield degradations through the implementation of different types of noise, namely Uniform, Gaussian, and Impulse noise to determine the effect of these various degradations on an Commercial-off-the-Shelf image classification system’s ability to correctly identify combat vehicles. This is an undergraduate research project which we wish to present via a Poster Presentation.
Add to Speakers
34 Jamie Thorpe
Cybersecurity R&D, Sandia National Laboratories
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Thorpe_photo_20220228-150×150.png
Presentation
No Publish
2Solving P . . .rogram Evaluation Challenges Lessons Learned for Study of Uncertainty Quantification in Cyber-Physical System Emulation
Over the past decade, the number and . . . severity of cyber-attacks to critical infrastructure has continued to increase, necessitating a deeper understanding of these systems and potential threats. Recent advancements for high-fidelity system modeling, also called emulation, have enabled quantitative cyber experimentation to support analyses of system design, planning decisions, and threat characterization. However, much remains to be done to establish scientific methodologies for performing these cyber analyses more rigorously. Without a rigorous approach to cyber experimentation, it is difficult for analysts to fully characterize their confidence in the results of an experiment, degrading the ability to make decisions based upon analysis results, and often defeating the purpose of performing the analysis. This issue is particularly salient when analyzing critical infrastructures or similarly impactful systems, where confident, well-informed decision making is imperative. Thus, the integration of tools for rigorous scientific analysis with platforms for emulation-driven experimentation is crucial. This work discusses one such effort to integrate the tools necessary to perform uncertainty quantification (UQ) on an emulated model, motivated by a study on a notional critical infrastructure use case. The goal of the study was to determine how variations in the aggressiveness of the given threat affected how resilient the system was to the attacker. Resilience was measured using a series of metrics which were designed to capture the system’s ability to perform its mission in the presence of the attack. One reason for the selection of this use case was that the threat and system models were believed to be fairly deterministic and well-understood. The expectation was that results would show a linear correlation between the aggressiveness of the attacker and the resilience of the system. Surprisingly, this hypothesis was not supported by the data. The initial results showed no correlation, and they were deemed inconclusive. These findings spurred a series of mini analyses, leading to extensive evaluation of the data, methodology, and model to identify the cause of these results. Significant quantities of data collected as part of the initial UQ study enabled closer inspection of data sources and metrics calculation. In addition, tools developed during this work facilitated supplemental statistical analyses, including a noise study. These studies all supported the conclusion that the system model and threat model chosen were far less deterministic than initially assumed, highlighting key lessons learned for approaching similar analyses in the future. Although this work is discussed in the context of a specific use case, the authors believe that the lessons learned are generally applicable to similar studies applying statistical testing to complex, high-fidelity system models. Insights include the importance of deeply understanding potential sources of stochasticity in a model, planning how to handle or otherwise account for such stochasticity, and performing multiple experiments and looking at multiple metrics to gain a more holistic understanding of a modeled scenario. These results highlight the criticality of approaching system experimentation with a rigorous scientific mindset.
Add to Speakers
35 Patrick Bjornstad
Systems Engineer I, Jet Propulsion Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/PatrickBjornstadPhotoCropped-150×150.jpg
Presentation
No Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Unlocking our Collective Knowledge: LLMs for Data Extraction from Long-Form Documents
As the primary mode of communication . . . between humans, natural language (oftentimes found in the form of text) is one of the most prevalent sources of information across all domains. From scholarly articles to industry reports, textual documentation pervades every facet of knowledge dissemination. This is especially true in the world of aerospace. While other structured data formats may struggle to capture complex relationships, natural language excels by allowing for detailed explanations that a human can understand. However, the flexible, human-centered nature of text has made it traditionally difficult to incorporate into quantitative analyses, leaving potentially valuable insights and features hidden within the troves of documents collecting dust in various repositories. Large Language Models (LLMs) are an emerging technology that can bridge the gap between the expressiveness of unstructured text and the practicality of structured data. Trained to predict the next most likely word following a sequence of text, LLMs built on large and diverse datasets must implicitly learn knowledge related to a variety of fields in order to perform prediction effectively. As a result, modern LLMs have the capability to interpret the underlying semantics of language in many different contexts, allowing them to digest long-form, domain-specific textual information in a fraction of the time that a human could. Among other things, this opens up the possibility of knowledge extraction: the transformation of unstructured textual knowledge to a structured format that is consistent, queryable, and amenable to being incorporated in future statistical or machine learning analyses. Specifically, this work begins by highlighting the use of GPT-4 for categorizing NASA work contracts based on JPL’s organizational structure using textual descriptions of the contract’s work, allowing the lab to better understand how different divisions will be impacted by the increasingly outsourced work environment. Despite its simplicity, the task demonstrates the capability of LLMs to ingest unstructured text and produce structured results (categorical features for each contract indicating the JPL organization that the work would involve) useful for statistical analysis. Potential extensions to this proof of concept are then highlighted, such as the generation of knowledge-graphs/ontologies to encode domain and mission-specific information. Access to a consistent, structured graphical knowledge base would not only improve data-driven decision making in engineering contexts by exposing previously out-of-reach data artifacts to traditional analyses (e.g., numerical data extracted from text, or even graph embeddings which encode entities/nodes as vectors in a way that captures the entity’s relation to the overall structure of the graph), but could also accelerate the development of specialized capabilities like the mission Digital Twin (DT) by enabling access to a reliable, machine-readable database of mission and domain expertise.
Add to Speakers
36 Garrett Chrisman
Cadet, United States Military Academy
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Heqadshot_standing-2-150×150.png
Speed Presentation
Publish
2 Wildfire Burned Area Mapping Using Sentinel-1 SAR and Sentinel-2 MSI with Convolutional Ne
The escalating environmental and . . . societal repercussions of wildfires, underscored by the occurrence of four of the five largest wildfires in Colorado within the past five years, necessitate efficient mapping of burned areas to enhance emergency response and fire control strategies. This study investigates the potential of Synthetic Aperture Radar (SAR) capabilities of the Sentinel-1 satellite, in conjunction with optical imagery from Sentinel-2, to expedite the assessment of wildfire conditions and progression. Our research is structured into four distinct cases; each applied to our dataset comprising seven Colorado wildfires. In each case, we iteratively refined our methods to mitigate the inherent challenges associated with SAR data. Our results demonstrate that while SAR imagery may not match the precision of traditional methodologies, it offers a valuable trade-off by providing a sufficiently accurate estimate of burned areas in significantly less time. Furthermore, we developed a deep learning framework for predicting burn severity using both Sentinel-1 SAR and Sentinel-2 MSI data acquired during wildfire events. Our findings underscore the potential of spaceborne imagery for real-time burn severity prediction, providing valuable insights for the effective management of wildfires. This research contributes to the advancement of wildfire monitoring and response, particularly in regions prone to such events like Colorado, and underscores the significance of remote sensing technologies in addressing contemporary environmental challenges.
Add to Speakers
37 Jacob Langley
Data Science Fellow II, IDA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/photo-150×150.png
Poster Presentation
No Publish
1Promoting . . . access to federally-funded R&D resources, including laboratories, equipment, data, and expertise An Analysis of Topics in Artificial Intelligence at Institutions of Higher Education
This study employs cosine similarity . . . topic modeling to analyze the curriculum content of AI (Artificial Intelligence) bachelor’s and master’s degrees, comparing them with Data Science bachelor’s and master’s degrees, as well as Computer Science (CS) bachelor’s degrees with concentrations in AI. 97 programs total were compared. 52 topics of interest were identified at the course level. The analysis creates a representation for each of the 52 identified topics by compiling course descriptions whose course title matches into a bag-of-words. Cosine similarity is employed to compare the topic coverage of each program against all course descriptions of required courses from within that program. Subsequently, Kmeans and Hierarchical clustering methods are applied to the results to investigate potential patterns and similarities among the programs. The primary objective was to discern whether there are distinguishable differences in the topic coverage of AI degrees in comparison to CS bachelor’s degrees with AI concentrations and Data Science degrees. The findings reveal a notable similarity between AI bachelor’s degrees and CS bachelor’s degrees with AI concentrations, suggesting a shared thematic focus. In contrast, both AI and CS bachelor’s programs exhibit distinct dissimilarities in topic coverage when compared to Data Science bachelor’s and master’s degrees. A notable difference being that the Data Science degrees exhibit much higher coverage of math and statistics than the AI and CS bachelor’s degrees. This research contributes to our understanding of the academic landscape, and helps scope the field as public and private interest into AI is at an all-time high.
Add to Speakers
38 Aayushi Verma
Data Science Fellow II, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/aayushi_verma_headshot-150×150.jpg
Speed Presentation
Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies From Text to Metadata: Automated Product Tagging with Python and NLP
As a research organization, the . . . Institute for Defense Analyses (IDA) produces a variety of deliverables like reports, memoranda, slides, and other formats for our sponsors. Due to their length and volume, summarizing these products quickly for efficient retrieval of information on specific research topics poses a challenge. IDA has led numerous initiatives for historical tagging of documents, but this is a manual and time-consuming process, and must be led periodically to tag newer products. To address this challenge, we have developed a Python-based automated product tagging pipeline using natural language processing (NLP) techniques. This pipeline utilizes NLP keyword extraction techniques to identify descriptive keywords within the content. Filtering these keywords with IDA’s research taxonomy terms produces a set of product tags, serving as metadata. This process also enables standardized tagging of products, compared to the manual tagging process, which introduces variability in tagging quality across project leaders, authors, and divisions. Instead, the tags produced through this pipeline are consistent and descriptive of the contents. This product-tagging pipeline facilitates an automated and standardized process for streamlined topic summarization of IDA’s research products, and has many applications for quantifying and analyzing IDA’s research in terms of these product tags.
Add to Speakers
39 Jonathan Rathsam
Senior Research Engineer, NASA Langley Research Center
https://dataworks.testscience.org/wp-content/uploads/formidable/32/LRC-2022-B701_P-02670-150×150.jpg
Presentation
No Publish
1 Overview of a survey methods test for the NASA Quesst community survey campaign
In its mission to expand knowledge and . . . improve aviation, NASA conducts research to address sonic boom noise, the prime barrier to overland supersonic flight. NASA is currently preparing for a community survey campaign to assess response to noise from the new X-59 aircraft. During each community survey, a substantial number of observations must be collected over a limited timeframe to generate a dose response relationship. A sample of residents will be recruited in advance to fill out a brief survey each time X-59 flies over, approximately 80 times throughout a month. In preparation NASA conducted a month-long test of survey methods in 2023. A sample of 800 residents was recruited from a simulated fly-over area. Because there were no actual X-59 fly-overs, respondents were asked about their reactions to noise from normal aircraft operations. The respondents chose whether to fill out the survey on the web or via a smartphone application. Evaluating response rates and how they evolved over time was a specific focus of the test. Also, a graduated incentive structure was implemented to keep respondents engaged. Finally, location data was collected from respondents since it will be needed to estimate individual noise exposure from X-59. The results of this survey test will help determine the design of the community survey campaign. This is an overview presentation that will cover the key goals, results, and lessons learned from the survey test.
N/A – Contributed Add to Speakers
40 Mohammad Ahmed
NASA Program Data Analyst, OCFO
https://dataworks.testscience.org/wp-content/uploads/formidable/32/ma-headshot-150×150.jpg
Presentation
Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies ADICT – A Power BI visualization tool used transform budget forecasting.
AVATAR Dynamic Interactive Charting Tool . . . (ADICT) is an advanced Power BI tool that has been developed by the National Aeronautics and Space Administration (NASA) to transform budget forecasting and assist in critical budget decision-making. This innovative tool leverages the power of the M language within Power BI to provide organizations with a comprehensive 15-year budget projection system that ensures real-time accuracy and efficiency. It is housed in Power BI for the capabilities to simultaneously update to the model from our excel file named AVATAR. One of the standout features of ADICT is its capability to allow users to define and apply rate changes. This feature empowers organizations, including NASA, to customize their budget projections by specifying rate variations, resulting in precise and adaptable financial forecasting. NASA integrates ADICT to SharePoint to house the model to avoid local drives and seamlessly update the model to adjust for any scenario. This tool is used a scenario-based planner as well, as it can provide support for workforce planning and budget decisions. ADICT seamlessly integrates with source Excel sheets, offering dynamic updates as data evolves. This integration eliminates the need for manual data manipulation, enhancing the overall decision-making process. It ensures that financial projections remain current and reliable, enabling organizations like NASA to respond swiftly to changing economic conditions and emerging challenges. At its core, ADICT enhances budgeting by transforming complex financial data into interactive visualizations, enabling NASA to gain deeper insights into their financial data and make agile decisions. ADICT also seamlessly integrates with source Excel sheets, offering dynamic updates as data evolves. This integration eliminates the need for manual data manipulation, enhancing the overall decision-making process. It ensures that financial projections remain current and reliable for organizations to respond swiftly to changing economic conditions and emerging challenges.
NASA Add to Speakers
41 Starr D’Auria
NDE Engineer, Extende
https://dataworks.testscience.org/wp-content/uploads/formidable/32/20230204_142448-002-150×150.jpg
Presentation
Publish
2Improving . . . the Quality of Test & Evaluation CIVA NDT Simulation: Improving Inspections Today for a Better Tomorrow
CIVA NDT simulation software is a . . . powerful tool for non-destructive testing (NDT) applications. It allows users to design, optimize, and validate inspection procedures for various NDT methods, such as ultrasonic, eddy current, radiographic, and guided wave testing. Come learn about the benefits of using CIVA NDT simulation software to improve the reliability, efficiency, and cost-effectiveness of NDT inspections.
Peter Juarez Add to Speakers
42 Karly Parcell
Cadet, United States Military Academy
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot-3-150×150.jpg
Poster Presentation
Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Object Identification and Classification in Threat Scenarios
In rapidly evolving threat scenarios, . . . the accurate and timely identification of hostile enemies armed with weapons is crucial to strategic advantage and personnel safety. This study aims to develop both a timely and accurate model utilizing YOLOv5 for the detection of weapons and persons in real-time drone footage and generate an alert containing the count of weapons and persons detected. Existing methods in this field often focus on either minimizing type I/type II errors or the speed at which the model runs. In our current work, we have focused on two main points of emphasis throughout training our model. The minimization of type II error (minimizing instances of weapons of persons present but not detected) and keeping accuracy and precision consistent while increasing the speed of our model to keep up with real-time footage. Various parameters were adjusted within our model, including but not limited to speed, freezing layers, and image size. Going from our first to the final adjusted model, overall precision and recall went from 71.9% to 89.2% and 63.7% to 77.5%, respectively. The occurrences of misidentification produced from our model decreased dramatically, from 27% of persons misidentified as either weapon or background noise to 14%, and the misidentification of weapons from 50% to 34%. An important consideration for future work is mitigating overfitting the model to a particular dataset when training. In real-world implementation, our model needs to perform well across a variety of conditions and angles not all of which were introduced in the training data set.
Add to Speakers
43 Joshua Ostrander
Research Staff Member, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/photo-1-150×150.jpg
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Packing for a Road Trip: Provisioning Deployed Units for a Contested Logistics Environment
In the event of a conflict, the . . . Department of Defense (DOD) anticipates significant disruptions to their ability to resupply deployed units with the spare components required to repair their equipment. Simply giving units enough additional spares to last the entirety of the mission without resupply is the most straightforward and risk-averse approach to ensure success. However, this approach is also the most expensive, as a complete duplicate set of spares must be purchased for each unit, reducing the number of systems that can be so augmented on a limited budget. An alternative approach would be to support multiple combatant units with a common set of forward-positioned spares, reducing the duplicative purchasing of critical items with relatively low failure rates and freeing up funding to support additional systems. This approach, while cost-effective, introduces a single point of failure, and presupposes timely local resupply. We have used Readiness Based Sparing (RBS) tools and discrete event simulations to explore and quantify effectiveness of different strategies for achieving high availability in a contested logistics environment. Assuming that local, periodic resupply of spares is possible, we found that creating a centralized pool of forward-positioned spares dramatically decreases the overall cost for a given readiness target compared to augmenting each individual unit with additional spares. Our work ties dollars spent to readiness outcomes, giving DOD leadership the tools to make quantitative tradeoffs.
Add to Speakers
44 Matthew Kursar
System Analyst, DEVCOM Armaments Center
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Kursar_Matthew_MS-Teams-150×150.jpg
Presentation
Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Design of Experiments for Lethality Analysis
Lethality analysis is the modeling and . . . simulation of a weapon and munition pairing against a target. Historically, lethality analyses are conducted in a 2D or 3D grid-like pattern over a space of interest to create a vulnerability map. Other factors of interest, such as terminal velocity, angle of fall, target, kill criteria, and warhead (z-data file) are typically set constant for a given map, and multiple vulnerability maps may be generated if there is interest is seeing the effects of changing those other factors. Results are typically presented as a color-coded visualization of the probability of kill (or other metric of interest) for each point, which might be the average of up to 100 Monte Carlo simulations of that same location/scenario. The set of relevant vulnerability maps may then be used as a lookup table for weapon effects during a follow-on analysis incorporating weapon delivery error. This approach can quickly result in a large computational burden with millions, billions, or sometimes even trillions of runs required as the number of factors of interest increases the spacing needed between points decreases. Here, a methodology is presented that incorporates design of experiments and machine learning with artificial neural networks (ANN) to reduce this computational burden and make accurate lethality predictions over the entire effects space of interest. Once the ANN model has been fit to the data, the result is a closed form, fast-calculating equation that can be used to predict the mean lethality metric of interest at any point in the design space, not just the specific points that were simulated. If there is a need to create 2D or 3D vulnerability maps, then that can be done with the model predictions. With an appropriately fit model, these predictions should be as accurate or possibly even more accurate than the lethality plot otherwise created from the average of a potentially un-converged number of Monte Carlo runs. And by applying design of experiments techniques, it is likely that orders of magnitude less simulations were required to fit the model than those required to directly simulate the dense grids.
Add to Speakers
45 Kyle Risher
Undergraduate Research Intern, Virginia Tech National Security Institute
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Headshot-5-150×150.jpg
Poster Presentation
Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Automated Tools for Improved Accessibility of Bayesian Analysis Methods
Statistical analysis is integral to the . . . evaluation of defense systems throughout the acquisition process. Unlike traditional frequentist statistical methods, newer Bayesian statistical analysis incorporates prior information, such as historical data and expert knowledge, into the analysis for a more integrated approach to test data analysis. With Bayesian techniques, practitioners can more easily decide what data to leverage in their analysis, and how much that data should impact their analysis results. This provides a more flexible, informed framework for decision making in the testing and evaluation of DoD systems. However, the application of Bayesian statistical analyses is often challenging due to the advanced statistical knowledge and technical coding experience necessary for the utilization of current Bayesian programming tools. The development of automated analysis tools can help address these barriers and make modern Bayesian analysis techniques available to a wide range of stakeholders, regardless of technical background. By making new methods more readily-available, collaboration and decision-making are made easier and more effective within the T&E community. To facilitate this, we have developed a web application with the R Shiny package in R. This application uses an intuitive user interface to enable the implementation of our Bayesian reliability analysis approach by non-technical users without the need for any coding knowledge or advanced statistical background. Users can upload reliability data from the developmental testing and operational testing stages of a system of interest and tweak parameters of their choosing to automatically generate plots and estimates of system reliability performance based on their uploaded data and prior knowledge of system behavior.
Add to Speakers
46 Lucas Villanti
Cadet, US Military Academy
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Villanti_Headshot-150×150.jpg
Poster Presentation
No Publish
2 Identifying factors associated to starting pitcher longevity
Background: Current practice on how . . . starting pitchers are pulled vary, are inconsistent, and not transparent. With the new major league baseball (MLB) data available through Statcast, such decisions can be more consistently supported by combining measured data. Methods: To address this gap, we scraped pitch-level data from Statcast, a new technology system that collects real time MLB game measurements using laser technology. Here, we used Statcast data within a Cox regression for survival analysis to identify measurable factors that are associated to pitcher longevity. Measurements from 696,743 pitches were extracted for analysis from the 2021 MLB season. The pitcher was considered “surviving” the pitch if they remained in the game. Mortality was defined as the pitcher’s last pitch. Analysis began at the second inning to account for high variation during the first inning. Results: Statistically significant factors include HSR(Hits to Strike Ratio), Runs per batter faced, and total bases per batter faced yielded the highest hazard coefficients (ranging from 10-23), which means higher risk of being relieved. Conclusions: Our findings indicate that HSR, runs per batter faced and total bases per batter faced provide decision making information for relieving the starting pitcher.
Add to Speakers
47 Patricia Gallagher and Patrick Bjornstad
Data Scientist, Jet Propulsion Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Headshot-1-150×150.png
Presentation
Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Optimizing Mission Concept Development: A Bayesian Approach Utilizing DAGs
This study delves into the application . . . of influence diagrams in mission concept development at the Jet Propulsion Laboratory, emphasizing the importance of how technical variables influence mission costs. Concept development is an early stage in the design process that requires extensive decision-making, in which the tradeoffs between time, money, and scientific goals are explored to generate a wide range of project ideas from which new missions can be selected. Utilizing influence diagrams is one strategy for optimizing decision making. An influence diagram represents decision scenarios in a graphical and mathematical manner, providing an intuitive interpretation of the relationships between input variables (which are functions of the decisions made in a trade space) and outcomes. These input-to-outcome relationships may be mediated by intermediate variables, and the influence diagram provides a convenient way to encode the hypothesized “trickle-down” structure of the system. In the context of mission design and concept development, influence diagrams can inform analysis and decision-making under uncertainty to encourage the design of realistic projects within imposed cost limitations, and better understand the impacts of trade space decisions on outcomes like cost. This project addresses this initiative by focusing on the analysis of an influence diagram framed as a Directed Acyclic Graph (DAG), a graphical structure where vertices are connected by directed edges that do not form loops. Edge weights in the DAG represent the strength and direction of relationships between variables. The DAG aims to model the trickle-down effects of mission technical parameters, such as payload mass, payload power, delta V, and data volume, on mission cost elements. A Bayesian multilevel regression model with random effects is used for estimating the edge weights in a specific DAG (constructed according to expert opinion) that is meant to represent a hypothesized trickle-down structure from technical parameters to cost. This Bayesian approach provides a flexible and robust framework, allowing us to incorporate prior knowledge, handle small datasets effectively, and leverage its capacity to capture the inherent uncertainty in our data.
Michael Dinicola Add to Speakers
48 Dhruv Patel
Research Staff Member, Institute for Defense Analysis
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot-6-150×150.jpg
Presentation
No Publish
2 A practitioner’s framework for federated model V&V resource allocation.
Recent advances in computation and . . . statistics led to an increasing use of Federated Models for system evaluation. A federated model is a collection of sub models interconnected where the outputs of a sub-model act as inputs to subsequent models. However, the process of verifying and validating federated models is poorly understood and testers often struggle with determining how to best allocate limited test resources for model validation. We propose a graph-based representation of federated models, where the graph encodes the connections between sub-models. Vertices of the graph are given by sub-models. A directed edge between vertex a and b is drawn if a inputs into b. We characterize sub-models through vertex attributes and quantify their uncertainties through edge weights. The graph-based framework allows us to quantify the uncertainty propagated through the model and optimize resource allocation based on the uncertainties.
Institute for Defense Analyses Add to Speakers
49 Jason Schlup
Research Staff, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/photo-1-150×150.png
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Silence of the Logs: A Cyber Red Team Data Collection Framework
Capturing the activities of Cyber Red . . . Team operators as they conduct their mission in a way that is both reproducible and also granular enough for detailed analysis poses a challenge to test organizations for cyber testing. Cyber Red Team members act as both operators and data collectors, all while keeping a busy testing schedule and working within a limited testing window. Data collection often suffers at the expense of meeting testing objectives. Data collection assistance may therefore be beneficial to support Cyber Red Team members so they can conduct cyber operations while still delivering the needed data. To assist in data collection, DOT&E, IDA, Johns Hopkins University Applied Physics Lab, and MITRE are developing a framework, including a data standard that supports data collection requirements, for Cyber Red Teams called Silence of the Logs (SotL). The goal of delivering SotL is to have Red Teams continue operations as normal while automatically logging activity in the SotL data standard and generating data needed for analyses. In addition to the data standard and application framework, the SotL development team has created example capabilities that record logs from a commonly used commercial Red Team tool in the data standard format. As Cyber Red Teams adopt other Red Team tools, they can use the SotL data standard and framework to create their own logging mechanisms to meet data collection requirements. Analysts also benefit from the SotL data standard as it enables reproducible data analysis. This talk demonstrates current SotL capabilities and presents possible data analysis techniques enabled by SotL.
Add to Speakers
50 Christian Frederiksen
PhD Student, Tulane University
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Picture-150×150.jpg
Speed Presentation
Publish
2Improving . . . the Quality of Test & Evaluation Bayesian Design of Experiments and Parameter Recovery
With recent advances in computing power, . . . many Bayesian methods that were once impracticably expensive are becoming increasingly viable. Parameter recovery problems present an exciting opportunity to explore some of these Bayesian techniques. In this talk we briefly introduce Bayesian design of experiments and look at a simple case study comparing its performance to classical approaches. We then discuss a PDE inverse problem and present ongoing efforts to optimize parameter recovery in this more complicated setting. This is joint work with Justin Krometis, Nathan Glatt-Holtz, Victoria Sieck, and Laura Freeman.
Add to Speakers
51 James Starling
Associate Professor, United States Military Academy
https://dataworks.testscience.org/wp-content/uploads/formidable/32/jks-150×150.jpg
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Creating Workflows for Synthetic Data Generation and Advanced Military Image Classificatio
The US Government has a specific need . . . for tools that intelligence analysts can use to search and filter data effectively. Artificial Intelligence (AI), through the application of Deep Neural Networks (DNNs) can assist in a multitude of military applications, requiring a constant supply of relevant data sets to keep up with the always-evolving battlefield. Existing imagery does not adequately represent the evolving nature of modern warfare; therefore, finding a way to simulate images of future conflicts could give us a strategic advantage against our adversaries. Additionally, using physical cameras to capture sufficient various lighting and environmental conditions is nearly impossible. The technical challenge in this area is to create software tools for edge computing devices integrated with cameras to process the video feed locally without having to send the video data through bandwidth-constrained networks to servers in data centers. The ability to collect and process data locally, often in austere environments, can accelerate decision making and action taken in response to emergency situations. An important part of this challenge is to create labeled datasets that are relevant to the problem and are needed for training the edge-efficient AI. Teams from Fayetteville State University (FSU) and The United States Military Academy (USMA) will present their proposed workflows that will enable accurate detection of various threats using Unreal Engine (UE) to generate synthetic training data. In principle, production of synthetic data is unlimited and can be customized to location, various environmental variables, and human and crowd characteristics. Together, both teams address the challenges of realism and fidelity; diversity and variability; and integration with real data. The focus of the FSU team is on creating semi-automated workflows to create simulated human-crowd behaviors and the ability to detect anomalous behaviors. It will provide methods of specifying collective behaviors to create crowd simulations of many human agents, and for selecting a few of those agents to exhibit behaviors that are outside of the defined range of normality. The analysis is needed for rapid detection of anomalous activities that can pose security threats and cost human lives. The focus of the USMA team will be in creating semi-autonomous workflows that evaluate the ability of DNNs to identify key military assets under various environmental conditions, specifically armored vehicles and personnel. We aim to vary environmental parameters to simulate varying light conditions and introduce obscuration experiments using artificial means like smoke and natural phenomena like fog to add complexity to the scenarios. Additionally, the USMA team will explore a variety of camouflage patterns and various levels of defilade. The outcome of both teams is to provide workflow solutions that maximize the use of UE to provide realistic datasets that simulate future battlefields and emergency scenarios for evaluating and training existing models. These studies pave the way for creating advanced models trained specifically for military application. Creating adaptive models that can keep up with today’s evolving battlefield will give the military a great advantage in the race for artificial intelligence applications.
Add to Speakers
52 Kelly Koser
Senior Project Manager & Statistician, Johns Hopkins University Applied Physics Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Koser_Kelly_Headshot.png-150×150.jpg
Presentation
No Publish
2Improving . . . the Quality of Test & Evaluation Adaptive Sequential Experimental Design for Strategic Reentry Simulated Environment
To enable the rapid design and . . . evaluation of survivable reentry systems, the Johns Hopkins University Applied Physics Laboratory (JHU/APL) developed a simulation environment to quickly explore the reentry system tradespace. As part of that effort, a repeatable process for designing and assessing the tradespace was implemented utilizing experimental design and statistical modeling techniques. This talk will discuss the utilization of the fast flexible filling experimental design and maximum value-weighted squared error (MaxVSE) adaptive sequential experimental design methods and Gaussian Process modeling techniques for assessing features that impact reentry system trajectories and enabling continuous model refinements. The repeatable scripts used to implement these methods allow for integration into other software tools for a complete end-to-end simulation of reentry systems.
Joseph Warfield Add to Speakers
53 Andrew Cooper
Graduate Student, Virginia Tech Department of Statistics
https://dataworks.testscience.org/wp-content/uploads/formidable/32/IMG_20200827_131341-150×150.jpg
Speed Presentation
Publish
3Improving . . . the Quality of Test & Evaluation Data-Driven Robust Design of an Aeroelastic Wing
This paper applies a Bayesian . . . Optimization approach to the design of a wing subject to stress and aeroelastic constraints. The parameters of these constraints, which correspond to various flight conditions and uncertain parameters, are prescribed by a finite number of scenarios. Chance-constrained optimization is used to seek a wing design that is robust to the parameter variation prescribed by such scenarios. This framework enables computing designs with varying degrees of robustness. For instance, we can deliberately eliminate a given number of scenarios in order to obtain a lighter wing that is more likely to violate a requirement, or might seek a conservative wing design that satisfies the constraints for as many scenarios as possible.
Add to Speakers
54 Yosef Razin
, IDA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/YRazin_HeadShot-150×150.png
Presentation
No Publish
1 Developing AI Trust: From Theory to Testing and the Myths In Between
The Director, Operational Test and . . . Evaluation (DOT&E) and the Institute for Defense Analyses (IDA) are developing recommendations for how to account for trust and trustworthiness in AI-enabled systems during Department of Defense (DoD) Operational Testing (OT). Trust and trustworthiness have critical roles in system adoption, system use and misuse, and performance of human-machine teams. The goal, however, is not to maximize trust, but to calibrate the human’s trust to the system’s trustworthiness. Trusting more than a system warrants can result in shattered expectations, disillusionment, and remorse. Conversely, under trusting implies that humans are not making the most of available resources. Trusted and trustworthy systems are commonly referenced as essential for the deployment of AI by political and defense leaders and thinkers. Executive Order 14110 requires “safe, secure, and trustworthy development and use” of AI. Furthermore, the desired end state of the Department of Defense Responsible AI Strategy is trust. These terms are not well characterized and there is no standard, accepted model for understanding, or method for quantifying, trust or trustworthiness for test and evaluation (T&E). This has resulted in trust and trust calibration rarely being assessed in T&E. This is, in part, due to the contextual and relational nature of trustworthiness. For instance, the developmental tester requires a different level of algorithmic transparency than the operational tester or the operator; whereas the operator may need more understandability than transparency. This means that to successfully operationally test AI-enabled systems, such testing must be done at the right level, with the actual operators and commanders and up-to-date CONOPS as well as sufficient time for training and experience for trust to evolve. The need for testing over time is further amplified by particular features of AI, wherein machine behaviors are no longer as predictable or static as traditional systems but may continue to be updated and adaptive. Thus, testing for trust and trustworthiness cannot be one and done. It is critical to ensure that those who work within AI – in its design, development, and testing – understand exactly what trust actually means, why it is important, and how to operationalize and measure it. This session will empower testers by: • Establishing a common foundation for understanding what trust and trustworthiness are. • Defining key terms related to trust, enabling testers to think about trust more effectively. • Demonstrating the importance of trust calibration for system acceptance and use and the risks of poor calibration. • Decomposing the factors within trust to better elucidate how trust functions and what factors and antecedents have been shown to effect trust in human-machine interaction. • Introducing concepts on how to design AI-enabled systems for better trust calibration, assurance, and safety. • Proposing validated and reliable survey measures for trust. • Discussing common cognitive biases implicated in trust and AI and both the positive and negative roles biases play. • Addressing common myths around trust in AI, including that trust or its measurement doesn’t matter, or that trust in AI can be “solved” with ever more transparency, understandability, and fairness.
Miriam Armstrong Add to Speakers
55 Karen O’Brien
Senior Principal Data Scientist, Modern Technology Solutions, Inc
https://dataworks.testscience.org/wp-content/uploads/formidable/32/OBrien_Headshot_Cropped-150×150.png
Presentation
Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Toward an integrated T&E framework for AI-enabled systems
The classic DoD T&E paradigm . . . (Operational Effectiveness-Suitability-Survivability-Safety) benefits from 40 years of formalism and refinement and has produced numerous specialized testing disciplines (e.g., the -ilities), governing regulations, and rigorous analysis procedures. T&E of AI-enabled systems is still very new and has yet to achieve enough consensus to downselect and formalize the many competing concepts for testing. Currently, significant resources are being invested in demonstrating measures of performance. To some extent, this comes at the expense of other measures such as operational suitability, effectiveness, and top-level critical operational issues. Stepping back from confusion matrix performance metrics reveals a much larger landscape of evaluation issues that derive from various policy sources — issues that are at risk of being overlooked. Borrowing from the classic “survivability onion” conceptual model, we propose a set of integrated and nested evaluation questions for AI-enabled systems that covers the full range of classic T&E considerations, plus a few that are unique to AI technologies within the military operational environment, as well as larger policy considerations such as Responsible AI and law. All requirements for rigorous analytical and statistical techniques are preserved and new opportunities to apply test science are identified. We hope to prompt an exchange of ideas that moves the community toward filling significant T&E capability gaps.
Add to Speakers
56 Zaki Hasnain
Data Scientist, NASA Jet Propulsion Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/zaki_profile_photo_125_100-150×150.jpg
Presentation
No Publish
1 Onboard spacecraft thermal modeling using physics informed machine learning
Modeling thermal states for complex . . . space missions, such as the surface exploration of airless bodies, requires high computation, whether used in ground-based analysis for spacecraft design or during onboard reasoning for autonomous operations. For example, a finite-element-method (FEM) thermal model with hundreds of elements can take significant time to simulate on a typical workstation, which makes it unsuitable for onboard reasoning during time-sensitive scenarios such as descent and landing, proximity operations, or in-space assembly. Further, the lack of fast and accurate thermal modeling drives thermal designs to be more conservative and leads to spacecraft with larger mass and higher power budgets. The emerging paradigm of physics-informed machine learning (PIML) presents a class of hybrid modeling architectures that address this challenge by combining simplified physics models (e.g., analytical, reduced-order, and coarse mesh models) with sample-based machine learning (ML) models (e.g., deep neural networks and Gaussian processes) resulting in models which maintain both interpretability and robustness. Such techniques enable designs with reduced mass and power through onboard thermal-state estimation and control and may lead to improved onboard handling of off-nominal states, including unplanned down-time (e.g. GOES-7 cite{bedingfield1996spacecraft}, and M2020) The PIML model or hybrid model presented here consists of a neural network which predicts reduced nodalizations (coarse mesh size) given on-orbit thermal load conditions, and subsequently a (relatively coarse) finite-difference model operates on this mesh to predict thermal states. We compare the computational performance and accuracy of the hybrid model to a purely data-driven model, and a high-fidelity finite-difference model (on a fine mesh) of a prototype Earth-orbiting small spacecraft. This hybrid thermal model promises to achieve 1) faster design iterations, 2) reduction in mission costs by circumventing worst-case-based conservative planning, and 3) safer thermal-aware navigation and exploration.
NASA Add to Speakers
57 Rachel Sholder & Kathy Kha
Parametric Analyst, JHU/APL
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot2-150×150.jpg
Presentation
No Publish
1Solving P . . .rogram Evaluation Challenges Cost Considerations for Estimating Small Satellite Integration & Test
In the early phases of project . . . formulation, mission integration and test (I&T) costs are typically estimated via a wrap factor approach, analogies to similar missions adjusted for mission specifics, or a Bottom Up Estimate (BUE). The wrap factor approach estimates mission I&T costs as a percentage of payload and spacecraft hardware costs. This percentage is based on data from historical missions, with the assumption that the project being estimated shares similar characteristics with the underlying data set used to develop the wrap factor. This technique has worked well for traditional spacecraft builds since typically as hardware costs grow, I&T test costs do as well. However, with the emergence of CubeSats and nanosatellites, the cost basis of hardware is just not large enough to use the same approach. This suggests that there is a cost “floor” that covers basic I&T tasks, such as a baseline of labor and testing. This paper begins the process of developing a cost estimating relationship (CER) for estimating Small Satellite (SmallSat) Integration & Test (I&T) costs. CERs are a result of a cost estimating methodology using statistical relationships between historical costs and other program variables. The objective in generating a CER equation is to show a relationship between the dependent variable, cost, to one or more independent variables. The results of this analysis can be used to better predict SmallSat I&T costs.
Victoria Nilsen Add to Speakers
58 Keltin Grimes
Assistant Machine Learning Research Scientist, Software Engineering Institute
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot_4x5-150×150.jpg
Presentation
Publish
2Solving P . . .rogram Evaluation Challenges Statistical Validation of Fuel Savings from In-Flight Data Recordings
The efficient use of energy is a . . . critical challenge for any organization, but especially in aviation, where entities such as the United States Air Force operate on a global scale, using many millions of gallons of fuel per year and requiring a massive logistical network to maintain operational readiness. Even very small modifications to aircraft, whether it be physical, digital, or operational, can accumulate substantial changes in a fleet’s fuel consumption. We have developed a prototype system to quantify changes in fuel use due to the application of an intervention, with the purpose of informing decision-makers and promoting fuel-efficient practices. Given a set of in-flight sensor data from a certain type of aircraft and a list of sorties for which an intervention is present, we use statistical models of fuel consumption to provide confidence intervals for the true fuel efficiency improvements of the intervention. Our analysis shows that, for some aircraft, we can reliably detect the presence of interventions with as little as a 1% fuel rate improvement and only a few hundred sorties, enabling rapid mitigation of even relatively minor issues.
Add to Speakers
59 Charles Wheaton
Student, United States Military Academy
https://dataworks.testscience.org/wp-content/uploads/formidable/32/45144754_hx5ocfexh4-1-USflag-Charis-150×150.jpg
Poster Presentation
Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Improving Image Classifiers in Military Settings through Enhanced Training Data
The prevalence of unmanned aerial . . . systems (UAS) and remote sensing technology on the modern battlefield has laid the foundation for an automated targeting system. However, no computer vision model has been trained to support such a system. Difficulties arise in creating these models due to a lack of available battlefield training data. This work aims to investigate the use of synthetic images generated in Unreal Engine as supplementary training data for a battlefield image classifier. We test state-of-the-art computer vision models to determine their performance on drone images of modern battlefields and the suitability of synthetic images as training data. Our results suggest that synthetic training images can improve the performance of state-of-the-art models in battlefield computer vision tasks. This is an abstract for a student poster.
Add to Speakers
60 James Theimer
Operations Research Analyst, HS COBP
https://dataworks.testscience.org/wp-content/uploads/formidable/32/52765067579_88c3ca8bdc_o-150×150.jpg
Presentation
No Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Using Bayesian Network of subsystem statistical models to assess system behavior
Situations exists when a system-level . . . test is rarely accomplished or simply not feasible. When subsystem testing is available, to include creating a subsystem statistical model, an approach is required to combine these models. A Bayesian Network (BN) is an approach to address this problem. A BN models system behavior using subsystem statistical models. The system is decomposed into a network of subsystems and the interactions between the subsystems are described. Each subsystem is in turn described by a statistical model which determines the subjective probability distribution of the outputs given a set of inputs. Previous methods have been developed for validating performance of the subsystem models and subsequently what can be known about system performance. This work defined a notional system, created the subsystem statistical models, generated synthetic data, and developed the Bayesian Network. Then, subsystem models are validated followed by a discussion on how system level information is derived from the Bayesian Network.
Corinne Stafford Add to Speakers
61 Karen O’Brien
Senior Principal Data Scientist, Modern Technology Solutions, Inc
https://dataworks.testscience.org/wp-content/uploads/formidable/32/OBrien_Headshot_Cropped-1-150×150.png
Presentation
Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Hypersonic Glide Vehicle Trajectories: A conversation about synthetic data in T&E
The topic of synthetic data in test and . . . evaluation is steeped in controversy – and rightfully so. Generative AI techniques can be erratic, producing non-credible results that should give evaluators pause. At the same time, there are mission domains that are difficult to test, and these rely on modeling and simulation to generate insights for evaluation. High fidelity modeling and simulation can be slow, computationally intensive, and burdened by large volumes of data – challenges which become prohibitive as test complexity grows. To mitigate these challenges, we posit a defensible, physically valid generative AI approach to creating fast-running synthetic data for M&S studies of hard-to-test scenarios. Characterized as a “Narrow Digital Twin,” we create an exemplar Generative AI model of high-fidelity Hypersonic Glide Vehicle trajectories. The model produces a set of trajectories that meets user-specified criteria (particularly as directed by a Design of Experiments) and that can be validated against the equations of motion that govern these trajectories. This presentation will identify the characteristics of the model that make it suitable for generating synthetic data and propose easy-to-measure acceptability criteria. We hope to advance a conversation about appropriate and rigorous uses of synthetic data within T&E.
Add to Speakers
62 Robert Edman
Machine Learning Research Scientist, Software Engineering Institute
https://dataworks.testscience.org/wp-content/uploads/formidable/32/profile_photo_headshot-150×150.png
Presentation
Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Sensor Fusion for Automated Gathering of Labeled Data in Edge Settings
Data labeling has been identified as the . . . most significant bottleneck and expense in the development of ML enabled systems. High quality labeled data also plays a critical role in the testing and deployment of AI/ML enabled systems, by providing a realistic measurement of model performance in a realistic environment. Moreover, the lack of agreement on test and production data is a commonly cited failure mode for ML systems. This work focuses on methods for automatic label acquisition using sensor fusion methods, specifically in edge settings where multiple sensors, including multi-modal sensors, provide multiple views of an object. When multiple sensors provide probable detection of an object, the detection capabilities of the overall system (as opposed to those of each component of the system) can be improved to highly probable or nearly certain. This is accomplished via a system network of belief propagation that fuses the observations of an object from multiple sensors. These nearly certain detections can, in turn, be used as labels in a semi-supervised like manner. Once the detection likelihood exceeds a specified threshold, the data and the associated label can be used in retraining to produce higher performing models in near real time to improve overall detection capabilities. Automated edge retraining scenarios provide a particular challenge for test and evaluation because it also requires high confidence tests that generalize to potentially unseen environments. The rapid and automated collection of labels enables edge retraining, federated training, dataset construction, and improved model performance. Additionally, improved model performance is an enabling capability for downstream system tasks, including more rapid model deployment, faster time to detect, fewer false positives, simplified data pipelines, and decreased network bandwidth requirements. To demonstrate these benefits, we have developed a scalable reference architecture and dataset that allows repeatable experimentation for edge retraining scenarios. This architecture allows exploration of the complex design space for sensor fusion systems, with variation points including: methods for belief automation, automated labeling methods, automatic retraining triggers, and drift detection mechanisms. Our reference architecture exercises all of these variation points using multi-modal data (overhead imaging, ground-based imaging, and acoustic data).
Add to Speakers
63 Nathan R. Wray
Senior Operator, DOT&E ACO
https://dataworks.testscience.org/wp-content/uploads/formidable/32/184A4860-150×150.jpg
Presentation
Publish
2Improving . . . the Quality of Test & Evaluation Cobalt Strike: A Cyber Tooling T&E Challenge
Cyber Test and Evaluation serves a . . . critical role in the procurement process of Red Team tools; however, once a tool is vetted and approved for use at the Red Team level, it is generally incorporated into their steady state operations without additional concern with regards to testing or maintenance of the tool. As a result, approved tools may not undergo routine in-depth T&E as new versions are released. This presents a major concern for the Red Team community as new versions can change the Operational Security of those tools. Similarly, cyber defenders – either through lack of training or limited resources – have been known to upload Red Team tools to commercial malware analysis platforms, which inadvertently releases potentially sensitive information about Red Team operations. The DOT&E Advanced Cyber Operations team, as part of the Cyber Assessment Program, performed in-depth analysis into Cobalt Strike, versions 4.8 and newer, an adversary simulation software widely used across the Department of Defense and the United States Government. Advanced Cyber Operations identified several operational security concerns that could disclose sensitive information to an adversary with access to payloads generated by Cobalt Strike. This highlights the need to improve the test and evaluation of cyber tooling, at a minimum, for major releases of tools utilized by Red Teams. Advanced Cyber Operations recommends in-depth, continuous test and evaluation of offensive operations tools and continued evaluation to mitigate potential operational security concerns.
Add to Speakers
64 Kate Maffey
Data Scientist, U.S. Army Artificial Intelligence Integration Center
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Maffey_Headshot_Mar_2023-150×150.jpeg
Presentation
No Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities
Many organizations seek to ensure that . . . machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we built MLTE (Machine Learning Test and Evaluation, colloquially referred to as “melt”), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling, a Python package, supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results. In this presentation, we will discuss current MLTE details as well as future plans to support developmental testing (DT) and operational testing (OT) organizations and teams. A problem in the Department of Defense (DoD) is that test and evaluation (T&E) organizations are segregated: OT organizations work independently from DT organizations, which leads to inefficiencies. Model developers doing contractor testing (CT) may not have access to mission and system requirements and therefore fail to adequately address the real-world operational environment. Motivation to solve these two problems has generated a push for Integrated T&E — or T&E as a Continuum — in which testing is iteratively updated and refined based on previous test outcomes, and is informed by mission and system requirements. MLTE helps teams to better negotiate, evaluate, and document ML model and system qualities, and will aid in the facilitation of this iterative testing approach. As MLTE matures, it can be extended to further support Integrated T&E by (1) providing test data and artifacts that OT can use as evidence to make risk-based assessments regarding the appropriate level of OT and (2) ensuring that CT and DT testing of ML models accurately reflects the challenges and constraints of real-world operational environments.
Dr. Jeremy Werner, DOT&E Add to Speakers
65 Giuseppe Cataldo
Planetary Protection Lead, NASA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/IMG_5975-150×150.jpg
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Global Sensitivity Analyses for Test Planning under Constraints with Black-box Models
This work describes sensitivity analyses . . . performed on complex black-box models used to support experimental test planning under limited resources in the context of the Mars Sample Return program, which aims at bringing to Earth rock and atmospheric samples from Mars. We develop a systematic workflow that allows the analysts to simultaneously obtain quantitative insights on key drivers of uncertainty, on the direction of impact, and the presence of interactions. We apply novel optimal transport-based global sensitivity measures to tackle the multivariate nature of the output. On the modeling side, we apply multi-fidelity techniques that leverage low-fidelity models to speed up the calculations and make up for the limited amount of high-fidelity samples, while keeping these in the loop for accuracy guarantees. The sensitivity analysis reveals insights useful for the analysts to understand the model’s behavior and identify the factors to focus on during testing in order to maximize the value of information extracted from them to ensure mission success when limited resources are available.
Add to Speakers
66 John W. Dennis
Research Staff Member (Economist), Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/jay-dennis-picture-150×150.jpg
Presentation
Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Data VV&A for AI Enabled Capabilities
Data – collection, preparation, and . . . curation – is a crucial need in the AI lifecycle. Ensuring that the data are consistent, correct, and representative for the intended use is critical to ensuring the efficacy of an AI enabled system. Data verification, validation, and accreditation (VV&A) is meant to address this need. The dramatic increase in the prevalence of AI-enabled capabilities and analytic tools across the DoD has emphasized the need for a unified understanding of data VV&A, as quality data forms the foundation of AI models. In practice, data VV&A and associated activities are often used in an ad-hoc manner that may limit the ability to support development and testing of AI enabled capabilities. However, existing DOD frameworks for data VV&A are applicable to the AI lifecycle and embody important supporting activities for T&E of AI enabled systems. We highlight the importance of data VV&A, relying on established definitions, and outline some concerns and best practices.
Kelly Avery/DATAWorks 2024 Co-Chairs Add to Speakers
67 Alex Margolis
Subject Matter Expert, Edaptive Computing, Inc
https://dataworks.testscience.org/wp-content/uploads/formidable/32/IMG_20240126_165826661-150×150.jpg
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Bringing No-Code Machine Learning to the average user
In the rapidly evolving landscape of . . . technology, Artificial Intelligence (AI) and Machine Learning (ML) have emerged as powerful tools with transformative potential. However, the adoption of these advanced technologies has often been limited to individuals with coding expertise, leaving a significant portion of the population, particularly those without programming skills, on the sidelines. This shift towards user-friendly AI/ML interfaces not only enhances inclusivity but also opens new avenues for innovation. A broader spectrum of individuals can combine the benefits of these cutting-edge technologies with their own domain knowledge to solve complex problems rapidly and effectively. Bringing no-code AI/ML to subject matter experts is necessary to ensure that the massive amount of data being produced by the DoD is properly analyzed and valuable insights are captured. This presentation delves into the importance of making AI and ML accessible to individuals with no coding experience. By doing so, it opens a world of possibilities for diverse participants to engage with and reap the benefits of the AI revolution. While the prospect of making AI and ML accessible to individuals without coding experience is promising, it comes with its own set of challenges, particularly in addressing the barriers for individuals lacking a background in data analysis. One significant hurdle lies in the complexity of AI and ML algorithms, which often require a nuanced understanding of statistical concepts, data preprocessing, and model evaluation. Individuals without a foundation in analysis may find it challenging to interpret results accurately, hindering their ability to derive meaningful insights from AI-driven applications. Another challenge is the availability of data, especially in the defense domain. Many models require large amounts of data to be effective. Ensuring the quality and consistency of the chosen dataset is a challenge, as individuals may encounter missing values, outliers, or inaccuracies that can adversely impact the performance of their ML models. Data preprocessing steps such as categorical variable encoding, interpolation, and normalization can be performed automatically, but it is important to understand when to use these techniques and why. Applying transformations such as logarithmic or polynomial transformations can enhance model performance. However, individuals with limited experience may struggle to determine when and how to apply these techniques effectively. The lack of familiarity with key concepts such as feature engineering, model selection, and hyperparameter tuning can impede users from effectively utilizing AI tools. The black-box nature of some advanced models further complicates matters, as users may struggle to comprehend the inner workings of these algorithms, raising concerns about transparency and trust in AI-generated outcomes. Ethical considerations and biases inherent in AI models also pose substantial challenges. Users without an analysis background may inadvertently perpetuate biases or misinterpret results, underscoring the need for education and awareness to navigate these ethical complexities. In this talk, we delve into the multifaceted challenges of bringing AI and ML to individuals without a background in analysis, emphasizing the importance of developing solutions that empower individuals to harness the potential of these technologies while mitigating potential pitfalls.
Kelly Avery Add to Speakers
68 Austin Amaya
Lead, Algorithm Testing & Evaluation, MORSE
https://dataworks.testscience.org/wp-content/uploads/formidable/32/20230314_134244-150×150.jpg
Presentation
No Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies The Role of Bayesian Multilevel Models in Performance Measurement and Prediction
T&E relies on series of observations . . . under varying conditions in order to assess overall performance. Traditional evaluation methods can in fact oversimplify complex structures in the data, where variance within groups of observations made under identical experimental conditions differs significantly from that between such groups, introducing biases and potentially misrepresenting true performance capabilities. To address these challenges, MORSE is implementing Bayesian multilevel models. These models adeptly capture the nuanced group-wise structure inherent in T&E data, simultaneously estimating intragroup and intergroup parameters while efficiently pooling information across different model levels. This methodology is particularly adept at regressing against experimental parameters, a feature that conventional models often overlook. A distinct advantage of employing Bayesian approaches lies in their ability to generate comprehensive uncertainty distributions for all model parameters, providing a more robust and holistic understanding how performance varies. Our application of these Bayesian multilevel models has been instrumental in generating credible intervals for performance metrics for applications with varying levels of risk tolerance. Looking forward, our focus will shift towards advancing T&E past the idea of measuring performance towards the idea of modeling performance. 
N/A Contributed Add to Speakers
69 Logan Ausman
Research Staff, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Me-1-150×150.jpeg
Presentation
No Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies A Framework for OT&E of Rapidly Changing Software Systems: C3I and Business Systems
Operational test and evaluation . . . (OT&E) of a system provides the opportunity to examine how representative individuals and units use the system to accomplish their missions, and complements functionality-focused automated testing conducted throughout development. Operational evaluations of software acquisitions need to consider more than just the software itself; they must account for the complex interactions between the software, the end users, and supporting personnel (such as maintainers, help desk staff, and cyber defenders) to support the decision-maker who uses information processed through the software system. We present a framework for meeting OT&E objectives while enabling the delivery schedule for software acquisitions by identifying potential areas for OT&E efficiencies. The framework includes continuous involvement beginning in the early stages of the acquisition program to prepare a test strategy and infrastructure for the envisioned pace of activity during the develop and deploy cycles of the acquisition program. Key early OT&E activities are to acquire, develop, and accredit test infrastructure and tools for OT&E, and embed the OT&E workforce in software acquisition program activities. Early OT&E community involvement in requirements development and program planning supports procedural efficiencies. It further allows the OT&E community to determine whether the requirements address the collective use of the system and include all potential user roles. OT&E during capability development and deployment concentrates on operational testing efficiencies via appropriately scoped, dedicated tests while integrating information from all sources to provide usable data that meets stakeholder needs and informs decisions. The testing aligns with deliveries starting with the initial capability release and continuing with risk-informed approaches for subsequent software deployments.
Rebecca Medlin Add to Speakers
70 Tyler Morgan-Wall
Research Staff Member, IDA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/tyler-headshot-150×150.jpg
Presentation
Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Simulation Insights on Power Analysis with Binary Responses: From SNR Methods to ‘skprJMP’
Logistic regression is a commonly-used . . . method for analyzing tests with probabilistic responses in the test community, yet calculating power for these tests has historically been challenging. This difficulty prompted the development of methods based on signal-to-noise ratio (SNR) approximations over the last decade, tailored to address the intricacies of logistic regression’s binary outcomes and complex probability distributions. Originally conceived as a solution to the limitations of then-available statistical software, these approximations provided a necessary, albeit imperfect, means of power analysis. However, advancements and improvements in statistical software and computational power have reduced the need for such approximate methods. Our research presents a detailed simulation study that compares SNR-based power estimates with those derived from exact Monte Carlo simulations, highlighting the inadequacies of SNR approximations. To address these shortcomings, we will discuss improvements in the open-source R package “skpr” as well as present “skprJMP,” a new plug-in that offers more accurate and reliable power calculations for logistic regression analyses for organizations that prefer to work in JMP. Our presentation will outline the challenges initially encountered in calculating power for logistic regression, discuss the findings from our simulation study, and demonstrate the capabilities and benefits “skpr” and “skprJMP” provide to an analyst.
Rebecca Medlin Add to Speakers
71 Ryan Krolikowski
CDT, United States Military Academy
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Headshot-7-150×150.jpg
Poster Presentation
Publish
3Improving . . . the Quality of Test & Evaluation Army HRC Battalion Command Data Analysis
The Army’s Human Resources Command (HRC) . . . annually takes on the critical task of the Centralized Selection List (CSL) process, where approximately 400 officers are assigned to key battalion command roles. This slating process is a cornerstone of the Army’s broader talent management strategy, involving collaborative input from branch proponent officers and culminating in the approval of the Army Chief of Staff. The study addresses crucial shortcomings in the existing process for officer assignments, focusing on the biases and inconsistent weighting that affect slate selection outcomes. It examines the effects of incorporating specific criteria like Skill Experience Match (SEM), Knowledge, Skills, Behaviors (KSB), Order of Merit List (OML), and Officer Preferences (PREF) into the selection process of a pilot program. Our research specifically addresses the terms efficiency, strength, and weakness within the context of the pairing process. Our objective is to illuminate the potential advantages of a more comprehensive approach to decision-making in officer-job assignments, ultimately enhancing the effectiveness of placing the most suitable officer in the most fitting role.
Add to Speakers
72 Thomas A Ulrich
Human Factors and Reliability Research Scientist, Idaho National Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Ulrich_P-11535-2_square_150ppi-150×150.png
Presentation
Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Rancor-HUNTER: A Virtual Plant and Operator Environment for Predicting Human Performance
Advances in simulation capabilities to . . . model physical systems have outpaced the development of simulations for humans using those physical systems. There is an argument that the infinite span of potential human behaviors inherently render human modeling more challenging than physical systems. Despite this challenge, the need for modeling humans interacting with these complex systems is paramount. As technologies have improved, many of the failure modes originating from the physical systems have been solved. This means the overall proportion of human errors has increased, such that it is not uncommon to be the primary driver of system failure in modern complex systems. Moreover, technologies such as automated systems may introduce emerging contexts that can cause new, unanticipated modes of human error. Therefore, it is now more important than ever to develop models of human behavior to realize overall system error reductions and achieve established safety margins. To support new and novel concepts of operations for the anticipated wave of advanced nuclear reactor deployments, human factors and human reliability analysis researchers need to develop advanced simulation-based approaches. This talk presents a simulation environment suitable to both collect data and then perform Monte Carlo simulations to evaluate human performance and develop better models of human behavior. Specifically, the Rancor Microworld Simulator models a complex energy production system in a simplified manner. Rancor includes computer-based procedures, which serve as a framework to automatically classify human behaviors without manual, subjective experimenter coding during scenarios. This method supports a detailed level of analysis at the task level. It is feasible for collecting large sample sizes required to develop quantitative modelling elements that have historically challenged traditional full-scope simulator study approaches. Additionally, the other portion of this experimental platform, the Human Unimodel for Nuclear Technology to Enhance Reliability (HUNTER), is presented to show how the collected data can be used to evaluate novel scenarios based on the contextual factors, or performance shaping factors, derived from Rancor simulations. Rancor-HUNTER is being used to predict operator performance with new procedures, such as results from control room modernization or new-build situations. Rancor-HUNTER is also proving a useful surrogate platform to model human performance for other complex systems.
Andrea Brown Add to Speakers
73 Jason Ingersoll
Cadet, West Point
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot_Ingersoll.pdf
Poster Presentation
Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Tactical Route Optimization
Military planners frequently face the . . . challenging task of devising a route plan based solely on a map and a grid coordinate of their objective. This traditional approach is not only time-consuming but also mentally taxing. Moreover, it often compels planners to make broad assumptions, resulting in a route that is based more on educated guesses than on data-driven analysis. To address these limitations, this research explores the potential of utilizing a path-finding algorithm to assist planners such as A*. Specifically, our algorithm aims to identify the route that minimizes the likelihood of enemy detection, thereby providing a more optimized and data-driven path for mission success. We have developed a model that takes satellite imagery data and produces a feasible route that minimizes detection given the location of an enemy. Future work includes improving the graphical interface and the development of k-distinct paths to provide planners with multiple options.
Add to Speakers
74 Max Felter
CDT, USCC West Point
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Maxwell_Felter.heic
Poster Presentation
Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies CGI Applied to Computer Vision
As the battlefield undergoes constant . . . evolution, and we anticipate future conflicts, there is a growing need for apt computer vision models tailored toward military applications. The heightened use of drones and other technology on the modern battlefield has led to a demand for effective models specifically trained on military equipment. However, there has not been a proper effort to assemble or utilize data from recent wars for training future-oriented models. Creating new quality data poses costs and challenges that make it unrealistic for the sole purpose of training these models. This project explores a way around these barriers with the use of synthetic data generation using the Unreal Engine, a prominent computer graphics gaming engine. The ability to create computer-generated videos representative of the battlefield can impact model training and performance. I will be limiting the scope to focus on armored vehicles and the point of view of a consumer drone. Simulating a drone’s point of view in the Unreal Engine, I will create a collection of videos with ample variation. Using this data, I will experiment with various training methods to provide commentary on the best use of synthetic imagery for this task. If shown to be promising, this method can provide a feasible solution to prepare our models and military for what comes next.
Add to Speakers
75 Mason Zoellner
Undergraduate Research Assistant, Hume Center for National Security and Technology
https://dataworks.testscience.org/wp-content/uploads/formidable/32/thumbnail_D2B12B734BE54CC1A2839C96A97FD3BE-150×150.jpg
Poster Presentation
No Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Automatic Target Recognition Project
The goal of this project was to develop . . . an algorithm that automatically detects and predicts the future position of boats in a maritime environment. We integrated these algorithms into an intelligent combat system developed by the Naval Surface Warfare Center Dahlgren Division (NSWCDD). This algorithm used YOLOv8 for computer vision detection and a linear Kalman filter for prediction. The data used underwent extensive augmentation and third party integration. It was tested at a Live, Virtual, and Constructive (LVC) event held at NSWCDD this past fall (October 2023). The initial models faced challenges of overfitting. However, through processes such as data augmentation, incorporation of third-party data, and layer freezing techniques, we were able to develop a more robust model. Various datasets were processed by tools to improve data robustness. By further labeling the data, we were able to obtain ground truth data to evaluate the Kalman filter. The Kalman filter was chosen for its versatility and predictive tracking capabilities. Qualitative and quantitative analysis were performed for both the YOLO and Kalman filter models. Much of the project’s contribution lay in its ability to adapt to a variety of data. YOLO displayed effectiveness across various maritime scenarios, and the Kalman Filter excelled in predicting boat movements across difficult situations such as abrupt camera movements. In preparation for the live fire test event, our algorithm was integrated into the NSWCDD system and code was written to produce the expected output files. In summary, this project successfully developed an algorithm for detecting and predicting boats in a maritime environment. This project demonstrated the potential of the intersection of machine learning, rapid integration of technology, and maritime security.
Add to Speakers
76 Jasmine Ratchford
Research Scientist, CMU Software Engineering Institute
https://dataworks.testscience.org/wp-content/uploads/formidable/32/1I5A7952-1-150×150.jpeg
Presentation
No Publish
1Improving . . . the Quality of Test & Evaluation Experimental Design for Usability Testing of LLMs
Large language models (LLMs) are poised . . . to dramatically impact the process of composing, analyzing, and editing documents, including within DoD and IC communities. However, there have been few studies that focus on understanding human interactions and perceptions of LLM outputs, and even fewer still when one considers only those relevant to a government context. Furthermore, there is a paucity of benchmark datasets and standardized data collection schemes necessary for assessing the usability of LLMs in complex tasks, such as summarization, across different organizations and mission use cases. Such usability studies require an understanding beyond the literal content of the document; the needs and interests of the reader must be considered, necessitating an intimate understanding of the operational context. Thus, adequately measuring the effectiveness and suitability of LLMs requires usability testing to be incorporated into the testing and evaluation process. However, measures of usability are stymied by three challenges. First, there is an unsatisfied need for mission-relevant data that can be used for assessment, a critical first step. Agencies must provide data for assessment of LLM usage, such as report summarization, to best evaluate the effectiveness of LLMs. Current widely available datasets for assessing LLMs consist primarily of ad hoc exams ranging from the LSAT to sommelier exams. High performance on these exams offers little insight into LLM performance on mission tasks, which possess a unique lexicon, set of high-stakes mission applications, and DoD and IC userbase. Notably, our prior work indicates that currently available curated datasets are unsuitable proxies for government reporting. Our search for proxy data for intelligence reports led us on a path to create our own dataset in order to evaluate LLMs within mission contexts. Second, a range of experimental design techniques exists for collecting human-centric measures of LLM usability, each with their own benefits and disadvantages. Navigating the tradeoffs between these different techniques is challenging, and the lack of standardization across different groups inhibits comparison between groups. A discussion is provided on the potential usage of commonly conducted usability studies, including heuristic evaluations, observational and user experience studies, and tool instrumentation focusing on LLMs used in summarization. We will describe the pros and cons of each study, crafting guidance against their approximate required resources in terms of time (planning, participant recruitment, study, and analysis), compute, and data. We will demonstrate how our data collection prototype for summarization tasks can be used to streamline the above. The final challenge involves associating human-centric measures, such as ratings of fluency, to other more quantitative and mission-level metrics. We will provide an overview of measures for summarization quality, including ratings for accuracy, concision, fluency, and completeness, and discuss current efforts and existing challenges in associating those measures to quantitative and qualitative metrics. We will also discuss the value of such efforts in building a more comprehensive assessment of LLMs, as well as the relevance of these efforts to document summarization.
Rebecca Medlin Add to Speakers
77 Adam Miller
Research Staff Member, IDA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/photo_ampm_copy-1-150×150.png
Presentation
No Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Operational T&E of AI-supported Data Integration, Fusion, and Analysis Capabilities
AI will play an important role in future . . . military systems. However, large questions remain about how to test AI systems, especially in operational settings. Here, we discuss operational test and evaluation (OT&E) methods for evaluating systems with AI-supported data integration, fusion, and analysis capabilities. Our approach includes existing OT&E methods that remain applicable, and new methods for dealing with AI-specific challenges. We then present a notional test concept for the Distributed Common Ground System-Army Capability Drop 2, which includes multiple AI-supported components. The test concept focuses on how to evaluate the system’s novel AI components from the perspective of its technical performance (How accurate is the AI output?), human systems interaction (How does the AI work with users?), and adversarial resilience (Can adversaries affect AI performance?).
Rebecca Medlin Add to Speakers
78 Nicholas Jones
STAT Expert, STAT COE
https://dataworks.testscience.org/wp-content/uploads/formidable/32/NJ_headshot-150×150.jpg
Presentation
No Publish
1Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Case Study: State Transition Maps for Mission Model Development and Test Objective Identif

Coming soon

Corinne Stafford Add to Speakers
79 Russell Gilabert
Researcher/Engineer, NASA Langley Research Center
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Headshot-8-150×150.jpg
Presentation
Publish
2Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Simulated Multipath Using Software Generated GPS Signals
Depending on the environment, multipath . . . can be one of the largest error sources contributing to degradation in Global Navigation Satellite System (GNSS) (e.g., GPS) performance. Multipath is a phenomenon that occurs as radio signals reflect off of surfaces, such as buildings, producing multiple copies of the original signal. When this occurs with GPS signals, it results in one or more delayed signals arriving at the receiver with or without the on-time/direct GPS signal. The receiver measures the composite of these signals which, depending on the severity of the multipath, can substantially degrade the accuracy of the receiver’s calculated position. Multipath is commonly experienced in cities due to tall buildings and its mitigation is an ongoing area of study. This research demonstrates a novel approach for simulating GPS multipath through the modification of an open-source tool, GPS-SDR-SIM. The resulting additional testing capability could allow for improved development of multipath mitigating technologies. Currently, open-source tools for simulating GPS signals are available and can be used in the testing and evaluation of GPS receiver equipment. These tools can generate GPS signals that, when used by a GPS receiver, result in computation of a position solution that was pre-determined at the time of signal generation. That is, the signals produced are properly formed for the pre-determined location and result in the receiver reporting that position. This allows for a GPS receiver under test to be exposed to various simulated locations and conditions without having to be physically subjected to them. Additionally, while these signals are generated by a software simulation, they can be processed by real or software defined GPS receivers. This work utilizes the GPS-SDR-SIM software tool for GPS signal generation and while this tool does implement some sources of error that are inherent to GPS, it cannot inject multipath. GPS-SDR-SIM was modified in this effort to produce additional copies of signals with pre-determined delays. These additional delayed signals mimic multipath and represent what happens to GPS signals in the real world as they reflect off of surfaces and arrive at a receiver in place of or alongside the direct GPS signal. A successful proof of concept was prototyped and demonstrated using this modified version of GPS-SDR-SIM to produce simulated GPS signals as well as additional simulated multipath signals. The generated data was processed using a software defined GPS receiver and it was found that the introduction of simulated multipath signals successfully produced the expected characteristics of a composite multipath signal. Further maturation of this work could allow for the development of a GPS receiver testing and evaluation framework and aid in the development of multipath mitigating technologies.
NA, contributed abstract Add to Speakers
80 Leo Blanken
Assoc Professor, Naval Postgraduate School
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot1-150×150.png
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies The Trade-Offs in Choosing Different Apertures for the Modeling of Defense Planning
We propose a framework to enable an . . . updated approach for modeling national security decisions. The basis of our approach is to treat national security as the multi-stage production of a service provided by the state to foster a nation’s welfare. The challenge in analyzing this activity stems from the fact that this a complex process that is conducted by a vast number of actors that span four discrete stages of production: budgeting, planning, coercion, warfighting. We argue that decisions made at any given stage of the process that fail to consider incentives problems that may occur at all stage of the process constitutes a myopic approach to national security. In this presentation we will present our larger Feasible Production Frameworkapproach (a formal framework based on Principal-Agent analysis), before focusing on the planning stage of production for this audience. This presentation will highlight the trade-offs in modeling within a narrow “single-stage aperture” versus a holistic “multi-stage aperture”.
McCoy Klink Add to Speakers
81 Christian Smart
, Jet Propulsion Laboratory
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Christian-Smart-Headshot-150×150.jpg
Presentation
Publish
2Solving P . . .rogram Evaluation Challenges Best of Both Worlds: Combining Parametric Cost Risk Analysis with Earned Value Management
Murray Cantor, Ph.D., Cantor . . . Consulting Christian Smart, Ph.D., Jet Propulsion Laboratory, California Institute of Technology Cost risk analysis and earned value data are typically used separately and independently to estimate Estimates at Completion (EAC). However, there is significant value to combining the two in order to improve the accuracy of EAC forecasting. In this paper, we provide a rigorous method for doing this using Bayesian methods. In earned value management (EVM), the Estimate at Completion (EAC) is perhaps the critical metric. It is used to forecast the effort’s total work cost as it progresses. In particular, it is used to see if the work is running over or under its planned budget, specified as the budget at completion (BAC). Separate probability distribution functions (PDF) of the EAC at the onset of the effort and after some activities have been completed show the probability that EAC will fall within the BAC, and, conversely, the probability it won’t. At the onset of an effort, the budget is fixed, and the EAC is uncertain. As the work progresses, some of the actual costs (AC) are reported. The EAC uncertainty should then decrease and the likelihood of meeting the budget should increase. If the area under the curve to the left of the BAC decreases as the work progresses, the budget is in jeopardy, and some management action is warranted. This paper will explain how to specify the initial PDF and learn the later PDFs from the data tracked in EVM. We describe the technique called Bayesian parameter learning (BPL). We chose this technique because it is the most robust for exploiting small sets of progress data and is most easily used by practitioners. This point will be elaborated further in the paper.
Michael DiNicola Add to Speakers
82 Peter Juarez
Research Engineer, NASA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Photo1-150×150.jpg
Presentation
No Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Applications for inspection simulation at NASA
The state of the art of numerical . . . simulation of nondestructive evaluations (NDE) has begun to transition from research to application. The simulation software, both commercial and custom, has reached a level of maturity where it can be readily deployed to solve real world problems. The next area of research that is beginning to emerge is determining when and how to NDE simulation should be applied. At NASA Langley Research Center, NDE simulations have already been utilized for several aerospace projects to facilitate or enhance understanding of the inspection optimization and interpretation of results. Researchers at NASA have identified several different scenarios where it is appropriate to utilized NDE simulations. In this presentation, we will describe these scenarios, give examples of each instance, and demonstrate how NDE simulations were applied to solve problems with an emphasis on the mechanics of integrating with other workgroups. These examples will include inspection planning for multi-layer pressure vessels as well as on-orbit inspections.
Elizabeth Gregory Add to Speakers
83 Andrew Simpson
Ph.D. Student, South Dakota State University
https://dataworks.testscience.org/wp-content/uploads/formidable/32/6A7A7278-copy-150×150.jpg
Speed Presentation
No Publish
2 Clustering Singular and Non-Singular Covariance Matrices for Classification
In classification problems when working . . . in high dimensions with a large number of classes and few observations per class, linear discriminant analysis (LDA) requires the strong assumptions of a shared covariance matrix between all classes and quadratic discriminant analysis leads to singular or unstable covariance matrix estimates. Both of these can lead to lower than desired classification performance. We introduce a novel, model-based clustering method which can relax the shared covariance assumptions of LDA by clustering sample covariance matrices, either singular or non-singular. This will lead to covariance matrix estimates which are pooled within each cluster. We show using simulated and real data that our method for classification tends to yield better discrimination compared to other methods.
Rebecca M. Medlin Add to Speakers
84 Dr. David “Fuzzy” Wells, CMSP
Principal Cyber Simulationist, The MITRE Corporation
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot-1-1-150×150.png
Presentation
No Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Recommendations for Cyber Test & Evaluation of Space Systems
WAITING ON PUBLIC RELEASE This . . . presentation marks the conclusion of a study aimed to understand the current state of cyber test and evaluation (T&E) activities that occur within the space domain. This includes topics such as cyber T&E challenges unique to the space domain (e.g., culture and motivations, space system architectures and threats, cyber T&E resources), cyber T&E policy and guidance, and results from a space cyber T&E survey and set of interviews. Recommendations include establishing a cyber T&E helpdesk and rapid response team, establishing contracting templates, incentivizing space cyber T&E innovation, growing and maturing the space cyber T&E workforce, and learning from cyber ranges. NO VIDEOTAPING OF THIS PRESENTATION.
Dr. Mark Herrera Add to Speakers
85 Harris Bernstein
Data Scientist, Johns Hopkins University Applied Physics Lab
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Harris-headshot-2-150×150.jpg
Presentation
Publish
1Improving . . . the Quality of Test & Evaluation Bayesian Reliability Growth Planning for Discrete Systems
Developmental programs for complex . . . systems with limited resources often face the daunting task of predicting the time needed to achieve system reliability goals. Traditional reliability growth plans rely heavily on operational testing. They use confidence estimates to determine the required sample size, and then work backward to calculate the amount of testing required during the developmental test program to meet the operational test goal and satisfy a variety of risk metrics. However, these strategies are resource-intensive and do not take advantage of the information present in the developmental test period. This presentation introduces a new method for projecting the reliability growth of a discrete, one-shot system. This model allows for various corrective actions to be considered, while accounting for both the uncertainty in the corrective action effectiveness and the management strategy used to parameterize those actions. Solutions for the posterior distribution on the system reliability are found numerically, while allowing for a variety of prior distributions on the corrective action effectiveness and the management strategy. Additionally, the model can be extended to account for system degradation across testing environments. A case study demonstrates how this model can use historical data with limited failure observations to inform its parameters, making it even more valuable for real-world applications. This work builds upon previous research in Reliability Growth planning from Drs. Brian Hall and Martin Wayne.
Dr. Jo Anna Capp Add to Speakers
86 Nathan Gaw
Assistant Professor of Data Science, Assistant Professor of Data Science
https://dataworks.testscience.org/wp-content/uploads/formidable/32/AF_Unofficial_Nathan-Gaw-150×150.jpg
Presentation
No Publish
3Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Assessing the Calibration and Performance of Attention-based Spatiotemporal Neural Network
In the last decade, deep learning models . . . have proven capable of learning complex spatiotemporal relations and producing highly accurate short-term forecasts, known as nowcasts. Various models have been proposed to forecast precipitation associated with storm events hours before they happen. More recently, neural networks have been developed to produce accurate lightning nowcasts, using various types of satellite imagery, past lightning data, and other weather parameters as inputs to their model. Furthermore, the inclusion of attention mechanisms into these spatiotemporal weather prediction models has shown increases in the model’s predictive capabilities. However, the calibration of these models and other spatiotemporal neural networks is rarely discussed. In general, model calibration addresses how reliable model predictions are, and models are typically calibrated after the model training process using scaling and regression techniques. Recent research suggests that neural networks are poorly calibrated despite being highly accurate, which brings into question how accurate the models are. This research develops attention-based and non-attention-based deep-learning neural networks that uniquely incorporate reliability measures into the model tuning and training process to investigate the performance and calibration of spatiotemporal deep-learning models. All of the models developed in this research prove capable of producing lightning occurrence nowcasts using common remotely sensed weather modalities, such as radar and satellite imagery. Initial results suggest that the inclusion of attention mechanisms into the model architecture improves the model’s accuracy and predictive capabilities while improving the model’s calibration and reliability.
ASA/SDNS Add to Speakers
87 Gavin Collins
R&D Statistician, Sandia National Laboratories
https://dataworks.testscience.org/wp-content/uploads/formidable/32/headshot-9.jpg
Presentation
No Publish
3Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Bayesian Projection Pursuit Regression
In projection pursuit regression (PPR), . . . a univariate response variable is approximated by the sum of M “ridge functions,” which are flexible functions of one-dimensional projections of a multivariate input variable. Traditionally, optimization routines are used to choose the projection directions and ridge functions via a sequential algorithm, and M is typically chosen via cross-validation. We introduce a novel Bayesian version of PPR, which has the benefit of accurate uncertainty quantification. To infer appropriate projection directions and ridge functions, we apply novel adaptations of methods used for the single ridge function case (M=1), called the Bayesian Single Index Model; and use a Reversible Jump Markov chain Monte Carlo algorithm to infer the number of ridge functions $M$. We evaluate the predictive ability of our model in 20 simulated scenarios and for 23 real datasets, in a bake-off against an array of state-of-the-art regression methods. Finally, we generalize this methodology and demonstrate the ability to accurately model multivariate response variables. Its effective performance indicates that Bayesian Projection Pursuit Regression is a valuable addition to the existing regression toolbox.
Rebecca Medlin (SDNS Student Paper Competition) Add to Speakers
88 Christina Houfek
Lead PjM, VT-ARC
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Christina-Houfek-150×150.jpg
Presentation
Publish
1Improving . . . the Quality of Test & Evaluation Joint Test Concept
The Joint force will likely be contested . . . in all domains during the execution of distributed and potentially non-contiguous, combat operations. This challenge inspires the question, “How do we effectively reimagine efficient T&E within the context of expected contributions to complex Joint kill/effects webs?” The DOT&E sponsored Joint Test Concept applies an end-to-end capability lifecycle campaign of learning approach, anchored in mission engineering, and supported by a distributed live, virtual, constructive environment to assess material and non-material solutions’ performance, interoperability, and impact to service and Joint mission execution. Relying on input from the expanding JTC community of interest and human centered design facilitation, the final concept is intended to ensure data quality, accessibility, utility, and analytic value across existing and emergent Joint mission (kill/effects) webs for all systems under test throughout the entire capability lifecycle. Using modeling and simulation principles, the JTC team is developing an evaluation model to assess the impact of the JTC within the current T&E construct to identify the value proposition across a diverse stakeholder population.
Andrea Brown Add to Speakers
89 Christina Houfek
Lead PjM, VT-ARC
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Christina-Houfek-1-150×150.jpg
Mini-Tutorial
Publish
1Improving . . . the Quality of Test & Evaluation Leading Change: Applying Human Centered Design Facilitation Techniques
First introduced in 1987, modern design . . . thinking was popularized by the Stanford Design School and the global design and innovation company, IDEO. Design thinking is now recognized as a “way of thinking which leads to transformation, evolution and innovation” and has been so widely accepted across industry and within the DoD, that universities offer graduate degrees in the discipline. Relying on the design thinking foundation, the Decision Science Division (DSD) of Virginia Tech Applied Research Corporation (VT-ARC) human centered design facilitation technique integrates related methodologies including liberating structures and open thinking. Liberating structures, are “simple and concrete tools that can enhance group performance in diverse organizational settings.” Open thinking, popularized by Dan Pontefract, provides a comprehensive approach to decision-making that incorporates critical and creative thinking techniques. The combination of these methodologies enables tailored problem framing, innovative solution discovery, and creative adaptability to harness collaborative analytic potential, overcome the limitations of cognitive biases, and lead change. DSD VT-ARC applies this approach to complex and wicked challenges to deliver solutions that address implementation challenges and diverse stakeholder requirements. Operating under the guiding principle that collaboration is key to success, DSD regularly partners with other research organizations, such as Virginia Tech National Security Institute (VT NSI), in human centered design activities to help further the understanding, use, and benefits of the approach. This experiential session will provide attendees with some basic human centered design facilitation tools and an understanding of how these techniques might be applied across a multitude of technical and non-technical projects.
Andrea Brown Add to Speakers
90 James Ferry
Principal Research Scientist, Metron, Inc.
Presentation
Publish
3Improving . . . the Quality of Test & Evaluation Dynamo: Adaptive T&E via Bayesian Decision Theory
The Dynamo paradigm for T&E compares . . . a set of test options for a system by computing which of them provides the greatest expected operational benefit relative to the cost of testing. This paradigm will be described and demonstrated for simple, realistic cases. Dyanmo stands for DYNAmic Knowledge + MOneyball. These two halves of Dynamo are its modeling framework and its chief evaluation criterion, respectively. A modeling framework for T&E is what allows test results (and domain knowledge) to be leveraged to predict operational system performance. Without a model, one can only predict, qualitatively, that operational performance will be similar to test performance in similar environments. For quantitative predictions one can formulate a model that inputs a representation of an operational environment and outputs the probabilities of the various possible outcomes of using the system there. Such models are typically parametric: they have a set of unknown parameters to be calibrated during test. The more knowledge one has about a suitable model’s parameters, the better predictions one can make about the modeled system’s operational performance. The Bayesian approach to T&E encodes this knowledge as a probability distribution over the model parameters. This knowledge is initialized with data from previous testing and with subject matter expertise, and it is “dynamic” because it is updated whenever new test results arrive. An evaluation criterion is a metric for the operational predictions provided by the modeling framework. One type of metric is about whether test results indicate a system meets requirements: this question can be addressed with increasing nuance as one employs more sophisticated modeling frameworks. Another type of metric is how well a test design will tighten knowledge about model parameters, regardless of what the test results themselves are. The Dynamo paradigm can leverage either, but it uses a “Moneyball” metric for recommending test decisions. A Moneyball metric quantifies the expected value of the knowledge one would gain from testing (whether from an entire test event, or from just a handful of trials) in terms of the operational value this knowledge would provide. It requires a Bayesian modeling framework so that incremental gains in knowledge can be represented and measured. A Moneyball metric quantifies stakeholder preferences in the same units as testing costs, which enables a principled cost/benefit analysis not only of which tests to perform, but of whether to conduct further testing at all. The essence of Dynamo is that it applies Bayesian Decision Theory to T&E to maintain and visualize the state of knowledge about a system under test at all times, and that it can make recommendations at any time about which test options to conduct to provide the greatest expected benefit to stakeholders relative to the cost of testing. This talk will discuss the progress to date developing Dynamo and some of the future work remaining to make it more easily adaptable to testing specific systems.
Kelly Avery Add to Speakers
91 Peter Parker
, NASA
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Parker-Headshot-291×300-1.jpg-150×150.webp
Presentation
Publish
1 Practical Experimental Design Strategies for Binary Responses under Operational Constraint
Defense and aerospace testing commonly . . . involves binary responses to changing levels of a system configuration or an explanatory variable. Examples of binary responses are hit or miss, detect or not detect, and success or fail, and they are a special case of categorical responses with multiple discreet levels. The test objective is typically to estimate a statistical model that predicts the probability of occurrence of the binary response as a function of the explanatory variable(s). Statistical approaches are readily available for modeling binary responses; however, they often assume that the design features large sample sizes that provide responses distributed across the range of the explanatory variable. In practice, these assumptions are often challenged by small sample sizes and response levels focused over a limited range of the explanatory variable(s). These practical restrictions are due to experimentation cost, operational constraints, and a primary interest in one response level, e.g., testing may be more focused on hits compared to a misses. This presentation provides strategies to address these challenges with an emphasis on collaboration techniques to develop experimental design approaches under practical constraints. Case studies are presented to illustrate these strategies from estimating human annoyance to low noise supersonic overflights in NASA’s Quesst mission and evaluating detection capability of nondestructive evaluation methods for fracture-critical human-spaceflight components. This presentation offers practical guidance on experimental design strategies for binary responses under operational constraints.
Rebecca Medlin Add to Speakers
92 Justin Krometis, Adam Ahmed, Jim Ferry
Research Assistant Professor, Virginia Tech
https://dataworks.testscience.org/wp-content/uploads/formidable/32/krometis_portrait_cropped-150×150.jpg
Mini-Tutorial
Publish
2Improving . . . the Quality of Test & Evaluation Leveraging Bayesian Methods to support Integrated Testing
This mini-tutorial will outline . . . approaches to apply Bayesian methods to the test and evaluation process, from development of tests to interpretation of test results to translating that understanding into decision-making. We will begin by outlining the basic concepts that underlie the Bayesian approach to statistics and the potential benefits of applying that approach to test and evaluation. We will then walk through application to an example (notional) program, setting up data models and priors on the associated parameters, and interpreting the results. From there, techniques for integrating results from multiple stages of tests will be discussed, building understanding of system behavior as evidence accumulates. Finally, we will conclude by describing how Bayesian thinking can be used to translate information from test outcomes into requirements and decision-making. The mini-tutorial will assume some background in statistics but the audience need not have prior exposure to Bayesian methods.
Rebecca Medlin/Kelly Avery Add to Speakers
93 Boris Chernis
Research Associate, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Me_Skype-150×150.png
Mini-Tutorial
No Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Advancing Reproducible Research: Concepts, Compliance, and Practical Applications
Reproducible research principles ensure . . . that analyses can be verified and defended by meeting the criterion that conducting the same analysis on the same data should yield identical results. Not only are reproducible analyses more defensible and less susceptible to errors, but they also enable faster iteration and yield cleaner results. In this seminar, we will delve into how to conceptualize reproducible research and explore how reproducible research practices align with government policies. Additionally, we will provide hands-on examples, using Python and MS Excel, illustrating various approaches for conducting reproducible research.
Institute for Defense Analyses Add to Speakers
94 Gabriela Parasidis
Lead Systems Engineer, MITRE
https://dataworks.testscience.org/wp-content/uploads/formidable/32/DSC_0297-3-150×150.jpg
Presentation
Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Mission Engineering
The US Department of Defense (DoD) has . . . expanded their emphasis on the application of systems engineering approaches to ‘missions’. As originally defined in the Defense Acquisition Guidebook, Mission Engineering (ME) is “the deliberate planning, analyzing, organizing, and integrating of current and emerging operational and system capabilities to achieve desired operational mission effects”. Based on experience to date, the new definition reflects ME as an “an interdisciplinary approach and process encompassing the entire technical effort to analyze, design, and integrate current and emerging operational needs and capabilities to achieve desired mission outcomes”. This presentation presents the current mission engineering methodology, describes how it is currently being applied, and explores the role of T&E in the ME process. Mission engineering is applying systems engineering to missions – that is, engineering a system of systems, (including organizations, people and technical systems) to provide desired impact on mission or capability outcomes. Traditionally, systems of systems engineering focused on designing systems or systems of systems to achieve specified technical performance. Mission engineering goes one step further to assess whether the system of systems, when deployed in a realistic user environment, achieves the user mission or capability objectives. Mission engineering applies digital model-based engineering approaches to describe the sets of activities in the form of ‘mission threads’ (or activity models) needed to execute the mission and then adds information on players and systems used to implement these activities in the form of ‘mission engineering threads.’ These digital ‘mission models’ are then implemented in operational simulations to assess how well they achieve user capability objectives. Gaps are identified and models are updated to reflect proposed changes, including reorientation of systems and insertion of new candidate solutions, and which are assessed relative to changes in overall mission effectiveness.
Andrea Brown Add to Speakers
95 Roger Ghanem
Professor, University of Southern California
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Ghanem_Photo-150×150.jpg
Mini-Tutorial
Publish
1Advancing . . . Test & Evaluation of Emerging and Prevalent Technologies Introduction to Uncertainty Quantification
Uncertainty quantification (UQ) sits at . . . the confluence of data, computers, basic science, and operation. It has emerged with the need to inform risk assessment with rapidly evolving science and to bring the full power of sensing and computing to bear on its management. With this role, UQ must provide analytical insight into several disparate disciplines, a task that may seem daunting and highly technical. But not necessarily so. In this mini-tutorial, I will present foundational concepts of UQ, showing how it is the simplicity of the underlying ideas that allows them to straddle multiple disciplines. I will also describe how operational imperatives have helped shape the evolution of UQ and discuss how current research at the forefront of UQ can in turn affect these operations.
Kelli McCoy Add to Speakers
96 Shaffer
Research Staff Member, Insitute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Placeholder_Image-150×150.png
Presentation
Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies Meta-analysis of the SALIANT procedure for assessing team situation awareness
Many Department of Defense (DoD) systems . . . aim to increase or maintain Situational Awareness (SA) at the individual or group level. In some cases, maintenance or enhancement of SA is listed as a primary function or requirement of the system. However, during test and evaluation SA is examined inconsistently or is not measured at all. Situational Awareness Linked Indicators Adapted to Novel Tasks (SALIANT) is an empirically-based methodology meant to measure SA at the team, or group, level. While research using the SALIANT model suggests that it effectively quantifies team SA, no study has examined the effectiveness of SALIANT across the entirety of the existing empirical research. The aim of the current work is to conduct a meta-analysis of previous research to examine the overall reliability of SALIANT as an SA measurement tool. This meta-analysis will assess when and how SALIANT can serve as a reliable indicator of performance at testing. Additional applications of SALIANT in non-traditional operational testing domains will also be discussed.
Miriam Armstrong Add to Speakers
97 Miller
Research Staff Member, Institute for Defense Analyses
https://dataworks.testscience.org/wp-content/uploads/formidable/32/HS4-150×150.jpg
Presentation
Publish
2Sharing A . . .nalysis Tools, Methods, and Collaboration Strategies A preview of functional data analysis for modeling and simulation validation
Modeling and simulation (M&S) . . . validation for operational testing often involves comparing live data with simulation outputs. Statistical methods known as functional data analysis (FDA) provides techniques for analyzing large data sets (“large” meaning that a single trial has a lot of information associated with it), such as radar tracks. We preview how FDA methods could assist M&S validation by providing statistical tools handling these large data sets. This may facilitate analyses that make use of more of the data available and thus allows for better detection of differences between M&S predictions and live test results. We demonstrate some fundamental FDA approaches with a notional example of live and simulated radar tracks of a bomber’s flight.
Kelly Avery Add to Speakers
98 Upadhyay
Associate Professor -ECE, Florida International University
https://dataworks.testscience.org/wp-content/uploads/formidable/32/Hiimanshu_Upadhyay_Picture-150×150.webp
Presentation
No Publish
2 Generative AI -Large Language Model Introduction
Generative artificial intelligence (AI) . . . is a rapidly advancing field and transformative technology that involves the creation of new content. Generative AI encompasses AI models that produce novel data, information, or documents in response to the prompts. This technology has gained significant attention due to the emergence of models like DALL-E , Imagen, and ChatGPT. Generative AI excels in generating content across various domains. The versatility of Generative AI extends to generating text, software code, images, videos, and music by statistically analyzing patterns in training data. One of the most prominent applications of Generative AI is ChatGPT, developed by OpenAI. ChatGPT is a sophisticated language model trained on vast amounts of text data from diverse sources. It can engage in conversations, answer questions, write essays, generate code snippets, and more. Generative AI’s strengths lie in its ability to produce diverse and seemingly original outputs quickly. Large Language Models (LLMs) are advanced deep learning algorithms that can understand, summarize, translate, predict, and generate content using extensive datasets. These models work by being trained on massive amounts of data, deriving relationships between words and concepts, and then using transformer neural network processes to understand and generate responses. LLMs are widely used for tasks like text generation, translation, content summarization, rewriting content, classification, and categorization. They are trained on huge datasets to understand language better and provide accurate responses when given prompts or queries. The key algorithms used in LLMs include: • Word Embedding: This algorithm represents the meaning of words in a numerical format, enabling the AI model to process and analyze text data efficiently. • Attention Mechanisms: These algorithms allow the AI to focus on specific parts of input text, such as sentiment-related words, when generating an output, leading to more accurate responses. • Transformers: Transformers are a type of neural network architecture designed to solve sequence-to-sequence tasks efficiently by using self-attention mechanisms. They excel at handling long-range dependencies in data sequences. They learn context and meaning by tracking relationships between elements in a sequence. This presentation will focus on basics of large language models, algorithms, and applications to nuclear decommissioning knowledge management.
Rebecca Medlin Add to Speakers