DATAWorks Speakers and Abstracts

Dr. Karl Pazdernik

Pacific Northwest National Laboratory
Text Analysis: Introduction to Advanced Language Modeling
Day 1, Room: D 9:00 AM-4:00 PM

Dr. Karl Pazdernik is a senior data scientist at Pacific Northwest National Laboratory. He is also a research assistant professor at North Carolina State University (NCSU) and the former chair of the American Statistical Association Section on Statistics in Defense and National Security. His research has focused on the dynamic modeling of multi-modal data with a particular interest in text analytics, spatial statistics, pattern recognition, anomaly detection, Bayesian statistics, and computer vision. Recent projects include natural language processing of multilingual unstructured financial data, anomaly detection in combined open-source data streams, automated biosurveillance and disease forecasting, and deep learning for defect detection and element mass quantification in nuclear materials. He received a Ph.D. in Statistics from Iowa State University and was a postdoctoral scholar at NCSU under the Consortium for Nonproliferation Enabling Capabilities.

Abstract: Text Analysis: Introduction to Advanced Language Modeling

This course will provide a broad overview of text analysis and natural language processing (NLP), including a significant amount of introductory material with extensions to state-of-the-art methods. All aspects of the text analysis pipeline will be covered including data preprocessing, converting text to numeric representations (from simple aggregation methods to more complex embeddings), and training supervised and unsupervised learning methods for standard text-based tasks such as named entity recognition (NER), sentiment analysis, topic modeling, and text generation using Large Language Models (LLMs). The course will alternate between presentations and hands-on exercises in Python. Translations from Python to R will be provided for students more comfortable with that language. Attendees should be familiar with Python (preferably), R, or both and have a basic understanding of statistics and/or machine learning. Attendees will gain the practical skills necessary to begin using text analysis tools for their tasks, an understanding of the strengths and weaknesses of these tools, and an appreciation for the ethical considerations of using these tools in practice.

Session Materials: https://dataworks.testscience.org/wp-content/uploads/formidable/23/Text_Analysis_DATAWorks_2024.pdf