Daniel Spokoyny

About
I am postdoc at UC San Diego co-leading an LLM security project on social engineering scams with Nikolai
collaborating with Stefan Savage and Geoffrey M. Voelker
I defended my PhD from CMU working with Taylor Berg-Kirkpatrick. During my PhD, I conducted core NLP research on modeling and reasoning over numerical quantities in text, including developing architectures for continuous number prediction, jointly modeling quantities and units, and evaluating to what extent existing transformer models can handle novel numerically focused tasks involving correlation and measurement understanding. I also explored the application of language models to the climate domain, introducing benchmarks for classifying and analyzing unstructured climate documents, aligning national plans with sustainability goals, and leveraging semi-structured climate questionnaires as weak supervision for improved transfer learning to real-world climate texts.
Previously, I studied computer science at the College of Creative Studies at UC Santa Barbara and was advised by Murat Karaorman, Fermin Moscoso del Prado Martin, and William Wang.
- Aligning Unstructured Paris Agreement Climate Plans with Sustainable Development Goals
- Daniel Spokoyny, Janelle Cai, Tom Corringham, and Taylor Berg-Kirkpatrick.
- ACL 2024 Workshop on Natural Language Processing Meets Climate Change
- NLP Tutorials for Climate Change AI Summer School 2023
- Numerical Correlation in Text
- Daniel Spokoyny and Chien-Sheng Wu and Caiming Xiong
- EMNLP Workshop on Mathematical Natural Language Processing 2022
- Towards Answering Climate Questionnaires from Unstructured Climate Reports
- ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English
- Daniel Spokoyny and Tanmay Laud and Tom Corringham and Taylor Berg-Kirkpatrick
- EMNLP Workshop on NLP for Positive Impact 2022
- Arxiv
- Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context
- Daniel Spokoyny and Ivan Lee and Zhao Jin and Taylor Berg-Kirkpatrick
- Findings at NAACL 2022
- BERT Classification of Paris Agreement Climate Action Plans
- Tom Corringham, Daniel Spokoyny, Eric Xiao, Christopher Cha, Colin Lemarchand, Mandeep Syal, Ethan Olson, Alexander Gershunov
- ICML 2021 Workshop on Tackling Climate Change with Machine Learning
- Paper/Slides/Talk
- An Empirical Investigation of Contextualized Number Prediction.
- Daniel Spokoyny and Taylor Berg-Kirkpatrick.
- EMNLP 2020
- Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
- Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick
- ICLR 2019. arxiv code
Professional Experience
- Research Intern, Salesforce Research, Palo Alto Summer 2022
- Defined a new numerical comprehension task of predicting the correlation relationship of quantities in text. Used crowd-workers to label and release a new dataset for this task.
- Research Project, Boeing, 2021
- Worked on a task of email thread summarization. Along with engineers and technicians we collected a dataset, implemented several pretraining strategies and evaluated the performance of the models.
- Research Intern, Microsoft Research, Virtual Summer 2020
- As a part of the MSR’s Deep-Excel team, worked on designing and training neural transformer architectures on self-supervised tasks using excel spreadsheet data and then finetuning the models for various downstream tasks.
- Research Intern, Google AI, Mountain View Summer 2019
- As a part of Google Search’s Web Answers team, developed and experimented with numerical regression pretraining and transfer learning methods to improve accuracy for a downstream question answering task.
Talks
- Turning Climate Reports into Climate Action with AI at Voices of Data Science Symposium at UMass Amherst 2025
- ClimaBench and ClimaQA at the NLP for Climate Adaptation Workshop at Wageningen University & Research 2024
- Applying Artificial Intelligence for Climate Action for Climate in HiTech Course at Afeka Tel Aviv 2022
- Poster Numerical Correlation in Text and ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English at SoCal NLP 2022
- Modeling Numerical Quantities to Extract Measurements from Climate Text Sources at Doctoral Consortium on Computational Sustainability 2022
- NLP For Climate Documents for AI for Climate Change Bootcamp at Stanford 2022
- Machine Learning Model Compression for Computing for the Cloud and Internet of Things Class at UCSB 2022
Teaching
- Co-Lead, Seminar on Training Large Language Models, UCSD (Spring 2023)
- Teaching Assistant, Undergraduate and Graduate Natural Language Processing, CMU (Fall 2021)
- Teaching Assistant, Undergraduate Machine Learning, UCSD (Winter 2019)
- Co-Instructor, Topics in Machine Learning, UCSB (Spring 2017)
- Instructor, Graduate Seminar on Safety in AI, UCSB (Fall 2016)
- Co-Instructor, Introduction to NLP and Machine Learning, UCSB (Winter 2016)
- Co-Instructor, Graduate Statistical Learning Theory Colloquium, UCSB (Winter 2015)
Volunteer
Participated
Email: dspokoyny@ucsd.edu
Office: UCSD CSE 3242