Daniel Spokoyny
About
I am sixth year PhD student at CMU working with Taylor Berg-Kirkpatrick (permanently remote at San Diego).
My research focuses on modeling numerical quantities in unstructured texts, tables, long documents and spreadsheets.
Further I aim to apply numerically pretrained transformers to downstream applications to extract information from unstructured climate reports.
Previously, I studied computer science at the College of Creative Studies at UC Santa Barbara.
- NLP Tutorials for Climate Change AI Summer School 2023
- Numerical Correlation in Text
- Daniel Spokoyny and Chien-Sheng Wu and Caiming Xiong
- EMNLP Workshop on Mathematical Natural Language Processing 2022
- Towards Answering Climate Questionnaires from Unstructured Climate Reports
- ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English
- Daniel Spokoyny and Tanmay Laud and Tom Corringham and Taylor Berg-Kirkpatrick
- EMNLP Workshop on NLP for Positive Impact 2022
- Arxiv
- Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context
- Daniel Spokoyny and Ivan Lee and Zhao Jin and Taylor Berg-Kirkpatrick
- Findings at NAACL 2022
- BERT Classification of Paris Agreement Climate Action Plans
- Tom Corringham, Daniel Spokoyny, Eric Xiao, Christopher Cha, Colin Lemarchand, Mandeep Syal, Ethan Olson, Alexander Gershunov
- ICML 2021 Workshop on Tackling Climate Change with Machine Learning
- Paper/Slides/Talk
- An Empirical Investigation of Contextualized Number Prediction.
- Daniel Spokoyny and Taylor Berg-Kirkpatrick.
- EMNLP 2020
- Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
- Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick
- ICLR 2019. arxiv code
Professional Experience
- Research Intern, Salesforce Research, Palo Alto Summer 2022
- Defined a new numerical comprehension task of predicting the correlation relationship of quantities in text. Used crowd-workers to label and release a new dataset for this task.
- Research Project, Boeing, 2021
- Worked on a task of email thread summarization. Along with engineers and technicians we collected a dataset, implemented several pretraining strategies and evaluated the performance of the models.
- Research Intern, Microsoft Research, Virtual Summer 2020
- As a part of the MSR’s Deep-Excel team, worked on designing and training neural transformer architectures on self-supervised tasks using excel spreadsheet data and then finetuning the models for various downstream tasks.
- Research Intern, Google AI, Mountain View Summer 2019
- As a part of Google Search’s Web Answers team, developed and experimented with numerical regression pretraining and transfer learning methods to improve accuracy for a downstream question answering task.
Talks
- Applying Artificial Intelligence for Climate Action for Climate in HiTech Course at Afeka Tel Aviv 2022
- Poster Numerical Correlation in Text and ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English at SoCal NLP 2022
- Modeling Numerical Quantities to Extract Measurements from Climate Text Sources at Doctoral Consortium on Computational Sustainability 2022
- NLP For Climate Documents for AI for Climate Change Bootcamp at Stanford 2022
- Machine Learning Model Compression for Computing for the Cloud and Internet of Things Class at UCSB 2022
Teaching
- Co-Lead, Seminar on Training Large Language Models, UCSD (Spring 2023)
- Teaching Assistant, Undergraduate and Graduate Natural Language Processing, CMU (Fall 2021)
- Teaching Assistant, Undergraduate Machine Learning, UCSD (Winter 2019)
- Co-Instructor, Topics in Machine Learning, UCSB (Spring 2017)
- Instructor, Graduate Seminar on Safety in AI, UCSB (Fall 2016)
- Co-Instructor, Introduction to NLP and Machine Learning, UCSB (Winter 2016)
- Co-Instructor, Graduate Statistical Learning Theory Colloquium, UCSB (Winter 2015)
Volunteer
Participated
Email: dspokoyn@cs.cmu.edu
Office: UCSD CSE 3242