Announcement Poster (PDF)

Daily Schedule

  First week
9:00–12:30 Embeddings in NLP (Eva-Maria Vecchi, LMU Munich)
14:00–17:30 Reflected Text Analysis beyond Linguistics (Nils Reiter, University of Stuttgart)
  Second week
9:00–12:30 Ethics in NLP (Thomas Arnold, TU Darmstadt)
14:00–17:30 Syntactic Structures Can’t Be Just Anything (Meaghan Fowlie, Utrecht University)

Evening Lectures

Social Activities

Course Abstracts

Embeddings in Natural Language Processing (Eva-Maria Vecchi)

In this course, we will discuss the design and construction of embeddings in Natural Language Processing tasks. In particular, we will explore the trend in NLP to capture and use semantic embeddings to accurately approximate a human’s ability to understand words and phrases. The course will emphasize both theory and practice, providing a strong background to motivate the use of various techniques, as well as an apt environment to get your hands dirty with real-world implementations.

The course will motivate the use of embeddings. We will provide a background in how we can define meaning in language use, then explore the mathematical structures necessary to model this information within a computationally practical framework. The course will focus on the use, flexibility, and limitations of these structures in NLP tasks, and how best to ground and assess these models.

This course aims to provide an opportunity to study the state of the art in the field of Computational Semantics, specifically Distributional Semantics. We will discuss and experiment with various types of embeddings, from simple count-based representations to neural network implementations, and explore a range of NLP tasks tackled using these methods and discuss future directions left unexplored.

Reflected Text Analysis beyond Linguistics (Nils Reiter)

Computational linguistics tools, methods and ideas are increasingly used in other scientific areas. This class looks at digital humanities (DH) in particular, and discusses how and with what purpose CL methods can and should be applied within DH.

The class covers first the creation of reference data. Reference data to be used in quantitative scenarios needs to be annotated coherently, which usually presumes an inter-subjective agreement on the annotation categories. We will look at ways of establishing annotation guidelines during the annotation process. Secondly, the created reference data is used in machine learning scenarios. This section contains discussion of several (basic) ML algorithms, and their use in a shared task-like scenario. We will also talk about ways of harvesting large, out-of-domain data sets (domain adaptation). The third section will be the evaluation and critically reflected interpretation of the results, and their integration into other research workflows from the humanities.

The class will be a mixture of lectures and exercises, and does not presuppose previous knowledge in computational linguistics. The class is also open to students from the (digital) humanities with an interest in reflected text analytics.

Ethics in Natural Language Processing (Thomas Arnold)

Machine Learning and Natural Language technologies are integrated in more and more aspects of our life. Therefore, the decisions we make about our methods and data are closely tied up with their impact on our world and society. In this course, we present real-world, state-of-the-art applications of natural language processing and their associated ethical questions and consequences. We also discuss philosophical foundations of ethics in research.

Core topics of this course:

Syntactic Structures Can’t Be Just Anything (Meaghan Fowlie)

How do people automatically figure out the structure of a sentence just by hearing it?

What kinds of mistakes can people make in trying to figure out the structure, and why?

This course will explore these questions from a formal perspective, by examining the mathematical properties of human language syntax, including the complexity of human language, and parsing algorithms for context free and mildly context sensitive language families.