Do liars use specific words and sentence structures more often than people who tell the truth? Can a computer judge the quality of a text and make suggestions for improvement? How can a publisher quickly check if a manuscript contains plagiarism?

These are just a few questions that can be answered using ‘text analytics’, an umbrella term for various processes for extracting high-quality information from text, mostly relying on modern computational techniques.

By the end of this course:

The student is able to work with various applications of text analytics, ranging from simple tools for searching text with regular expressions (e.g., to find all words that end in ‘-ish’) and calculating type-token ratios (e.g., to explore lexical richness) to more sophisticated techniques that allow document comparison (e.g., to determine the semantic similarity between two or more texts) and automated syntactic parsing (e.g., to determine whether ‘brief’ is an adjective or a verb in a sentence).

The student will understand and be able to reflect on the possibilities and limitations of these applications.

The student will be familiar with the properties of several relevant open-access text corpora and be able to use these corpora for answering simple research questions.

The student will be able to prepare (or ‘pre-process’) a text for a specific computational analysis.

The student will be able to write a text analytics report and present and visualize text analytics results.

The course consists of one 2-hour meeting each week for 7 weeks. These will be a mix of lectures and practicals, in which the student will gain hands-on experience with several analytical tools (including, but not limited to those described above) and text corpora.

Attendance to the meetings is obligatory and active participation is required. The grade will be based on several assignments during the course (e.g., a 5-minute presentation on a software survey) and a small individual research project. Depending on the number of students in the course, some of the assignments will be done in small groups.

Students will have to bring their own laptops to the meetings.

