Title | Computer-based Content Analysis – Text |
Type of course | Lecture and practical exercises |
Level | Bachelor, Ph.D. |
ECTS points | 6 |
SWS | 2 |
Language | English |
Time and place of lectures | Thursdays, 13:45–15:15, Schloss Ehrenhof West - EW 163 |
Lecturer | |
Assisted computer pool times | Wednesdays, 15:00–17:00, PI-Pool |
! Attention ! The room changed. From now on, the lecture will take place in Schloss Ehrenhof West - EW 163.
New: Assisted Computer Pool time. See above.
First appointment:
- 08.09.2011 (lecture)
Preliminaries:
- Foundations of linear algebra and probability theory (high school level)
- Computer skills that allow to get familiar with complex applications fast
Grading is based on:
- Implementation of a project
- Final presentation
- Report (~ 15 pages)
Attendance Modalities (new!):
- Lecture part: attendance *voluntary*
- Project presentation part: attendance *mandatory*
For more details attend to the first lecture on Thursday, 08.09.2011.
Content of the Lecture
The course presents methods for the computer assisted automatic analysis of digital documents as a basis for further quantitative content analyses used in social and cultural sciences.
In the beginning we will present some possible analyses computational linguistics can offer to social and cultural sciences using the software GATE. This is followed by a short programming course in the Python programming language introducing a more flexible way of preprocessing texts and also access to text data through web crawling and conversion of different file formats. Before the break more advanced methods on text classification and clustering are presented along with more tools that can be used. In the second part of the course participants will present their own project work to each other.
Dates and Topics
Date | Topic | Material (PDF) | Exercises (PDF) |
|---|---|---|---|
Introduction | |||
08.09 | Overview & Goals Introduction to Named Entity Recognition & GATE | – | |
15.09. | Regular Expressions & JAPE | ||
Programming with Python | |||
22.09. | Introduction to Python | ||
29.09. | Introduction to Python II | ||
06.10 | Text preprocessing with NLTK | ||
13.10 | Crawling Websites & Document Conversion | ||
Diving into Theory & Tools | |||
20.10. | Information Retrieval | ||
27.10 | Text Classification & Machine Learning Rapidminer | ||
03.11. | Project Assignments | – | |
Project Work | |||
10.11 & 17.11 | Project Time without Lectures | ☕ | ☕ |
24.11 | Presentations | Felix Lorenz Christopher Markert (Slides) | – |
1.12. | Presentations | Linda Gierich & Judith Klingenstein Seyhan Özkan Markus Baumann
| – |
8.12. | Presentations | Simone Krug Yipeng Liu Matthias Haber | – |
Reading recommendations
Description | Title |
|---|---|
Application of NLP methods (NLTK) | |
Application of NLP methods | Automated Discovery and Analysis of Social Networks from Threaded Discussions |
Exercises
We will hand out (non-mandatory) exercises that will help you understand the presented technology and methods. We strongly suggest that you take the time to work on them. In our experience hands-on exercises make it a lot easier to follow a course like this.

