Utrecht, Netherlands

Data Science: Applied Text Mining

when 15 July 2024 - 19 July 2024
language English
duration 1 week
credits 1.5 EC
fee EUR 850

This course introduces the basic and advanced concepts and ideas in text mining and natural language processing. In this course, students will learn how to apply text mining methods on text data and analyse them in a pipeline with machine learning and deep learning algorithms. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from social sciences, humanities, and healthcare and interpreting the results.

Given the rapid rate at which text data are being digitally gathered in many domains of science, there is growing need for automated tools that can analyze, classify, and interpret this kind of data. Text mining techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. This course offers an extensive exploration into text mining with Python. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include preprocessing text, text classification, topic modeling, word embedding, deep learning models, and responsible text mining

The course deals with:

Review the fundamental approaches to text mining
Understand and apply current methods for analyzing texts
Define a text mining pipeline given a practical data science problem
Implement all steps in a text mining pipeline: feature extraction, feature selection, model learning, model evaluation
Understand and apply state-of-the-art methods in text mining
Implement word embedding and advanced deep learning techniques
The course starts with reviewing basic concepts of text mining and implementing advanced concepts in natural language processing. At the end of the week, participants will master advanced skills of text mining with Python.

Participants should have a basic knowledge and a motivation of scripting and programming in Python.

Participants are requested to bring their own laptop computer. Software will be available online

This course is part of a series of 5 courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics. Please see here for more information about the full specialisation. This course can also be taken separately.

Summer School Data Science specialisation:

Data science: Statistical Programming with R (S24)
Data science: Introduction to Text Mining with R (S41)
Data science: Multiple Imputation in Practice (S28)
Data science: Data analysis (S31)
Data science: Applied Text Mining (this course)
Upon completing 3 out of 5 courses in the specialisation (no more than one text mining course), students can obtain a certificate. Each course may also be taken separately.

Course leader

Dr. Ayoub Bagheri

Target group

This course works best for learners who are comfortable programming in Python, want to acquire skills in text mining approaches, and have a basic knowledge of machine learning.

Participants should also have a basic knowledge and a motivation of scripting and programming in Python. Participants from a variety of fields, including sociology, psychology, education, human development, marketing, business, biology, medicine, political science, and communication sciences, will benefit from the course. A maximum of 80 participants will be allowed in this course. Please note that the selection for this course will be done on a first-come-first-served basis.

Course aim

The course teaches students the basic and advanced text mining techniques using Python on a variety of applications in many domains of science.

The skills addressed in this course are:

Python environment;
Preprocessing text and feature extraction;
NLTK, Gensim, spaCy;
Text classification;
Sentiment classification;
Text clustering;
Topic modeling;
Word embedding;
CBOW vs Skip-gram;
Convolutional neural networks;
Recurrent neural networks;
Attention models;
Responsible text mining;
Text summarisation.

Fee info

EUR 850: Course + course materials
EUR 250: Housing fee (optional)

Register for this course
on course website