A Humorous Introduction to Natural Language Processing
Area
Date
Place
A Humorous Introduction to Natural Language Processing is a mini-course which took place online on 14-25 November. Our lead expert for the course was Pavel Braslavski, Associate Professor and Senior Research Fellow, Faculty of Computer Science, HSE University.
The mini-course is a brief introduction to natural language processing (NLP), with the task of funny news title generation as a working example. The course is an introduction to basic NLP concepts and approaches, which provides hands-on experience in working with various NLP tools. As part of the course, we also became acquainted with subtasks such as:
- tokenization,
- POS-tagging,
- semantic distance,
- sentiment analysis,
- edit distance,
- evaluation of results.
The mini-course consisted of two parts: theory and practice. The theory part featured 5 lectures and hands-on assignments, and the practice part focused on individual project work and included technical support sessions guided by teaching assistants from the Exactpro team.
Prerequisites: basic Python skills, Google account to work in Colab
Theory dates/time: 14-18 November, 18:00 GET/19:30 SLST
Practice dates/time: 21-25 November, TBA
Exactpro teaching assistants: Stanislav Glushkov, DocOps Engineer; Tornike Baramidze, QA Analyst; Julia Emelianova, Researcher.
The full mini-course agenda featured the following topics:
- Natural Language Processing: a very short introduction. Computational Humor. Task and Data Used in the Mini-course.
- Tokenization and Part-Of-Speech (POS) Tagging. Stanza Package.
- Semantic Resources: WordNet. Word embeddings: fastText.
- Similarly Sounding/Spelled words: soundex, Levenshtein distance. Datamuse API. Evaluation in NLP, a Measure for Inter-Rater Agreement: Cohen's Kappa.
- Sentiment Analysis: Sentiment dictionaries, Sentiment classifiers.
You can also check out our Python4ML Bootcamp. The bootcamp aims to introduce you to Python programming basics and reviews various beginner-level tasks that typically need to be solved for data science and data analysis purposes.