Friday, April 21, 2017

Lecture 8: syntactic parsing (1/2)

Introduction to syntax. Context-free grammars and languages. Treebanks. Normal forms. Dependency grammars. Syntactic parsing: top-down and bottom-up. Structural ambiguity. Backtracking vs. dynamic programming for parsing. The CKY algorithm. Neural transition-based dependency parsing.

Friday, April 7, 2017

Lecture 7: part-of-speech tagging

Introduction to part-of-speech (POS) tagging. POS tagsets: the Penn Treebank tagset and the Google Universal Tagset. Rule-based POS tagging. Stochastic part-of-speech tagging. Hidden markov models. Deleted interpolation. Linear and logistic regression: Maximum Entropy models. Transformation-based POS tagging. Handling out-of-vocabulary words. The Stanford POS tagger. Neural POS tagging with bidirection LSTMs. Presentation of homework 2.


Friday, March 31, 2017

Lecture 6: deep learning; intro to part of speech tagging

Recurrent Neural Networks and Long-Short Term Memory networks. Practical session on character-based LSTMs with Keras. Introduction to part-of-speech tagging.


Monday, March 27, 2017

Lecture 5: practical session on Keras; more on NNs for NLP; word embeddings

Practical session on Keras. More on NNs for NLP: hierarchical softmax; negative sampling. Vector representations. Word2vec. Word embeddings and their properties.


Friday, March 17, 2017

Lecture 4: language modeling (2); neural networks and NLP

We discussed perplexity and its close relationship with entropy, we introduced smoothing and interpolation techniques to deal with the issue of data sparsity. Practical session on language modeling with Python and the Berkeley LM toolkit.

Friday, March 10, 2017

Lecture 3: morphological analysis: practical session; homework 1; language modeling (1)

We had a practical session on morphological analysis in Python and Java. We reviewed basic probability concepts. introduced N-gram models (unigrams, bigrams, trigrams), together with their probability modeling and issues.

We also discussed homework 1 (see post on the class group).

Friday, March 3, 2017

Lecture 2: intro (2); morphological analysis

We introduced words and morphemes. Before delving into morphology and morphological analysis, we introduced regular expressions as a powerful tool to deal with different forms of a word. We then introduced recent work on morphological analysis based on machine learning: unsupervised (Morfessor) and supervised (based on CRFs).


Saturday, February 25, 2017

Lecture 1: Introduction to NLP

We gave an introduction to the course and the field it is focused on, i.e., Natural Language Processing, with a focus on the Turing Test as a tool to understand whether "machines can think". We also discussed the pitfalls of the test, including Searle's Chinese Room argument.


Thursday, January 19, 2017

Ready, steady, go!

Welcome to the Sapienza NLP course blog! This year there will be important changes: first, projects will be lightweight for attending students; second, homeworks will be part of the final project (in this respect, attending students will complete more than 50% of their projects before the end of the course); third, the class will be updated on the newest trends in neural networks; fourth: this year the (class) project will be... the development of an intelligent chatbot working on Telegram!
IMPORTANT: The 2017 class hour schedule will be on Fridays 2.30pm-5.45pm. Please sign up to the NLP class!


Friday, May 27, 2016

Lecture 12: statistical machine translation

Introduction to Machine Translation. Rule-based vs. Statistical MT. Statistical MT: the noisy channel model. The language model and the translation model. The phrase-based translation model. Learning a model of training. Phrase-translation tables. Parallel corpora. Extracting phrases from word alignments. Word alignments

IBM models for word alignment. Many-to-one and many-to-many alignments. IBM model 1 and the HMM alignment model. Training the alignment models: the Expectation Maximization (EM) algorithm. Symmetrizing alignments for phrase-based MT: symmetrizing by intersection; the growing heuristic. Calculating the phrase translation table. Decoding: stack decoding. Evaluation of MT systems. BLEU.