Friday, May 27, 2016

Lecture 12: statistical machine translation

Introduction to Machine Translation. Rule-based vs. Statistical MT. Statistical MT: the noisy channel model. The language model and the translation model. The phrase-based translation model. Learning a model of training. Phrase-translation tables. Parallel corpora. Extracting phrases from word alignments. Word alignments

IBM models for word alignment. Many-to-one and many-to-many alignments. IBM model 1 and the HMM alignment model. Training the alignment models: the Expectation Maximization (EM) algorithm. Symmetrizing alignments for phrase-based MT: symmetrizing by intersection; the growing heuristic. Calculating the phrase translation table. Decoding: stack decoding. Evaluation of MT systems. BLEU.

Saturday, May 21, 2016

Lecture 11: semantic parsing (2), AMR, research in Rome

Unsupervised semantic parsing, semi-supervised semantic parsing, Abstract Meaning Representation (AMR). NLP research in Rome.

Friday, May 13, 2016

Lecture 10: semantic role labeling and semantic parsing

PropBank, FrameNet, semantic role labeling. Introduction to semantic parsing. Presentation of the projects.

Friday, May 6, 2016

Lecture 9: Neural Networks, word embeddings and deep learning

Motivation. The perceptron. Input encoding, sum and activation functions; objective function. Linearity of the perceptron. Neural networks. Training. Backpropagation. Connection to Maximum Entropy. Connection to language. Vector representations. NN for the bigram language model. Word2vec: CBOW and skip-gram. Word embeddings. Deep learning. Language modeling with NN. The big picture.

Tuesday, May 3, 2016

Lecture 8: Entity Linking

Entity Linking. Main approaches. AIDA, TagMe, Wikifier, DBpedia spotlight, Babelfy. The MASC annotated corpus. Demo di sistemi di WSD e Entity Linking. Introduction to Neural Networks.

Friday, April 22, 2016

Lecture 7: Word Sense Disambiguation

Introduction to Word Sense Disambiguation (WSD). Motivation. The typical WSD framework. Lexical sample vs. all-words. WSD viewed as lexical substitution and cross-lingual lexical substitution. Knowledge resources. Representation of context: flat and structured representations. Main approaches to WSD: Supervised, unsupervised and knowledge-based WSD. Two important dimensions: supervision and knowledge. Supervised Word Sense Disambiguation: pros and cons. Vector representation of context. Main supervised disambiguation paradigms: decision trees, neural networks, instance-based learning, Support Vector Machines. Unsupervised Word Sense Disambiguation: Word Sense Induction. Context-based clustering. Co-occurrence graphs: curvature clustering, HyperLex. Knowledge-based Word Sense Disambiguation. The Lesk and Extended Lesk algorithm. Structural approaches: similarity measures and graph algorithms. Conceptual density. Structural Semantic Interconnections. Evaluation: precision, recall, F1, accuracy. Baselines.

Saturday, April 9, 2016

Lecture 6: computational semantics

Introduction to computational semantics. Syntax-driven semantic analysis. Semantic attachments. First-Order Logic. Lambda notation and lambda calculus for semantic representation. Lexicon, lemmas and word forms. Word senses: monosemy vs. polysemy. Special kinds of polysemy. Computational sense representations: enumeration vs. generation. Graded word sense assignment. Encoding word senses: paper dictionaries, thesauri, machine-readable dictionary, computational lexicons. WordNet. Wordnets in other languages. BabelNet.

Friday, April 1, 2016

Lecture 5: syntax

Introduction to syntax. Context-free grammars and languages. Treebanks. Normal forms. Dependency grammars. Syntactic parsing: top-down and bottom-up. Structural ambiguity. Backtracking vs. dynamic programming for parsing. The CKY algorithm. The Earley algorithm. Probabilistic CFGs (PCFGs). PCFGs for disambiguation: the probabilistic CKY algorithm. PCFGs for language modeling. Demo: The Stanford Dependency parser.

Saturday, March 19, 2016

Lecture 4: Part-of-Speech Tagging

Introduction to part-of-speech (POS) tagging. POS tagsets: the Penn Treebank tagset and the Google Universal Tagset. Rule-based POS tagging. Stochastic part-of-speech tagging. Hidden markov models. Deleted interpolation. Linear and logistic regression: Maximum Entropy models. Transformation-based POS tagging. Handling out-of-vocabulary words. The Stanford POS tagger.

Friday, March 11, 2016

Lecture 3: language modeling

We introduced N-gram models (unigrams, bigrams, trigrams), together with their probability modeling and issues. We discussed perplexity and its close relationship with entropy, we introduced smoothing and interpolation techniques to deal with the issue of data sparsity. The KYOTO and Berkley Language Model toolkits.

We also discussed the homework 1 in more detail (see slides on the class group).