Name: Knowlege representation
- 1 Time, place, result
- 2 Assumed background
- 3 Focus
- 4 Books to use
- 5 Practical work
- 6 Lecture block 1: basics and representing simple facts.
- 7 Lecture block 2: capturing meaning in natural language
- 8 Lecture block 3: large common-sense knowledge bases
Time, place, result
Lectures: Wednesdays 12:00-13:30 room U06A-229
Practical work: Wednesdays 14:00-15:30, room ICT-121, ICT-122
Practical work will give 40% and exam 60% of points underlying the final grade. The exam will consist of several small excercises.
The first practical work time on 30. January at 14:00 will be used as a secord conventional lecture of the day.
Some of the last lecture times at the end of the course may be used as additional seminars/labs.
You should have studied the course "Basics of AI and Machine Learning" or get acquinted with the logic and probabilities parts of the course yourself.
The main focus of the course is on knowledge representation and reasoning (KR), on a spectrum from simple to very complex: representing and using knowledge in databases, sentences in natural language and commonsense knowledge.
The overall goal of the labwork to understand issues facing a task of building a natural language question answering system a la Watson and to see and experiment with existing parts of the solution.
The course contains the following blocks:
- Background and basics. Representing and using simple facts and rules.
- Knowledge conveyed by natural sentences: both statistical methods like word2vec and logic-based methods.
- General-knowledge databases: wikidata, wordnet, yago, conceptnet, nell, cyc.
- Reasoners and question answering systems.
- Context, time, location, events, causality.
- Different kinds and dimensions of uncertain knowledge.
- Indexes and search.
Check out this description of the whole KR area.
Books to use
- Recommended for NLP: Speech and Language Processing by Dan Jurafsky and James H. Martin. here is the 2nd edition and here the web page for the draft 3rd edition
- Ca half of the course themes are covered in this book with freely accessible pdf-s.
- The freely accesible pdf of the handbook if knowledge representation gives a detailed and thorough coverage of the subject; far more than necessary for the course.
Observe that a noticeable part of the course contents are not covered by these books: use the course materials and links to papers, standards and tutorials provided.
There are three labs. They are all steps in a single project: build a simple natural-language question-answering system.
The labs have to be presented to the course teachers and all students present at labwork time.
The labs can be prepared alone or by teams of two people. The first lab task will be given on 6. February.
First, read the explanation of the overall task. The following labs are steps on the path to the end goal.
Deadline: 14. March (after this there will be a penalty).
If you manage to fullfill the task earlier, start doing the second lab.
Please read about the task for the first lab
The task in the second lab is actually answering simple questions from the input text, using a small set of rules you write and a reasoner.
The third lab is optional and will simply give as many points as lab 1 or 2 towards the final result and the grade: practical work 60% and exam 40%.
The task in the third lab is the open-world scenario of answering questions based on wikipedia and using large downloaded rule sets a la yago, wordnet etc with a reasoner.
It is a plus to be able to give uncertain answers (likely, unlikely) and handle fuzzy properties like boy, girl, grownup, rich etc.
Lecture block 1: basics and representing simple facts.
Lectures 1 and 2: Overview of the course. Background and basics: SQL, logic, NLP
Lecture 2: RDF and RDFS (and OWL)
Lecture block 2: capturing meaning in natural language
Lecture 3: Intro to homework and NLP
- nlp and watson
- Jurafsky and Martin book: conversational agents
- start reading the Jurafsky and Martin book (see links above)
Lecture 4: vector representation of words
This lecture will be given by Priit Järv.
- First, listen to text vectorization episode from the linear digressions podcast.
- Second, listen to word2vec episode from the linear digressions podcast.
- Read the beginning (until you get tired ...) of the Vector Semantics chapter from the Jurafsky & Martin book.
Useful additional materials from (roughly) easier to more complex:
- word vectors: a short intro
- vector semantics part I presentation from the Jurafsky & Martin book for for chapter 6.
- vector representation: a short tutorial with code examples.
- word2vec tutorial from tensorflow
- vector semantics part II presentation from the Jurafsky & Martin book for chapter 6.
Also interesting to read:
- Aetherial Symbols by the ultimate machine learning guru aka "godfather of machine learning" Geoffrey Hinton
- Autocomplete using Markov chains
Lecture block 3: large common-sense knowledge bases
Lecture 5: First look into main large knowledge bases
We will have a look at the goals, main content and differences between: