Teadmiste formaliseerimine 2019
Name: Knowlege representation |
Sisukord
- 1 NB! This is an archive from 2019, not actual course contents
- 2 Exam results
- 3 Exam
- 4 Time, place, result
- 5 Assumed background
- 6 Focus
- 7 Books to use
- 8 Practical work
- 9 Lecture block 1: basics and representing simple facts.
- 10 Lecture block 2: capturing meaning in natural language
- 11 Lecture block 3 and start of 4: large common-sense knowledge bases and reasoning
- 12 Lecture block 5: context and uncertainty
- 12.1 Lecture 9: intro to representing context and uncertainty
- 12.2 Lecture 10: various classical ideas for discrete reasoning with uncertainty
- 12.3 Lecture 11: various classical ideas for numeric reasoning with uncertainty
- 12.4 Lecture 12: seminar-kind of work on the second lab
- 12.5 Lecture 13: seminar-kind of work on the third lab
NB! This is an archive from 2019, not actual course contents
Exam results
Siin on lõpptulemus kolmanda eksami järel: Teadmiste formaliseerimine lõpptulemus 2019 koos praktikumipunktide ja hindega.
Milles seisnes vahepealne parandus: praktikumipunktide kokkuliitmisel oli kaks erinevat interpretatsiooni: kas kolmas praks on osa summast 0..40 või on ta puhas lisand esimese kahe praksi summale. Esialgne hinne tulenes esimesest interpretatsioonist, parandatud hinne teisest. Selle tulemusena mitmed hinded tõusid.
Viimane eksam toimub kolmapäeval, 12. juunil kell 12:00 IT majas ruumis ICT-A2.
Siin on esimese eksami tulemused (praktikumipunkte ei ole lisatud): Teadmiste formaliseerimine esimene eksam 2019
Kellel alla 30 punkti, palun eksam uuesti teha. Igaühel on võimalik teha eksamit kaks korda.
Exam
All exam times on Wednesdays at 12:00 (lecture timeslot):
- 22 May 12:00 in the economics building SOC-211A
- 29 May 12:00 in the economics building SOC-213
- 12 June 12:00 in the IT building ICT-A2
Exam is in written form, no materials can be used. Exam will last 3 hours, but most likely you can finish in ca 1.5 hours.
Here are notes as What will be asked in the ITI8700 exam including materials to study.
Time, place, result
Lectures: Wednesdays 12:00-13:30 room U06A-229
Practical work: Wednesdays 14:00-15:30, room ICT-121, ICT-122
Practical work will give 40% and exam 60% of points underlying the final grade.
The exam will consist of several small excercises.
The first practical work time on 30. January at 14:00 will be used as a secord conventional lecture of the day.
Some of the last lecture times at the end of the course may be used as additional seminars/labs.
Assumed background
You should have studied the course "Basics of AI and Machine Learning" or get acquinted with the logic and probabilities parts of the course yourself.
In particular, it is useful to read these course materials and exercises: logic AIMA book, wumpus world in AIMA book, uncertainty in AIMA book, probability models in AIMA book, prolog lab, bayes lab
Focus
The main focus of the course is on knowledge representation and reasoning (KR), on a spectrum from simple to very complex: representing and using knowledge in databases, sentences in natural language and commonsense knowledge.
The overall goal of the labwork to understand issues facing a task of building a natural language question answering system a la Watson and to see and experiment with existing parts of the solution.
The course contains the following blocks:
- Background and basics. Representing and using simple facts and rules.
- Knowledge conveyed by natural sentences: both statistical methods like word2vec and logic-based methods.
- General-knowledge databases: wikidata, wordnet, yago, conceptnet, nell, cyc.
- Reasoners and question answering systems.
- Context, time, location, events, causality.
- Different kinds and dimensions of uncertain knowledge.
- Indexes and search.
Check out this description of the whole KR area.
Books to use
- Recommended for NLP: Speech and Language Processing by Dan Jurafsky and James H. Martin. here is the 2nd edition and here the web page for the draft 3rd edition
- Ca half of the course themes are covered in this book with freely accessible pdf-s.
- The freely accesible pdf of the handbook if knowledge representation gives a detailed and thorough coverage of the subject; far more than necessary for the course.
Observe that a noticeable part of the course contents are not covered by these books: use the course materials and links to papers, standards and tutorials provided.
Practical work
There are three labs. They are all steps in a single project: build a simple natural-language question-answering system.
The labs have to be presented to the course teachers and all students present at labwork time.
The labs can be prepared alone or by teams of two people. The first lab task will be given on 6. February.
First, read the explanation of the overall task. The following labs are steps on the path to the end goal.
NB! You have to register for the lab like this:
First, go to
Search for: ITI8700
Log in with the "UNI ID".
Enroll on the course page found: Click "enrol me" button.
Second, go to http://gitlab.cs.ttu.ee and log in
Click on "Create a project"
Name of the project must be exactly that: iti8700-2019
Visibility level: private
Upload you code, presentation, examples etc you use for the presentation.
First lab
Deadline: 14. March (after this there will be a penalty).
If you manage to fullfill the task earlier, start doing the second lab.
Please read about the task for the first lab
Second lab
Deadline 15. May. You have to give a presentation latest at 15. May labtime (better at 8. May).
The task in the second lab is actually answering simple questions from the input text, using a small set of rules you write and a reasoner.
Please start by experimenting with a real prover: look and run Reasoner examples with gkc
We use the gkc prover for examples (instructions are in the Reasoner examples ... page above) but you could also try out an old classic prover Otter which actually works OK on windows even though it is really old.
Make sure to get the latest release of gkc from gkc releases (currently delta) and either download a binary or compile using instructions on the gkc github page. The basic instructions are also on the gkc github page. Also, see the few examples in the Examples folder.
You can use a trivial NLP-to-reasoner parser in python nlp.py as a starting point.
Have a look at the Lecture 12 materials.
Interesting large question-answering datasets and challenge problems: Stanford SQUAD, Allen institute arc2 challenge, Google natural questions, Fujitsu NLP challenge, amazon qa dataset
Third lab
Deadline 15. May. You have to give a presentation latest at 15. May labtime (better at 8. May).
The third lab is optional and will simply give as many points as lab 1 or 2 towards the final result and the grade: practical work 60% and exam 40%.
The task in the third lab is the open-world scenario of answering questions based on wikipedia and using large downloaded rule sets a la yago, wordnet etc with a reasoner.
It is a plus to be able to give uncertain answers (likely, unlikely) and handle fuzzy properties like boy, girl, grownup, rich etc.
Have a look at the Lecture 13 materials
Lecture block 1: basics and representing simple facts.
Lectures 1 and 2: Overview of the course. Background and basics: SQL, logic, NLP
Lecture materials:
Lecture 2: RDF and RDFS (and OWL)
Lecture materials:
Lecture block 2: capturing meaning in natural language
Lecture 3: Intro to homework and NLP
Lecture materials:
- nlp and watson
- Jurafsky and Martin book: conversational agents
- start reading the Jurafsky and Martin book (see links above)
Lecture 4: vector representation of words
This lecture will be given by Priit Järv.
Lecture materials:
- First, listen to text vectorization episode from the linear digressions podcast.
- Second, listen to word2vec episode from the linear digressions podcast.
- Read the beginning (until you get tired ...) of the Vector Semantics chapter from the Jurafsky & Martin book.
Useful additional materials from (roughly) easier to more complex:
- word vectors: a short intro
- vector semantics part I presentation from the Jurafsky & Martin book for for chapter 6.
- vector representation: a short tutorial with code examples.
- word2vec tutorial from tensorflow
- vector semantics part II presentation from the Jurafsky & Martin book for chapter 6.
Probabilistic models:
Also interesting to read:
- Aetherial Symbols by the ultimate machine learning guru aka "godfather of machine learning" Geoffrey Hinton
- Autocomplete using Markov chains
Lecture block 3 and start of 4: large common-sense knowledge bases and reasoning
Lecture 5: First look into main large knowledge bases
We will have a look at the goals, main content and differences between:
- wordnet
- dbpedia
- yago
- babelnet
- conceptnet
- nell in wiki and a good paper. The nell website is currently down.
- cyc
Lecture 6: Big annotation systems and intro to rule reasoners
First, big annotation systems:
- Google structured data on the webpage.
- Facebook Open Graph markup on the webpage.
In connection, see also:
- schema.org: property markup vocabulary suggested by Google, Microsoft and others.
- json-ld: currently most popular rdf syntax (in json), also recommended by Google.
Second, intro to rule reasoners:
- Lecture material as ppt or as pdf
Additionally you may want to look at:
Lecture 7: rule reasoners
Please have a look and run the same experiments yourself with the gkc prover:
We use the gkc prover for examples (instructions are in the Reasoner examples ... page above) but you could also try out an old classic prover Otter which actually works OK on windows even though it is really old.
Lecture 8: starting to parse natural language for interaction with rule reasoners
We will start building a tiny question-answering tool, translating natural language to rule reasoner input.
Our first, very simple goal: translate the text "John is a father of Pete. Pete is a father of Mark. Who is the father of Pete?" to a rule reasoner input
father(john,pete). father(pete,mark). -father(X,pete) | ans(X).
then run the reasoner, fetch the ans(john) and give an answer "John is".
You can use a trivial NLP-to-reasoner parser in python nlp.py as a starting point.
We will then proceed to gradually make questions and sentences more complex and add rules to be able to answer them.
In the final part of the course we will start looking at uncertain reasoning: right now we will only try to do simple strict reasoning.
We will have a look at potential tools for semantic parsing, from simpler to more complex:
- parsetron: a really simple and very limited tool to build your own parser.
- sippycup: has a detailed tutorial/course attached, check the first lab
- ccg2lambda: parses to classical logic strings, which are embedded in XML output.
Other interesting things to look at:
- Framenet: map meaning to form in English through the theory of Frame Semantics.
- Sling: Google project for getting facts from Wikipedia using Frame Semantics.
- Spider: Yale Semantic Parsing and Text-to-SQL Challenge.
- WikiSQL: A Salesforce crowd-sourced dataset for developing NLP interfaces for relational databases.
- Allennlp from the Allen Institute for AI. See also their semantic parsing tutorial.
Lecture block 5: context and uncertainty
Lecture 9: intro to representing context and uncertainty
Blocks world.
Some axiom sets:
And some planning problems to solve from these axiom sets:
Also:
Next, starting uncertainty.
We will first look at default logic.
Lecture 10: various classical ideas for discrete reasoning with uncertainty
We will look at some discrete ways:
- default logic: refresher from previous lecture.
- circumscription
- Autoepistemic logic: long tutorial Ijcai93.pdf
And start with numeric:
- Uncertain_prob_fuzzy.ppt Intro to Bayesian and fuzzy reasoning.
- Notes about numerical confidence, popularity, etc measures
Lecture 11: various classical ideas for numeric reasoning with uncertainty
Classical numeric ways to reason about uncertainty:
- Uncertain_prob_fuzzy.ppt Intro to both Bayesian and Fuzzy reasoning.
Some newer approaches:
- Vienna_tanel_4.pdf Additional examples and combining.
- Problog start with a tutorial
Lecture 12: seminar-kind of work on the second lab
Background to start with:
- Take and unpack gkc.zip from gkc release gamma
- Start with the example file p1.txt
father(john,pete). % either the first or a second or both facts are true: father(pete,mark). % equivalent to (father(X,Y) & father(Y,Z)) => grandfather(X,Z). -father(X,Y) | -father(Y,Z) | grandfather(X,Z). -grandfather(john,X) | ans(X).
- Run
gkc p1.txt
- Start with the trivial NLP-to-reasoner parser in python nlp.py .
- Our initial goal is to process the text "John is a father of Andrew. Andrew is a man. Who is the father of Andrew?"
And the examples with various ways to encode normal/abnormal cases (not really needed for lab):
- Reasoner example for being likely married: works out ok
- Reasoner example for birds: does not work out well
- Reasoner another example for birds: does not work out well
Syntax and command line:
- gkc command line: gkc readme in github: see also Examples folder for a few trivial examples.
- For simple clausal syntax see otter manual clause syntax.
- For complex syntax options see TPTP manual: seems to be sometimes down.
Lecture 13: seminar-kind of work on the third lab
Ways to get external data in addition to our own rules and data:
- Raw data (hard to use): wikipedia, wikidata
- Processed raw data (a bit easier): dbpedia (processed wikipedia/wikidata)
- Well-formed and connected options (use one of these):
- wordnet: exists in several versions, including an external TPTP version usable by the gkc reasoner, direct link to TPTP axiom NLP001+0.ax
- yago, connecting wikipedia with wordnet and geonames. Look at the downloads section SIMPLETAX and download yagoSimpleTypes and yagoSimpleTaxonomy. You will need to write a program to filter and convert the yago datasets to the prover-understandable form. See some ideas for converting yago.