Teadmiste formaliseerimine 2019

Name: Knowlege representation
Code: ITI8700
Lecturer: Tanel Tammet
Labs: Evelin Halling and Tanel Tammet
Contact: tanel.tammet@ttu.ee, 6203457, TTÜ ICT-426
Archives of previous years: 2015 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, older.

Sisukord

1 NB! This is an archive from 2019, not actual course contents
2 Exam results
3 Exam
4 Time, place, result
5 Assumed background
6 Focus
7 Books to use
8 Practical work
9 Lecture block 1: basics and representing simple facts.
- 9.1 Lectures 1 and 2: Overview of the course. Background and basics: SQL, logic, NLP
- 9.2 Lecture 2: RDF and RDFS (and OWL)
10 Lecture block 2: capturing meaning in natural language
- 10.1 Lecture 3: Intro to homework and NLP
- 10.2 Lecture 4: vector representation of words
11 Lecture block 3 and start of 4: large common-sense knowledge bases and reasoning
12 Lecture block 5: context and uncertainty

NB! This is an archive from 2019, not actual course contents

Exam results

Siin on lõpptulemus kolmanda eksami järel: Teadmiste formaliseerimine lõpptulemus 2019 koos praktikumipunktide ja hindega.

Milles seisnes vahepealne parandus: praktikumipunktide kokkuliitmisel oli kaks erinevat interpretatsiooni: kas kolmas praks on osa summast 0..40 või on ta puhas lisand esimese kahe praksi summale. Esialgne hinne tulenes esimesest interpretatsioonist, parandatud hinne teisest. Selle tulemusena mitmed hinded tõusid.

Viimane eksam toimub kolmapäeval, 12. juunil kell 12:00 IT majas ruumis ICT-A2.

Siin on esimese eksami tulemused (praktikumipunkte ei ole lisatud): Teadmiste formaliseerimine esimene eksam 2019

Kellel alla 30 punkti, palun eksam uuesti teha. Igaühel on võimalik teha eksamit kaks korda.

Exam

All exam times on Wednesdays at 12:00 (lecture timeslot):

22 May 12:00 in the economics building SOC-211A
29 May 12:00 in the economics building SOC-213
12 June 12:00 in the IT building ICT-A2

Exam is in written form, no materials can be used. Exam will last 3 hours, but most likely you can finish in ca 1.5 hours.

Here are notes as What will be asked in the ITI8700 exam including materials to study.

Time, place, result

Lectures: Wednesdays 12:00-13:30 room U06A-229
Practical work: Wednesdays 14:00-15:30, room ICT-121, ICT-122
Practical work will give 40% and exam 60% of points underlying the final grade. The exam will consist of several small excercises.

The first practical work time on 30. January at 14:00 will be used as a secord conventional lecture of the day.

Some of the last lecture times at the end of the course may be used as additional seminars/labs.

Assumed background

You should have studied the course "Basics of AI and Machine Learning" or get acquinted with the logic and probabilities parts of the course yourself.

In particular, it is useful to read these course materials and exercises: logic AIMA book, wumpus world in AIMA book, uncertainty in AIMA book, probability models in AIMA book, prolog lab, bayes lab

Focus

The main focus of the course is on knowledge representation and reasoning (KR), on a spectrum from simple to very complex: representing and using knowledge in databases, sentences in natural language and commonsense knowledge.

The overall goal of the labwork to understand issues facing a task of building a natural language question answering system a la Watson and to see and experiment with existing parts of the solution.

The course contains the following blocks:

Background and basics. Representing and using simple facts and rules.
Knowledge conveyed by natural sentences: both statistical methods like word2vec and logic-based methods.
General-knowledge databases: wikidata, wordnet, yago, conceptnet, nell, cyc.
Reasoners and question answering systems.
Context, time, location, events, causality.
Different kinds and dimensions of uncertain knowledge.
Indexes and search.

Check out this description of the whole KR area.

Books to use

Recommended for NLP: Speech and Language Processing by Dan Jurafsky and James H. Martin. here is the 2nd edition and here the web page for the draft 3rd edition
Ca half of the course themes are covered in this book with freely accessible pdf-s.
The freely accesible pdf of the handbook if knowledge representation gives a detailed and thorough coverage of the subject; far more than necessary for the course.

Observe that a noticeable part of the course contents are not covered by these books: use the course materials and links to papers, standards and tutorials provided.

Practical work

There are three labs. They are all steps in a single project: build a simple natural-language question-answering system.

The labs have to be presented to the course teachers and all students present at labwork time.

The labs can be prepared alone or by teams of two people. The first lab task will be given on 6. February.

First, read the explanation of the overall task. The following labs are steps on the path to the end goal.

NB! You have to register for the lab like this:

First, go to

https://ained.ttu.ee/

Search for: ITI8700

Log in with the "UNI ID".

Enroll on the course page found: Click "enrol me" button.

Second, go to http://gitlab.cs.ttu.ee and log in

Click on "Create a project"

Name of the project must be exactly that: iti8700-2019

Visibility level: private

Upload you code, presentation, examples etc you use for the presentation.

First lab

Deadline: 14. March (after this there will be a penalty).

If you manage to fullfill the task earlier, start doing the second lab.

Please read about the task for the first lab

Second lab

Deadline 15. May. You have to give a presentation latest at 15. May labtime (better at 8. May).

The task in the second lab is actually answering simple questions from the input text, using a small set of rules you write and a reasoner.

Please start by experimenting with a real prover: look and run Reasoner examples with gkc

We use the gkc prover for examples (instructions are in the Reasoner examples ... page above) but you could also try out an old classic prover Otter which actually works OK on windows even though it is really old.

Make sure to get the latest release of gkc from gkc releases (currently delta) and either download a binary or compile using instructions on the gkc github page. The basic instructions are also on the gkc github page. Also, see the few examples in the Examples folder.

You can use a trivial NLP-to-reasoner parser in python nlp.py as a starting point.

Have a look at the Lecture 12 materials.

Interesting large question-answering datasets and challenge problems: Stanford SQUAD, Allen institute arc2 challenge, Google natural questions, Fujitsu NLP challenge, amazon qa dataset

Third lab

Deadline 15. May. You have to give a presentation latest at 15. May labtime (better at 8. May).

The third lab is optional and will simply give as many points as lab 1 or 2 towards the final result and the grade: practical work 60% and exam 40%.

The task in the third lab is the open-world scenario of answering questions based on wikipedia and using large downloaded rule sets a la yago, wordnet etc with a reasoner.

It is a plus to be able to give uncertain answers (likely, unlikely) and handle fuzzy properties like boy, girl, grownup, rich etc.

Have a look at the Lecture 13 materials

Lecture block 1: basics and representing simple facts.

Lectures 1 and 2: Overview of the course. Background and basics: SQL, logic, NLP

Lecture materials:

intro, sql, logic, NLP

Lecture 2: RDF and RDFS (and OWL)

Lecture materials:

Lecture block 2: capturing meaning in natural language

Lecture 3: Intro to homework and NLP

Lecture materials:

nlp and watson
Jurafsky and Martin book: conversational agents
start reading the Jurafsky and Martin book (see links above)

Lecture 4: vector representation of words

This lecture will be given by Priit Järv.

Lecture materials:

First, listen to text vectorization episode from the linear digressions podcast.
Second, listen to word2vec episode from the linear digressions podcast.
Read the beginning (until you get tired ...) of the Vector Semantics chapter from the Jurafsky & Martin book.

Useful additional materials from (roughly) easier to more complex:

word vectors: a short intro
vector semantics part I presentation from the Jurafsky & Martin book for for chapter 6.
vector representation: a short tutorial with code examples.
word2vec tutorial from tensorflow
vector semantics part II presentation from the Jurafsky & Martin book for chapter 6.

Probabilistic models:

LDA explanation by Jordan Boyd-Graber

Also interesting to read:

Aetherial Symbols by the ultimate machine learning guru aka "godfather of machine learning" Geoffrey Hinton
Autocomplete using Markov chains

Lecture block 3 and start of 4: large common-sense knowledge bases and reasoning

Lecture 5: First look into main large knowledge bases

We will have a look at the goals, main content and differences between:

wordnet
dbpedia
yago
babelnet
conceptnet
nell in wiki and a good paper. The nell website is currently down.
cyc

Lecture 6: Big annotation systems and intro to rule reasoners

First, big annotation systems:

Google structured data on the webpage.
Facebook Open Graph markup on the webpage.

In connection, see also:

schema.org: property markup vocabulary suggested by Google, Microsoft and others.
json-ld: currently most popular rdf syntax (in json), also recommended by Google.

Second, intro to rule reasoners:

Lecture material as ppt or as pdf

Additionally you may want to look at:

automated reasoning tutorial by Geoff

Lecture 7: rule reasoners

Please have a look and run the same experiments yourself with the gkc prover:

Reasoner examples with gkc

We use the gkc prover for examples (instructions are in the Reasoner examples ... page above) but you could also try out an old classic prover Otter which actually works OK on windows even though it is really old.

Lecture 8: starting to parse natural language for interaction with rule reasoners

We will start building a tiny question-answering tool, translating natural language to rule reasoner input.

Our first, very simple goal: translate the text "John is a father of Pete. Pete is a father of Mark. Who is the father of Pete?" to a rule reasoner input

father(john,pete).
father(pete,mark).
-father(X,pete) | ans(X).

then run the reasoner, fetch the ans(john) and give an answer "John is".

You can use a trivial NLP-to-reasoner parser in python nlp.py as a starting point.

We will then proceed to gradually make questions and sentences more complex and add rules to be able to answer them.

In the final part of the course we will start looking at uncertain reasoning: right now we will only try to do simple strict reasoning.

We will have a look at potential tools for semantic parsing, from simpler to more complex:

parsetron: a really simple and very limited tool to build your own parser.
sippycup: has a detailed tutorial/course attached, check the first lab
ccg2lambda: parses to classical logic strings, which are embedded in XML output.

Other interesting things to look at:

Framenet: map meaning to form in English through the theory of Frame Semantics.
Sling: Google project for getting facts from Wikipedia using Frame Semantics.
Spider: Yale Semantic Parsing and Text-to-SQL Challenge.
WikiSQL: A Salesforce crowd-sourced dataset for developing NLP interfaces for relational databases.
Allennlp from the Allen Institute for AI. See also their semantic parsing tutorial.

Lecture block 5: context and uncertainty

Lecture 9: intro to representing context and uncertainty

Blocks world.

Old intro: planning and blocks world

Some axiom sets:

And some planning problems to solve from these axiom sets:

Also:

Alternative axiomatization

Next, starting uncertainty.

We will first look at default logic.

Lecture 10: various classical ideas for discrete reasoning with uncertainty

We will look at some discrete ways:

default logic: refresher from previous lecture.
circumscription
Autoepistemic logic: long tutorial Ijcai93.pdf

And start with numeric:

Uncertain_prob_fuzzy.ppt‎ Intro to Bayesian and fuzzy reasoning.
Notes about numerical confidence, popularity, etc measures

Lecture 11: various classical ideas for numeric reasoning with uncertainty

Classical numeric ways to reason about uncertainty:

Uncertain_prob_fuzzy.ppt‎ Intro to both Bayesian and Fuzzy reasoning.

Some newer approaches:

Vienna_tanel_4.pdf Additional examples and combining.
Problog start with a tutorial

Lecture 12: seminar-kind of work on the second lab

Background to start with:

Take and unpack gkc.zip from gkc release gamma

Start with the example file p1.txt

father(john,pete).

% either the first or a second or both facts are true:
father(pete,mark).

% equivalent to (father(X,Y) & father(Y,Z)) => grandfather(X,Z).
-father(X,Y) | -father(Y,Z) | grandfather(X,Z).

-grandfather(john,X) | ans(X).

Run
```
gkc p1.txt
```

Start with the trivial NLP-to-reasoner parser in python nlp.py .

Our initial goal is to process the text "John is a father of Andrew. Andrew is a man. Who is the father of Andrew?"

And the examples with various ways to encode normal/abnormal cases (not really needed for lab):

Reasoner example for being likely married: works out ok
Reasoner example for birds: does not work out well
Reasoner another example for birds: does not work out well

Syntax and command line:

gkc command line: gkc readme in github: see also Examples folder for a few trivial examples.
For simple clausal syntax see otter manual clause syntax.
For complex syntax options see TPTP manual: seems to be sometimes down.

Lecture 13: seminar-kind of work on the third lab

Ways to get external data in addition to our own rules and data:

Raw data (hard to use): wikipedia, wikidata
Processed raw data (a bit easier): dbpedia (processed wikipedia/wikidata)
Well-formed and connected options (use one of these):
- wordnet: exists in several versions, including an external TPTP version usable by the gkc reasoner, direct link to TPTP axiom NLP001+0.ax
- yago, connecting wikipedia with wordnet and geonames. Look at the downloads section SIMPLETAX and download yagoSimpleTypes and yagoSimpleTaxonomy. You will need to write a program to filter and convert the yago datasets to the prover-understandable form. See some ideas for converting yago.

See how to run gkc on the Wordnet TPTP version

Teadmiste formaliseerimine 2019

Sisukord

NB! This is an archive from 2019, not actual course contents

Exam results

Exam

Time, place, result

Assumed background

Focus

Books to use

Practical work

NB! You have to register for the lab like this:

First lab

Second lab

Third lab

Lecture block 1: basics and representing simple facts.

Lectures 1 and 2: Overview of the course. Background and basics: SQL, logic, NLP

Lecture 2: RDF and RDFS (and OWL)

Lecture block 2: capturing meaning in natural language

Lecture 3: Intro to homework and NLP

Lecture 4: vector representation of words

Lecture block 3 and start of 4: large common-sense knowledge bases and reasoning

Lecture 5: First look into main large knowledge bases

Lecture 6: Big annotation systems and intro to rule reasoners

Lecture 7: rule reasoners

Lecture 8: starting to parse natural language for interaction with rule reasoners

Lecture block 5: context and uncertainty

Lecture 9: intro to representing context and uncertainty

Lecture 10: various classical ideas for discrete reasoning with uncertainty

Lecture 11: various classical ideas for numeric reasoning with uncertainty

Lecture 12: seminar-kind of work on the second lab

Lecture 13: seminar-kind of work on the third lab

Navigeerimismenüü

Otsing