Algoritmide ja andmestruktuuride erikursus

Course name: Advanced Algorithms and Data Structures in Automated Reasoning

Course code: ITI8590
Semester: fall 2019
Lecturer: Tanel Tammet
Contact: tanel.tammet@ttu.ee, 6203457, ICT 426
Language: The default language of the course is English, but if all students understand Estonian, it will be in Estonian.
Lectures per week: 1 lecture, 2 academic hours
Practice work per week: 1 session, 2 academic hours
Home work: additionally, mostly development and experimentation

Sisukord

1 Esmaspäeval ainult loeng kell 10:00, praksivariant kell 16
2 Exams
3 Eksamis tulevad teemad
4 Time and place
5 Learning outcomes
6 Brief description of the course
7 Course materials
8 Practice work

Esmaspäeval ainult loeng kell 10:00, praksivariant kell 16

Esmaspäeval loengus kell 10 plaanime eksami sisu ja vaatame üle praktilised tööd.

Kella 14-ne praktikum jääb ära: annan samal ajal intervjuu err-le.

Kui peaks kellelegi sobima, siis hea meelega teeksin praktikumi hiljem, kell 16.

Exams

Kõik eksamid toimuvad esmaspäeviti kell 12:00 ja kell 15:00, konkreetne aeg vali ise:

06.01.2019 12:00 ja 15:00 ruumis U02-102
13.01.2019 12:00 ja 15:00 ruumis U06A-229
20.01.2019 12:00 ja 15:00 ruumis U02-102

Eksamis tulevad teemad

Teoreetiline baas:

Geoff exam style questions lk 8, lk 15

Clausification:

Geoff ülesanded a la lk 25
Schulz lk 18-27 veidi täienduseks

Üldised põhimõtted arusaamiseks

Schulz kuni lk 40

Unifitseerimisülesanded ja resolutsioonimeetod

eestikeelsest loogikaõpikust lk 319-320
Geoff ülesanded a la lk 36

Oskus teha resolutsiooniotsingut/tõestust (ehitad näitetõestuse) nii edasi (forward reasoning) kui tagasi (backward reasoning) vastavalt

eestikeelsest loogikaõpiku näidetele lk 332-334

Otsistrateegiad

Geoff 44-61 ja oskus rakendada strateegiat

etteantud näiteülesandel (teed käsitsi otsingu väikesel näitel selle strateegiaga)

Võrdussüsteemid:

Eestikeelsest loogikaõpikust lk 366-375 termiteisendussüsteemid: oskus väikesest võrdussüsteemist järeldusi teha ja leida täielik kokkuvoolav võrdussüsteem (kui seda saab ehitada)
Geoff lk 67-73

Indekseerimine: mille jaoks mis meetod sobib ja kuidas ta töötab

Schulz 88-112

Clause and literal selection: mis meetodid on enam kasutatavad / lihtsad

Schulz lk 117-118

Time and place

Lectures: every Monday at 10:00-11:45 at ICT403

Practice sessions: Mondays at 14:00, in room 411

Learning outcomes

In Estonian:

teab mitmete erialgoritmide tööpõhimõtteid
on suuteline rakendama neid algoritme erinevate ülesannete lahendamiseks
teab mitmete andmestruktuuride tööpõhimõtteid ja on suuteline neid rakendama
suudab analüüsida järjestikuste ja rekursiivsete algoritmide keerukust

Brief description of the course

The course focuses on the internals of the methods of automated reasoning:

The main bulk of the course: algorithms and data structures used for full first order (FOL) reasoners.
The latter part will be extended to algorithms and data structures for extending standard FOL reasoners to the tasks of commonsense reasoning.
We will also have a look at:

Solving propositional formulae
SMT methods

Course materials

Will be given during the course. Materials are either presentations for specific methods or scientific papers covering such methods.

An introductory material to field is the set of course notes from Geoff Sutcliffe.

A simpler introduction in Estonian book chapter pages 297-345 by Tanel Tammet.

The second most important material is course notes by Stephan Schulz: this concerns implementation aspects.

In case you want to see a highly theoretical intro to the field, check out Christoph Weidenbach's coursebook

Practice work

Will contain all of these:

Writing problem converters
Writing small, toy prover components
Experimenting with automated reasoning systems on specific problem sets

Practice work 1

Get some provers running (at least two, but more is better!):

Go check the current top provers for the First-order Theorems division of the latest casc competition. You may want to have a brief look at short system descriptions.

Download and install at least Vampire (the overall top prover) and GKC (the new prover by T.Tammet, targeting large problems)

Download and unpack TPTP problem set: here is the direct download link).

Have a quick look at the TPTP technical manual

Try to get the PyRes running: this is a pedagogical prover in Python by Stephan Schulz.

You may also run these on dijkstra like this:

ssh kasutajanimi@dijkstra.cs.ttu.ee

Practice work 2

Familiarize yourself with the provers:

Run Vampire and GKC on some smaller problems in TPTP: start with initial problems from the PUZ domain (puzzles) and then the KRS domain (knowledge representation). Read the output and understand the basics.

Encode some small problems from the Tammet book chapter or Sutcliffe material and run the provers on these. Read the output and undestand the basics.

Practice work 3

Do some excercises and exam style questions from the first chapters of the Sutcliffe tutorial (select yourself some from the excercizes and exam style questions)

Practice work 4

Build a script to run both Vampire and GKC on the TPTP KRS domain, considering only problems ending with +*.p like KRS119+1.p:

Find if there are any provable problems Vampire fails on in 30 seconds, list these.

Find if there are any provable problems GKC fails on in 30 seconds, list these.

NB! A significant percentage of the problems in KRS are known to be not provable: these have the Status line value as either CounterSatisfiable or Satisfiable (explained in the TPTP Tech Report) like

% Status : Satisfiable

Please remove these problems from the test set: gkc will never report "satisfiable", vampire sometimes reports satisfiablility, but cannot do it in harder cases.

Also, please remove problems with the name containing the caret ^ like KRS275^7.p: these are in a typed/higher order language.

You will probably want to run Vampire like this (try also vampire --help):

   vampire --mode casc --include ~/TPTP ~/TPTP/Problems/ALG/ALG150+1.p

where the --mode casc thing makes vampire try out many strategies, not just running one.

Gkc runs many strategies by default, unless you give it an extra strategy selection file as a command line arg.

The gkc method for finding included axioms (the first two are the TPTP suggestions provers mostly follow):

try Axioms folder under the problem folder (notice you could set a sumbolic link there or just copy)
try the TPTP environment variable, under which look for Axioms folder
try the /opt/TPTP folder (assuming we have /opt/TPTP/Axioms/... there)

This may be helpful: an example script for provers. It is *not* exactly the script you finally need, but you may get useful inspiration/ideas.

Practice work 5

The goal of this excercise is to create a large number of search strategies and try out which ones are able to solve most problems.

For gkc either make many small strategy files or make a single large strategy file containing many runs: give each one 1 second and finally produce statistics as for which combination of strategies seems to be best for this problem class (initially experiment with KRS).

A strategy file example as given in the https://github.com/tammet/gkc top-level README:

{
    "print":1,
    "print_level": 15,
    "max_size": 0,
    "max_depth": 0,
    "max_length": 0,
    "max_seconds": 0,
    "equality":0,
    "runs":[
        {"max_seconds": 1, "strategy":["negative_pref"], "query_preference": 1},
        {"max_seconds": 1, "strategy":["negative_pref"], "query_preference": 0},   
        {"max_seconds": 1, "strategy":["query_focus"], "query_preference": 2},   

        {"max_seconds": 5, "strategy":["negative_pref"], "query_preference": 1},
        {"max_seconds": 5, "strategy":["negative_pref"], "query_preference": 0},
        {"max_seconds": 5, "strategy":["hardness_pref"], "query_preference": 0},
        {"max_seconds": 5, "strategy":["unit"], "query_preference": 0}
    ]
 }

Practice work 6

This practice work is a project. You can choose from the following list of initially proposed projects: the list may be extended/modified during discussions.

It is recommendable that each student or a two-student team picks a different project, the choices to be discussed during lecture/practice session.

Some useful background:

Two new ideas:

Write a standalone implementation for selecting potentially useful rules/facts from a large set of axioms, using both ideas from SinE (see http://resources.mpi-inf.mpg.de/departments/rg1/conferences/deduction10/slides/krystof-hoder.pdf ) and word vectors (see https://link.springer.com/chapter/10.1007/978-3-030-29436-6_15 as a starting point)

Find out if there is a feasible way to programmatically build a large set of default properties for most popular noun categories in wordnet/wikipedia, a la "physical things typically cannot fly", "bird typically can fly", "penguins typically cannot fly", "dogs are typically hairy" etc. Start from looking at rules in conceptnet/nell/etc and continue looking at the beginnings of the wikipedia articles: is there hope of automatic parsing of texts to get such properties. Also, maybe they can be gleaned from existing word vectors?

Previous ideas, valid as before:

Investigating available frameworks/tools suitable to be used for converting natural language (English) to a logical formalism, assuming it is ok to use confidence of statements/rules, default logic, word vector based similarities and other such extensions as necessary. After selection, using available tools to build a prototype converter.

It may be useful to google AMR parsing. The main direct link: https://amr.isi.edu/language.html and one actual parser https://github.com/mdtux89/amr-eager . See also https://paperswithcode.com/task/amr-parsing

Have a brief look at

Investigating and converting http://conceptnet5.media.mit.edu/ to a logical formalism, using all the non-standard representation mechanisms (again, onfidence of statements/rules, default logic, word vector based similarities and other such extensions as necessary). A TPTP dump of conceptnet is available. Ideally integrating the ruleset with https://wordnet.princeton.edu/ . A TPTP dump of a taxonomy of wordnet is available. It may be a good idea to select only rules for the "most common" words: the "commonness" table is available.

Analogously to the previous task, Investigating and converting http://rtw.ml.cmu.edu/rtw/ to a logical formalism.

Working out the principles for unification, subsumption etc in case the word-vec similarities between words are given and confidences are used. Assisting in experimenting with the implementation, mainly by doing experiments with the system in development and suggesting modifications/improvements etc.

Working out the principles for unification, subsumption etc in case location, time and context are encoded into facts and rules as special terms. Again, assisting in experimenting with the implementation, mainly by doing experiments with the system in development and suggesting modifications/improvements etc.

Taking up the first-stage-finished instatiation-based reasoning system (used by http://www.cs.man.ac.uk/~korovink/iprover/ ) in gkc, understanding the instatiation-based method in depth and assisting in implementing the final phases.

Taking up some harder open-question-answering benchmarks, working out several examples to see what information / rules would be needed to answer "hard" questions not easily solvable by word vector based methods alone. Here are some useful links for starters:

- https://en.wikipedia.org/wiki/Winograd_Schema_Challenge
- https://leaderboard.allenai.org/open_book_qa/submissions/get-started and a paper about the benchmark
- https://gluebenchmark.com/
- https://ai.googleblog.com/2019/01/natural-questions-new-corpus-and.html ja https://ai.google.com/research/NaturalQuestions
- A master thesis concerning answering winograd schema: https://repository.asu.edu/attachments/143311/content/Sharma_asu_0010N_14458.pdf and a paper by the author and others https://www.ijcai.org/Proceedings/15/Papers/190.pdf
- https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/26336/Fader_washington_0250E_13164.pdf?sequence=1&isAllowed=y
- https://www.aclweb.org/anthology/D18-1455/

Algoritmide ja andmestruktuuride erikursus

Sisukord

Esmaspäeval ainult loeng kell 10:00, praksivariant kell 16

Exams

Eksamis tulevad teemad

Time and place

Learning outcomes

Brief description of the course

Course materials

Practice work

Practice work 1

Practice work 2

Practice work 3

Practice work 4

Practice work 5

Practice work 6

Navigeerimismenüü

Personaalsed tööriistad

Nimeruumid

Variandid

vaatamisi

Veel

Otsing

Navigeerimine

Kasulikku

Tööriistad