Teadmiste otsing, formaliseerimine ja hoidmine

Code: ITV0060
Link: http://www.lambda.ee/index.php/Teadmiste_otsing,_formaliseerimine_ja_hoidmine or http://www.lambda.ee/index/itv0060
Lecturer: Tanel Tammet
Contact: tanel.tammet@ttu.ee, 6203457, TTÜ ICT-426
Archives of previous years: 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, older.

Sisukord

1 NB! This is an archive from 2015
2 Exam
3 Time, place, result
4 Focus
5 Practical work
6 Lecture block 1: basics and representing simple facts.
7 Lecture block 2: representing rules
8 Lecture block 3: time, planning and uncertain knowledge
9 Lecture block 4: indexes and search
10 18. May: guest lecture

NB! This is an archive from 2015

Exam

Date and time (two separate occasions):

2. June (Tuesday) at 10:00 in room SOC-214 (Majandus- ja sotsiaalteaduste maja)
9. June (Tuesday) at 10:00 in room SOC-210 (Majandus- ja sotsiaalteaduste maja)

4 questions, one for each block. Each question is a small task in encoding information. No programming excersices.

1. task: encoding data in a conventional relational database in RDF, inventing suitable ID-s, converting selects with where conditions and joins to first order rule form.

2. task: encoding data/rules in RDFs (perhaps additionally in RDFa). Writing 1st order rules about the meaning of RDFs built-in axioms and deriving data using RDFs built-in axioms. Concrete derivation systems are not required: just deriving correct information. It is also important to understand ways to uri-s as global id-s and encoding data in both html metainfo part and in pages, using RDFa.

3. task: encoding uncertain knowledge. You will first have to encode a small scenario using several different systems: default logic, probabilistic logic, fuzzy logic. Second, you have to show what can you derive using one or another formalism.

4. tasks: building indexes. Given a small dataset(s), you will have to build several indexes. Which indexes: trie indexes for words in a text, discrimination tree indexes for generalizations of terms, path indexes for concretisations of terms. For the latter two, use the mccune paper

NB! The information in the following "this is in exam ..." is old and not relevant.

Time, place, result

Semester: spring
Grading: exam

Lectures: every Monday 16:00-17.30, room ICT-A1
Practical work: Mondays on odd weeks (2. Feb, 16 Feb, ...) 17:45-19:15 room ICT-403
Practical work will give 40% and exam 60% of points underlying the final grade. The exam will consist of several small excercises.

Focus

The main focus of the course is on KR (knowledge representation): how to represent nontrivial information in programs and databases, how to build and use indexes for efficient search through large sets of knowledge. Check out this description.

The course contains four blocks built on each other:

Background and basics. Representing simple facts.
Representing rules.
Time, planning and uncertain knowledge.
Indexes and search.

Books to use:

Ca half of the course themes are covered in this book with freely accessible pdf-s.
The freely accesible pdf of the handbook if knowledge representation gives a detailed and thorough coverage of the subject; far more than necessary for the course.
Interesting to browse: recent conference proceedings
Interesting to browse: course materials, course materials

Observe that a noticeable part of the course contents are not covered by these books: use the course materials and links to papers, standards and tutorials provided.

Practical work

There are three labs: the first two are obligatory, third is optional. The labs have to be presented to the prof and all students present at labwork.

First lab 2015

The goal of the first lab is to write a software system able to scrape factual raw data about an address (like "Akadeemia tee 12, Tallinn") given to the program.

Basically, search for web pages containing the name or similar names and extract relevant words from the list you create: words closer to the name and with more occurrences are also more important/better match. Titles/headlines above the word are also more likely to be important. It may be a good idea to do some searches together with interesting keywords already. Also, whenever you do a search / pull a page, it is a good idea to store the search result / html source in a file to avoid exchausting your search quota and just wasting bandwidth and time,

It makes sense to search adresses in several ways: googling, binging plus searching from known real estate portals.

Deadline: 16. March (after this there will be a penalty).

Second lab 2015

The goal of the second lab is to write and use rules to categorize/tag addresses according to the data obtained during the first lab. An address should get a number of tags with numerical indicators showing our trust in that the tag really applies to the address, plus (sometimes) a number indicating the degree to which the tag applies.

See details for KR lab 2.

Third lab 2014

This lab is optional and will simply give as many points as lab 1 or 2 towards the final result and the grade: practical wotrk 60% and exam 40%.

The goal of the lab is to use wordnet or teksaurus to create an additional ruleset and to use that in addition to your own rules in lab 2.

Lab ideas

KR lab idea by Priit Järv

Student ideas are also welcome: have to agree with Tanel first.

A cool idea is to investigate:

Lecture block 1: basics and representing simple facts.

Lecture 1 (2. Feb) Overview of the course. Background and basics. First lab.

Lecture materials:

Intro lecture: declarative and procedural representations
Core logic refresher.
Details of the first lab.

Lecture 2 (9 Feb) Programming and databases. SQL: meaning and representation of facts

Encoding data in programming languages.
relation of plain data in databases to logic & representing complex structures in databases
Core ideas of non-relational databases, mostly RDF

Lecture materials:

Lecture 3 (16 Feb) HTML annotations. Microformats, microdata, RDFa

Two student presentations ca 30 min each:

Microformats, microdata, RDFa: Madis Taimre
Facebook open graph: Madis Allikmaa

Lecture materials:

Html annotations

Understand main parts:

Lecture 4 (2. March) RDF last parts + RDFS start

Continuing with RDF:

Continue with the lecture material key/value pairs and rdf, compared to the relational model
Check out http://www.w3.org/RDF/ and http://www.w3.org/standards/techs/rdf#w3c_all
And read more from http://www.w3.org/TR/rdf11-primer/ and http://www.w3.org/TR/rdf11-new/

Starting RDFS:

Start with the lecture material RDFS: rdf schema and as ppt
Read RDFs wikipedia
Check out w3c RDF schema

Choosing the student presentations from:

Nosql links and notes

Lecture block 2: representing rules

Lecture 5 (9. March) Student presentations on NELL and nosql databases + intro to the second lab

Student lectures on the theme of data and knowledge extraction from the web:

Nell and systems like Nell (are there any? Check also ConceptNet): Andrei. See some details about Nell as a nice example.
Student presentations chosen from Nosql links and notes:
- Rait: docs.mongodb.org/manual/ (tehtud)
- Raigo: http://www.neo4j.org/ (tehtud)

Understand RDFS:

Lecture material:

Continue with RDFS: rdf schema and as ppt

Additional details (not part of exam):

Lecture 6 (16. March) continuing RDFs and looking into other KR languages

Lecture material:

Continue with RDFS: rdf schema and as ppt

Additional details (not part of exam):

Important KR languages:

RuleML RuleML
OWL OWL wikipedia, w3c OWL guide
KIF
CL CL main page,
ontologies
wordnet
cyc
TPTP language TPTP

Lecture 7 (23. March): OWL and ontologies

Owl background: description logics (not part of exam )

description logic intro
brief alternative intro to description logic
detailed course in description logics (not necessary for exam)

Understand basics of owl:

OWL wikipedia

not part of exam:

Notes about RDF and OWL: logical meaning

Start looking at interesting ontologies:

Lecture 8 (30 March): Restricted english

Attempto restricted english as intro use this talk
Inform system for interactive fiction, vt ka vanemat raamatut

Attempto details not necessary for the exam.

Lecture block 3: time, planning and uncertain knowledge

Rules in planning and robotics

Lecture material:

Logic for uncertain knowledge

nonmonotonic logic Not necessary for exam.
default logic For exam: main material for default logic.

For the exam: you should be able to create and solve small examples with default logic.

Fuzzy and probabilistic logic

Uncertain_prob_fuzzy.ppt‎ Intro.
Vienna_tanel_2.pdf Additional examples and combining.

For the exam: understand the differences between fuzzy and probabilistic logic and be able to present small examples.

Logic of belief and knowledge

Ijcai93.pdf Overview.

For exam: understand referential transparency and core ideas about encoding belief and knowledge. Modal logic not necessary for the exam.

Lecture block 4: indexes and search

27. April. Indexes: intro and mainstream

Traditional database indexes incl B+ tree:

G. Molina lecture 4

Hash indexes:

wiki.

Bitmap indexes:

wiki.

For exam: understand the core usage scenarios and be able to create small examples.

4. May Multi-field and geoindexes, fulltext and term indexes

A few words about both multi-field indexes and geoindexes.
Fulltext indexes: good intro and overview
Term indexes: mccune paper

NB! Presentation: geoindexes, 4. May, Daniel.

not part of exam:

4. May: Fancier term indexes

Path indexes

On from here.

11 May: Nosql indexes plus information about the exam

Document bases
Graph bases

NB! Indexing in Graph and Document (nosql) databases. 11. May: Vassili.

NB! Last day for the practical work presentations.

About the exam.

18. May: guest lecture

Kalle Tomingas. Impact analysis and ontologies.

http://demo.dlineage.com

teadmised_note

Teadmiste otsing, formaliseerimine ja hoidmine

Sisukord

NB! This is an archive from 2015

Exam

Time, place, result

Focus

Practical work

First lab 2015

Second lab 2015

Third lab 2014

Lab ideas

Lecture block 1: basics and representing simple facts.

Lecture 1 (2. Feb) Overview of the course. Background and basics. First lab.

Lecture 2 (9 Feb) Programming and databases. SQL: meaning and representation of facts

Lecture 3 (16 Feb) HTML annotations. Microformats, microdata, RDFa

Lecture 4 (2. March) RDF last parts + RDFS start

Lecture block 2: representing rules

Lecture 5 (9. March) Student presentations on NELL and nosql databases + intro to the second lab

Lecture 6 (16. March) continuing RDFs and looking into other KR languages

Lecture 7 (23. March): OWL and ontologies

Lecture 8 (30 March): Restricted english

Lecture block 3: time, planning and uncertain knowledge

Rules in planning and robotics

Logic for uncertain knowledge

Fuzzy and probabilistic logic

Logic of belief and knowledge

Lecture block 4: indexes and search

27. April. Indexes: intro and mainstream

4. May Multi-field and geoindexes, fulltext and term indexes

4. May: Fancier term indexes

11 May: Nosql indexes plus information about the exam

18. May: guest lecture

Navigeerimismenüü

Otsing