Teadmiste otsing, formaliseerimine ja hoidmine

Allikas: Lambda
teadmised

Code: ITV0060
Link: http://www.lambda.ee/index.php/Teadmiste_otsing,_formaliseerimine_ja_hoidmine or http://www.lambda.ee/index/itv0060
Lecturer: Tanel Tammet
Contact: tanel.tammet@ttu.ee, 6203457, TTÜ ICT-426
Archives of previous years: 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, older.

Exam

Date and time (two separate occasions):

  • 2. June (Tuesday) at 10:00 in room SOC-214 (Majandus- ja sotsiaalteaduste maja)
  • 9. June (Tuesday) at 10:00 in room SOC-210 (Majandus- ja sotsiaalteaduste maja)

4 questions, one for each block. Each question is a small task in encoding information. No programming excersices.

  • 1. task: encoding data in a conventional relational database in RDF, inventing suitable ID-s, converting selects with where conditions and joins to first order rule form.
  • 2. task: encoding data/rules in RDFs (perhaps additionally in RDFa). Writing 1st order rules about the meaning of RDFs built-in axioms and deriving data using RDFs built-in axioms. Concrete derivation systems are not required: just deriving correct information. It is also important to understand ways to uri-s as global id-s and encoding data in both html metainfo part and in pages, using RDFa.
  • 3. task: encoding uncertain knowledge. You will first have to encode a small scenario using several different systems: default logic, probabilistic logic, fuzzy logic. Second, you have to show what can you derive using one or another formalism.


  • 4. tasks: building indexes. Given a small dataset(s), you will have to build several indexes. Which indexes: trie indexes for words in a text, discrimination tree indexes for generalizations of terms, path indexes for concretisations of terms. For the latter two, use the mccune paper

NB! The information in the following "this is in exam ..." is old and not relevant.

Time, place, result

Semester: spring
Grading: exam

Lectures: every Monday 16:00-17.30, room ICT-A1
Practical work: Mondays on odd weeks (2. Feb, 16 Feb, ...) 17:45-19:15 room ICT-403
Practical work will give 40% and exam 60% of points underlying the final grade. The exam will consist of several small excercises.

Focus

The main focus of the course is on KR (knowledge representation): how to represent nontrivial information in programs and databases, how to build and use indexes for efficient search through large sets of knowledge. Check out this description.

The course contains four blocks built on each other:

  • Background and basics. Representing simple facts.
  • Representing rules.
  • Time, planning and uncertain knowledge.
  • Indexes and search.

Books to use:

Observe that a noticeable part of the course contents are not covered by these books: use the course materials and links to papers, standards and tutorials provided.

Practical work

There are three labs: the first two are obligatory, third is optional. The labs have to be presented to the prof and all students present at labwork.

First lab 2015

The goal of the first lab is to write a software system able to scrape factual raw data about an address (like "Akadeemia tee 12, Tallinn") given to the program.

Basically, search for web pages containing the name or similar names and extract relevant words from the list you create: words closer to the name and with more occurrences are also more important/better match. Titles/headlines above the word are also more likely to be important. It may be a good idea to do some searches together with interesting keywords already. Also, whenever you do a search / pull a page, it is a good idea to store the search result / html source in a file to avoid exchausting your search quota and just wasting bandwidth and time,

It makes sense to search adresses in several ways: googling, binging plus searching from known real estate portals.

Deadline: 16. March (after this there will be a penalty).

See also notes for KR lab 1.

Second lab 2015

The goal of the second lab is to write and use rules to categorize/tag addresses according to the data obtained during the first lab. An address should get a number of tags with numerical indicators showing our trust in that the tag really applies to the address, plus (sometimes) a number indicating the degree to which the tag applies.


See details for KR lab 2.

Third lab 2014

This lab is optional and will simply give as many points as lab 1 or 2 towards the final result and the grade: practical wotrk 60% and exam 40%.

The goal of the lab is to use wordnet or teksaurus to create an additional ruleset and to use that in addition to your own rules in lab 2.

Lab ideas

Student ideas are also welcome: have to agree with Tanel first.

A cool idea is to investigate:



Lecture block 1: basics and representing simple facts.

Lecture 1 (2. Feb) Overview of the course. Background and basics. First lab.

Lecture materials:

Lecture 2 (9 Feb) Programming and databases. SQL: meaning and representation of facts

Lecture materials:

Lecture 3 (16 Feb) HTML annotations. Microformats, microdata, RDFa

Two student presentations ca 30 min each:

  • Microformats, microdata, RDFa: Madis Taimre
  • Facebook open graph: Madis Allikmaa

Lecture materials:

Understand main parts:

Lecture 4 (2. March) RDF last parts + RDFS start

Continuing with RDF:

Starting RDFS:

Choosing the student presentations from:

Lecture block 2: representing rules

Lecture 5 (9. March) Student presentations on NELL and nosql databases + intro to the second lab

Student lectures on the theme of data and knowledge extraction from the web:

  • Nell and systems like Nell (are there any? Check also ConceptNet): Andrei. See some details about Nell as a nice example.
  • Student presentations chosen from Nosql links and notes:

Understand RDFS:

Lecture material:

Additional details (not part of exam):

Lecture 6 (16. March) continuing RDFs and looking into other KR languages

Lecture material:

Additional details (not part of exam):


Important KR languages:

Lecture 7 (23. March): OWL and ontologies

Owl background: description logics (not part of exam )

Understand basics of owl:

not part of exam:

Notes about RDF and OWL: logical meaning

Start looking at interesting ontologies:

Lecture 8 (30 March): Restricted english

Attempto details not necessary for the exam.

Lecture block 3: time, planning and uncertain knowledge

Rules in planning and robotics

Lecture material:

See also (not part of exam):

Logic for uncertain knowledge

For the exam: you should be able to create and solve small examples with default logic.

Fuzzy and probabilistic logic

For the exam: understand the differences between fuzzy and probabilistic logic and be able to present small examples.

Logic of belief and knowledge

For exam: understand referential transparency and core ideas about encoding belief and knowledge. Modal logic not necessary for the exam.

Lecture block 4: indexes and search

27. April. Indexes: intro and mainstream

Traditional database indexes incl B+ tree:

Hash indexes:

Bitmap indexes:

For exam: understand the core usage scenarios and be able to create small examples.

4. May Multi-field and geoindexes, fulltext and term indexes

NB! Presentation: geoindexes, 4. May, Daniel.

not part of exam:

4. May: Fancier term indexes

  • Path indexes
  • On from here.

11 May: Nosql indexes plus information about the exam

  • Document bases
  • Graph bases

NB! Indexing in Graph and Document (nosql) databases. 11. May: Vassili.

NB! Last day for the practical work presentations.

  • About the exam.

18. May: guest lecture

Kalle Tomingas. Impact analysis and ontologies.

http://demo.dlineage.com


teadmised_note