Teadmiste otsing, formaliseerimine ja hoidmine
Code: ITV0060 |
Sisukord
- 1 NB! This is an archive from 2015
- 2 Exam
- 3 Time, place, result
- 4 Focus
- 5 Practical work
- 6 Lecture block 1: basics and representing simple facts.
- 7 Lecture block 2: representing rules
- 8 Lecture block 3: time, planning and uncertain knowledge
- 9 Lecture block 4: indexes and search
- 10 18. May: guest lecture
NB! This is an archive from 2015
Exam
Date and time (two separate occasions):
- 2. June (Tuesday) at 10:00 in room SOC-214 (Majandus- ja sotsiaalteaduste maja)
- 9. June (Tuesday) at 10:00 in room SOC-210 (Majandus- ja sotsiaalteaduste maja)
4 questions, one for each block. Each question is a small task in encoding information. No programming excersices.
- 1. task: encoding data in a conventional relational database in RDF, inventing suitable ID-s, converting selects with where conditions and joins to first order rule form.
- 2. task: encoding data/rules in RDFs (perhaps additionally in RDFa). Writing 1st order rules about the meaning of RDFs built-in axioms and deriving data using RDFs built-in axioms. Concrete derivation systems are not required: just deriving correct information. It is also important to understand ways to uri-s as global id-s and encoding data in both html metainfo part and in pages, using RDFa.
- 3. task: encoding uncertain knowledge. You will first have to encode a small scenario using several different systems: default logic, probabilistic logic, fuzzy logic. Second, you have to show what can you derive using one or another formalism.
- 4. tasks: building indexes. Given a small dataset(s), you will have to build several indexes. Which indexes: trie indexes for words in a text, discrimination tree indexes for generalizations of terms, path indexes for concretisations of terms. For the latter two, use the mccune paper
NB! The information in the following "this is in exam ..." is old and not relevant.
Time, place, result
Semester: spring
Grading: exam
Lectures: every Monday 16:00-17.30, room ICT-A1
Practical work: Mondays on odd weeks (2. Feb, 16 Feb, ...) 17:45-19:15 room ICT-403
Practical work will give 40% and exam 60% of points underlying the final grade.
The exam will consist of several small excercises.
Focus
The main focus of the course is on KR (knowledge representation): how to represent nontrivial information in programs and databases, how to build and use indexes for efficient search through large sets of knowledge. Check out this description.
The course contains four blocks built on each other:
- Background and basics. Representing simple facts.
- Representing rules.
- Time, planning and uncertain knowledge.
- Indexes and search.
Books to use:
- Ca half of the course themes are covered in this book with freely accessible pdf-s.
- The freely accesible pdf of the handbook if knowledge representation gives a detailed and thorough coverage of the subject; far more than necessary for the course.
- Interesting to browse: recent conference proceedings
- Interesting to browse: course materials, course materials
Observe that a noticeable part of the course contents are not covered by these books: use the course materials and links to papers, standards and tutorials provided.
Practical work
There are three labs: the first two are obligatory, third is optional. The labs have to be presented to the prof and all students present at labwork.
First lab 2015
The goal of the first lab is to write a software system able to scrape factual raw data about an address (like "Akadeemia tee 12, Tallinn") given to the program.
Basically, search for web pages containing the name or similar names and extract relevant words from the list you create: words closer to the name and with more occurrences are also more important/better match. Titles/headlines above the word are also more likely to be important. It may be a good idea to do some searches together with interesting keywords already. Also, whenever you do a search / pull a page, it is a good idea to store the search result / html source in a file to avoid exchausting your search quota and just wasting bandwidth and time,
It makes sense to search adresses in several ways: googling, binging plus searching from known real estate portals.
Deadline: 16. March (after this there will be a penalty).
See also notes for KR lab 1.
Second lab 2015
The goal of the second lab is to write and use rules to categorize/tag addresses according to the data obtained during the first lab. An address should get a number of tags with numerical indicators showing our trust in that the tag really applies to the address, plus (sometimes) a number indicating the degree to which the tag applies.
See details for KR lab 2.
Third lab 2014
This lab is optional and will simply give as many points as lab 1 or 2 towards the final result and the grade: practical wotrk 60% and exam 40%.
The goal of the lab is to use wordnet or teksaurus to create an additional ruleset and to use that in addition to your own rules in lab 2.
Lab ideas
Student ideas are also welcome: have to agree with Tanel first.
A cool idea is to investigate:
- http://threatpost.com/fbstalker-automates-facebook-graph-search-data-mining
- https://github.com/milo2012/osintstalker
Lecture block 1: basics and representing simple facts.
Lecture 1 (2. Feb) Overview of the course. Background and basics. First lab.
Lecture materials:
- Intro lecture: declarative and procedural representations
- Core logic refresher.
- Details of the first lab.
Lecture 2 (9 Feb) Programming and databases. SQL: meaning and representation of facts
- Encoding data in programming languages.
- relation of plain data in databases to logic & representing complex structures in databases
- Core ideas of non-relational databases, mostly RDF
Lecture materials:
Lecture 3 (16 Feb) HTML annotations. Microformats, microdata, RDFa
Two student presentations ca 30 min each:
- Microformats, microdata, RDFa: Madis Taimre
- Facebook open graph: Madis Allikmaa
Lecture materials:
Understand main parts:
- Google rich snippets
- wiki intro
- RDFa lite ja w3c rdfa primer
- open graph protocol (i.e. Facebook stuff)
- microformats
- Media:portaalidekoosvoime.ppt or as pdf Media:portaalidekoosvoime.pdf
Lecture 4 (2. March) RDF last parts + RDFS start
Continuing with RDF:
- Continue with the lecture material key/value pairs and rdf, compared to the relational model
- Check out http://www.w3.org/RDF/ and http://www.w3.org/standards/techs/rdf#w3c_all
- And read more from http://www.w3.org/TR/rdf11-primer/ and http://www.w3.org/TR/rdf11-new/
Starting RDFS:
- Start with the lecture material RDFS: rdf schema and as ppt
- Read RDFs wikipedia
- Check out w3c RDF schema
Choosing the student presentations from:
Lecture block 2: representing rules
Lecture 5 (9. March) Student presentations on NELL and nosql databases + intro to the second lab
Student lectures on the theme of data and knowledge extraction from the web:
- Nell and systems like Nell (are there any? Check also ConceptNet): Andrei. See some details about Nell as a nice example.
- Student presentations chosen from Nosql links and notes:
- Rait: docs.mongodb.org/manual/ (tehtud)
- Raigo: http://www.neo4j.org/ (tehtud)
Understand RDFS:
Lecture material:
- Continue with RDFS: rdf schema and as ppt
Additional details (not part of exam):
Lecture 6 (16. March) continuing RDFs and looking into other KR languages
Lecture material:
- Continue with RDFS: rdf schema and as ppt
Additional details (not part of exam):
Important KR languages:
- RuleML RuleML
- OWL OWL wikipedia, w3c OWL guide
- KIF
- CL CL main page,
- ontologies
- wordnet
- cyc
- TPTP language TPTP
Lecture 7 (23. March): OWL and ontologies
Owl background: description logics (not part of exam )
- description logic intro
- brief alternative intro to description logic
- detailed course in description logics (not necessary for exam)
Understand basics of owl:
not part of exam:
Notes about RDF and OWL: logical meaning
Start looking at interesting ontologies:
Lecture 8 (30 March): Restricted english
- Attempto restricted english as intro use this talk
- Inform system for interactive fiction, vt ka vanemat raamatut
Attempto details not necessary for the exam.
Lecture block 3: time, planning and uncertain knowledge
Rules in planning and robotics
Lecture material:
See also (not part of exam):
Logic for uncertain knowledge
- nonmonotonic logic Not necessary for exam.
- default logic For exam: main material for default logic.
For the exam: you should be able to create and solve small examples with default logic.
Fuzzy and probabilistic logic
- Uncertain_prob_fuzzy.ppt Intro.
- Vienna_tanel_2.pdf Additional examples and combining.
For the exam: understand the differences between fuzzy and probabilistic logic and be able to present small examples.
Logic of belief and knowledge
- Ijcai93.pdf Overview.
For exam: understand referential transparency and core ideas about encoding belief and knowledge. Modal logic not necessary for the exam.
Lecture block 4: indexes and search
27. April. Indexes: intro and mainstream
Traditional database indexes incl B+ tree:
Hash indexes:
- wiki.
Bitmap indexes:
- wiki.
For exam: understand the core usage scenarios and be able to create small examples.
4. May Multi-field and geoindexes, fulltext and term indexes
- A few words about both multi-field indexes and geoindexes.
- Fulltext indexes: good intro and overview
- Term indexes: mccune paper
NB! Presentation: geoindexes, 4. May, Daniel.
not part of exam:
4. May: Fancier term indexes
- Path indexes
- On from here.
11 May: Nosql indexes plus information about the exam
- Document bases
- Graph bases
NB! Indexing in Graph and Document (nosql) databases. 11. May: Vassili.
NB! Last day for the practical work presentations.
- About the exam.
18. May: guest lecture
Kalle Tomingas. Impact analysis and ontologies.