Details for KR lab 2
The main goal of the second lab is to extend person data obtained during the first, web-scraping lab, using rules and a reasoning engine.
You can do the lab either alone or together with a fellow student. Three students per lab is not ok, though.
Technology - language, op system etc - is completely free EXCEPT that you have to use a full first order logic reasoning engine. It is suggested to use Otter version 3.3 as a reasoning engine, although it can be interesting to find/experiment with a more modern and more powerful engine.
Ideally you should create both (a) your own rules for improving data / deriving new data and (b) rules from wordnet taxonomy to obtain generalisations from derived facts.
The lab is graded considering (see below for details):
- your own rules: how good/sensible they are and what results you get for a selected domain of objects
- quality and interestingness of the overall results
What should the app do
Your system should take as input data obtained during the first lab for some address and it should derive new facts, augmented with the confidence number.
The new facts for the object have to be derived using Otter and rules of your own making.
As a result the full app should
- take a phrase like "Ehitajate tee 50" as input
- scrape raw data from the web for the phrase (use the first lab you have completed)
- derive new data
- output the full set of data, including the derived and the raw data, with confidence numbers and the indicator raw/derived for each fact.
- Filter/sum data to determine the categories and possible additonal information snippets for the address.
Subtasks of the lab
The lab is essentially split into these subtasks:
- the main task: create the ruleset suitable for your data and the reasoner
- format the output of your first task so that it is suitable for the reasoner: basically the correct syntax.
- create the input file for the reasoner: just compose it from a header, data block, rule block and a footer.
- run the reasoner and send output to a file.
- filter out the derived facts from the reasoner output.
- present the full resulting dataset.
- write a program filtering/summarising the data to determine whether a certain ad is suitable or which kinds of ads are suitable.
Creating the ruleset
The main task of the lab is the ruleset creation. This requires creative thinking and experimentation with the reasoner. The ruleset should either improve the confidence of already derived facts by combining them and/or derive completely new facts.
Start by downloading and installing Otter version 3.3 and experiment a bit.
It is a good idea to use these examples containing suitable settings:
A new example:
Two old examples for people:
Use the Otter manual for additional details and settings.
The ruleset, as said, should contain two parts: