KR 2019 homework: overall task
The overall project built during the three homeworks is a small and simple system for understanding natural language (English): more concretely, a question answering system.
For example, given a text "John is a father of Mike. Mike is a father of Pete." your system should answer questions like "Is John male?" as "yes", "Who is a grandfather of Pete?" as "John" and "Is Mike female?" as "no".
Our domain: what kinds of knowledge we care about
We will focus on a limited domain of human knowledge and limited textual input and questions: basic properties of and relations between people.
Properties can be thought of as predicates like male(john), female(mary), alive(mary), has_children(mary) which can be also negated, like -female(john).
Relations can be thought of as predicates like father(john,mike), mother(mary,mike), husband(john,margaret) which can also be negated, like -father(john,margaret).
You do not have to manage to answer really complex questions about people, but the more you manage, the better.
The properties we should certainly care about (but not limited to) are
- strict: male, female, alive, dead, has children, has a university degree, is an entrepeneur.
- or fuzzy: boy, girl, rich, poor, famous, etc.
Think also, that we are not always sure of the answer. It is enough to broadly classify our confidence (if we have any grounds for the answer at all) as:
- certainly yes
- likely
- unlikely
- certainly not
The relations we should certainly care about (but not limited to) are father, mother, parent, child, grandfather, grandmother, grandparent, ancestor, bother, sister, husband, wife, friend, works for, employee of, owner of, lives in, born in year
Two kinds of input we care about
The first scenario (closed world) is obligatory. Your system should manage is that it is given
- Input text like "John is a father of Mike. Mike is a father of Pete."
- Input ruleset: we will use both
- Some rules you write yourself
- Some large ruleset(s) you download from the web and convert to the form usable by your system (a la wikidata, Yago, Conceptnet, wordnet)
- A question to answer like the initial example: "Is John male?" as "yes", "Who is a grandfather of Pete?" as "John" and "Is Mike female?" as "no".
The second scenario (open world) is part of the third, optional lab. Your system should manage is that it is not given an input text. Instead, it relies on a large dataset you download from the web (like wikidata) plus the rulesets and answers questions like "Is Donald Trump female" as "no", "Where does Donald Trump live" as either "U.S", "Washington", "Trump tower" or similar, "Is Melania Knauss a mother of Donald John Trump Jr" as "no" etc.
Obviously, you cannot answer really complicated questions based on wikidata, Yago, Conceptnet etc alone. Try to do a sensibly good job and present examples you can answer and examples you cannot answer.