KR 2021 homework part 1

Allikas: Lambda

The goal of the lab is to explore and prepare data for the question answering system to be built in the following labs. We will later modify, filter and format this data to make it suitable for simple commonsense reasoning / question answering.

Due date 1.04.2021 -- on that day we will have presentations for each group and if some remain, they will be able to do present during the next practice session.


Yago for geodata

As a source for geographical data we use yago (here is their old page). Investigate parts of it and download from the download page (again, the old download page is here). The second important taxonomy source is wordnet: Yago contains some (?) of it, but you can use wordnet directly in your system.

Yago is large. You do not need to incorporate all of Yago in your database: the geographical facts for say, one country, and taxonomies are enough.

Your task is to

  • select a sensible sub-part of geographical facts in Yago (like, one country) and store it in the SQL database, using both standard SQL and, where it seems useful, json. Postgresql is the best option due to special capabilities of handling json. Sqlite is the second best option, due to simplicity of use.
  • select a sensible sub-part of Yago taxonomies and store it in the same SQL database.
  • investigate whether your yago taxonomy contains relevant parts of wordnet: if not, incorporate also wordnet into your database.
  • perform some sample queries to verify that you can actually find information: try searching for both simple facts and also using taxonomies for searching for more abstract concepts.

Quasimodo for basic commonsense knowledge

As a source for basic commonsense knowledge we explore and possibly use quasimodo. Beware: it does not cover very much and contains a lot of weird statements, like "estonia, has_color, yellow".

In particular, please

  • Find out if the dataset contains meaningful amount of information about the geographical places (or additional information about the facts about these places) for the sub-part you chose from Yago.
  • If yes, please store the relevant meaningful information part in the SQL database, using both standard SQL and, where it seems useful, json.

Conceptnet for basic commonsense knowledge

Similarly to Quasimodo, explore, and if it makes sense, use conceptnet. This contains a lot more than Quasimodo, but does not have as good metainformation (context, plausibility, etc).

In particular,

  • Find out if the dataset contains meaningful amount of information about the geographical places (or additional information about the facts about these places) for the sub-part you chose from Yago.
  • If yes, please store a significant amout of relevant meaningful information part in the SQL database, using both standard SQL and, where it seems useful, json. Since Conceptnet is big, do not attempt to store all the seemingly relevant information.