Experiment with natural language rules, ChatGPT and ProofWriter

Allikas: Lambda

This lab option builds upon the ChatGPT work in the second lab, going into more details/systems.

Your goal is to experiment with both the powerful ChatGPT and the much weaker ProofWriter systems.

The experiments you have to conduct is to use the rules developed in the second lab plus a selection of relevant facts from the first and second lab to answer the question posed in natural language: find a number of example cases and general principles for all three:

  • The examples and types of formulations/questions the systems can answer correctly (but should, using the facts/rules you give)
  • The examples and types of questions the systems answer incorrectly.
  • The examples and types of questions the systems fail to answer.

Do not just give one example for each subtask above: an optimal number is around 10.

Importantly, the sets of rules and facts you give should be sufficient for actually answering the question, without additional knowledge.

Notice that ChatGPT has a much higher chance of finding correct answers than ProofWriter: for the former your main problem is finding rulesets/questions where it should answer but fails, while for the latter your main problem is finding rulesets/questions where it actually succeeds.

Observe that ChatGPT has a huge amount of world knowledge, while ProofWriter has very little. Thus you really need to provide detailed rules for ProofWriter, while ChatGPT has some internal knowledge of your rules (and maybe some of the facts) already.

In order to make your rules really count for ChatGPT, invent nonexistant words replacing the natural language words in your facts, rules and questions. For example, instead of "If a person was born in a city X and the city X is in a country Y, then the person X was born in a country Y", write a la "If a fooxer was greimed in a gream X and the gream X is in a fiiz Y, then the fooxer X was greimed in a fiiz Y".

For ChatGPT please also experiment with rules given in logic directly: maybe it can use these as well?

Useful observations to keep in mind and use:

  • Neither system survives a huge input text: thus you cannot give them the whole dataset, but you need to filter out a small subset of facts which should be sufficient for answering the query.
  • ChatGPT is more prone to errors if you give it a larger set of facts and rules, possibly not relevant to question, interleaved with relevant facts and rules. In this case it will often focus on useless rules and facts and fail.
  • Most importantly for ChatGPT: give questions which should not be correctly answerable by "rough guess", like "Was Fritz Weidenbach born in Germany?", "Was Jaan Tatikas born in Estonia?", because it might guess the answer based on the typical features or common practice, like the name sounding german or estonian, etc. Better use ID-s instead of names, etc.

The result of your task should be a report:

  • Indicating for all the three subtasks above several suitable examples along with the general principles you observed while experimenting.
  • Giving a brief overview of your experimentation process: how you started, what did you observe, what did you change, etc.