Experiment with neurosymbolic reasoning: LLM as a parser plus a symbolic reasoner for images: 2026

Allikas: Lambda

The task here is to build a prototype neurosymbolic pipeline for either (a) finding an image based on the criteria given in natural language or (b) verifying whether a given image corresponds to a criteria given in natural language.

A real large hypothetical system could use the symbolic query converted to a database query for quickly finding images from a very large collection, like google photos does.

The pipeline

The way you should do this is to build a following pipeline:

(a) Use an LLM to take English or Estonian as input and build a query in logic, which a solver can later use.

(b) Create an input problem/task file by concatenating

    • Logical description of objects/relations on one image or several images: the stuff you built in the second lab.
    • Any rules for operating/generalizing upon such description: again the stuff you built in the second lab.
    • The logic-form question built by the LLM.

(c) Actually run a solver/prover and give expected answers like "Found image 2" / "True" / "Unknown".

NB! It is extremely hard to get such a pipeline to run with a high quality. Your task is to build a simple prototype which works for a few rather simple cases. If you cannot really get the pipeline to work, you can also just do part (a) and then put the pipeline together as a manual process, ie compose the full problem/task file by hand and then give it to https://logictools.org. This will be graded lower than real automatic pipeline, but is still acceptable. It is also a good idea to code the stuff together with the help of LLM: this is not penalized, but rather recommended. You need to understand the code, though.

How is the lab graded

  • Being able to find an image is graded higher than only verifying whether one single image corresponds to a query.
  • Building an actual automated pipeline is graded higher than doing most of the parts manually.
  • A higher range of complex examples/queries/rules is graded higher than a simplistic input.

So, in the worst/simplest case this lab can be done via a browser (no strict need to install anything or build code), but this will lead to a lower grade.

Notes and recommendations

  • Start small: first build a manual "pipeline" for checking whether one image satisfies a natural language criteria. Once this works, automate parts.
  • Use the format you used for the first and second labs. Instruct the LLM to use this format (these predicates with this meaning). Give examples using these predicates. You do not want to have LLM first describing a blue lamp as "has_color(lamp1,blue)" and later pose a question containing "color(lamp1,blue)" or "has_color(blue,lamp1)"
  • Unless you use gk, it is very hard to get "no" as an answer. It is OK to either get a positive answer or "unknown". It is also hard to handle numbers and sets: do not bother. Similarly, hard to get answers to negative questions like "an image without a chair": do not bother.
  • The main issue is getting the LLM to output an OK query. Start by building some queries by hand and experimenting with them until they actually work with the solver and all the input. Only then start working with the LLM for query-building. You need to (a) give LLM instructions of what and how it should do, (b) give it a number of examples. You will probably also need to (sometimes) programmatically clean up the LLM output (remove json wrapping, balance parentheses).
  • As a high-end example of all of this, see https://github.com/tammet/nlpsolver/tree/main/llmpipe . You may, but do not need to run it. Look at the large prompts in the prompts/ subfolder: this does parsing in two stages, but do not go into such complexities. Look briefly at stage 1 instructions and more the stage 2 instructions and examples to get inspiration/ideas. Your prompt should be much much simpler.
  • For a prover/solver you are allowed to choose freely. If you want to experiment with a (probably) strongest existing commonsense solver, try gk (and also look at llmpipe). For a basic first-order logic prover it is an OK idea to take some release of gkc: the old release v0.6 is probably fine. But you can search for and use something completely different. A cool idea is to use problog.

Finally

Create several examples which work and several which do not. Finally prepare a small presentation of what you did, what worked out and what not.