Nlpsolver knowledge representation principles: draft

Allikas: Lambda




The schema predicate is "isa", used like this:

Note: no context is currently added to the "isa". 


"John is a man.":     ["isa","man","c1_John"]  
"Bears are animals.": [["isa","bear","?:S2"], "=>", ["isa","animal","?:S2"]]  ehk ["or", ["-isa","bear","?:S2"], ["isa","animal","?:S2"]] 

Note: variables are anything starting with ?:, but I use a readability-enhancing convention:
?:S subject
?:O object
?:A action
?:Tense tense (past/present)
?:Fv situation number for a given tense

        "property"   permanent (big) and temporary (angry) 


The schema predicate is "prop", with these arguments:
["prop",actual_property,object,strength of property (1 small / $generic not indicated / 3 strong), class of property (a la small bear): $generic if missing, context]

About context: 
* In case the statement holds always, use a variable for the context, like ["prop","nice","c1_John","$generic","$generic","?:Ctxt"] 
* The context structure will be extended in the future with more parameters
* The current context structure is ["$ctxt",past_pres_or_future (either "Past","Pres"),concrete_situation_number in past/present/future: separate enumerations]
"John is nice":          ["prop","nice","c1_John","$generic","$generic",["$ctxt","Pres",1]]     
"John is somewhat nice"  ["prop","nice","c1_John",1,"$generic",["$ctxt","Pres",1]]
"John is very nice"      ["prop","nice","c1_John",3,"$generic",["$ctxt","Pres",1]]

"John is a big mouse":   ["prop","big","c1_John","$generic","mouse",["$ctxt","Pres",1]]
"John is a nice mouse":  ["prop","nice","c1_John","$generic","$generic",["$ctxt","Pres",1]]

Notice that in the last example in "nice mouse" the "nice" is not considered to be class-related. Only a fixed list of property-words like "big", "small", etc
is considered to be class-related.


        "hasa"    possessions and body parts


The schema predicate is "rel2" with the first argument "have", for undetermined number of things
plus functions for single values, counted sets and measures:
  where logic_expression may contain pseudo-lambda-parameters $arg1,$arg2 etc  
where "unit_of_measurement" may be "$generic" if not relevant, and the type words are limited,
currently: heavy,light, long,shot, tall,short, wide,narrow, deep,shallow, warm,hot,cold,cost,cheap 

Example for undetermined/unmeasured number of things:
"John has a car.": 

First we make a formula:


and normalize it to 


Another example:

"Elephants have a trunk."

and normalize it (observe skolemizing the "exists ?:O1" to ["cs1","?:S2"]):

["or", ["isa","trunk",["cs1","?:S2"]], ["-isa","elephant","?:S2"]]

A full example which illustrates that for questions and conditions we cannot like pre-skolemize, but have to 
use the formula with quantifiers before the final normalization:

"John has a red car. John has a car?":

  { @question: [$def0] }


{"@logic": ["prop","red","cs2","$generic","$generic",["$ctxt","Pres",1]]},
{"@logic": ["isa","car","cs2"]},
{"@logic": ["rel2","have","c1_John","cs2",["$ctxt","Pres",1]]},
{"@logic": ["or", ["isa","car","cs3"], ["-$def0"]]},
{"@logic": ["or", ["rel2","have","c1_John","cs3",["$ctxt","Pres","?:Fv5"]], ["-$def0"]]},
{"@logic": ["or",
{"@question": ["$def0"]}

Next the functional having:


Next the countable having:

"John has three red cars":
with the main statement there normalized as


Next the measurable having:

"Nile has the length 10 kilometers" or
"The length of Nile is 10 kilometers" etc

with the main statement there normalized as


        "capability"   what can it do   (verbs?)

The schema predicates are can1 and can2:
[can1, verb_which_can_do, subject_who_can, action_or_capability_id, context]
[can2, verb_which_can_do, subject_who_can, object_of_action, action_or_capability_id, context]

and closely related predicates act1 and act2 with the same arguments, for actually doing something:
[act1, verb_which_can_do, subject_who_can, action_or_capability_id, context]
[act2, verb_which_can_do, subject_who_can, object_of_action, action_or_capability_id, context]

NB! The action verb (eat), doer/subject, context and optionally object are present as arguments, 
but location, helpers, qualities of action etc are indicated separately as properties
of the action/capability id.

Example for can1:

"John can fly":

which is normalized to


Another example for can1:

"Birds can fly." 

which is normalized to

Yet another example for can1:

"Penguins cannot fly."

which is normalized to

Example for can2:

"John can drive a car."

which is normalized to

Another example for can2:

"Bears can eat honey."

which is normalized to

["or", ["isa","honey",["cs1","?:S2"]], ["-isa","bear","?:S2"]],

Full example illustrating properties of the action/verb:

"John can fly fast. John can fly?"

{"@logic": ["prop","fast","cs2","$generic","$generic",["$ctxt","?:Tense2",1]]},
{"@logic": ["can1","fly","c1_John","cs2",["$ctxt","?:Tense3",1]]},
{"@logic": ["or", ["-$def0"], ["can1","fly","c1_John","cs3",["$ctxt","cs4","?:Fv6"]]]},
{"@logic": ["or", ["$def0"], ["-can1","fly","c1_John","?:A4",["$ctxt","?:Tense5","?:Fv6"]]]},
{"@question": ["$def0"]}

NB! There is also actually _doing_ something:

"John drove the red car":


        "comparative"  arity 3    subject bigger subject2

The schema predicate is rel2_than for non-measurable and "=", "$less", "$lesseq", "$greater", "$greatereq" for measurable:
["=", counted_measure1, counted_measure2]
where the "counter_measure" has the same structure/meaning as above for the "having" relation.

NB!! We should probably modify rel2_than to contain the somewhat/much distinction,
or add the distinction to the action id, or drop the action id.

Example for non-measurable comparison:

"John is nicer than Eve." 


which is normalized to


Example for measurable:

"The length of Nile is equal to the length of Amazon." 

which is normalized to


where probably only the last one is actually needed and the rest can be skipped.


    "partof"     membership

The schema predicate is rel2_of in combination with "part" or "rel2" in combination with "in"

NB! Maybe the action_relation_id should be dropped, or maybe some sensible use can be found?
NB! Also, maybe a special relation should be created?

Example for "rel2_of"+"part":

"Trunks are a part of an elephant."

normalized to

Example for "rel2"+"in":

"Elephants contain trunks"

which is normalized to



        "subjectto"   what happens to it (can include events)

Have not thought about it: needs work asap.


        "location"   where is it normally found

For actual location the
schema is "rel2" in combination with in  "in","on","at","near","above","under":

However, for typical location we should think a bit more, see below.


"John is in a room."


which is normalized to


NB! I propose the typical generic location to be represented like this with a low probability and blocker attached:


This latter thing is currently not properly implemented in the parser.
Alternative ideas are also welcome.

These need thought, no clear ideas yet:        
meta stuff.
mostly clear how these can connect events  (X, Y)
mostly unclear how to combine e.g. causes and property
        "causes"     causes X
        "prevents"     Y prevents doing X
        "dependency"    X requires Y
        "usedfor"      subject is used for X
        "createdby"     subject is created by X
        "madeof"        subject is made of object (substance)
        "have_goal"    subject wants to do X / X to happen

		"time"         X happens at time


Time is represented (a) in a context, (b) like location above, with words  "in","at","on","during","before","after",
plus the "$time" constructor:
where the time constructed element is used as a special typed variable:
where the "type_of_time_indicator" can be "$generic".


"On Monday, John jumped in a house."


which is normalized as



event roles
        "event_type"     stab
        "event_actor"    senators
        "event_theme"     Caesar
        "event_method"    brutally
        "event_instrument"   knife
        "event_type_modifier"  if type is go: go IN, go OUT, ...

These may need more thought, but for now we have:

* type,actor,theme are given as act1/act2 arguments, see above
* method and instrument are properties of the action id, indicated with "rel2" 
  in combination with the actual word like "with": what does the "with" mean,
  needs additional reasoning rules or procedural derivation of new facts.


"Senators stabbed Caesar with a knife in curia"

is normalized as


Observe that the fact that there were several senators should be given,
but currently is not done for that example.


These need further thinking:

event meta
		"event_parallel"    X and Y are simultaneous
		"event_after"		Y happens after X
		"event_content"		Y is subevent of X (may be broken, other mixed use in db)
special use
		"similar"	semantic similarity


I am attaching the current small ruleset I am using while debugging the parser:
it is intentionally small.

very high level commonsense rules
	transitivity of "be"
	symmetry of "similar"
	inference using taxonomy of object (can leap |- can jump)
