KR 2019 homework part 1: overall task

Sisukord

1 Task
2 Technology and tools
3 Popular toolkits for NLP
- 3.1 Web APIs

Task

The end result of the lab is a presentation: ca five minutes during which you explain:

which systems you experimented with
which gave useful output from your input
examples of output from your input (show actual textual examples)

The task of the first lab is to build and demonstrate a system which can

tokenize text (split any text into words: trivial)
run a NER tool on the text giving wikipedia url-s as output. See webaida. Notice that:
- some NER-s do not give wiki URL-s, but just types. This is also useful.
- for the second overall scenario (like wikidata) we do need wiki urls, though.
run and experiment with a (recommendably) pre-trained word vectorization system like word2vec: as an example, see this tutorial. We will come back to that.
run and experiment with parser tool on the text. There are several ways to parse, some more useful for our purposes than others. We will come back to that.

You do not have to be wildly successful with all these subtasks to pass. You will, though, have to try out each of these subtasks and report on how it went (got ok results, got bad results, could not get it to work, ...)

The second lab will build upon the results here. In particular, some choices in the second lab depend on what you managed to do with ner, word2vec and parsers.

Technology and tools

You are free to use any programming language, but Python is recommended.

You are free to use NLP tools and API-s and datasets, but your program should drive them from the beginning to end (ie your program takes input files, calls tools, modifies and prints output). It is better to use the tools than not to use them!

Popular toolkits for NLP

The recommendation is to take the first (spacy) or the second (nltk) toolkit.

Spacy toolkit for Python
NLTK: the main Python toolkit, see also this tutorial
Google SyntaxNet (see in github)
CoreNLP: the main Stanford NLP tool in the context of a larger set of Stanford NLP toolkits like stanford NER etc and this NER tutorial
Pattern toolkit for Python
opennlp
PyNLP for Python
NER tutorial for Linux in the context of a larger practical tutorial

Web APIs

Google cloud natural language API
opencalais (free registration required)

KR 2019 homework part 1: overall task

Sisukord

Task

Technology and tools

Popular toolkits for NLP

Web APIs

Navigeerimismenüü

Personaalsed tööriistad

Nimeruumid

Variandid

vaatamisi

Veel

Otsing

Navigeerimine

Kasulikku

Tööriistad