C++ interface between SISTA's CoreNLP processors and SwiRL.
This project is maintained by trananh
This project provides C++ integration between Mihai Surdeanu's SISTA CoreNLP processors (written in Scala) and his SwiRL system (written in C++).
The code supplies seamless serialization & deserialization of SISTA's CoreNLP documents in C++. This allows us to use SwiRL to add semantic role labels to a document previously annotated with CoreNLP, which has the advantage of using CoreNLP's POS tagger and named-entity engine to bootstrap SwiRL.
WordNet
Make sure you have WordNet installed and the environment variable $WNHOME is set to the root of the installation directory. You can download and install WordNet from here, then set $WNHOME accordingly. See example below.
$ # If you installed wordnet to /usr/local/WordNet-3.0/
$ export WNHOME='/usr/local/WordNet-3.0/'
$ echo $WNHOME
/usr/local/WordNet-3.0/
SwiRL
Make sure SwiRL is installed. Download, build, and install SwiRL from here. Alternatively if you are developing in OSX, check out SwiRL-OSX.
For common build errors, please refer to this FAQ.
Before compiling, you may need to update the installation paths of SwiRL in the
Makefile. By default, the include path for SwiRL points to
/usr/local/include/swirl
and the lib path is set to /usr/local/lib
.
If you installed SwiRL to a different location, please update the variables at
the top of the Makefile accordingly.
# SWIRL's installation
SWIRLINC = /usr/local/include/swirl
SWIRLLIB = /usr/local/lib
Once all the paths are resolved. The usual make will compile the project.
$ cd /path/to/SwiRL-NLP
$ make
The following snippet demonstrates how to read CoreNLP annotation from file, and then using SwiRL to further parse the document for role labels. It shows how one can integrate the output of CoreNLP's POS & named entity taggers into the input for SwiRL.
Alternatively, you can take a look at src/demo.cpp
.
#include <iostream>
#include <fstream>
#include <string>
#include <Swirl.h>
#include "DocumentSerializer.h"
using namespace std;
using namespace processors;
using namespace srl;
int main(int argc, char ** argv) {
// IMPORTANT: Set these paths accordingly
string swirl = "./model_swirl";
string charniak = "./model_charniak";
string filename = "/path/to/nlp-annotation.txt";
// Initialize Swirl
bool caseSensitive = true;
if (!Swirl::initialize(swirl.c_str(), charniak.c_str(), caseSensitive)) {
cerr << "Failed to initialize SRL system!\n";
exit(1);
}
// Open NLP annotation
ifstream stream(filename.c_str());
if (stream.is_open()) {
// Load annotation
Document doc = DocumentSerializer::load(stream);
// For each sentence, parse with swirl
TreePtr tree;
string text, swirlCode;
const char * line;
for (int i = 0; i < (int) doc.sentences.size(); i++) {
// form the sentence text with the necessary POS and entities
Sentence sentence = doc.sentences.at(i);
if (sentence.entities.size() == sentence.words.size()) {
if (sentence.tags.size() == sentence.words.size()) {
text = sentence.getTokenizedTextWithTagsEntities();
swirlCode = "1";
} else {
text = sentence.getTokenizedTextWithEntities();
swirlCode = "2";
}
} else {
text = sentence.getTokenizedText();
swirlCode = "3";
}
cout << text << endl << endl;
text = swirlCode + string(" ") + text;
line = text.c_str();
// classify all predicates in this sentence
tree = Swirl::parse(line);
// dump extended Treebank format
if (tree != (const Tree *) NULL) {
tree->serialize(cout);
cout << endl << endl;
}
// dump extended CoNLL format
Swirl::serialize(tree, line, cout);
}
stream.close();
} else {
cerr << "Failed to find annotation file!\n";
exit(1);
}
return 0;
}