Tuesday, October 14, 2014

Training Name Entity Recognizer model using Open NLP

Name Entity Recognizer means it extract some Name Entities like ( Person , Location , Traffic Level ) tags from sentences automatically. 
First you have to download Open NLP 



First you have to tagged your Name Entities in a sentence properly and save them in text file.

Here is an small example for location
RT moogater: heavy traffic in <START> wellawatta <END> & <START> dehiwala <END> . RT PasinduJay: Traffic Jam in <START> Kandy <END> RT nalintharanga: Heavy traffic in <START> Dehiwela <END> Avoid RT maheshnegombo: traffic in <START> weyangoda <END> (new bridge) towards <START> negombo <END> .



the you can use Open NLP java apies to train a model and save it.

Here is the code that I used to train my model. you have to import all the four Open NLP jar files into your project.


import java.io.File;
import java.io.FileOutputStream;
import java.util.Collections;

import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSampleDataStream;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.featuregen.AdaptiveFeatureGenerator;


public class trainModel {

 /**
  * Training location model
  * @param args
  */
 public static void main(String[] args) {
  // load trained data into memory

  File inFile = new File("location new");
  // create NameSampleDataStream
  // converts tagged strings from trained data into NameSample objects
  // populated in next step

  NameSampleDataStream nss = null;
  try {
   nss =
         new NameSampleDataStream(
                                  new PlainTextByLineStream(new java.io.FileReader(inFile)));
  } catch (Exception ex) {
   System.out.println(ex.getMessage());
  }
  // create "title" model
  TokenNameFinderModel model = null;
  int iterations = 300;
  int cutoff = 5;
  try {
   model =
           NameFinderME.train("en", // language of the training data
                              // type of model
                              "location",
                              // the NameSample collection,created
                              // above
                              nss, (AdaptiveFeatureGenerator) null,
                              Collections. emptyMap(), iterations, cutoff);
  } catch (Exception ex) {
   System.out.println(ex.getMessage());
  }
  // save the model to disk
  // used in testing and production
  File outFile = null;
  try {
   outFile = new File("en-location.bin");
   FileOutputStream outFileStream = new FileOutputStream(outFile);
   model.serialize(outFileStream);
  } catch (Exception ex) {
   System.out.println(ex.getMessage());
  }
 }
}


1 comment:

  1. i got error on this line PlainTextByLineStream(new java.io.FileReader(inFile)));

    ReplyDelete