Mine That Record

February 4, 2010

UIMA

Filed under: Classification,General musings — Dr. H @ 3:29 pm

Thanks to a great presentation put together by Chuck I have a clearer understanding of what UIMA is all about.

  1. UIMA is not just Apache UIMA. UIMA stands for Unstructured Information Management Architecture and is, in fact, a standard.
  2. UIMA can handle more than just text documents. It works just fine on audio and video.
  3. The most common implementation is Apache UIMA.
  4. UIMA works around configurable workflows (including conditional workflows) in which documents are passed to Annotators. Each annotator can examine the document and any annotations already attached to it, and then appends its own annotations.
  5. The order in which annotators execute is therefore important, because an annotator may depend on the output of another annotator.
  6. An annotator can annotate whatever it deems important and is free to ignore the rest of the record.
  7. This workflow is fed from a “factory” that can generate documents however it deems appropriate: crawling the web, reading a database, reading files… the “reading documents” part of the workflow is isolated.
  8. The end product of the workflow is consumed by a consumer class that can do whatever it wants with the annotations. Typical choices are to write them to a file or insert them into a database.

There, that’s UIMA in a nutshell. For our purposes we’re interested in two UIMA pipelines: medKAT/P and cTAKES.

Advertisement

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Theme: Rubric. Blog at WordPress.com

Follow

Get every new post delivered to your Inbox.