Thanks to a great presentation put together by Chuck I have a clearer understanding of what UIMA is all about.
- UIMA is not just Apache UIMA. UIMA stands for Unstructured Information Management Architecture and is, in fact, a standard.
- UIMA can handle more than just text documents. It works just fine on audio and video.
- The most common implementation is Apache UIMA.
- UIMA works around configurable workflows (including conditional workflows) in which documents are passed to Annotators. Each annotator can examine the document and any annotations already attached to it, and then appends its own annotations.
- The order in which annotators execute is therefore important, because an annotator may depend on the output of another annotator.
- An annotator can annotate whatever it deems important and is free to ignore the rest of the record.
- This workflow is fed from a “factory” that can generate documents however it deems appropriate: crawling the web, reading a database, reading files… the “reading documents” part of the workflow is isolated.
- The end product of the workflow is consumed by a consumer class that can do whatever it wants with the annotations. Typical choices are to write them to a file or insert them into a database.
There, that’s UIMA in a nutshell. For our purposes we’re interested in two UIMA pipelines: medKAT/P and cTAKES.
Advertisement