org.apache.uima.tools.components
Class FileSystemCollectionReader

java.lang.Object
  extended by org.apache.uima.resource.Resource_ImplBase
      extended by org.apache.uima.resource.ConfigurableResource_ImplBase
          extended by org.apache.uima.collection.CollectionReader_ImplBase
              extended by org.apache.uima.tools.components.FileSystemCollectionReader
All Implemented Interfaces:
BaseCollectionReader, CollectionReader, ConfigurableResource, Resource

public class FileSystemCollectionReader
extends CollectionReader_ImplBase

A simple collection reader that reads documents from a directory in the filesystem. It can be configured with the following parameters:


Field Summary
static java.lang.String PARAM_ENCODING
          Name of configuration parameter that contains the character encoding used by the input files.
static java.lang.String PARAM_INPUTDIR
          Name of configuration parameter that must be set to the path of a directory containing input files.
static java.lang.String PARAM_LANGUAGE
          Name of optional configuration parameter that contains the language of the documents in the input directory.
static java.lang.String PARAM_LENIENT
           
static java.lang.String PARAM_XCAS
          Optional configuration parameter that specifies XCAS input files
 
Fields inherited from interface org.apache.uima.resource.Resource
PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_PARAM_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT
 
Constructor Summary
FileSystemCollectionReader()
           
 
Method Summary
 void close()
          Closes this CollectionReader, after which it may no longer be used.
static CollectionReaderDescription getDescription()
          Parses and returns the descriptor for this collection reader.
static java.net.URL getDescriptorURL()
           
 void getNext(CAS aCAS)
          Gets the next element of the collection.
 int getNumberOfDocuments()
          Gets the total number of documents that will be returned by this collection reader.
 Progress[] getProgress()
          Gets information about the number of entities and/or amount of data that has been read from this CollectionReader, and the total amount that remains (if that information is available).
 boolean hasNext()
          Gets whether there are any elements remaining to be read from this CollectionReader.
 void initialize()
          This method is called during initialization, and does nothing by default.
 
Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
 
Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from class org.apache.uima.resource.Resource_ImplBase
getCasManager, getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.uima.resource.ConfigurableResource
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from interface org.apache.uima.resource.Resource
getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
 

Field Detail

PARAM_INPUTDIR

public static final java.lang.String PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a directory containing input files.

See Also:
Constant Field Values

PARAM_ENCODING

public static final java.lang.String PARAM_ENCODING
Name of configuration parameter that contains the character encoding used by the input files. If not specified, the default system encoding will be used.

See Also:
Constant Field Values

PARAM_LANGUAGE

public static final java.lang.String PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified this information will be added to the CAS.

See Also:
Constant Field Values

PARAM_XCAS

public static final java.lang.String PARAM_XCAS
Optional configuration parameter that specifies XCAS input files

See Also:
Constant Field Values

PARAM_LENIENT

public static final java.lang.String PARAM_LENIENT
See Also:
Constant Field Values
Constructor Detail

FileSystemCollectionReader

public FileSystemCollectionReader()
Method Detail

initialize

public void initialize()
                throws ResourceInitializationException
Description copied from class: CollectionReader_ImplBase
This method is called during initialization, and does nothing by default. Subclasses should override it to perform one-time startup logic.

Overrides:
initialize in class CollectionReader_ImplBase
Throws:
ResourceInitializationException - if a failure occurs during initialization.
See Also:
CollectionReader_ImplBase.initialize()

hasNext

public boolean hasNext()
Description copied from interface: BaseCollectionReader
Gets whether there are any elements remaining to be read from this CollectionReader.

Returns:
true if and only if there are more elements availble from this CollectionReader.
See Also:
BaseCollectionReader.hasNext()

getNext

public void getNext(CAS aCAS)
             throws java.io.IOException,
                    CollectionException
Description copied from interface: CollectionReader
Gets the next element of the collection. The element will be stored in the provided CAS objet. If this is a consuming CollectionReader (see BaseCollectionReader.isConsuming()), this element will also be removed from the collection.

Parameters:
aCAS - the CAS to populate with the next element of the collection
Throws:
java.io.IOException - if an I/O failure occurs
CollectionException - if there is some other problem with reading from the Collection
See Also:
CollectionReader.getNext(org.apache.uima.cas.CAS)

close

public void close()
           throws java.io.IOException
Description copied from interface: BaseCollectionReader
Closes this CollectionReader, after which it may no longer be used.

Throws:
java.io.IOException - if an I/O failure occurs
See Also:
BaseCollectionReader.close()

getProgress

public Progress[] getProgress()
Description copied from interface: BaseCollectionReader
Gets information about the number of entities and/or amount of data that has been read from this CollectionReader, and the total amount that remains (if that information is available).

This method returns an array of Progress objects so that results can be reported using different units. For example, the CollectionReader could report progress in terms of the number of documents that have been read and also in terms of the number of bytes that have been read. In many cases, it will be sufficient to return just one Progress object.

Returns:
an array of Progress objects. Each object may have different units (for example number of entities or bytes).
See Also:
BaseCollectionReader.getProgress()

getNumberOfDocuments

public int getNumberOfDocuments()
Gets the total number of documents that will be returned by this collection reader. This is not part of the general collection reader interface.

Returns:
the number of documents in the collection

getDescription

public static CollectionReaderDescription getDescription()
                                                  throws InvalidXMLException
Parses and returns the descriptor for this collection reader. The descriptor is stored in the uima.jar file and located using the ClassLoader.

Returns:
an object containing all of the information parsed from the descriptor.
Throws:
InvalidXMLException - if the descriptor is invalid or missing

getDescriptorURL

public static java.net.URL getDescriptorURL()


Copyright © 2010 The Apache Software Foundation. All Rights Reserved.