TextDocumentExtractor (Cognitive Foundry)

java.lang.Object
- gov.sandia.cognition.util.AbstractCloneableSerializable
- - gov.sandia.cognition.text.document.extractor.AbstractDocumentExtractor
  - - gov.sandia.cognition.text.document.extractor.AbstractSingleDocumentExtractor
    - - gov.sandia.cognition.text.document.extractor.TextDocumentExtractor

All Implemented Interfaces:

DocumentExtractor, SingleDocumentExtractor, CloneableSerializable, java.io.Serializable, java.lang.Cloneable
```
public class TextDocumentExtractor
extends AbstractSingleDocumentExtractor
```
Extracts text from plain text documents.

Since:

3.0

Author:

Justin Basilico

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`CONTENT_TYPE` The content type is "text/plain".
`static java.util.List<java.lang.String>`	`DEFAULT_TEXT_FILE_EXTENSIONS` The default set of file extensions for text files.

Constructor Summary

Constructors
Constructor and Description

TextDocumentExtractor()
Creates a new TextDocumentExtractor.

Constructors
Constructor and Description
`TextDocumentExtractor()` Creates a new `TextDocumentExtractor`.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`canExtract(java.net.URI uri)` Determines if the given file can be extracted by this extractor.
`boolean`	`canExtract(java.net.URLConnection connection)` Determines if the given file can be extracted by this extractor.
`Document`	`extractDocument(java.net.URLConnection connection)` Attempts to extract a document from the given file.

Methods inherited from class gov.sandia.cognition.text.document.extractor.AbstractSingleDocumentExtractor
extractAll, extractAll, extractAll, extractDocument, extractDocument

Methods inherited from class gov.sandia.cognition.text.document.extractor.AbstractDocumentExtractor
canExtract

Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface gov.sandia.cognition.text.document.extractor.DocumentExtractor
canExtract

- Field Detail
  - CONTENT_TYPE
```
public static final java.lang.String CONTENT_TYPE
```
    The content type is "text/plain".
    
    See Also:
    
    Constant Field Values
  - DEFAULT_TEXT_FILE_EXTENSIONS
```
public static final java.util.List<java.lang.String> DEFAULT_TEXT_FILE_EXTENSIONS
```
    The default set of file extensions for text files.
- Constructor Detail
  - TextDocumentExtractor
```
public TextDocumentExtractor()
```
    Creates a new TextDocumentExtractor.
- Method Detail
  - canExtract
```
public boolean canExtract(java.net.URI uri)
                   throws java.io.IOException
```
    Description copied from interface: DocumentExtractor
    
    Determines if the given file can be extracted by this extractor.
    
    Parameters:
    
    uri - The URI of the file to extract.
    
    Returns:
    
    True if this extractor can extract the file and false otherwise.
    
    Throws:
    
    java.io.IOException - If there is an IO error.
  - canExtract
```
public boolean canExtract(java.net.URLConnection connection)
                   throws java.io.IOException
```
    Description copied from interface: DocumentExtractor
    
    Determines if the given file can be extracted by this extractor.
    
    Parameters:
    
    connection - The connection to the file to extract.
    
    Returns:
    
    True if this extractor can extract the file and false otherwise.
    
    Throws:
    
    java.io.IOException - If there is an IO error.
  - extractDocument
```
public Document extractDocument(java.net.URLConnection connection)
                         throws java.io.IOException
```
    Description copied from interface: SingleDocumentExtractor
    
    Attempts to extract a document from the given file.
    
    Parameters:
    
    connection - The connection to the file to extract.
    
    Returns:
    
    The document extracted from the given file.
    
    Throws:
    
    java.io.IOException - If there is an IO error.

Class TextDocumentExtractor

Field Summary

Constructor Summary

Method Summary

Methods inherited from class gov.sandia.cognition.text.document.extractor.AbstractSingleDocumentExtractor

Methods inherited from class gov.sandia.cognition.text.document.extractor.AbstractDocumentExtractor

Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable

Methods inherited from class java.lang.Object

Methods inherited from interface gov.sandia.cognition.text.document.extractor.DocumentExtractor

Field Detail

CONTENT_TYPE

DEFAULT_TEXT_FILE_EXTENSIONS

Constructor Detail

TextDocumentExtractor

Method Detail

canExtract

canExtract

extractDocument