tikara.util package#
Submodules#
tikara.util.java module#
Java and JVM utilities mostly focused on I/O operations.
- tikara.util.java.get_jars() list[Path] [source]#
Get path to bundled Tika JAR file(s).
- Parameters:
tika_version (str) – The version of Tika to use.
- Returns:
The list of paths to the Tika JAR file(s) to be included in the JVM classpath.
- Return type:
list[Path]
- tikara.util.java.initialize_jvm(tika_jar_override: Path | None = None, extra_jars: list[Path] | None = None) None [source]#
Initialize the JVM.
Tries to start the JVM with the Tika JAR file(s) in the classpath. If the JVM is already started, checks if the Tika JAR file(s) are in the classpath.
- tikara.util.java.input_stream_as_binary_stream(java_input_stream: InputStream) BinaryIO [source]#
Convert a Java InputStream to a Python binary stream.
- Parameters:
java_input_stream (InputStream) – The Java InputStream to convert.
- Returns:
The Python binary stream that reads from the Java InputStream.
- Return type:
BinaryIO
- tikara.util.java.input_stream_to_file(input_stream: InputStream, output_file: Path) Path [source]#
Stream the contents of a Java InputStream to a file.
- Parameters:
input_stream – Java InputStream to read from
output_file – The file to write the contents to. The file will be overwritten.
- Returns:
The path to the output file.
- Return type:
Path
- tikara.util.java.output_stream_or_reader_stream_to_file(source: Reader | ByteArrayOutputStream, output_file: Path) Path [source]#
Stream the contents to a file.
- Parameters:
source – Either a Java Reader or ByteArrayOutputStream to read from.
output_file – The file to write the contents to. The file will be overwritten.
- Returns:
The path to the output file.
- Return type:
Path
- tikara.util.java.output_stream_to_reader(java_output_stream: ByteArrayOutputStream) Reader [source]#
Convert a Java ByteArrayOutputStream to a Java Reader.
- Parameters:
java_output_stream (ByteArrayOutputStream) – The Java output stream containing data
- Returns:
A Java Reader that can read the output stream’s contents
- Return type:
Reader
- tikara.util.java.read_to_string(source: Reader | ByteArrayOutputStream) str [source]#
Read content into a Python string.
- Parameters:
source – Either a Java Reader or ByteArrayOutputStream
- tikara.util.java.reader_as_binary_stream(source: Reader | ByteArrayOutputStream) BinaryIO [source]#
Convert a Java Reader or ByteArrayOutputStream to a Python binary stream.
- Parameters:
source – Either a Java Reader or ByteArrayOutputStream to convert.
- Returns:
The Python binary stream that reads from the source.
- Return type:
BinaryIO
tikara.util.misc module#
Miscellaneous utility functions.
tikara.util.tika module#
Collection of utility function and classes for interacting with the underlying Apache Tika library.
Module contents#
Collection of utility classes and functions.