tikara.util package#

Submodules#

tikara.util.java module#

Java and JVM utilities mostly focused on I/O operations.

tikara.util.java.get_jars() list[Path][source]#

Get path to bundled Tika JAR file(s).

Parameters:

tika_version (str) – The version of Tika to use.

Returns:

The list of paths to the Tika JAR file(s) to be included in the JVM classpath.

Return type:

list[Path]

tikara.util.java.initialize_jvm(tika_jar_override: Path | None = None, extra_jars: list[Path] | None = None) None[source]#

Initialize the JVM.

Tries to start the JVM with the Tika JAR file(s) in the classpath. If the JVM is already started, checks if the Tika JAR file(s) are in the classpath.

tikara.util.java.input_stream_as_binary_stream(java_input_stream: InputStream) BinaryIO[source]#

Convert a Java InputStream to a Python binary stream.

Parameters:

java_input_stream (InputStream) – The Java InputStream to convert.

Returns:

The Python binary stream that reads from the Java InputStream.

Return type:

BinaryIO

tikara.util.java.input_stream_to_file(input_stream: InputStream, output_file: Path) Path[source]#

Stream the contents of a Java InputStream to a file.

Parameters:
  • input_stream – Java InputStream to read from

  • output_file – The file to write the contents to. The file will be overwritten.

Returns:

The path to the output file.

Return type:

Path

tikara.util.java.output_stream_or_reader_stream_to_file(source: Reader | ByteArrayOutputStream, output_file: Path) Path[source]#

Stream the contents to a file.

Parameters:
  • source – Either a Java Reader or ByteArrayOutputStream to read from.

  • output_file – The file to write the contents to. The file will be overwritten.

Returns:

The path to the output file.

Return type:

Path

tikara.util.java.output_stream_to_reader(java_output_stream: ByteArrayOutputStream) Reader[source]#

Convert a Java ByteArrayOutputStream to a Java Reader.

Parameters:

java_output_stream (ByteArrayOutputStream) – The Java output stream containing data

Returns:

A Java Reader that can read the output stream’s contents

Return type:

Reader

tikara.util.java.read_to_string(source: Reader | ByteArrayOutputStream) str[source]#

Read content into a Python string.

Parameters:

source – Either a Java Reader or ByteArrayOutputStream

tikara.util.java.reader_as_binary_stream(source: Reader | ByteArrayOutputStream) BinaryIO[source]#

Convert a Java Reader or ByteArrayOutputStream to a Python binary stream.

Parameters:

source – Either a Java Reader or ByteArrayOutputStream to convert.

Returns:

The Python binary stream that reads from the source.

Return type:

BinaryIO

tikara.util.misc module#

Miscellaneous utility functions.

tikara.util.tika module#

Collection of utility function and classes for interacting with the underlying Apache Tika library.

Module contents#

Collection of utility classes and functions.