Pitfalls of Processing a Stream from an External Program

How to design a standalone program that produces a big amount of binary data, and what are the pitfalls of the approach?


A good example is a file converter (images, mp3s, documents, etc).

Standalone Producer Application

There is many ways how to create a standalone application and one of the easiest and the most straight-forward approaches is Spring-Boot (pom.xml):

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <!-- Your own application should inherit from spring-boot-starter-parent -->
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.7.RELEASE</version>
    </parent>         
    <groupId>cz.net21.ttulka.eval</groupId>
    <artifactId>StandaloneBytesProducer</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
            <version>1.2</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.16.18</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

StandaloneBytesProducerApplication.java:

package cz.net21.ttulka.eval.bytesproducer;

import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import lombok.extern.apachecommons.CommonsLog;

@SpringBootApplication
@CommonsLog
public class StandaloneBytesProducerApplication implements ApplicationRunner {
    @Override
    public void run(ApplicationArguments args) {
        log.info("StandaloneBytesProducerApplication started.");
        try {
            int bytesAmount = 1000;
            if (args.containsOption("bytes")) {
                bytesAmount = Integer.parseInt(args.getOptionValues("bytes").get(0));   
            }
            for (int i = 0; i < bytesAmount; i++) {    
                System.out.write(i % Byte.MAX_VALUE);   // we're writing on the standard output stream
            }
        } catch (Exception e) {
            log.error("Unexpected error.", e);
            System.exit(1);
        }
        System.exit(0);
    }
    public static void main(String[] args) throws Exception {
        SpringApplication.run(StandaloneBytesProducerApplication.class, args);
    }
}

Compile it and run:

mvn clean package
mvn spring-boot:run

It looks good. Of course a consumer will run it direct from a JAR:

java -jar target\StandaloneBytesProducer-1.0.0-SNAPSHOT.jar

The result is what we expected, Spring Boot ASCII logo, some log messages and our bytes stream.

And this is exactly one pitfall because all this junk destroys our result, actually all and only we need is the bytes stream.


Spring Boot uses it own logging (based on commons-logging) hidden in the artifact spring-boot-starter-logging. To get rid of it we can exclude this artifact from the build:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter</artifactId>
    <exclusions>
        <exclusion>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-logging</artifactId>
        </exclusion>
    </exclusions>
</dependency>

When we now run the program, the log messages look different. After excluding the Spring Boot logging the commons-logging uses its default fall-back implementation SimpleLog.

SimpleLog then sends all messages, for all defined loggers, to stderr. We can prove it by forwarding the standard output into a file:

java -jar target\StandaloneBytesProducer-1.0.0-SNAPSHOT.jar > out.dat

Indeed, the log messages are still written in the console and the file includes only the Spring Boot logo and our bytes.

To get rid of the logo is easy, just put the application.yml into the resources directory:

spring:
  main:
    banner-mode: "off"

Now the standard output contains only the result bytes. It's time to implement a consumer...

Standalone Consumer Application

Consumer could be done in the same manner, this time we don't case about logging much (pom.xml):

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <!-- Your own application should inherit from spring-boot-starter-parent -->
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.7.RELEASE</version>
    </parent>         
    <groupId>cz.net21.ttulka.eval</groupId>
    <artifactId>StandaloneBytesConsumer</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
            <version>1.2</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.16.18</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project> 

Important pitfall here to be aware about: all (stdout, stderr) the streams must be consumed. If you forget to consume the stderr stream the program will freeze forever.

The error log can be either consumed and forgotten or consumed and print into the log:

package cz.net21.ttulka.eval.bytesconsumer;

import java.io.IOException;
import java.io.InputStream;
import java.util.Scanner;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import lombok.extern.apachecommons.CommonsLog;

@SpringBootApplication
@CommonsLog
public class StandaloneBytesConsumerApplication implements ApplicationRunner {
    @Override
    public void run(ApplicationArguments args) {        
        String pathToJar = System.getProperty("PATH_TO_JAR");
        log.info("StandaloneBytesConsumerApplication started: " + pathToJar);
        ProcessBuilder builder = new ProcessBuilder("java", "-jar", pathToJar);        
        try {
            Process process = builder.start();
            processErrors(process.getErrorStream());            
            processStream(process.getInputStream());            
        } catch (Exception e) {
            log.error("Unexpected error.", e);
            System.exit(1);
        }
        System.exit(0);
    }
    private void processStream(InputStream stream) throws IOException {
        int b;
        while ((b = stream.read()) != -1) {
            // TODO do something with the stream
        }
        stream.close();
    }
    private void processErrors(final InputStream in) {
        new Thread(new Runnable() {
            @Override
            public void run() {
                int logLevel = 3;   // 0 - ERROR, 1 - WARN, 2 - INFO, 3 - DEBUG
                Scanner scanner = new Scanner(in);
                while (scanner.hasNextLine()) {
                    String line = scanner.nextLine();
                    if (line.startsWith("ERROR") || line.startsWith("FATAL")) {
                        logLevel = 0;
                    }
                    if (line.startsWith("WARN")) {
                        logLevel = 1;
                    }
                    if (line.startsWith("INFO")) {
                        logLevel = 2;
                    }
                    if (line.startsWith("DEBUG") || line.startsWith("TRACE")) {
                        logLevel = 3;
                    }
                    switch (logLevel) {
                        case 0:
                            log.error(line);
                            break;
                        case 1:
                            log.warn(line);
                            break;
                        case 2:
                            log.info(line);
                            break;
                        default:
                            log.debug(line);
                            break;
                    }
                }
            }
        }).start();
    }
    public static void main(String[] args) throws Exception {
        SpringApplication.run(StandaloneBytesConsumerApplication.class, args);
    }
} 

Compile and run it:

mvn clean package
mvn spring-boot:run -DPATH_TO_JAR=..\StandaloneBytesProducer\target\StandaloneBytesProducer-1.0.0-SNAPSHOT.jar

Source codes: StandaloneBytesProducer and StandaloneBytesConsumer

Happy byting!