>In this entry I set out to show how a BufferedOutputStream can improve performance over an OutputStream.
Somebody presented a challenge "Fastest way to find all the prime numbers up to 1 billion without exceeding 64mb of ram". So I wrote a version of the Sieve Of Eratosthenes. To meet the 64megs of ram criteria I wrote a version of the sieve that uses chunks of the sieve and writes prime numbers to a file at intermediate steps. The more chunks, the less ram the program needs. I set the program to run so that it would only use one large chunk to find all of the prime numbers.
When the program was running I noticed it was taking a long time, but it was writing prime numbers relatively soon. That meant that it had found all of the prime numbers and it was just being held up on the io. I had assumed the io would be fast because I was writing each prime number as four bytes using a DataOutputStream.
DataOutputStream stream = new DataOutputStream( new FileOutputStream(outfile,true) ); for(int i = 0; i<new_primes; i++){ stream.writeInt(working[i]); } stream.close();
The implementation ended up taking almost 3 hours to run. A majority of that time was spent reading and writing. To put this in context it was taking about 3 minutes to write a 200meg file.
I tried using an actual output stream and to do my own int->byte[], I tried getting the output stream from the .nio package but the main thing that helped was to use a buffered writer.
DataOutputStream stream = new DataOutputStream( new BufferedOutputStream( Files.newOutputStream( Paths.get(outfile), StandardOpenOption.APPEND ) ) );
The only real change that I made was to wrap the OutputStream in a BufferedOutputStream and it changed my implementation from nearly 3 hours to run down to 3 minutes. Writing a 200meg file went from 3 minutes down to 1.3 seconds.
If you are wondering why this is it is because of the number of system calls. From the javadoc for buffered output stream:
"""
By setting up such an output stream, an application can write bytes to the underlying output stream without necessarily causing a call to the underlying system for each byte written.BufferedOutputStream
By using a loop that writes 4 bytes per int we end up making a system call on every iteration. The buffered writer uses a byte[] to buffer the data, so there are less system calls. The number of system calls will manifest itself in two ways: the size of the buffer can influence the speed and or we could use our own buffering strategy and possibly speed up the process.
Here is an example of three different ways to write int[] to a file as bytes: i) Using a BufferedOutputStream, ii) Writing a single byte[] to the file, and iii) Writing each int using an un-buffered DataOutputStream. The differences are large enough that System.currentTimeMillis().
The output for the following program (found here for a formatted version.)
buffered: 4854 one chunk: 1343 non-buffered: 342481
The output from time is:
real 5m48.912s user 1m30.400s sys 4m18.530s
This demonstrates that a majority of the time is spent making system calls. My "buffering strategy" is actually just creating one large buffer and writing the whole byte array.
tl;dr; The reason to use a BufferedOutputStream instead of an OutputStream is to reduce the number of system calls.
import java.io.*; import java.nio.*; import java.nio.file.*; import java.nio.charset.*; public class CompareBuffered{ public static void main(String[] args){ int[] values = new int[50000000]; long start, end; start = System.currentTimeMillis(); writeBuffered(values); end = System.currentTimeMillis(); System.out.println("buffered: " + (end - start)); start = System.currentTimeMillis(); writeChunk(values); end = System.currentTimeMillis(); System.out.println("one chunk: " + (end - start)); start = System.currentTimeMillis(); writeNoBuffer(values); end = System.currentTimeMillis(); System.out.println("non-buffered: " + (end - start)); } /** * This will write the bytes of an integer array to a file. It uses a * data output stream w/out a buffered stream. * * @param values the array of ints that will be written to a file. * **/ static void writeNoBuffer(int[] values){ try{ DataOutputStream stream = new DataOutputStream( Files.newOutputStream(Paths.get("non-buffered.bin")) ); for(int i = 0; i<values.length; i++){ stream.writeInt(values[i]); } stream.close(); } catch(Exception e){ e.printStackTrace(); System.exit(1); } } /** * Writes the bytes of an integer array, this uses a buffered output * stream. * * @param values the array of ints that will be written to a file. * **/ static void writeBuffered(int[] values){ try{ DataOutputStream stream = new DataOutputStream( new BufferedOutputStream( Files.newOutputStream(Paths.get("buffered.bin")) ) ); for(int i = 0; i<values.length; i++){ stream.writeInt(values[i]); } stream.close(); } catch(Exception e){ e.printStackTrace(); System.exit(1); } } /** * Writes the bytes of an integer array, this uses a non-buffered * output stream, but it writes all of the data in one array of bytes. * * @param values the array of ints that will be written to a file. * **/ static void writeChunk(int[] values){ byte[] bytes = new byte[4*values.length]; for(int i = 0; i<values.length; i++){ for(int j = 0; j<4; j++){ bytes[4*i + j] = (byte)(values[i]>>(8*(3-j))); } } try{ OutputStream stream = Files.newOutputStream(Paths.get("chunk.bin")); stream.write(bytes); stream.close(); } catch(Exception e){ e.printStackTrace(); System.exit(1); } } }