What are Arrays good for in java?

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
Just seems like ArrayList is better in every way, is it an efficiency thing or what?

Am i right in thinking that there's nothing an Array can do that an ArrayList cant also do? ArrayList syntax is in line with the rest of the java collection framework as well.
 

dwell

pics?
Oct 9, 1999
5,185
2
0
Shouldn't be a shocker to anyone:

Code:
public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
...
    private transient Object[] elementData;
...
}

Measuring the overhead (unscientifically):

Code:
public class Main
{
    public static void main(String[] args) throws Exception
    {
        List<String> list = new ArrayList<String>();

        list.add("One");
        list.add("Two");
        list.add("Three");
        list.add("Four");
        list.add("Five");

        String[] arr = {"One", "Two", "Three", "Four", "Five"};

        System.out.println(sizeOf(list));
        System.out.println(sizeOf(arr));
    }

    public static int sizeOf(Object obj) throws java.io.IOException
    {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(baos);

        oos.writeObject(obj);
        oos.flush();
        oos.close();
        baos.close();

        return baos.toByteArray().length;
    }
}

Output:

Code:
92
78

14 bytes overhead? Not so bad, if accurate. Might have to find a better way to get the size of an object in Java.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
Overhead matters a lot more when you're dealing with larger amounts of data. ArrayLists will do things like resize the internal container which can result in using more space than necessary and also takes CPU cycles. Also ArrayLists can't hold primitives, which adds extra overhead (an Integer is larger than an int).

Let's compare an array of a million ints to an ArrayList of a million Integers. Borrowing dwell's sizeOf() method:

Code:
static final int MILLION = 1000000;

public static void main(String[] args) throws Exception {
    int[] array = new int[MILLION];
    ArrayList<Integer> list = new ArrayList<Integer>();
    for (int i = 0; i < MILLION; ++i) {
        array[i] = i;
        list.add(i);
    }
    System.out.println(sizeOf(array));
    System.out.println(sizeOf(list));
}

Output:
Code:
4000027
10000125

The ArrayList is 2.5x larger.

The int[] takes 3.8 MB and the ArrayList takes 9.5 MB. In a larger applicaiton, that really adds up. It could be the difference between your program using 400MB or 1000MB.
 
Last edited:

Danimal1209

Senior member
Nov 9, 2011
355
0
0
I prefer ArrayList since it expands and contracts if you need to add or take away an object/variable.

Arrays need to be set in size when created, ArrayList does not.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
There is also a decent speed difference.

Measuring how long it takes to insert 1 million integers into an array vs an ArrayList (including allocation of the array/list):

Code:
static final int MILLION = 1000000;

public static void main(String[] args) throws Exception {
    // array
    long start = System.currentTimeMillis();
    int[] array = new int[MILLION];
    for (int i = 0; i < MILLION; ++i) {
        array[i] = i;
    }
    long end = System.currentTimeMillis();
    System.out.println("Array took " + (end-start) + " ms");

    // ArrayList
    start = System.currentTimeMillis();
    ArrayList<Integer> list = new ArrayList<Integer>();
    for (int i = 0; i < MILLION; ++i) {
        list.add(i);
    }
    end = System.currentTimeMillis();
    System.out.println("List took " + (end-start) + " ms");
}

Output:

Code:
Array took 16 ms
List took 117 ms

The ArrayList took 7.3 times longer
 
Last edited:

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
Overhead matters a lot more when you're dealing with larger amounts of data. ArrayLists will do things like resize the internal container which can result in using more space than necessary and also takes CPU cycles. Also ArrayLists can't hold primitives, which adds extra overhead (an Integer is larger than an int).

Let's compare an array of a million ints to an ArrayList of a million Integers. Borrowing dwell's sizeOf() method:

Code:
static final int MILLION = 1000000;

public static void main(String[] args) throws Exception {
    int[] array = new int[MILLION];
    ArrayList<Integer> list = new ArrayList<Integer>();
    for (int i = 0; i < MILLION; ++i) {
        array[i] = i;
        list.add(i);
    }
    System.out.println(sizeOf(array));
    System.out.println(sizeOf(list));
}

Output:
Code:
4000027
10000125

The ArrayList is 2.5x larger.

The int[] takes 3.8 MB and the ArrayList takes 9.5 MB. In a larger applicaiton, that really adds up. It could be the difference between your program using 400MB or 1000MB.

Hmm interesting, ill keep this in mind. I prefer ArrayLists for their flexibility as im still a noob at this but ill definitely try to use an Array if possible.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
Hmm interesting, ill keep this in mind. I prefer ArrayLists for their flexibility as im still a noob at this but ill definitely try to use an Array if possible.

It all depends on the application. If you're trying to manage a few hundred things, by all means, use the ArrayList for its convenience. If you're working megabytes of data, start thinking about efficiency.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
It all depends on the application. If you're trying to manage a few hundred things, by all means, use the ArrayList for its convenience. If you're working megabytes of data, start thinking about efficiency.

It's fine to think about efficiency, but few programmers--especially beginners--really need to worry about performance as a first-order constraint. Focus on doing the thing that:
- Has the simplest algorithmic complexity,
- Is the easiest to read,
- Is the easiest to write.

Premature optimization is the root of all evil.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
ArrayLists has built in overhead and CPU cycles that are used trying to optimize the use of the list as efficiently as possible.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
The overhead of access while there isn't normally a good reason to choose one over the other. The main reason is for the storage of primitives where arrays are significantly smaller as they don't have to box them in objects.

Most of the times you should choose a List, Set or Map implementation, and the defaults of ArrayList, HashSet and HashMap will do 95% of everything.
 

esun

Platinum Member
Nov 12, 2001
2,214
0
0
I haven't written Java in a long time, but based on my reading the ArrayList is analogous to the C++ std::vector, meaning it is a dynamically resizable array (and not, for example, a linked list).

This means that if you pre-allocate by setting the capacity ahead of time, then inserting a million elements should be as fast as a plain old array (or very close). The size difference should also not be large if you pre-allocate. Of course if you don't, then based on the doubling scheme the size will be something like the next power of 2, which could be much larger than you need.

I'd go read up on dynamic array implementations to understand the performance of dynamically resizing arrays. Also read up on linked lists to understand why their overhead is often non-trivial, unlike ArrayLists or std::vectors.
 

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
Hmm interesting, ill keep this in mind. I prefer ArrayLists for their flexibility as im still a noob at this but ill definitely try to use an Array if possible.

Ive been toying around with your code in java and telling the ArrayList its size beforehand as esun suggested seems to speed it up by around 50%, consecutive runs speed it up more, im assuming because the needed data is already in memory after hitting run a few times i get:

Array took 5 ms
List took 12 ms

Just wondering if you made your own sizeOf() method? I dont appear to have one and googling it tells me java hasent got one by default :(
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Code:
public class ArraySizesTest {
	private static int MB = 1024*1024;
	
	@Test
	public void shouldCreateArray() throws Exception {
		byte[] array = new byte[10*MB];
		for(int i=0;i<array.length;++i) {
			array[i] = (byte)(i % 128);
		}
		
		measure("array");
		
		for(int i=0;i<array.length;++i) {
			array[i] = 0;
		}
	}
	
	@Test
	public void shouldCreateArrayList() throws Exception {
		ArrayList<Byte> arrayList = new ArrayList<Byte>(10*MB);
		for(int i=0;i<arrayList.size();++i) {
			arrayList.set(i, (byte)(i % 128));
		}
		
		measure("arrayList");
		
		for(int i=0;i<arrayList.size();++i) {
			arrayList.set(i,(byte) 0);
		}
	}
	
	private void measure(String testName) {
		System.gc();
		System.gc();
		Runtime runtime = Runtime.getRuntime();
		long usedMemory = runtime.totalMemory()- runtime.freeMemory();
		
		System.out.println(testName+": " +usedMemory +" bytes");
	}
}

Here you see a technique for determine the memory used. There is no sizeOf that works reliably, so we measure the memory used in microbenchmarks like these.

For me this outputs

array: 12275216 bytes
arrayList: 42388408 bytes

And Junit is telling me that timings of:
shouldCreateArray = 0.020s
shouldCreateArrayList = 0.130s

Bytes is about the worst example you can have for a collection, because the overhead is at its maximum. You'll see less of a difference with longs and doubles than you do with the smaller data types.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
I haven't written Java in a long time, but based on my reading the ArrayList is analogous to the C++ std::vector, meaning it is a dynamically resizable array (and not, for example, a linked list).

This means that if you pre-allocate by setting the capacity ahead of time, then inserting a million elements should be as fast as a plain old array (or very close). The size difference should also not be large if you pre-allocate. Of course if you don't, then based on the doubling scheme the size will be something like the next power of 2, which could be much larger than you need.

I'd go read up on dynamic array implementations to understand the performance of dynamically resizing arrays. Also read up on linked lists to understand why their overhead is often non-trivial, unlike ArrayLists or std::vectors.

The main difference is that collections in Java can hold objects. I think C++ collections can hold primitives. There is a big difference in storing bunch of ints as compared to making a bunch of Integer objects and then storing those.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
The main difference is that collections in Java can hold objects. I think C++ collections can hold primitives. There is a big difference in storing bunch of ints as compared to making a bunch of Integer objects and then storing those.

It's a template class...

Code:
template < class T, class Allocator = allocator<T> > class vector;

So it can be instantiated for any type of object (there may be some constraints I'm not aware of any longer, but generally any type of object).
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Java has cheap generics where only one version of the code is created and autoboxing is used for dealing with primitives. It produces inefficient storage for primitives which is why arrays are still useful in Java. There actually are implementations of ArrayList that are native byte/int/short etc out there in libraries.

Scala does this in a different way on the JVM and can provide good storage and the flexibility with its implicit array like structures.
 

esun

Platinum Member
Nov 12, 2001
2,214
0
0
The main difference is that collections in Java can hold objects. I think C++ collections can hold primitives. There is a big difference in storing bunch of ints as compared to making a bunch of Integer objects and then storing those.

Perhaps a better way to say this is collections in Java can only store objects and not primitives, while C++ data structures can store both primitives and objects (although usually we store pointers to objects when possible to avoid unnecessary copies).