Suppose I have:

test = numpy.array([[1, 2], [3, 4], [5, 6]])

`test[i]`

gets me *ith* line of the array (eg `[1, 2]`

). How can I access the *ith* column? (eg `[1, 3, 5]`

). Also, would this be an expensive operation?

Skip to content
# python – How to access the ith column of a NumPy multidimensional array?

## The Question :

*518 people think this question is useful*
*The Question Comments :*
## The Answer 1

*770 people think this answer is useful*
## The Answer 2

*79 people think this answer is useful*
## The Answer 3

*75 people think this answer is useful*
## The Answer 4

*23 people think this answer is useful*
## The Answer 5

*10 people think this answer is useful*
## The Answer 6

*6 people think this answer is useful*
## The Answer 7

*3 people think this answer is useful*

2021-01-12

Suppose I have:

test = numpy.array([[1, 2], [3, 4], [5, 6]])

`test[i]`

gets me *ith* line of the array (eg `[1, 2]`

). How can I access the *ith* column? (eg `[1, 3, 5]`

). Also, would this be an expensive operation?

>>> test[:,0] array([1, 3, 5])

Similarly,

>>> test[1,:] array([3, 4])

lets you access rows. This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It’s certainly much quicker than accessing each element in a loop.

And if you want to access more than one column at a time you could do:

>>> test = np.arange(9).reshape((3,3)) >>> test array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> test[:,[0,2]] array([[0, 2], [3, 5], [6, 8]])

>>> test[:,0] array([1, 3, 5])

this command gives you a row vector, if you just want to loop over it, it’s fine, but if you want to hstack with some other array with dimension 3xN, you will have

ValueError: all the input arrays must have same number of dimensions

while

>>> test[:,[0]] array([[1], [3], [5]])

gives you a column vector, so that you can do concatenate or hstack operation.

e.g.

>>> np.hstack((test, test[:,[0]])) array([[1, 2, 1], [3, 4, 3], [5, 6, 5]])

You could also transpose and return a row:

In [4]: test.T[0] Out[4]: array([1, 3, 5])

Although the question has been answered, let me mention some nuances.

Let’s say you are interested in the first column of the array

arr = numpy.array([[1, 2], [3, 4], [5, 6]])

As you already know from other answers, to get it in the form of “row vector” (array of shape `(3,)`

), you use slicing:

arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr

To check if an array is a view or a copy of another array you can do the following:

arr_col1_view.base is arr # True arr_col1_copy.base is arr # False

see ndarray.base.

Besides the obvious difference between the two (modifying `arr_col1_view`

will affect the `arr`

), the number of byte-steps for traversing each of them is different:

arr_col1_view.strides[0] # 8 bytes arr_col1_copy.strides[0] # 4 bytes

Why is this important? Imagine that you have a very big array `A`

instead of the `arr`

:

A = np.random.randint(2, size=(10000, 10000), dtype='int32') A_col1_view = A[:, 1] A_col1_copy = A[:, 1].copy()

and you want to compute the sum of all the elements of the first column, i.e. `A_col1_view.sum()`

or `A_col1_copy.sum()`

. Using the copied version is much faster:

%timeit A_col1_view.sum() # ~248 µs %timeit A_col1_copy.sum() # ~12.8 µs

This is due to the different number of strides mentioned before:

A_col1_view.strides[0] # 40000 bytes A_col1_copy.strides[0] # 4 bytes

Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the `A_col1_copy`

). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.

In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major (‘F’) order instead of the row-major (‘C’) order (which is the default), and then do the slicing as before to get a column without copying it:

A = np.asfortranarray(A) # or np.array(A, order='F') A_col1_view = A[:, 1] A_col1_view.strides[0] # 4 bytes %timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs

Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.

Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.

A[:, 1].strides[0] # 40000 bytes A.T[1, :].strides[0] # 40000 bytes

To get several and indepent columns, just:

> test[:,[0,2]]

you will get colums 0 and 2

>>> test array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) >>> ncol = test.shape[1] >>> ncol 5L

Then you can select the 2nd – 4th column this way:

>>> test[0:, 1:(ncol - 1)] array([[1, 2, 3], [6, 7, 8]])