python – Selecting a row of pandas series/dataframe by integer index

The Question :

436 people think this question is useful

I am curious as to why df[2] is not supported, while df.ix[2] and df[2:3] both work.

In [26]: df.ix[2]
A    1.027680
B    1.514210
C   -1.466963
D   -0.162339
Name: 2000-01-03 00:00:00

In [27]: df[2:3]
                  A        B         C         D
2000-01-03  1.02768  1.51421 -1.466963 -0.162339

I would expect df[2] to work the same way as df[2:3] to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?

The Question Comments :
  • df.ix[2] does not work – at least not in pandas version '0.19.2'
  • To see the difference between row and column selection via the indexing operator, [], see this answer below. Also NEVER USE .ix, it is deprecated

The Answer 1

599 people think this answer is useful

echoing @HYRY, see the new docs in 0.11

Here we have new operators, .iloc to explicity support only integer indexing, and .loc to explicity support only label indexing

e.g. imagine this scenario

In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))

In [2]: df
          A         B
0  1.068932 -0.794307
2 -0.470056  1.192211
4 -0.284561  0.756029
6  1.037563 -0.267820
8 -0.538478 -0.800654

In [5]: df.iloc[[2]]
          A         B
4 -0.284561  0.756029

In [6]: df.loc[[2]]
          A         B
2 -0.470056  1.192211

[] slices the rows (by label location) only

The Answer 2

73 people think this answer is useful

The primary purpose of the DataFrame indexing operator, [] is to select columns.

When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.

So, in the question above: df[2] searches for a column name matching the integer value 2. This column does not exist and a KeyError is raised.

The DataFrame indexing operator completely changes behavior to select rows when slice notation is used

Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.


This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.


You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.

I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc.

The Answer 3

24 people think this answer is useful

You can think DataFrame as a dict of Series. df[key] try to select the column index by key and returns a Series object.

However slicing inside of [] slices the rows, because it’s a very common operation.

You can read the document for detail:

The Answer 4

15 people think this answer is useful

To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as

np_df = df.as_matrix()

and then


would work.

The Answer 5

7 people think this answer is useful

you can loop through the data frame like this .

for ad in range(1,dataframe_c.size):

The Answer 6

6 people think this answer is useful

You can take a look at the source code .

DataFrame has a private function _slice() to slice the DataFrame, and it allows the parameter axis to determine which axis to slice. The __getitem__() for DataFrame doesn’t set the axis while invoking _slice(). So the _slice() slice it by default axis 0.

You can take a simple experiment, that might help you:

print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)

Add a Comment