loc and iloc

How to use loc and iloc to choose data in Pandas

Posted by

When it comes to selecting data on a DataFrame, Pandas and are two top favorites. They are fast, fast, easy to read and sometimes interchangeable.lociloc

In this article we will look at the differences between and , look at their similarities, and see how to perform data selection with them. We will talk about the following topics:lociloc

  1. Differences between and lociloc
  2. Choose via a single value
  3. Select via a list of values
  4. Select a range of data via cutting
  5. Select via conditions and callable
  6. loc and is interchangeable when labels are 0-based integersiloc

Please check notebook for the source code.

1. Differences between and lociloc

The main difference between and is:lociloc

  • loc is label-based, which means you must specify rows and columns based on their row and column labels.
  • iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

Here are some differences and similarities between and:lociloc

For demonstration, we create a DataFrame and load it with the Day column as the index.

df = pd.read_csv('data/data.csv', index_col=['Day'])

2. Choose via a single value

Both and allow inputs to be a single value. We can use the following syntax for data selection:lociloc

  • loc[row_label, column_label]
  • iloc[row_position, column_position]

For example, let’s say we’d like to regain Friday’s temperature value.

With , we can pass the row label and the column label .loc'Fri''Temperature'

# To get Friday's temperature
>>> df.loc['Fri', 'Temperature']10.51

The equivalent statement must take the row number and the column number .iloc41

# The equivalent `iloc` statement
>>> df.iloc[4, 1]10.51

We can also use it to return all data. For example, to get all rows::

# To get all rows
>>> df.loc[:, 'Temperature']Day
Mon    12.79
Tue    19.67
Wed    17.51
Thu    14.44
Fri    10.51
Sat    11.07
Sun    17.50
Name: Temperature, dtype: float64# The equivalent `iloc` statement
>>> df.iloc[:, 1]

And to get all columns:

# To get all columns
>>> df.loc['Fri', :]Weather        Shower
Temperature     10.51
Wind               26
Humidity           79
Name: Fri, dtype: object# The equivalent `iloc` statement
>>> df.iloc[4, :]

Note that the above 2 outputs are Series. and will return a series when the result is 1-dimensional data.lociloc

3. Select via a list of values

We can pass a list of tags to select multiple rows or columns:loc

# Multiple rows
>>> df.loc[['Thu', 'Fri'], 'Temperature']Day
Thu    14.44
Fri    10.51
Name: Temperature, dtype: float64# Multiple columns
>>> df.loc['Fri', ['Temperature', 'Wind']]Temperature    10.51
Wind              26
Name: Fri, dtype: object

Similarly, a list of integer values can be transferred to select multiple rows or columns. Here are the equivalent statements that make use of:ilociloc

>>> df.iloc[[3, 4], 1]Day
Thu    14.44
Fri    10.51
Name: Temperature, dtype: float64>>> df.iloc[4, [1, 2]]Temperature    10.51
Wind              26
Name: Fri, dtype: object

All of the above outputs are Series because their results are 1-dimensional data.

The output will be a DataFrame when the result is 2-dimensional data, for example, to access multiple rows and columns

# Multiple rows and columns
rows = ['Thu', 'Fri']
cols=['Temperature','Wind']df.loc[rows, cols]

The equivalent explanation is:iloc

rows = [3, 4]
cols = [1, 2]df.iloc[rows, cols]

4. Select a range of data via cutting

Cutting (written as ) is a powerful technique that makes it possible to select a variety of data. This is very useful if we want to choose everything between two items.start:stop:step

loc with cutting

With , we can use the syntax to select data from label A to label (Both A and B are included):locA:B

# Slicing column labels
rows=['Thu', 'Fri']df.loc[rows, 'Temperature':'Humidity' ]
# Slicing row labels
cols = ['Temperature', 'Wind']df.loc['Mon':'Thu', cols]

We can use the syntax to select data from label A to label with step size S (Both A and B are included):A:B:S

# Slicing with step
df.loc['Mon':'Fri':2 , :]

iloc with cutting

With , we can also use the syntax to select data from position n (included) to position m (excluding). However, the main difference here is that the endpoint (m) is excluded from the result.ilocn:miloc

For example, select columns from position 0 to 3 (excluding):

df.iloc[[1, 2], 0 : 3]

Similarly, we can use the syntax to select data from position n (included) to position m (excluding) with step size s. Notes that the endpoint m is excluded.n:m:s

df.iloc[0:4:2, :]

5. Choose via conditions and callable

Conditions

loc with conditions

Often we want to filter the data based on conditions. For example, we may need to find the rows where humidity is greater than 50.

With , we just have to pass the condition to the declaration.locloc

# One condition
df.loc[df.Humidity > 50, :]

Sometimes we may need to use multiple conditions to filter our data. For example, find all the rows where humidity is more than 50 and the weather is Shower:

## multiple conditions
df.loc[
    (df.Humidity > 50) & (df.Weather == 'Shower'), 
    ['Temperature','Wind'],
]

iloc with conditions

For , we will get a ValueError if pass the condition straight in the declaration:iloc

# Getting ValueError
df.iloc[df.Humidity > 50, :]

We get the bug because iloc can’t accept a boolean series. It only accepts a boolean list. We can use the list() feature to convert a range into a boolean list.

# Single condition
df.iloc[list(df.Humidity > 50)]

Similarly, we can use to convert the output of several conditions into a boolean list:list()

## multiple conditions
df.iloc[
    list((df.Humidity > 50) & (df.Weather == 'Shower')), 
    :,
]

Callable function

loc With call

loc accepts a call as an indexer. The callable must be a function with one argument that returns valid output for indexing.

For example, to select columns

# Selecting columns
df.loc[:, lambda df: ['Humidity', 'Wind']]

And to filter data with a callable:

# With condition
df.loc[lambda df: df.Humidity > 50, :]
image by author

iloc With call

iloc Can also take a call as an indexer.

df.iloc[lambda df: [0,1], :]

Filtering data with callables will be required to convert the output of conditions into a boolean list:iloclist()

df.iloc[lambda df: list(df.Humidity > 50), :]

6. and is interchangeable when labels are 0-based integerslociloc

For demonstration, let’s create a DataFrame with 0-based integers as headers and index labels.

df = pd.read_csv(
    'data/data.csv', 
    header=None, 
    skiprows=[0],
)

With , the Pandas will generate 0-based integer values as headings. With , those headings AgainTemperature, etc we used will be skipped.header=Noneskiprows=[0]

Now, a label-based data picker, can accept a single integer and a list of integer values. For example:loc

>>> df.loc[1, 2]
19.67
>>> df.loc[1, [1, 2]]
1    Sunny
2    19.67
Name: 1, dtype: object

The reason they work is that those integer values (and ) are interpreted as labels of the index. This usage is not an integer position along with the index and is a little confusing.12

In this case, and is interchangeable when you choose via a single value or a list of values.lociloc

>>> df.loc[1, 2] == df.iloc[1, 2]
True>>> df.loc[1, [1, 2]] == df.iloc[1, [1, 2]]
1    True
2    True
Name: 1, dtype: bool

Note that and will return different results when choosing via cuts and conditions. They are essentially different because:lociloc

  • cut: endpoint is excluded from result, but included in ilocloc
  • Conditions: accept boolean range, but can only accept a boolean list.lociloc

Conclusion

Finally, here’s a summary

loc Is label-based and allowed input:

  • A single label or (Note what is interpreted as a label of the index.)'A'22
  • A list of labels or (Note what is interpreted as labels of the index.)['A', 'B', 'C'][1, 2, 3]1, 2, 3
  • A cut with labels (Both are included)'A':'C'
  • Voorwaardes, ‘n boolean-reeks of ‘n boolean skikking
  • A function with one argumentcallable

iloc is integer position based and allowed input is:

  • An integer b.2
  • A list or variety of integers .[1, 2, 3]
  • A cut with integers (the endpoint is excluded)1:77
  • Conditions, but accept only a boolean array
  • A function with one argumentcallable

loc and is interchangeable when labeling Pandas DataFrame 0-based integersiloc

I hope this article will help you save time learning Pandas data selection. I recommend checking the documentation to know about other things you can do.

References

How to use loc and iloc for selecting data in Pandas | by B. Chen

How are iloc and loc different?

Leave a Reply

Your email address will not be published. Required fields are marked *


CAPTCHA Image
Reload Image