## The Question :

*382 people think this question is useful*

I have a dictionary which looks like this: `di = {1: "A", 2: "B"}`

I would like to apply it to the “col1” column of a dataframe similar to:

col1 col2
0 w a
1 1 2
2 2 NaN

to get:

col1 col2
0 w a
1 A 2
2 B NaN

How can I best do this? For some reason googling terms relating to this only shows me links about how to make columns from dicts and vice-versa :-/

*The Question Comments :*

## The Answer 1

*411 people think this answer is useful*

You can use `.replace`

. For example:

>>> df = pd.DataFrame({'col2': {0: 'a', 1: 2, 2: np.nan}, 'col1': {0: 'w', 1: 1, 2: 2}})
>>> di = {1: "A", 2: "B"}
>>> df
col1 col2
0 w a
1 1 2
2 2 NaN
>>> df.replace({"col1": di})
col1 col2
0 w a
1 A 2
2 B NaN

or directly on the `Series`

, i.e. `df["col1"].replace(di, inplace=True)`

.

## The Answer 2

*307 people think this answer is useful*

`map`

can be much faster than `replace`

If your dictionary has more than a couple of keys, using `map`

can be much faster than `replace`

. There are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values (and also whether you want non-matches to keep their values or be converted to NaNs):

### Exhaustive Mapping

In this case, the form is very simple:

df['col1'].map(di) # note: if the dictionary does not exhaustively map all
# entries then non-matched entries are changed to NaNs

Although `map`

most commonly takes a function as its argument, it can alternatively take a dictionary or series: Documentation for Pandas.series.map

### Non-Exhaustive Mapping

If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add `fillna`

:

df['col1'].map(di).fillna(df['col1'])

as in @jpp’s answer here: Replace values in a pandas series via dictionary efficiently

### Benchmarks

Using the following data with pandas version 0.23.1:

di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" }
df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })

and testing with `%timeit`

, it appears that `map`

is approximately 10x faster than `replace`

.

Note that your speedup with `map`

will vary with your data. The largest speedup appears to be with large dictionaries and exhaustive replaces. See @jpp answer (linked above) for more extensive benchmarks and discussion.

## The Answer 3

*67 people think this answer is useful*

There is a bit of ambiguity in your question. There are at least ~~three~~ two interpretations:

- the keys in
`di`

refer to index values
- the keys in
`di`

refer to `df['col1']`

values
- the keys in
`di`

refer to index locations (not the OP’s question, but thrown in for fun.)

Below is a solution for each case.

**Case 1:**
If the keys of `di`

are meant to refer to index values, then you could use the `update`

method:

df['col1'].update(pd.Series(di))

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['w', 10, 20],
'col2': ['a', 30, np.nan]},
index=[1,2,0])
# col1 col2
# 1 w a
# 2 10 30
# 0 20 NaN
di = {0: "A", 2: "B"}
# The value at the 0-index is mapped to 'A', the value at the 2-index is mapped to 'B'
df['col1'].update(pd.Series(di))
print(df)

yields

col1 col2
1 w a
2 B 30
0 A NaN

I’ve modified the values from your original post so it is clearer what `update`

is doing.
Note how the keys in `di`

are associated with index values. The order of the index values — that is, the index *locations* — does not matter.

**Case 2:**
If the keys in `di`

refer to `df['col1']`

values, then @DanAllan and @DSM show how to achieve this with `replace`

:

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['w', 10, 20],
'col2': ['a', 30, np.nan]},
index=[1,2,0])
print(df)
# col1 col2
# 1 w a
# 2 10 30
# 0 20 NaN
di = {10: "A", 20: "B"}
# The values 10 and 20 are replaced by 'A' and 'B'
df['col1'].replace(di, inplace=True)
print(df)

yields

col1 col2
1 w a
2 A 30
0 B NaN

Note how in this case the keys in `di`

were changed to match *values* in `df['col1']`

.

**Case 3:**
If the keys in `di`

refer to index locations, then you could use

df['col1'].put(di.keys(), di.values())

since

df = pd.DataFrame({'col1':['w', 10, 20],
'col2': ['a', 30, np.nan]},
index=[1,2,0])
di = {0: "A", 2: "B"}
# The values at the 0 and 2 index locations are replaced by 'A' and 'B'
df['col1'].put(di.keys(), di.values())
print(df)

yields

col1 col2
1 A a
2 10 30
0 B NaN

Here, the first and third rows were altered, because the keys in `di`

are `0`

and `2`

, which with Python’s 0-based indexing refer to the first and third locations.

## The Answer 4

*5 people think this answer is useful*

DSM has the accepted answer, but the coding doesn’t seem to work for everyone. Here is one that works with the current version of pandas (0.23.4 as of 8/2018):

import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 2, 3, 1],
'col2': ['negative', 'positive', 'neutral', 'neutral', 'positive']})
conversion_dict = {'negative': -1, 'neutral': 0, 'positive': 1}
df['converted_column'] = df['col2'].replace(conversion_dict)
print(df.head())

You’ll see it looks like:

col1 col2 converted_column
0 1 negative -1
1 2 positive 1
2 2 neutral 0
3 3 neutral 0
4 1 positive 1

The docs for pandas.DataFrame.replace are here.

## The Answer 5

*4 people think this answer is useful*

Adding to this question if you ever have more than one columns to remap in a data dataframe:

def remap(data,dict_labels):
"""
This function take in a dictionnary of labels : dict_labels
and replace the values (previously labelencode) into the string.
ex: dict_labels = {{'col1':{1:'A',2:'B'}}
"""
for field,values in dict_labels.items():
print("I am remapping %s"%field)
data.replace({field:values},inplace=True)
print("DONE")
return data

Hope it can be useful to someone.

Cheers

## The Answer 6

*2 people think this answer is useful*

Or do `apply`

:

df['col1'].apply(lambda x: {1: "A", 2: "B"}.get(x,x))

Demo:

>>> df['col1']=df['col1'].apply(lambda x: {1: "A", 2: "B"}.get(x,x))
>>> df
col1 col2
0 w a
1 1 2
2 2 NaN
>>>

## The Answer 7

*2 people think this answer is useful*

Given `map`

is faster than replace (@JohnE’s solution) you need to be careful **with Non-Exhaustive mappings where you intend to map specific values to **`NaN`

. The proper method in this case requires that you `mask`

the Series when you `.fillna`

, else you undo the mapping to `NaN`

.

import pandas as pd
import numpy as np
d = {'m': 'Male', 'f': 'Female', 'missing': np.NaN}
df = pd.DataFrame({'gender': ['m', 'f', 'missing', 'Male', 'U']})

keep_nan = [k for k,v in d.items() if pd.isnull(v)]
s = df['gender']
df['mapped'] = s.map(d).fillna(s.mask(s.isin(keep_nan)))

gender mapped
0 m Male
1 f Female
2 missing NaN
3 Male Male
4 U U

## The Answer 8

*1 people think this answer is useful*

A nice complete solution that keeps a map of your class labels:

labels = features['col1'].unique()
labels_dict = dict(zip(labels, range(len(labels))))
features = features.replace({"col1": labels_dict})

This way, you can at any point refer to the original class label from labels_dict.

## The Answer 9

*1 people think this answer is useful*

As an extension to what have been proposed by Nico Coallier (apply to multiple columns) and U10-Forward(using apply style of methods), and summarising it into a one-liner I propose:

df.loc[:,['col1','col2']].transform(lambda x: x.map(lambda x: {1: "A", 2: "B"}.get(x,x))

The `.transform()`

processes each column as a series. Contrary to `.apply()`

which passes the columns aggregated in a DataFrame.

Consequently you can apply the Series method `map()`

.

Finally, and I discovered this behaviour thanks to U10, you can use the whole Series in the .get() expression. Unless I have misunderstood its behaviour and it processes sequentially the series instead of bitwisely.

The `.get(x,x)`

accounts for the values you did not mention in your mapping dictionary which would be considered as Nan otherwise by the `.map()`

method

## The Answer 10

*0 people think this answer is useful*

A more native pandas approach is to apply a replace function as below:

def multiple_replace(dict, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

Once you defined the function, you can apply it to your dataframe.

di = {1: "A", 2: "B"}
df['col1'] = df.apply(lambda row: multiple_replace(di, row['col1']), axis=1)