KEMBAR78
Using df.at() for assignment changes views of certain columns, but not all of them · Issue #22372 · pandas-dev/pandas · GitHub
Skip to content

Using df.at() for assignment changes views of certain columns, but not all of them #22372

@liesb

Description

@liesb

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

def f2(seed=789, niter=10):
    rng = np.random.RandomState(seed=seed)
    df = pd.DataFrame(index=np.arange(5))
    df['x'] = rng.uniform(0, 2, size=5)
    df['y'] = rng.uniform(0, 2, size=5)
    df['cost'] = rng.uniform(0, 10, size=5)
    # print the initial df
    print(df)
    # for loop: replace the worst param combo and evaluate
    for i in range(niter):
        # idx of param combo with highest cost
        worst_idx = np.argmax(df.cost.values)
        # IF line below is commented out, I don't get the discrepancy
        centroid = df.loc[rng.choice(np.arange(5), size=1)].mean(axis=0)
        for p in ['x', 'y']:
            new_param = 4
            # THIS IS THE OFFENDING CALL to df.at! 
            # Change it to .loc and it doesn't give the same issue
            df.at[worst_idx, p] = new_param
#             df.loc[worst_idx, p] = new_param
        # The at below is not the offending at
        df.at[worst_idx, 'cost'] = rng.uniform(low=0, high=10)
#         df.loc[worst_idx, 'cost'] = rng.uniform(low=0, high=10, size=(1,))
    return df

test = f2()
test
test.cost
test[['cost']]
test.iloc[4]

Problem description

The cost column is displayed incorrectly when we use the command test. It is correct when we use test.cost but not when we use test[['cost']] or test.iloc[4]. The other columns are displayed correctly, it seems.
This doesn't happen when we use .loc() to assign new_param to the x and y columns.
It also doesn't happen when we comment out centroid = df.loc[rng.choice(np.arange(5), size=1)].mean(axis=0).

The current behaviour is problematic because choosing different ways of viewing the data yield different answers, which shouldn't be the case.

Expected Output

  • The cost column in test would correspond to test.cost.
  • test[['cost']] and test.cost give the same results
  • test.iloc[4] displays the 4th element of test.cost

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 36.4.0
Cython: None
numpy: 1.15.0
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions