KEMBAR78
TST: add test to read empty array by nakatomotoi · Pull Request #43459 · pandas-dev/pandas · GitHub
Skip to content

Conversation

@nakatomotoi
Copy link
Contributor

@jreback jreback added IO Parquet parquet, feather Testing pandas testing functions or related to the test suite labels Sep 8, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping when updated and green

@nakatomotoi nakatomotoi changed the title TST: add test to read zero-chunked array TST: add test to read empty array Sep 12, 2021
@nakatomotoi nakatomotoi requested a review from jreback September 12, 2021 08:22
@jreback jreback added this to the 1.4 milestone Sep 12, 2021
@nakatomotoi nakatomotoi requested a review from jreback September 14, 2021 14:06
Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add back the original test you add for all types? It doesn't need to include the nullable types since they are added in the other test that uses use_nullable_dtypes

assert isinstance(result._mgr, pd.core.internals.BlockManager)

@pytest.mark.parametrize(
"dtype", ["Int64", "UInt8", "boolean", "object", "datetime64[ns, UTC]"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more types here, float, int, period[D], category, Float64, string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
I get "AssertionError" when I add "category", is "category" necessary as a test case?

self = <pandas.tests.io.test_parquet.TestParquetPyArrow object at 0x7f020ec01640>, pa = 'pyarrow', dtype = 'category'

    @pytest.mark.parametrize(
        "dtype",
        [
            "Int64",
            "UInt8",
            "boolean",
            "object",
            "datetime64[ns, UTC]",
            "float",
            "int",
            "period[D]",
            "category",
            "Float64",
            "string",
        ],
    )
    def test_read_empty_array(self, pa, dtype):
        # GH #41241
        df = pd.DataFrame(
            {
                "value": pd.array([], dtype=dtype),
            }
        )
>       check_round_trip(df, pa)

pandas/tests/io/test_parquet.py:957: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/tests/io/test_parquet.py:221: in check_round_trip
    compare(repeat)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

repeat = 2

    def compare(repeat):
        for _ in range(repeat):
            df.to_parquet(path, **write_kwargs)
            with catch_warnings(record=True):
                actual = read_parquet(path, **read_kwargs)
    
>           tm.assert_frame_equal(
                expected,
                actual,
                check_names=check_names,
                check_like=check_like,
                check_dtype=check_dtype,
            )
E           AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="value") are different
E           
E           Attribute "dtype" are different
E           [left]:  CategoricalDtype(categories=[], ordered=False)
E           [right]: object

pandas/tests/io/test_parquet.py:211: AssertionError

"value": pd.array([], dtype=dtype),
}
)
check_round_trip(df, pa)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add read_kwars={'use_nullable_types': True}

}
)
check_round_trip(df, pa)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this test near here:

     def test_use_nullable_dtypes(self, engine):
        import pyarrow.parquet as pq

        if engine == "fastparquet":
            # We are manually disabling fastparquet's
            # nullable dtype support pending discussion
            pytest.skip("Fastparquet nullable dtype support is disabled")

and in fact structure it similarly

@nakatomotoi
Copy link
Contributor Author

@jreback
I get a timeout on the pandas-dev.pandas test, could you tell me why?
If you know the solution, I'd like to hear it.

@jreback
Copy link
Contributor

jreback commented Sep 28, 2021

@jreback I get a timeout on the pandas-dev.pandas test, could you tell me why? If you know the solution, I'd like to hear it.

you can merge master and see if that fixes, but sometimes this does timeout unrelated to your change

@nakatomotoi nakatomotoi requested a review from jreback September 29, 2021 08:28
@jreback jreback merged commit f10bbe9 into pandas-dev:master Sep 29, 2021
@jreback
Copy link
Contributor

jreback commented Sep 29, 2021

thanks @nakatomotoi very nice.

if you would like to do a PR with an empty category would be great as well (llikley just need to construct it properly)

@nakatomotoi nakatomotoi deleted the add-read-parquet-test branch September 29, 2021 13:07
gasparitiago pushed a commit to gasparitiago/pandas that referenced this pull request Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO Parquet parquet, feather Testing pandas testing functions or related to the test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: New param [use_nullable_dtypes] of pd.read_parquet() can't handle empty parquet file

3 participants