-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
This is reproducible in current latest Pandas 1.5.2.
In Python the zipfile.Path class is intendent to act similar (but not absolute equal!) to pathlib.Path. The latter is accepted by pandas but not the first.
Steps to reproduce:
- Create a zip file named
foo.zipwith one an csv-file in it namedbar.csv. - Create a path object directly pointing to that csv file in the zip file:
zp = zipfile.Path('foo.zip', 'bar.csv') - Use that path object (
zp) inpandas.read_csv()as path object.
Because of that part of your code
Lines 446 to 452 in 3b09765
| # is_file_like requires (read | write) & __iter__ but __iter__ is only | |
| # needed for read_csv(engine=python) | |
| if not ( | |
| hasattr(filepath_or_buffer, "read") or hasattr(filepath_or_buffer, "write") | |
| ): | |
| msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}" | |
| raise ValueError(msg) |
Python raise an " ValueError: Invalid file path or buffer object type: <class 'zipfile.Path'>".
EDIT:
I'm aware that pandas.read_csv() do offer the compressions argument and can read compressed csv files by its own. But this doesn't help in my case. I'm using pandas as a backend for a more higher level API reading data files. Pandas is just one part of it. And one shortcoming of pandas here is that it is not able to deal with ZIP files containing multiple CSV files.
pathlib.Path and zipfile.Path are standard python. And pandas IMHO should be able to deal with it.