File path or file-like object containing ORC data.
import pandas as pd
# Create a sample DataFrame
data = {
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Chloe', 'David', 'Eva'],
'age': [25, 30, 35, 40, 45],
'department': ['HR', 'Engineering', 'Marketing', 'Engineering', 'HR']
}
df = pd.DataFrame(data)
df.to_orc('data.orc')
# Read the data back from the ORC file
df_read = pd.read_orc(path='data.orc')
print(df_read)
'''
Output:
id name age department
0 1 Alice 25 HR
1 2 Bob 30 Engineering
2 3 Chloe 35 Marketing
3 4 David 40 Engineering
4 5 Eva 45 HR
'''
It is used to specify the group name in the HDF5 file from which the data should be read.
import pandas as pd
# Create a sample DataFrame
data = {
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Chloe', 'David', 'Eva'],
'age': [25, 30, 35, 40, 45],
'department': ['HR', 'Engineering', 'Marketing', 'Engineering', 'HR']
}
df = pd.DataFrame(data)
df.to_orc('data.orc')
# Read the data back from the ORC file
df_read = pd.read_orc(path='data.orc', columns=["id", "name"])
print(df_read)
'''
Output:
id name
0 1 Alice
1 2 Bob
2 3 Chloe
3 4 David
4 5 Eva
'''
The dtype_backend parameter is new in Pandas 2.0 which is used to specify the backend for handling the types of data when reading a file.
It specifies the file system to use for reading a Parquet file. This parameter is particularly useful when working with cloud storage systems (e.g., Amazon S3, Google Cloud Storage) or custom file systems.
Additional arguments for the ORC reader.