File path or file-like object containing Parquet data.
import pandas as pd
# Create a sample DataFrame
data = {
'id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', 'Chloe', 'David'],
'age': [25, 30, 35, 40],
'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)
# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)
# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet')
print(df_read)
'''
Output:
id name age salary
0 1 Alice 25 50000
1 2 Bob 30 60000
2 3 Chloe 35 70000
3 4 David 40 80000
'''
Parquet reader engine.
import pandas as pd
# Create a sample DataFrame
data = {
'id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', 'Chloe', 'David'],
'age': [25, 30, 35, 40],
'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)
# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)
# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet', engine='auto')
print(df_read)
'''
Output:
id name age salary
0 1 Alice 25 50000
1 2 Bob 30 60000
2 3 Chloe 35 70000
3 4 David 40 80000
'''
List of columns to read.
import pandas as pd
# Create a sample DataFrame
data = {
'id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', 'Chloe', 'David'],
'age': [25, 30, 35, 40],
'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)
# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)
# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet', columns=['id', 'name'])
print(df_read)
'''
Output:
id name
0 1 Alice
1 2 Bob
2 3 Chloe
3 4 David
'''
Dictionary of storage-specific options, such as credentials for cloud storage.
Deprecated since version 2.0.0: A strict version of this argument is now the default, passing it has no effect.
The dtype_backend parameter is new in Pandas 2.0 which is used to specify the backend for handling the types of data when reading a file.
It specifies the file system to use for reading a Parquet file. This parameter is particularly useful when working with cloud storage systems (e.g., Amazon S3, Google Cloud Storage) or custom file systems.
Row group filters to apply.
import pandas as pd
# Create a sample DataFrame
data = {
'id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', "Chloe", 'David'],
'age': [25, 30, 35, 40],
'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)
# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)
# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet', filters=[("age", ">", 30)])
print(df_read)
'''
Output:
id name age salary
0 3 Chloe 35 70000
1 4 David 40 80000
'''
Additional arguments for Parquet reader engines.