Creation | pd.read_parquet()

Previous Next

Method:

pd.read_parquet(path, engine='auto', columns=None, storage_options=None, use_nullable_dtypes=<no_default>, dtype_backend=<no_default>, filesystem=None, filters=None, **kwargs)

Reads a Parquet file into a DataFrame.

Returns:

pandas.core.frame.DataFrame

Parameters:

path: (str or path)-

File path or file-like object containing Parquet data.

import pandas as pd

# Create a sample DataFrame
data = {
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Chloe', 'David'],
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)

# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)

# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet')
print(df_read)
'''
Output:
   id   name  age  salary
0   1  Alice   25   50000
1   2    Bob   30   60000
2   3  Chloe   35   70000
3   4  David   40   80000
'''

engine: ('auto' or 'pyarrow' or 'fastparquet'), Optional-

Parquet reader engine.

import pandas as pd

# Create a sample DataFrame
data = {
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Chloe', 'David'],
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)

# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)

# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet', engine='auto')
print(df_read)
'''
Output:
   id   name  age  salary
0   1  Alice   25   50000
1   2    Bob   30   60000
2   3  Chloe   35   70000
3   4  David   40   80000
'''

columns: None, Optional-

List of columns to read.

import pandas as pd

# Create a sample DataFrame
data = {
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Chloe', 'David'],
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)

# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)

# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet', columns=['id', 'name'])
print(df_read)
'''
Output:
   id   name
0   1  Alice
1   2    Bob
2   3  Chloe
3   4  David
'''

storage_options: dict, Optional-

Dictionary of storage-specific options, such as credentials for cloud storage.

use_nullable_dtypes: bool, Optional-

Deprecated since version 2.0.0: A strict version of this argument is now the default, passing it has no effect.

dtype_backend: None, Optional-

The dtype_backend parameter is new in Pandas 2.0 which is used to specify the backend for handling the types of data when reading a file.

filesystem: None, Optional-

It specifies the file system to use for reading a Parquet file. This parameter is particularly useful when working with cloud storage systems (e.g., Amazon S3, Google Cloud Storage) or custom file systems.

filters: None, Optional-

Row group filters to apply.

import pandas as pd

# Create a sample DataFrame
data = {
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', "Chloe", 'David'],
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)

# Write the DataFrame to a Parquet file
df.to_parquet('data.parquet', index=False)

# Read the Parquet file into a DataFrame
df_read = pd.read_parquet(path='data.parquet', filters=[("age", ">", 30)])
print(df_read)
'''
Output:
   id   name  age  salary
0   3  Chloe   35   70000
1   4  David   40   80000
'''

**kwargs: None, Optional-

Additional arguments for Parquet reader engines.

Previous Next

BetterDocs

Support

EmailDiscordForms

Documentations

Python

Company

AboutDocs

Policies

Terms of ServicePrivacy Policy