It is built on top of pd.read_table() method.
Path or file-like object containing the fixed-width file.
List of tuples specifying start and end positions of each column; 'infer' detects automatically.
import pandas as pd
# Create dataset
data = "123Alice 25\n456Bob 30\n789Charlie 35"
file_path = "file2.fwf"
with open(file_path, "w") as f:
f.write(data)
# Specify column start and end positions
colspecs = [(0, 3), (3, 10), (10, 13)]
df1 = pd.read_fwf(file_path, colspecs='infer')
df2 = pd.read_fwf(file_path, colspecs=colspecs)
print(f"Infer:\n{df1}\nList of tuples:\n{df2}")
'''
Output:
Infer:
123Alice 25
0 456Bob 30
1 789Charlie 35
List of tuples:
123 Alice 25
0 456 Bob 30
1 789 Charlie 35
'''
Alternative to colspecs, specifying widths of each column.
import pandas as pd
# Create dataset
data = "123Alice 25\n456Bob 30\n789Charlie35"
file_path = "file3.fwf"
with open(file_path, "w") as f:
f.write(data)
# Specify column widths
widths = [3, 6, 2]
df = pd.read_fwf(file_path, widths=widths)
print(df)
'''
Output:
123 Alice 2
0 456 Bob 3
1 789 Charli e3
'''
Number of rows to infer column widths when colspecs='infer'.
import pandas as pd
# Create dataset
data = "123 Alice 25\n456 Bob 30\n789 Charlie 35"
file_path = "file4.fwf"
with open(file_path, "w") as f:
f.write(data)
# Infer column widths from the first 2 rows
df = pd.read_fwf(file_path, colspecs="infer", infer_nrows=2)
print(df)
'''
Output:
123 Alice 25
0 456 Bob 30
1 789 Charl e
'''
The dtype_backend parameter is new in Pandas 2.0 which is used to specify the backend for handling the types of data when reading a file.
Returns an iterator instead of a full DataFrame.
Returns DataFrame.
import pandas as pd
# Create dataset
data = "ID Name Age\n123 Alice 25\n456 Bob 30\n789 Charlie 35"
file_path = "file6.fwf"
with open(file_path, "w") as f:
f.write(data)
# Use iterator to read chunks
df_iterator = pd.read_fwf(file_path, iterator=False)
print(df_iterator)
'''
Output:
ID Name Age
0 123 Alice 25
1 456 Bob 30
2 789 Charlie 35
'''
It enables memory-mapped file reading, which uses the operating system's virtual memory to map the contents of the file directly into memory, improving performance for very large files.
import pandas as pd
# Create dataset
data = "ID Name Age\n123 Alice 25\n456 Bob 30\n789 Charlie 35"
file_path = "file6.fwf"
with open(file_path, "w") as f:
f.write(data)
# Use iterator to read chunks
df_iterator = pd.read_fwf(file_path, iterator=True, chunksize=1)
print(next(df_iterator))
'''
Output:
ID Name Age
0 123 Alice 25
'''
Number of rows per chunk when using an iterator.
import pandas as pd
# Create dataset
data = "ID Name Age\n123 Alice 25\n456 Bob 30\n789 Charlie 35"
file_path = "file6.fwf"
with open(file_path, "w") as f:
f.write(data)
# Use iterator to read chunks
df_iterator = pd.read_fwf(file_path, iterator=True, chunksize=2)
print(next(df_iterator))
'''
Output:
ID Name Age
0 123 Alice 25
1 456 Bob 30
'''
This parameter allows you to pass additional keyword arguments that are forwarded to the underlying pd.read_table method.
import pandas as pd
# Create dataset
data = "123Alice 25\n456Bob 30\n789Charlie 35"
file_path = "file8.txt"
with open(file_path, "w") as f:
f.write(data)
# Additional arguments being passed to pd.read_table()
df = pd.read_fwf(file_path, names=["ID", "Name", "Age"])
print(df)
'''
Output:
ID Name Age
0 123Alice 25 NaN
1 456Bob 30 NaN
2 789Charlie 35 NaN
'''