Input data to populate the DataFrame (e.g., dictionary, ndarray, DataFrame, etc.).
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Better', 'Docs'], 'Age': [25, 30]}
df = pd.DataFrame(data=data)
print(df)
'''
Output:
Name Age
0 Better 25
1 Docs 30
'''
If data is empty, it will return an empty DataFrame.
A sequence of labels for the data. If None, a default integer index (0, 1, 2, ...) is assigned.
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Better', 'Docs'], 'Age': [25, 30]}
df = pd.DataFrame(data=data, index=["Row1", "Row2"])
print(df)
'''
Output:
Name Age
Row1 Better 25
Row2 Docs 30
'''
If index is provided, it must always match with the length of the data.
Specifies column labels. If None, a default integer index (0, 1, 2, ...) is assigned.
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Better', 'Docs'], 'Age': [25, 30]}
df = pd.DataFrame(data=data, index=("Row1", "Row2"), columns=["Name", "Age"])
print(df)
'''
Output:
Name Age
Row1 Better 25
Row2 Docs 30
'''
The names in the columns must always match the names in the data.
If it does not match, the values for that particular column will be NaN.
Specifies the data-type of the DataFrame. If not provided, it’s inferred from the input.
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Better', 'Docs'], 'Age': [25, 30]}
df = pd.DataFrame(data=data, index=("Row1", "Row2"), columns=["Name", "Age"], dtype='object')
print(df)
'''
Output:
Name Age
Row1 Better 25
Row2 Docs 30
'''
int8: 8-bit signed integer (range: -128 to 127).
int16: 16-bit signed integer (range: -32,768 to 32,767).
int32: 32-bit signed integer (range: -2,147,483,648 to 2,147,483,647).
int64: 64-bit signed integer (large integer range).
uint8: 8-bit unsigned integer (range: 0 to 255).
uint16: 16-bit unsigned integer (range: 0 to 65,535).
uint32: 32-bit unsigned integer (range: 0 to 4,294,967,295).
uint64: 64-bit unsigned integer (large positive integer range).
float16: Half precision floating-point (16-bit, for low-precision computations).
float32: Single precision floating-point (32-bit).
float64: Double precision floating-point (64-bit, the default float in NumPy).
float128: Extended precision floating-point (128-bit, availability depends on system).
complex64: Complex number represented by two 32-bit floats (for real and imaginary parts).
complex128: Complex number represented by two 64-bit floats (default complex dtype).
complex256: Complex number represented by two 128-bit floats (system-dependent).
bool: Boolean type, can be either True or False (stored as 1-bit but takes up a full byte).
str: Fixed-length Unicode string, specified by S + length (e.g., S10 for a 10-character string).
unicode: Fixed-length Unicode string with support for multiple characters (uses U).
object: Allows storing any Python object, including mixed types, strings, or other arrays. Useful for heterogeneous data but slower than native NumPy types.
datetime64: Stores dates and times with varying precisions (e.g., Y, M, D, h, m, s, ms, us, ns, ps, fs, as). Example: datetime64('2003-10-02')
timedelta64: Represents time durations with units (same units as datetime64).
If True, a copy of the input data is created. This is useful if you want to ensure the original data remains unchanged.
Changes to the new array will also affect the original array.
A new DataFrame is created, and the data is copied.
copy is handy only when the data is of the type np.ndarray.
Modification to the original_array or new_array will affect both the arrays.
Refer to numpy documentation.