The primary two components of pandas are the Series
and DataFrame
.
A Series
is essentially a column, and a DataFrame
is a multi-dimensional table made up of a collection of Series
.
A Pandas DataFrame
is a two-dimensional, labeled data structure within the Pandas library for Python. Resembling a table or spreadsheet, it consists of rows and columns, where each column can hold different data types (numeric, string, boolean, etc.). The DataFrame
provides powerful tools for data manipulation, cleaning, analysis, and exploration. It allows for easy indexing, slicing, merging, reshaping, and aggregating data, making it a versatile and fundamental tool in data science and analysis workflows.
A Pandas Series
is a one-dimensional labeled array capable of holding various data types (integers, strings, floats, etc.) in a tabular form. It resembles a column in a spreadsheet or a simple array/list with associated index labels for each element.
Creating a Series
There are different ways we can add data to a Series. We start out with a simple list:
from pandas import Series # Note the initial upper-case letter
sneeze_counts = Series(data=[32, 41, 56, 62, 30, 22, 17])
print(sneeze_counts)
0 32
1 41
2 56
3 62
4 30
5 22
6 17
dtype: int64
Note: Note that the Series automatically adds an index on the left side. It also automatically infers the best fitting data type for the elements (here int64
= 64-bit integer). pandas
introduces the series as a new data type (like int
, str
and all the others) and as such the value of sneeze_counts
is actually the whole series at once.
Extra Information