The primary two components of pandas are the Series and DataFrame.
A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of a collection of Series.
A Pandas DataFrame is a two-dimensional, labeled data structure within the Pandas library for Python. Resembling a table or spreadsheet, it consists of rows and columns, where each column can hold different data types (numeric, string, boolean, etc.). The DataFrame provides powerful tools for data manipulation, cleaning, analysis, and exploration. It allows for easy indexing, slicing, merging, reshaping, and aggregating data, making it a versatile and fundamental tool in data science and analysis workflows.
A Pandas Series is a one-dimensional labeled array capable of holding various data types (integers, strings, floats, etc.) in a tabular form. It resembles a column in a spreadsheet or a simple array/list with associated index labels for each element.
Creating a Series
There are different ways we can add data to a Series. We start out with a simple list:
from pandas import Series # Note the initial upper-case letter
sneeze_counts = Series(data=[32, 41, 56, 62, 30, 22, 17])
print(sneeze_counts)
0 32
1 41
2 56
3 62
4 30
5 22
6 17
dtype: int64
Note: Note that the Series automatically adds an index on the left side. It also automatically infers the best fitting data type for the elements (here int64 = 64-bit integer). pandas introduces the series as a new data type (like int, str and all the others) and as such the value of sneeze_counts is actually the whole series at once.
Extra Information