Introduction to Pandas | Pandas Tutorial
Introduction to Pandas
You should have a basic understanding of Python programming terminologies. This library uses most of the functionalities of NumPy. It is suggested that you go through our tutorial on NumPy before getting into this article.
Pandas is an open-source python library providing high-performance, easy-to-use arrangement and data analysis tools for python. Python with pandas is used in a wide range of fields including finance, economics, statistics, analytics, etc. In need of high performance
and a flexible tool for data analysis, the development of pandas started.
Before pandas, the python was majorly used for data mugging and preparation. Using pandas we can accomplish five typical steps in the processing and analysis of data., regardless of the origin of the data
Load, prepare, manipulate, model, and analyze.
Now, python with pandas is used in a wide range of fields including academics and commercial domains like finance, economics,
Statistics, analytics, etc.
Features of Pandas:
• High performance merging and joining of data.
• Fast and efficient DataFrame objects with the default and customized indexing.
• Integrated handling of missing data.
• Reshaping and pivoting of data sets.
• Label based indexing, slicing, and subsetting of huge data sets.
• Group by data for aggregation and transformations.
• Tools are available for loading data into memory data objects with different file formats.
• Time-series functionality is also available.
For installation, use python package installer ‘pip’ & type the given below command on the terminal.
pip install pandas
Pandas Data Structures:
Pandas deals with 3 data structure, the higher dimensional data structure is a container of its lower-dimensional data structure.
• Series: it is one dimensional labeled array of homogenous array, its size is immutable.
Example: the following series is a collection of odd numbers:
• DataFrame: It is a 2D labeled tabular structure with heterogeneously typed columns or we can say it is a container of Series data structure of Pandas. its size is mutable.
• Panel: It is 3D labeled, or we can say it is a container of DataFrames.
Note: DataFrame is used widely and it is one of the most important data structures. The panel is used much less.