Loading, please wait...

A to Z Full Forms and Acronyms

Data structure -DataFrame |Pandas tutorial

this article is a brief explanation of data structure- DataFrame of pandas, what is Dataframe constructor, what parameters it takes , different data inputs that can be used to create a dataframe.

Data Structure – DataFrames

Before diving into this article, I would suggest reading my previous articles part-1 and part-2. Through this article,
You will learn the following thing about the data structure- DataFrame of Pandas in brief.

Topics that we will cover here are:

• What is the data frame?
• What is the data frame constructor?
• What are the data inputs we can use to create a data frame?
• Different ways of creating a data frame.

What is a data frame?


• DataFrame is a container of the number of the series data structure of pandas.
• Dataframe has data aligned in a tabular way (rows and columns).
• It is a 2Dimensional data structure of pandas.
• It is mutable in size.
• Column data can be of different data types.
• Arithmetic operations can be performed on rows and columns as well.

What is the data frame constructor?

Syntax:

pandas.DataFrame( data, index, columns, dtype, copy)

Parameters

data: It can be a ndarray, series, map, lists, dict, constants, and also another DataFrame.
Index: For the row labels, the Index to be used for the resulting frame is Optional and by Default np.arange(n) if the index is not passed.
Columns: The optional default syntax is - np.arange(n). This is only true if the index is not passed.
Dtype: Data type of each column.


Copy: This is used for copying of data, the default is False.

What are the data inputs we can use to create a dataframe?

A pandas dataframe can be created using different data inputs, all those inputs are listed below:

• Lists
• dict
• Series
• Numpy ndarrays
• Another DataFrame

Different ways of creating a dataframe.

A). Creating an Empty DataFrame?
Code:

#import the pandas' library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print(df)

output:

Empty DataFrame
Columns: []
Index: []

B). Creating a DataFrame from Lists

#using single list
code:

import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

output:

0
0 20
1 30
2 40
3 50

#using multiple list
Code:

#import the pandas library and aliasing as pd
import pandas as pd
data = [['SID', 20],['MONA' ,30],['BOB', 40],['SOHAN' ,50]]
df = pd.DataFrame(data, columns=['Name' ,'Age'])
print(df)

output:
Name Age
0 SID 20
1 MONA 30
2 BOB 40
3 SOHAN 50

#create dataframe using dict with default index.
Code:

import pandas as pd
data = {'Name':['Rohan', 'Sohan', 'Sid', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data,index=['rank1','rank2','rank3','rank4'])
print(df)


#output:
Name Age
rank1 Rohan 28
rank2 Sohan 34
rank3 Sid 29
rank4 Ricky 42

D). Create a DataFrame from List of Dictionaries.
By default dictionary, keys are taken as column names.

# create a DataFrame by passing a list of dictionaries.
Code:

import pandas as pd
data = [{'apple': 1, 'banana': 2},{'pear': 5, 'guava': 10, 'grapes': 20}]
df = pd.DataFrame(data)
print(df)

#output:
apple banana pear guava grapes
0 1.0 2.0 NaN NaN NaN
1 NaN NaN 5.0 10.0 20.0
Note − In missing areas NaN(Not a Number) is added..

#create a DataFrame by passing a list of dictionaries and the row indices.
Code:

import pandas as pd
data = [{'apple': 1, 'banana': 2},{'pear': 5, 'guava': 10, 'grapes': 20}]
df = pd.DataFrame(data,index=['first', 'second'])
print(df)

#output:
apple banana pear guava grapes
first 1.0 2.0 NaN NaN NaN
second NaN NaN 5.0 10.0 20.0

#create a DataFrame with a list of dictionaries, row indices, and column indices.

code:

import pandas as pd
data = [{'a': 100, 'b': 200},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values the same as dictionary keys

df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with another name

df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)


#output:
a b
first 100 200
second 5 10
a b1
first 100 NaN
second 5 NaN


E). Create a DataFrame from Dict of Series
Code:

import pandas as pd

d = {'x1' : pd.Series([100, 200, 300], index=['a', 'b', 'c']),
'x2' : pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

#output
x1 x2
a 100.0 10
b 200.0 20
c 300.0 30
d NaN 40
A to Z Full Forms and Acronyms

Related Article