Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
248 views
in Technique[技术] by (71.8m points)

python - Plot category, proportion, total

I am trying to make bar plot in python using name, prop, total. The idea is I should have name and then if I can show total streams and what proportion are male.

I have following example data

NAME    prop_male    total 
GGD     0.254147    727240
CCG     0.216658    323510
PPT     0.265414    251023
MMMA    0.185105    210416
JKK     0.434557    201594
BBD     0.279319    198998
KNL.    0.277761    190246
TSK     0.277653    171030
LIS     0.218444    165168
BRK     0.44755     161124

I tried this but somehow I,m missing trick

import pandas as pd import seaborn as sns

x, y, hue = "name", "proportion", "total"

(df[x]
 .groupby(df[hue])
 .value_counts(normalize=True)
 .rename(y)
 .reset_index()
 .pipe((sns.barplot, "data"), x=x, y=y, hue=hue))

could someone suggest/help a meaningful plot where I can show all 3 information together.

Thanks in advance


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is an infinite number of ways to plot these information, however the scale of the columns is quite different if you want to summarise it in a bar chart (a visible one).

The best way is probably what was suggested by Mr. T and the plot looks really nice (i'd add a legend however to explain that the dark blue bar is the male counts while the light blue is the total).

For completeness i'll report other two options which give a less interpretable results ():

You can scale the "total" column to make it visible, You can do a scatter plot

import matplotlib.pyplot as plt
import matplotlib
import numpy as np

Name = ['GGD', 'CCG', 'PPT', 'MMMA', 'JKK', 'BBD',  'KNL']
prop_male = [0.254147, 0.216658, 0.265414, 0.185105, 0.434557, 0.279319, 
0.277761]
total = [727240, 323510, 251023, 210416, 201594, 198998,  190246]

#Plot as bar

x = np.arange(len(Name))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(1,2, figsize=(20,8))
rects1 = ax[0].bar(x - width/2, [float(i)/1e6 for i in total], width, 
             label=r'Total $imes$ 1e-6 ')
rects2 = ax[0].bar(x + width/2, prop_male, width, label='Prop_male')

ax[0].set_xticks(x)
ax[0].set_xticklabels(Name, size=15)
ax[0].legend()

ax[0].set_ylabel("Counts [a.u.]", size=15)

#plot as scatter

norm = matplotlib.colors.Normalize(vmin=0,vmax=len(Name))
mapper = matplotlib.cm.ScalarMappable(norm=norm, cmap='viridis')
colors = np.array([(mapper.to_rgba(v)) for v in range(len(Name))])

for x, y, c in zip(prop_male, total, colors):
    ax[1].plot(x, y, 'o', color=c, markersize=10, alpha=0.8)

cmap = plt.get_cmap('viridis',len(Name))

sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array([])
cbar = plt.colorbar(sm, ticks=np.linspace(0,len(Name),len(Name)))
cbar.ax.set_yticklabels(Name)
cbar.set_label('Name', size=15)

ax[1].set_xlabel("prop_male", size=15)
ax[1].set_ylabel("total", size=15)

The plot should be something like this

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...