Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

julia - Joining Multiple Data Frames

I am wondering if there is a way in Julia DataFrames to join multiple data frames in one go,

 using DataFrames

 employer = DataFrame(
    ID = Array{Int64}([01,02,03,04,05,09,11,20]),
    name = Array{String}(["Matthews","Daniella", "Kofi", "Vladmir", "Jean", "James", "Ayo", "Bill"])
    )

salary = DataFrame(
    ID = Array{Int64}([01,02,03,04,05,06,08,23]),
    amount = Array{Int64}([2050,3000,3500,3500,2500,3400,2700,4500])
)

hours = DataFrame(
    ID = Array{Int64}([01,02,03,04,05,08,09,23]),
    time = Array{Int64}([40,40,40,40,40,38,45,50])
)

# I tried adding them in an array but ofcoures that results in an error
empSalHrs = innerjoin([employer,salary,hours], on = :ID)

# In python you can achieve this using
import pandas as pd 
from functools import reduce

df = reduce(lambda l,r : pd.merge(l,r, on = "ID"), [employer, salary, hours])

Is there a similar way to do this in julia?

question from:https://stackoverflow.com/questions/65649732/joining-multiple-data-frames

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You were almost there. As it is written in DataFrames.jl manual you just need to pass more than one dataframe as an argument.

using DataFrames

 employer = DataFrame(
    ID = [01,02,03,04,05,09,11,20],
    name = ["Matthews","Daniella", "Kofi", "Vladmir", "Jean", "James", "Ayo", "Bill"])
    

salary = DataFrame(
    ID = [01,02,03,04,05,06,08,23],
    amount = [2050,3000,3500,3500,2500,3400,2700,4500])


hours = DataFrame(
    ID = [01,02,03,04,05,08,09,23],
    time = [40,40,40,40,40,38,45,50]
)

empSalHrs = innerjoin(employer,salary,hours, on = :ID)

If for some reason you need to put your dataframes in a Vector, you can use splitting to achieve the same result

empSalHrs = innerjoin([employer,salary,hours]..., on = :ID)

Also, note that I've slightly changed the definitions of the dataframes. Since Array{Int} is an abstract type it shouldn't be used for the variable declaration, because it is bad for performance. It may be not important in this particular scenario, but it's better to make good habits from the start. Instead of Array{Int} one can use

  • Array{Int, 1}([1, 2, 3, 4])
  • Vector{Int}([1, 2, 3, 4])
  • Int[1, 2, 3]
  • [1, 2, 3]

The last one is legit because Julia can infer the type of the container on its own in this simple scenario.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...