In the first statement, we join tables Employee and Salary based on Employee_ID and we save down the result in parquet and JSON format.In the above code, we read 3 datasets - employee, salary and ratings. exportToS3AndJSON("s3://****/employee_ratings") exportToS3AndJSON("s3://****/employee_salary") println("Joining employee with ratings") employee.join(ratings, Seq("employee_id")). val employee = ("s3://****/employee") val salary = ("s3://****/salary") val ratings = ("s3://****/ratings") println("Joining employee with salary") employee.join(salary, Seq("employee_id")). Let’s look at the usual way we write our Spark code and then see how Future can help us. A Future gives you a simple way to run a job inside your spark application concurrently. Scala Futuresįutures are a means of performing asynchronous programming in Scala. In Scala, you can achieve this using Future. When the work is complete, it notifies the main thread about the completion or failure of the worker thread. This is a type of parallel programming in which a unit of work is allowed to run separately from the primary application thread. This article will help you to maximize the parallelization that you can achieve from Spark. But, this doesn’t mean it can run two independent jobs in parallel. Spark is known for breaking down a big job and running individual tasks in parallel.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |