PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, August 15, 2022

[FIXED] How to output write to a single CSV file from inside Google Cloud Data Fusion

 August 15, 2022     csv, google-cloud-data-fusion, google-cloud-platform, output, pipeline     No comments   

Issue

I'm running an ETL pipeline through Google Cloud Data Fusion. A quick summary of the pipeline's actions:

  1. Take in a csv file which is a list of names
  2. Take in a table from bigquery-public-data
  3. Join the two together and then output the results to a table
  4. Also output the results to a Group By, where is consolidates duplicates, and sums their scores.
  5. Output the resulting list of author names and scores to both a table, and a CSV file in a Google Cloud Storage bucket.

All of this should be working properly, the two tables are appearing with the correct data, and are queryable.

However, the CSV output from the Group By is coming out into the GCS bucket as 37 different parts, each named with the default naming system ("part-r-00000" to "part-r-00036"). They do appear in the CSV format (both text/csv and application/csv have resulted in usable CSV files.

I want the output to export into the GCS bucket folders as a single csv file with a given name (author_rankings.csv). Below I'm attaching a screenshot of the pipeline and an image of some of the output. Please let me know if I can provide any additional information.

Thank you for any insight.

Data Fusion pipeline

Current Output as many files


Solution

You can use RDD repartitioner plugin from the hub before the CSV output sink to create 1 partition. This one partition will be written to the single file. Please look at the documentation tab of the plugin for more details.

enter image description here

Thanks and Regards,

Sagar



Answered By - Sagar Kapare
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing