PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, March 1, 2022

[FIXED] Airflow Data Handling

 March 01, 2022     airflow, composer-php, python     No comments   

Issue

New to Airflow, learning via Udemy and reading the documentation.

Could some kind soul help me to understand what must be terribly basic, but I just can't seem to locate a good explanation and now its getting in the way of my progress.

How does one task, or activity, pass it's results on to the next for subsequent processing? I understand that tasks/activities should be idempotent; so task #1 for example gets some data, task #2 does something to it and each time they run the same things happen.

I realize that there is no concept of data flowing from one task to another as Airflow is really the orchestrator.

Let's take as an example the case of something like the SimpleHttpOperator, which I am using in a task to GET some data from a web resource, how can I make that data available to the next task when there does not appear to be any methods inside SimpleHttpOperator, that I can see, which allows me to control where to put that data so that the next task can then work with it.

There must be some 'standard' way of placing the data retrieved by the SimpleHttpOperator so that the next task is able to (read it and ) work with it?

I'm sure I must be missing something really basic and would appreciate any pointers.


Solution

As of recently that would be my answer:

You can use XComs to pass meta-data between tasks. And as you rightfully noticed, Airflow should not pass DATA. So you should never, ever use SimpleHTTPOperator to retrieve DATA. The best you can do is to use it to retrieve some meta-data that you pass to another operator so that it knows what to do with DATA stored elsewhere (For example in GCS bucket, or S3).

Usually, Airflow calls some external service that places it somewhere and returns the location of data in METADATA.

However things change.

As of recently, we've introduced custom XCom backends (which can be S3 or GCS for example) and you could use the very same XComs to actually pass DATA.

We just finished Airflow Summit 2021 and you can see more about the whole custom xcom backends concept here: https://airflowsummit.org/sessions/2021/customizing-xcom-to-enhance-data-sharing-between-tasks/

Fresh from the press!



Answered By - Jarek Potiuk
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing