Jump to content

Dataflow: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
add to nebulous lead, more work needed before {{lead too short}} can be removed
Added short description
Tags: Mobile edit Mobile app edit Android app edit
 
(30 intermediate revisions by 19 users not shown)
Line 1: Line 1:
{{Short description|Computing concept}}
{{about|software engineering|the flow of data within a computer network|Traffic flow (computer networking)|the graphical representation of flow of data within an information system|data flow diagram|the hardware architecture|dataflow architecture}}
{{about|software engineering|the flow of data within a computer network|Traffic flow (computer networking)|the graphical representation of flow of data within an information system|data flow diagram|the hardware architecture|Dataflow architecture|the Dubai-based company|DataFlow Group}}
{{Multiple issues|
{{Multiple issues|
{{lead too short|date=November 2013}}
{{lead too short|date=November 2013}}
Line 5: Line 6:
}}
}}


'''''Dataflow''''' is a term used in [[computing]] which has various meanings depending on application and the context in which the term is used. In the context of [[software architecture]], data flow relates to [[stream processing]] or [[reactive programming]].
In [[computing]], '''dataflow''' is a broad concept, which has various meanings depending on the application and context. In the context of [[software architecture]], data flow relates to [[stream processing]] or [[reactive programming]].


==Software architecture==
==Software architecture==
Dataflow is a software paradigm based on the idea of disconnecting computational actors into stages (pipelines) that can execute concurrently. Dataflow can also be called [[stream processing]] or [[reactive programming]].<ref>[http://www.jonathanbeard.io/blog/2015/09/19/streaming-and-dataflow.html A Short Intro to Stream Processing]</ref>
[[Dataflow programming|Dataflow computing]] is a software paradigm based on the idea of representing computations as a [[directed graph]], where nodes are computations and data flow along the edges.<ref name="sig">{{cite web |last1=Schwarzkopf |first1=Malte |title=The Remarkable Utility of Dataflow Computing |url=https://www.sigops.org/2020/the-remarkable-utility-of-dataflow-computing/ |website=ACM SIGOPS |access-date=31 July 2022 |date=7 March 2020}}</ref> Dataflow can also be called [[stream processing]] or [[reactive programming]].<ref>[http://www.jonathanbeard.io/blog/2015/09/19/streaming-and-dataflow.html A Short Intro to Stream Processing]</ref>


There have been multiple data-flow/stream processing languages of various forms (see [[Stream processing]]). Data-flow hardware (see [[Dataflow architecture]]) is an alternative to the classic [[Von Neumann architecture]]. The most obvious example of data-flow programming is the subset known as [[reactive programming]] with spreadsheets. As a user enters new values, they are instantly transmitted to the next logical "actor" or formula for calculation.
There have been multiple data-flow/stream processing languages of various forms (see [[Stream processing]]). Data-flow hardware (see [[Dataflow architecture]]) is an alternative to the classic [[von Neumann architecture]]. The most obvious example of data-flow programming is the subset known as [[reactive programming]] with spreadsheets. As a user enters new values, they are instantly transmitted to the next logical "actor" or formula for calculation.


[[Distributed data flow]]s have also been proposed as a programming abstraction that captures the dynamics of distributed multi-protocols. The data-centric perspective characteristic of data flow programming promotes high-level functional specifications and simplifies formal reasoning about system components.
[[Distributed data flow]]s have also been proposed as a programming abstraction that captures the dynamics of distributed multi-protocols. The data-centric perspective characteristic of data flow programming promotes high-level functional specifications and simplifies formal reasoning about system components.
Line 16: Line 17:
==Hardware architecture==
==Hardware architecture==
{{main|Dataflow architecture}}
{{main|Dataflow architecture}}
Hardware architectures for dataflow was a major topic in [[Computer architecture]] research in the 1970s and early 1980s. [[Jack Dennis]] of [[MIT]] pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use [[Content-addressable memory]] are called dynamic dataflow machines by [[Arvind (computer scientist)|Arvind]]. They use tags in memory to facilitate parallelism.
Hardware architectures for dataflow was a major topic in [[computer architecture]] research in the 1970s and early 1980s. [[Jack Dennis]] of the [[Massachusetts Institute of Technology]] (MIT) pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use [[content-addressable memory]] are called dynamic dataflow machines by [[Arvind (computer scientist)|Arvind]]. They use tags in memory to facilitate parallelism.
Data flows around the computer through the components of the computer. It gets entered from the input devices and can leave through output devices (printer etc.).
Data flows around the computer through the components of the computer. It gets entered from the input devices and can leave through output devices (printer etc.).


Line 23: Line 24:


In [[Kahn process networks]], named after [[Gilles Kahn]], the processes are ''determinate''. This implies that each determinate process computes a [[continuous function]] from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behavior of such networks can be described by a set of recursive equations, which can be solved using [[fixed point theory]]. The movement and transformation of the data is represented by a series of shapes and lines.
In [[Kahn process networks]], named after [[Gilles Kahn]], the processes are ''determinate''. This implies that each determinate process computes a [[continuous function]] from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behavior of such networks can be described by a set of recursive equations, which can be solved using [[fixed point theory]]. The movement and transformation of the data is represented by a series of shapes and lines.

== Other meanings ==
Dataflow can also refer to:
* [[Power BI]] Dataflow, a [[Power Query]] implementation in the cloud used for transforming source data into [[Data cleansing|cleansed]] Power BI Datasets to be used by Power BI report developers through the [[Microsoft Dataverse]] (formerly called Microsoft Common Data Service).
* [[Google Cloud Dataflow]], a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.


==See also==
==See also==
{{Wiktionary-inline|dataflow}}
* [[BMDFM]]
* [[Binary Modular Dataflow Machine]] (BMDFM)
* [[Communicating Sequential Processes]]
* [[Communicating sequential processes]]
* [[Complex event processing]]
* [[Complex event processing]]
* [[Data flow diagram]]
* [[Data-flow diagram]]
* [[Data-flow analysis]], a type of program analysis
* [[Data-flow analysis]], a type of program analysis
* [[Data stream]]
* [[Data stream]]
* [[Dataflow programming]] (a programming language paradigm)
* [[Dataflow programming]] (a programming language paradigm)
* [[Erlang (programming language)]]
* [[Flow-based programming]] (FBP)
* [[Flow-based programming]] (FBP)
* [[Flow control (data)]]
* [[Functional reactive programming]]
* [[Functional reactive programming]]
* [[Lazy evaluation]]
* [[Lazy evaluation]]
Line 40: Line 49:
* [[Pipeline (computing)]]
* [[Pipeline (computing)]]
* [[Pure Data]]
* [[Pure Data]]
* [[State transition]]
* [[TensorFlow]]
* [[TensorFlow]]
* [[Theano]]
* [[Theano_(software)|Theano]]
* [[Ward-Mellor methodology]]


== References ==
== References ==
{{Reflist}}
{{Reflist}}


== External links ==
{{Wiktionary|dataflow}}
{{External links|date=May 2017}}
* [http://dataflowanalytics.com DataFlow Analytics]: Composable Analytics - Flexible Business Intelligence.
* [http://bmdfm.com BMDFM]: Binary Modular Dataflow Machine, [[BMDFM]].
* [http://greta.cs.ioc.ee/~khoros2/k2tools/cantata/cantata.html Cantata]: Dataflow Visual Language for [[image processing]].
* [http://common-lisp.net/project/cells/ Cells]: Dataflow extension to [[Common Lisp]] [[Common Lisp Object System|Object System]], CLOS.
* [http://code.google.com/p/dc-lib/ DC]: Library that allows the embedding of one-way dataflow constraints in a C/C++ program.
* [http://www.iseesystems.com/softwares/Education/StellaSoftware.aspx Stella]: Dataflow Visual Language for dynamic dataflow [[Mathematical model|modeling]] and [[Computer simulation|simulation]].
* [http://www-sop.inria.fr/members/Jean-Vivien.Millo/kpassa/index.php KPASSA] : a tool for static-scheduling, performance analysis and optimizations for DataFlow models.
* [http://www.pointillistic.com/open-REBOL/moa/steel/liquid/index.html Liquid Rebol]
* [http://www.es.ele.tue.nl/sdf3 SDF3] : Performance analysis tool for DataFlow Model
* [https://github.com/larrytheliquid/dataflow/tree/master Ruby Dataflow] : Ruby gem adding Dataflow variable support
* Acar ''et al.'', [http://citeseer.ist.psu.edu/old/752721.html Adaptive Functional Programming], POPL 2002
* [https://web.archive.org/web/20130119045517/http://doc.akka.io/docs/akka/snapshot/scala/dataflow.html Scala Dataflow] : The Akka toolkit provides (among other things) dataflow concurrency in Scala
* [http://www.tensorflow.org/ TensorFlow] : Google's open source ([[Apache License|Apache 2.0]]) second-generation [[Python (programming language)|Python]] and [[C++]] machine learning library using dataflow graphs
* [http://flink.apache.org/ Apache Flink] : An open-source stream processing framework based on the dataflow programming model<ref>Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S. et al. (2015) Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4)</ref>



<!-- [[Category:Computer data]] Dataflow has nothing to do with this category! -->

<!-- [[Category:Computer data]] Dataflow has nothing to do with this category! -->
[[Category:Computer architecture]]
[[Category:Computer architecture]]
[[Category:Models of computation]]
[[Category:Models of computation]]

[[it:Dataflow (microprocessori)]]

Latest revision as of 13:49, 25 June 2024

In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming.

Software architecture

[edit]

Dataflow computing is a software paradigm based on the idea of representing computations as a directed graph, where nodes are computations and data flow along the edges.[1] Dataflow can also be called stream processing or reactive programming.[2]

There have been multiple data-flow/stream processing languages of various forms (see Stream processing). Data-flow hardware (see Dataflow architecture) is an alternative to the classic von Neumann architecture. The most obvious example of data-flow programming is the subset known as reactive programming with spreadsheets. As a user enters new values, they are instantly transmitted to the next logical "actor" or formula for calculation.

Distributed data flows have also been proposed as a programming abstraction that captures the dynamics of distributed multi-protocols. The data-centric perspective characteristic of data flow programming promotes high-level functional specifications and simplifies formal reasoning about system components.

Hardware architecture

[edit]

Hardware architectures for dataflow was a major topic in computer architecture research in the 1970s and early 1980s. Jack Dennis of the Massachusetts Institute of Technology (MIT) pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use content-addressable memory are called dynamic dataflow machines by Arvind. They use tags in memory to facilitate parallelism. Data flows around the computer through the components of the computer. It gets entered from the input devices and can leave through output devices (printer etc.).

Concurrency

[edit]

A dataflow network is a network of concurrently executing processes or automata that can communicate by sending data over channels (see message passing.)

In Kahn process networks, named after Gilles Kahn, the processes are determinate. This implies that each determinate process computes a continuous function from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behavior of such networks can be described by a set of recursive equations, which can be solved using fixed point theory. The movement and transformation of the data is represented by a series of shapes and lines.

Other meanings

[edit]

Dataflow can also refer to:

  • Power BI Dataflow, a Power Query implementation in the cloud used for transforming source data into cleansed Power BI Datasets to be used by Power BI report developers through the Microsoft Dataverse (formerly called Microsoft Common Data Service).
  • Google Cloud Dataflow, a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.

See also

[edit]

The dictionary definition of dataflow at Wiktionary

References

[edit]
  1. ^ Schwarzkopf, Malte (7 March 2020). "The Remarkable Utility of Dataflow Computing". ACM SIGOPS. Retrieved 31 July 2022.
  2. ^ A Short Intro to Stream Processing