Sector/Sphere: Difference between revisions

Content deleted Content added
Xenodevil (talk | contribs)
m Typos
#suggestededit-add 1.0
Tags: Mobile edit Mobile app edit Android app edit
 
(8 intermediate revisions by 8 users not shown)
Line 1:
{{Short description|Open source software suite}}
{{Infobox software
| name = Sector/Sphere
Line 4 ⟶ 5:
| caption = Logo
| developer = The Sector Alliance
| status = Active
| latest release version = 2.8
| latest release date = {{release date|2012|10|08}}
Line 13:
| genre = [[Distributed file system|Distributed File System]]
| license = [[Apache License 2.0]]
| website = http://{{URL |sector.sourceforge.net/}}
}}
'''Sector/Sphere''' is an [[open source software]] suite for high-performance [[distributed data storage]] and [[Distributed processing|processing]]. It can be broadly compared to [[Google]]'s [[Google File System|GFS]]/ and [[MapReduce]] stacktechnology. Sector is a [[distributed file system]] targeting [[Computer data storage|data storage]] over a large number of commodity computers. Sphere is the programming [[Architecturearchitecture framework|framework]] that supports massive in-storage parallel data processing for data stored in Sector. Additionally, Sector/Sphere is unique in its ability to operateoperates in a [[wide area network]] (WAN) setting.
 
The system was created by Dr. Yunhong Gu (the author of [[UDP-based Data Transfer Protocol|UDT]]) in 2006 and itwas is nowthen maintained by a group of [[open source]]other developers.
 
==Architecture==
Sector/Sphere consists of four components. The security server maintains the system security policies such as user accounts and the IP access control list. One or more master servers control operations of the overall system in addition to responding to various user requests. The slave nodes store the data files and process them upon request. The clients are the users' computers from which system access and data processing requests are issued. Also, Sector/Sphere is written in [[C++]] and is claimed to achieve with its architecture a two to four times better performance than the competitor [[Hadoop]] which is written in [[Java (software)|Java]],<ref>[http://sector.sourceforge.net/pub/Sector%20vs%20Hadoop%20-%20v1.pdf Sector vs. Hadoop - A Brief Comparison Between the Two Systems]</ref> a statement supported by an [[Aster Data Systems]] [[Benchmark (computing)|benchmark]]<ref name="benchmark">[http://decisionstats.com/2010/09/26/sector-sphere-faster-than-hadoopmapreduce-at-terasort/ Sector/ Sphere – Faster than Hadoop/Mapreduce at Terasort] September 26, 2010 Ajay Ohri</ref> and the winning of the "bandwidth challenge" of the [[Supercomputing Conference]] 2006,<ref>[httphttps://web.archive.org/web/20091015073135/http://www.hpcwire.com/offthewire/17889209.html NCDM Wins Bandwidth Challenge at SC06, HPCWire, November 24, 2006]</ref> 2008,<ref>[httphttps://web.archive.org/web/20091015073216/http://www.hpcwire.com/offthewire/UIC_Groups_Win_Bandwidth_Challenge_Award.html? UIC Groups Win Bandwidth Challenge Award, HPCWire, November 20, 2008]</ref> and 2009.<ref name="BWCSC09">[httphttps://web.archive.org/web/20120229221150/http://www.hpcwire.com/hpcwire/2009-12-08/open_cloud_testbed_wins_bandwidth_challenge_at_sc09.html Open Cloud Testbed Wins Bandwidth Challenge at SC09, December 8, 2009]</ref>
 
[[File:Sector-arch.jpg|thumb|3003px|Architecture of Sector/Sphere with its four components.]]
 
===Sector Distributed File System===
Sector is a user space file system which relies on the local/native file system of each node for storing uploaded files. Sector provides file system-level fault tolerance by replication, thus it does not require hardware fault tolerance such as [[RAID]], which is usually very expensive.
 
Line 39:
The Sector client provides an [[API]] for application development which allows user applications to interact directly with Sector. The software also comes prepackaged with a set of command-line tools for accessing the file system. Finally, Sector supports the [[Filesystem in Userspace|FUSE]] interface; presenting a mountable file system that is accessible via standard command-line tools.
 
===Sphere Parallel Data Processing Engine===
Sphere is a parallel data processing engine integrated in Sector and it can be used to process data stored in Sector in parallel. It can broadly compared to [[MapReduce]], but it uses generic Useruser Defineddefined Functionsfunctions (UDFs) instead of the map and reduce functions. A UDF can be either a map function or a reduce function, or even others.
Benefiting from the underlying Sector file system and the flexibility of the UDF model, Sphere can manipulate the locality of both input data and output data, thus it can effectively support multiple input datasets, combinative and iterative operations and even legacy application executable.
 
Benefiting from the underlying Sector file system and the flexibility of the UDF model, Sphere can manipulate the locality of both input data and output data, thus it can effectively support multiple input datasets, combinative and iterative operations and even legacy application executable.
 
Because Sector does not split user files, Sphere can simply wrap up many existing applications that accepts files or directories as input, without rewriting them. Thus it can provide greater compatibility to legacy applications.{{Citation needed|date=October 2012}}
 
==See also==
{{Portal|Free and open-source software}}
* [[Pentaho]] - Open source data integration (Kettle), analytics, reporting, visualization and predictive analytics directly from Hadoop nodes
* [[Nutch]] - An effort to build an open source search engine based on [[Lucene]] and Hadoop, also created by Doug Cutting
* [[Datameer]] Analytics Solution (DAS) - data source integration, storage, analytics engine and visualization
* [[Apache Accumulo]] - Secure Big Table
* [[HBase]] - [[BigTableBigtable]]-model database
* [[Hypertable]] - HBase alternative
* [[MapReduce]] - Hadoop's fundamental data filtering algorithm
Line 63 ⟶ 61:
 
==Literature==
* Yunhong Gu, Robert Grossman, [http://rsta.royalsocietypublishingdoi.org/content10.1098/367/1897/2429rsta.full2009.pdf+html0053 Sector and Sphere: The Design and Implementation of a High Performance Data Cloud], Theme Issue of the Philosophical Transactions of the Royal Society A: Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructure, 28 June 2009 vol. 367 no. 1897 2429-24452429–2445.
 
==References==