Semi-structured data: Difference between revisions

Content deleted Content added
m Corrected the Link to Peter Buneman’s Tutorial, old link was dead.
Adding local short description: "Data organized by tags but not tables", overriding Wikidata description "form of structured data"
 
(20 intermediate revisions by 17 users not shown)
Line 1:
{{Short description|Data organized by tags but not tables}}
'''Semi-structured data'''<ref>Tutorial{{cite onweb semi|author=Peter Buneman |title=<!--structured dataTutorial byon Peter--> BunemanSemistructured fromdata |journal=Symposium on Principles of Database Systems, |date=1997 [http|url=https://homepages.inf.ed.ac.uk/opb/papers/PODS1997a.pdf]}}</ref> is a form of [[structured data]] that does not conform withobey the formaltabular structure of data models associated with [[relational database]]s or other forms of [[Table (database)|data tables]], but nonetheless contains [[tag (metadata)|tags]] or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as [[self-describing]] structure.
 
In semi-structured data, the entities belonging to the same class may have different [[attribute (research)|attribute]]s even though they are grouped together, and the attributes' order is not important.
Line 5 ⟶ 6:
Semi-structured data are increasingly occurring since the advent of the [[Internet]] where [[full-text]] [[documents]] and [[databases]] are not the only forms of data anymore, and different applications need a medium for [[information exchange|exchanging information]]. In [[Object database|object-oriented databases]], one often finds semi-structured data.
 
== Types of Semi-structured data ==
 
===XML===
[[XML]],<ref>[http://db.cis.upenn.edu/research/SS_XML.html The Penn database group has semi-structured and XML data project]</ref> other markup languages, [[email]], and [[Electronic Data Interchange|EDI]] are all forms of semi-structured data. [[Object Exchange Model|OEM]] (Object Exchange Model) <ref>[http://infolab.stanford.edu/lore/home/index.html Stanford Universities Lore DBMS]</ref> was created prior to XML as a means of self-describing a data structure. XML has been popularized by web services that are developed utilizing [[SOAP]] principles.
 
Some types of data described here as "semi-structured", especially XML, suffer from the impression that they are incapable of structural rigor at the same functional level as Relational Tables and Rows. Indeed, the view of XML as inherently semi-structured (previously, it was referred to as "unstructured") has handicapped its use for a widening range of data-centric applications. Even documents, normally thought of as the epitome of semi-structure, can be designed with virtually the same rigor as [[database schema]], enforced by the [[XML schema]] and processed by both commercial and custom software programs without reducing their usability by human readers.
[[XML]],<ref>[http://db.cis.upenn.edu/research/SS_XML.html The Penn database group has semi-structured and XML data project]</ref> other markup languages, [[email]], and [[Electronic Data Interchange|EDI]] are all forms of semi-structured data. [[Object Exchange Model|OEM]] (Object Exchange Model) <ref>[http://infolab.stanford.edu/lore/home/index.html Stanford Universities Lore DBMS]</ref> was created prior to XML as a means of self-describing a data structure. XML has been popularized by web services that are developed utilizing [[SOAP]] principles.
 
Some types of data described here as "semi-structured", especially XML, suffer from the impression that they are incapable of structural rigor at the same functional level as Relational Tables and Rows. Indeed, the view of XML as inherently semi-structured (previously, it was referred to as "unstructured") has handicapped its use for a widening range of data-centric applications. Even documents, normally thought of as the epitome of semi-structure, can be designed with virtually the same rigor as database schema, enforced by the XML schema and processed by both commercial and custom software programs without reducing their usability by human readers.
 
In view of this fact, XML might be referred to as having "flexible structure" capable of human-centric flow and hierarchy as well as highly rigorous element structure and data typing.
Line 18:
 
===JSON===
[[JSON]] or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML. JSON has been popularized by web services developed utilizing [[REST]] principles.
 
There is a new breed of databasesDatabases such as [[MongoDB]] and [[Couchbase]] that store data natively in JSON format, leveraging the pros of semi-structured data architecture.
[[JSON]] or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML. JSON has been popularized by web services developed utilizing [[REST]] principles.
 
==Pros and cons==
There is a new breed of databases such as [[MongoDB]] and [[Couchbase]] that store data natively in JSON format, leveraging the pros of semi-structured data architecture.
{{Unreferenced section|auto=yes|date=June 2024}}
 
==Pros and Cons of Using a Semi-structured Data Format==
 
===Advantages===
Line 32:
===Disadvantages===
* The traditional relational data model has a popular and ready-made query language, [[SQL]].
* Prone to "garbage in, garbage out"; by removing restraints from the data model, there is less fore-thoughtforethought that is necessary to operate a data application.
 
==Semi-structured model==
{{Unreferenced section|auto=yes|date=June 2024}}
 
The '''semi-structured model''' is a [[database model]] where there is no separation between the [[Data (computing)|data]] and the [[Database schema|schema]], and the amount of structure used depends on the purpose.
 
The advantages of this model are the following:
* It can represent the information of some data sources that cannot be constrained by schema.
* It provides a flexible format for data exchange between different types of databases.
* It can be helpful to view structured data as semi-structured (for browsing purposes).
* The schema can easily be changed.
* The data transfer format may be portable.
 
The primary trade-off being made in using a semi-structured [[database model]] is that queries cannot be made as efficiently as in a more constrained structure, such as in the [[relational model]]. Typically the records in a semi-structured database are stored with unique IDs that are referenced with pointers to their location on disk. This makes navigational or path-based queries quite efficient, but for doing searches over many records (as is typical in [[SQL]]), it is not as efficient because it has to seek around the disk following pointers.
 
The [[Object Exchange Model]] (OEM) is one standard to express semi-structured data, another way is [[XML]].
 
==See also==
* [[Semi-structured model]]
* [[Structured SearchNoSQL]]
* [[Unstructured data]]
*Key-objects
* [[NoSQLStructured data]]
*[[Unstructured data]]
 
== References ==
<references/>
 
== External links ==
* [http://db.cis.upenn.edu/research/SS_XML.html UPenn Database Group]{{snd}} semi- Semistructuredstructured data and XML
* [http://www.ibmbigdatahub.com/blog/semi-structured-data-analytics-relational-or-hadoop-platform-part-1 Semi-Structured data analytics: Relational or Hadoop platform?] by IBM