Jump to content

Statistical unit: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Undid revision 772520118 by Jytdog: There are 3 types of units that researchers need to understand: scientific, experimental, and observational, discussed in the citation. Hardly citation spam.
(34 intermediate revisions by 23 users not shown)
Line 1: Line 1:
{{Short description|Individual entity for statistical purposes}}
A '''unit''' in a statistical analysis refers to one member of a set of entities being studied. It is the material source for the mathematical abstraction of a "[[random variable]]". Common examples of a unit would be a single person, animal, plant, manufactured item, or country that belongs to a larger collection of such entities being studied.
{{Multiple issues|
{{more citations needed|date=June 2019}}
{{No footnotes|date=November 2019}}
}}
In [[statistics]], a '''unit''' is one member of a set of entities being studied. It is the main source for the mathematical abstraction of a "[[random variable]]". Common examples of a unit would be a single person, animal, plant, manufactured item, or country that belongs to a larger collection of such entities being studied.


==Experimental and sampling units==
Units are often referred to as being either '''experimental units''', '''sampling units''' or, more generally, [[Unit of observation|units of observation]]:
Units are often referred to as being either '''experimental units''', '''sampling units''' or [[Unit of observation|units of observation]]:


* An "experimental unit" is typically thought of as one member of a set of objects that are initially equivalent, with each object then subjected to one of several experimental treatments. In designed experiments, the experimental unit may differ from the unit on which observations are made (observational unit), and the unit of scientific interest, or what the hypothesis is about (scientific unit) <ref>{{cite book|last1=Lazic|first1=SE|title=Experimental Design for Laboratory Biologists: Maximising Information and Improving Reproducibility|date=2016|publisher=Cambridge University Press|isbn=9781107424883|url=http://www.cambridge.org/Lazic}}</ref>.
* An "experimental unit" is typically thought of as one member of a set of objects that are initially equal, with each object then subjected to one of several experimental treatments. Put simply, it is the smallest entity to which a treatment is applied.
* A "sampling unit" is typically thought of as an object that has been sampled from a [[statistical population]]. This term is commonly used in [[opinion polling]] and [[survey sampling]].
* A "sampling unit" is typically thought of as an object that has been sampled from a [[statistical population]]. This term is commonly used in [[opinion polling]] and [[survey sampling]].


For example, in an experiment on educational methods, methods may be applied to classrooms of students. This would make the classroom as the experimental unit. Measurements of progress may be obtained from individual students, as observational units. But the treatment (teaching method) being applied to the class would not be applied independently to the individual students. Hence the student could not be regarded as the experimental unit. The class, or the teacher by method combination if the teacher had multiple classes, would be the appropriate experimental unit.

==Implementation==
In most statistical studies, the goal is to generalize from the observed units to a larger set consisting of all comparable units that exist but are not directly observed. For example, if we randomly sample 100 people and ask them which candidate they intend to vote for in an election, our main interest is in the voting behavior of all eligible voters, not exclusively on the 100 observed units.
In most statistical studies, the goal is to generalize from the observed units to a larger set consisting of all comparable units that exist but are not directly observed. For example, if we randomly sample 100 people and ask them which candidate they intend to vote for in an election, our main interest is in the voting behavior of all eligible voters, not exclusively on the 100 observed units.


In some cases, the observed units may not form a sample from any meaningful population, but rather constitute a [[accidental sampling|convenience sample]], or may represent the entire population of interest. In this situation, we may study the units [[descriptive statistics|descriptively]], or we may study their [[dynamic model|dynamics]] over time. But it typically does not make sense to talk about generalizing to a larger population of such units. Studies involving [[country|countries]] or [[business|business firms]] are often of this type. [[Clinical trial]]s also typically use convenience samples, however the aim is often to make inferences about the efficacy of treatments in other patients, and given the inclusion and exclusion criteria for some clinical trials, the sample may not be representative of the majority of patients with the condition or disease.
In some cases, the observed units may not form a sample from any meaningful population, but rather constitute a [[accidental sampling|convenience sample]], or may represent the entire population of interest. In this situation, we may study the units [[descriptive statistics|descriptively]], or we may study their [[dynamic model|dynamics]] over time. But it typically does not make sense to talk about generalizing to a larger population of such units. Studies involving [[country|countries]] or [[business|business firms]] are often of this type. [[Clinical trial]]s also typically use convenience samples, however the aim is often to make inferences about the effectiveness of treatments in other patients, and given the inclusion and exclusion criteria for some clinical trials, the sample may not be representative of the majority of patients with the condition or disease.


In simple [[data]] sets, the units are in one-to-one correspondence with the data values. In more complex data sets, multiple measurements are made for each unit. For example, if blood pressure measurements are made daily for a week on each subject in a study, there would be seven data values for each statistical unit. Multiple measurements taken on an individual are not [[independent and identically distributed random variables|independent]] (they will be more alike compared to measurements taken on different individuals). Ignoring these dependencies during the analysis can lead to an inflated sample size or [[pseudoreplication]].
In simple [[data]] sets, the units are in one-to-one correspondence with the data values. In more complex data sets, multiple measurements are made for each unit. For example, if blood pressure measurements are made daily for a week on each subject in a study, there would be seven data values for each statistical unit. Multiple measurements taken on an individual are not [[Independent and identically distributed random variables|independent]] (they will be more alike compared to measurements taken on different individuals). Ignoring these dependencies during the analysis can lead to an inflated sample size or [[pseudoreplication]].


While a ''unit'' is often the lowest level at which observations are made, in some cases, a ''unit'' can be further decomposed as a [[statistical assembly]].
While a ''unit'' is often the lowest level at which observations are made, in some cases, a ''unit'' can be further decomposed as a [[statistical assembly]].


Many statistical analyses use quantitative [[data]] that have [[units of measurement]]. This is a distinct and non-overlapping use of the term "unit."
Many statistical analyses use quantitative [[data]] that have [[units of measurement]]. This is a distinct and non-overlapping use of the term "unit."

==Units of collection and analysis==
Statistical units are divided into two. They are:

* Unit of collection: Units in which figures relating to a particular problem are either enumerated or estimated. The units of collection may be simple or composite.
**A simple unit is one which represents a single condition without any qualification.
**A composite unit is one which is formed by adding a qualification word or phrase to a simple unit. Example: labour-hours and passenger-kilometer.

* Unit of analysis and interpretation: Units in term of which statistical data are analysed and interpreted. Example: ratios, percentage, co-efficient etc.


== See also ==
== See also ==
* [[Census tract]]
* [[Research subject]]
* [[Research subject]]
* [[Laboratory specimen|Specimen]]
* [[Laboratory specimen|Specimen]]
* [[Sample point]]
* [[Statistical model]]
* [[Statistical model]]
* [[Unit of analysis]]
* [[Unit of analysis]]

==References==
{{reflist}}


==Bibliography==
==Bibliography==
Line 29: Line 46:
===Design of experiments===
===Design of experiments===


* {{cite book |author=[http://www.maths.qmw.ac.uk/~rab/ Bailey, R. A]|title=Design of Comparative Experiments|url=http://www.maths.qmul.ac.uk/~rab/DOEbook/|publisher=[http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521683579 Cambridge University Press]|year=2008 |isbn=978-0-521-68357-9}} Pre-publication chapters are available on-line.
* {{cite book |author=Bailey, R. A.|title=Design of Comparative Experiments|url=http://www.maths.qmul.ac.uk/~rab/DOEbook/ |publisher=Cambridge University Press|year=2008 |isbn=978-0-521-68357-9}} Pre-publication chapters are available on-line.
<!-- *{{cite book
<!-- *{{cite book
|author=Hinkelmann, Klaus and [[Oscar Kempthorne|Kempthorne, Oscar]]
|author=Hinkelmann, Klaus and [[Oscar Kempthorne|Kempthorne, Oscar]]
Line 39: Line 56:
|isbn=978-0-470-38551-7}} -->
|isbn=978-0-470-38551-7}} -->
*{{cite book
*{{cite book
|last1=Hinkelmann
|author=Hinkelmann, Klaus and [[Oscar Kempthorne|Kempthorne, Oscar]]
|first1=Klaus
|last2=Kempthorne
|first2=Oscar
|author-link2=Oscar Kempthorne
|year=2008
|year=2008
|title=Design and Analysis of Experiments, Volume I: Introduction to Experimental Design
|title=Design and Analysis of Experiments, Volume I: Introduction to Experimental Design
|url=https://books.google.com/books?id=T3wWj2kVYZgC&printsec=frontcover&cad=4_0
|url=https://books.google.com/books?id=T3wWj2kVYZgC
|edition=Second
|edition=Second
|publisher=Wiley
|publisher=[http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471727563.html Wiley]
|isbn=978-0-471-72756-9
|isbn=978-0-471-72756-9
}}
}}
Line 59: Line 80:
===Sampling===
===Sampling===
* {{cite book
* {{cite book
|author=[[William Gemmell Cochran|Cochran, William G.]]
|author=Cochran, William G.
|year=1977
|year=1977
|title=Sampling Techniques
|title=Sampling Techniques
Line 65: Line 86:
|publisher=Wiley
|publisher=Wiley
|isbn=0-471-16240-X
|isbn=0-471-16240-X
|author-link=William Gemmell Cochran
}}
}}
* {{cite book
* {{cite book

Revision as of 07:45, 20 December 2023

In statistics, a unit is one member of a set of entities being studied. It is the main source for the mathematical abstraction of a "random variable". Common examples of a unit would be a single person, animal, plant, manufactured item, or country that belongs to a larger collection of such entities being studied.

Experimental and sampling units

Units are often referred to as being either experimental units, sampling units or units of observation:

  • An "experimental unit" is typically thought of as one member of a set of objects that are initially equal, with each object then subjected to one of several experimental treatments. Put simply, it is the smallest entity to which a treatment is applied.
  • A "sampling unit" is typically thought of as an object that has been sampled from a statistical population. This term is commonly used in opinion polling and survey sampling.

For example, in an experiment on educational methods, methods may be applied to classrooms of students. This would make the classroom as the experimental unit. Measurements of progress may be obtained from individual students, as observational units. But the treatment (teaching method) being applied to the class would not be applied independently to the individual students. Hence the student could not be regarded as the experimental unit. The class, or the teacher by method combination if the teacher had multiple classes, would be the appropriate experimental unit.

Implementation

In most statistical studies, the goal is to generalize from the observed units to a larger set consisting of all comparable units that exist but are not directly observed. For example, if we randomly sample 100 people and ask them which candidate they intend to vote for in an election, our main interest is in the voting behavior of all eligible voters, not exclusively on the 100 observed units.

In some cases, the observed units may not form a sample from any meaningful population, but rather constitute a convenience sample, or may represent the entire population of interest. In this situation, we may study the units descriptively, or we may study their dynamics over time. But it typically does not make sense to talk about generalizing to a larger population of such units. Studies involving countries or business firms are often of this type. Clinical trials also typically use convenience samples, however the aim is often to make inferences about the effectiveness of treatments in other patients, and given the inclusion and exclusion criteria for some clinical trials, the sample may not be representative of the majority of patients with the condition or disease.

In simple data sets, the units are in one-to-one correspondence with the data values. In more complex data sets, multiple measurements are made for each unit. For example, if blood pressure measurements are made daily for a week on each subject in a study, there would be seven data values for each statistical unit. Multiple measurements taken on an individual are not independent (they will be more alike compared to measurements taken on different individuals). Ignoring these dependencies during the analysis can lead to an inflated sample size or pseudoreplication.

While a unit is often the lowest level at which observations are made, in some cases, a unit can be further decomposed as a statistical assembly.

Many statistical analyses use quantitative data that have units of measurement. This is a distinct and non-overlapping use of the term "unit."

Units of collection and analysis

Statistical units are divided into two. They are:

  • Unit of collection: Units in which figures relating to a particular problem are either enumerated or estimated. The units of collection may be simple or composite.
    • A simple unit is one which represents a single condition without any qualification.
    • A composite unit is one which is formed by adding a qualification word or phrase to a simple unit. Example: labour-hours and passenger-kilometer.
  • Unit of analysis and interpretation: Units in term of which statistical data are analysed and interpreted. Example: ratios, percentage, co-efficient etc.

See also

Bibliography

Design of experiments

  • Bailey, R. A. (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9. Pre-publication chapters are available on-line.
  • Hinkelmann, Klaus; Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. ISBN 978-0-471-72756-9.

Sampling