Statistical unit: Difference between revisions

Content deleted Content added

Inline

Revision as of 07:45, 20 December 2023

In statistics, a unit is one member of a set of entities being studied. It is the main source for the mathematical abstraction of a "random variable". Common examples of a unit would be a single person, animal, plant, manufactured item, or country that belongs to a larger collection of such entities being studied.

Experimental and sampling units

Units are often referred to as being either experimental units, sampling units or units of observation:

An "experimental unit" is typically thought of as one member of a set of objects that are initially equal, with each object then subjected to one of several experimental treatments. Put simply, it is the smallest entity to which a treatment is applied.
A "sampling unit" is typically thought of as an object that has been sampled from a statistical population. This term is commonly used in opinion polling and survey sampling.

For example, in an experiment on educational methods, methods may be applied to classrooms of students. This would make the classroom as the experimental unit. Measurements of progress may be obtained from individual students, as observational units. But the treatment (teaching method) being applied to the class would not be applied independently to the individual students. Hence the student could not be regarded as the experimental unit. The class, or the teacher by method combination if the teacher had multiple classes, would be the appropriate experimental unit.

Implementation

In most statistical studies, the goal is to generalize from the observed units to a larger set consisting of all comparable units that exist but are not directly observed. For example, if we randomly sample 100 people and ask them which candidate they intend to vote for in an election, our main interest is in the voting behavior of all eligible voters, not exclusively on the 100 observed units.

In some cases, the observed units may not form a sample from any meaningful population, but rather constitute a convenience sample, or may represent the entire population of interest. In this situation, we may study the units descriptively, or we may study their dynamics over time. But it typically does not make sense to talk about generalizing to a larger population of such units. Studies involving countries or business firms are often of this type. Clinical trials also typically use convenience samples, however the aim is often to make inferences about the effectiveness of treatments in other patients, and given the inclusion and exclusion criteria for some clinical trials, the sample may not be representative of the majority of patients with the condition or disease.

In simple data sets, the units are in one-to-one correspondence with the data values. In more complex data sets, multiple measurements are made for each unit. For example, if blood pressure measurements are made daily for a week on each subject in a study, there would be seven data values for each statistical unit. Multiple measurements taken on an individual are not independent (they will be more alike compared to measurements taken on different individuals). Ignoring these dependencies during the analysis can lead to an inflated sample size or pseudoreplication.

While a unit is often the lowest level at which observations are made, in some cases, a unit can be further decomposed as a statistical assembly.

Many statistical analyses use quantitative data that have units of measurement. This is a distinct and non-overlapping use of the term "unit."

Units of collection and analysis

Statistical units are divided into two. They are:

Unit of collection: Units in which figures relating to a particular problem are either enumerated or estimated. The units of collection may be simple or composite.
- A simple unit is one which represents a single condition without any qualification.
- A composite unit is one which is formed by adding a qualification word or phrase to a simple unit. Example: labour-hours and passenger-kilometer.

Unit of analysis and interpretation: Units in term of which statistical data are analysed and interpreted. Example: ratios, percentage, co-efficient etc.

Bibliography

Design of experiments

Bailey, R. A. (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9. Pre-publication chapters are available on-line.
Hinkelmann, Klaus; Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. ISBN 978-0-471-72756-9.

Sampling

Cochran, William G. (1977). Sampling Techniques (Third ed.). Wiley. ISBN 0-471-16240-X.
Särndal, Carl-Erik, and Swensson, Bengt, and Wretman, Jan (1992). Model Assisted Survey Sampling. Springer-Verlag. ISBN 0-387-40620-4.{{cite book}}: CS1 maint: multiple names: authors list (link)

@@ Line 1: / Line 1: @@
+{{Short description|Individual entity for statistical purposes}}
-A '''unit''' in a statistical analysis refers to one member of a set of entities being studied.  It is the material source for the mathematical abstraction of a "[[random variable]]". Common examples of a unit would be a single person, animal, plant, manufactured item, or country that belongs to a larger collection of such entities being studied.
+{{Multiple issues|
+{{more citations needed|date=June 2019}}
+{{No footnotes|date=November 2019}}
+}}
+In [[statistics]], a '''unit''' is one member of a set of entities being studied. It is the main source for the mathematical abstraction of a "[[random variable]]". Common examples of a unit would be a single person, animal, plant, manufactured item, or country that belongs to a larger collection of such entities being studied.
+==Experimental and sampling units==
-Units are often referred to as being either '''experimental units''', '''sampling units''' or, more generally, [[Unit of observation|units of observation]]:
+Units are often referred to as being either '''experimental units''', '''sampling units''' or [[Unit of observation|units of observation]]:
-* An "experimental unit" is typically thought of as one member of a set of objects that are initially equivalent, with each object then subjected to one of several experimental treatments. In designed experiments, the experimental unit may differ from the unit on which observations are made (observational unit), and the unit of scientific interest, or what the hypothesis is about (scientific unit) <ref>{{cite book|last1=Lazic|first1=SE|title=Experimental Design for Laboratory Biologists: Maximising Information and Improving Reproducibility|date=2016|publisher=Cambridge University Press|isbn=9781107424883|url=http://www.cambridge.org/Lazic}}</ref>.
+* An "experimental unit" is typically thought of as one member of a set of objects that are initially equal, with each object then subjected to one of several experimental treatments. Put simply, it is the smallest entity to which a treatment is applied.
 * A "sampling unit" is typically thought of as an object that has been sampled from a [[statistical population]]. This term is commonly used in [[opinion polling]] and [[survey sampling]].
+For example, in an experiment on educational methods, methods may be applied to classrooms of students. This would make the classroom as the experimental unit. Measurements of progress may be obtained from individual students, as observational units. But the treatment (teaching method) being applied to the class would not be applied independently to the individual students. Hence the student could not be regarded as the experimental unit. The class, or the teacher by method combination if the teacher had multiple classes, would be the appropriate experimental unit.
+==Implementation==
 In most statistical studies, the goal is to generalize from the observed units to a larger set consisting of all comparable units that exist but are not directly observed.  For example, if we randomly sample 100 people and ask them which candidate they intend to vote for in an election, our main interest is in the voting behavior of all eligible voters, not exclusively on the 100 observed units.
-In some cases, the observed units may not form a sample from any meaningful population, but rather constitute a [[accidental sampling|convenience sample]], or may represent the entire population of interest.  In this situation, we may study the units [[descriptive statistics|descriptively]], or we may study their [[dynamic model|dynamics]] over time.  But it typically does not make sense to talk about generalizing to a larger population of such units.  Studies involving [[country|countries]] or [[business|business firms]] are often of this type. [[Clinical trial]]s also typically use convenience samples, however the aim is often to make inferences about the efficacy of treatments in other patients, and given the inclusion and exclusion criteria for some clinical trials, the sample may not be representative of the majority of patients with the condition or disease.
+In some cases, the observed units may not form a sample from any meaningful population, but rather constitute a [[accidental sampling|convenience sample]], or may represent the entire population of interest.  In this situation, we may study the units [[descriptive statistics|descriptively]], or we may study their [[dynamic model|dynamics]] over time.  But it typically does not make sense to talk about generalizing to a larger population of such units.  Studies involving [[country|countries]] or [[business|business firms]] are often of this type. [[Clinical trial]]s also typically use convenience samples, however the aim is often to make inferences about the effectiveness of treatments in other patients, and given the inclusion and exclusion criteria for some clinical trials, the sample may not be representative of the majority of patients with the condition or disease.
-In simple [[data]] sets, the units are in one-to-one correspondence with the data values.  In more complex data sets, multiple measurements are made for each unit.  For example, if blood pressure measurements are made daily for a week on each subject in a study, there would be seven data values for each statistical unit. Multiple measurements taken on an individual are not [[independent and identically distributed random variables|independent]] (they will be more alike compared to measurements taken on different individuals). Ignoring these dependencies during the analysis can lead to an inflated sample size or [[pseudoreplication]].
+In simple [[data]] sets, the units are in one-to-one correspondence with the data values.  In more complex data sets, multiple measurements are made for each unit.  For example, if blood pressure measurements are made daily for a week on each subject in a study, there would be seven data values for each statistical unit. Multiple measurements taken on an individual are not [[Independent and identically distributed random variables|independent]] (they will be more alike compared to measurements taken on different individuals). Ignoring these dependencies during the analysis can lead to an inflated sample size or [[pseudoreplication]].
 While a ''unit'' is often the lowest level at which observations are made, in some cases, a ''unit'' can be further decomposed as a [[statistical assembly]].
 Many statistical analyses use quantitative [[data]] that have [[units of measurement]].  This is a distinct and non-overlapping use of the term "unit."
+==Units of collection and analysis==
+Statistical units are divided into two. They are:
+* Unit of collection: Units in which figures relating to a particular problem are either enumerated or estimated. The units of collection may be simple or composite.
+**A simple unit is one which represents a single condition without any qualification.
+**A composite unit is one which is formed by adding a qualification word or phrase to a simple unit. Example: labour-hours and passenger-kilometer.
+* Unit of analysis and interpretation: Units in term of which statistical data are analysed and interpreted. Example: ratios, percentage, co-efficient etc.
 == See also ==
+* [[Census tract]]
 * [[Research subject]]
 * [[Laboratory specimen|Specimen]]
+* [[Sample point]]
 * [[Statistical model]]
 * [[Unit of analysis]]
-==References==
-{{reflist}}
 ==Bibliography==
@@ Line 29: / Line 46: @@
 ===Design of experiments===
-* {{cite book |author=[http://www.maths.qmw.ac.uk/~rab/ Bailey, R. A]|title=Design of Comparative Experiments|url=http://www.maths.qmul.ac.uk/~rab/DOEbook/|publisher=[http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521683579 Cambridge University Press]|year=2008 |isbn=978-0-521-68357-9}} Pre-publication chapters are available on-line.
+* {{cite book |author=Bailey, R. A.|title=Design of Comparative Experiments|url=http://www.maths.qmul.ac.uk/~rab/DOEbook/ |publisher=Cambridge University Press|year=2008 |isbn=978-0-521-68357-9}} Pre-publication chapters are available on-line.
 <!-- *{{cite book
 |author=Hinkelmann, Klaus and [[Oscar Kempthorne|Kempthorne, Oscar]]
@@ Line 39: / Line 56: @@
 |isbn=978-0-470-38551-7}} -->
 *{{cite book
+|last1=Hinkelmann
-|author=Hinkelmann, Klaus and [[Oscar Kempthorne|Kempthorne, Oscar]]
+|first1=Klaus
+|last2=Kempthorne
+|first2=Oscar
+|author-link2=Oscar Kempthorne
 |year=2008
 |title=Design and Analysis of Experiments, Volume I: Introduction to Experimental Design
-|url=https://books.google.com/books?id=T3wWj2kVYZgC&printsec=frontcover&cad=4_0
+|url=https://books.google.com/books?id=T3wWj2kVYZgC
 |edition=Second
+|publisher=Wiley
-|publisher=[http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471727563.html Wiley]
 |isbn=978-0-471-72756-9
 }}
@@ Line 59: / Line 80: @@
 ===Sampling===
 * {{cite book
-|author=[[William Gemmell Cochran|Cochran, William G.]]
+|author=Cochran, William G.
 |year=1977
 |title=Sampling Techniques
@@ Line 65: / Line 86: @@
 |publisher=Wiley
 |isbn=0-471-16240-X
+|author-link=William Gemmell Cochran
 }}
 * {{cite book

v t e Design of experiments
Scientific method	Scientific experiment Statistical design Control Internal and external validity Experimental unit Blinding Optimal design: Bayesian Random assignment Randomization Restricted randomization Replication versus subsampling Sample size
Treatment and blocking	Treatment Effect size Contrast Interaction Confounding Orthogonality Blocking Covariate Nuisance variable
Models and inference	Linear regression Ordinary least squares Bayesian Random effect Mixed model Hierarchical model: Bayesian Analysis of variance (Anova) Cochran's theorem Manova (multivariate) Ancova (covariance) Compare means Multiple comparison
Designs Completely randomized	Factorial Fractional factorial Plackett–Burman Taguchi Response surface methodology Polynomial and rational modeling Box–Behnken Central composite Block Generalized randomized block design (GRBD) Latin square Graeco-Latin square Orthogonal array Latin hypercube Repeated measures design Crossover study Randomized controlled trial Sequential analysis Sequential probability ratio test
Glossary Kategorie Mathematics portal Statistical outline Statistical topics