Academia.eduAcademia.edu

Random Selection as Correlative Construct

2022, Institute of Psychology, Karl Franzens University, Austria

Since a random selection (or assignment) of elements (persons) from a population is of fundamental importance for both approximate and exact procedures, the concept of random selection is discussed in more detail. Two examples are given for illustration: Scenario A illustrates the formal mechanism of random selection. In Scenario B this formal mechanism is applied to the selection of samples from a population.

1 D. G. Schrausser / Thesis chapter 2: Random selection as correlative construct (2022) Thesis chapter 2: Random selection as correlative construct Dietmar G. Schrausser Institute of Psychology, Karl Franzens University, Universitätsplatz 2, 8010 Graz, Austria Thesis July1 1996; English translation July 2022 Introducton Since a random selection (or assignment) of elements (persons) from populations is of fundamental importance for both, approximate and exact procedures, this concept is discussed in more detail. Two examples are given for illustration: Scenario A (Section 2.1) illustrates the formal mechanism of random selection. In Scenario B (Section 2.2) this formal mechanism is applied to the selection of samples from a population. Analogies to error variance and treatment variance are discussed. The mechanism of random selection is defined as non-relationship between a selection process A and characteristics of a population P from which a sample S is selected (Section 2.3). 2.1 Scenario A: Of panels and cuboids 2.1.1 Case 1 Given a set P, a panel with a uniform thickness of entities e. Furthermore, we assume that set A is an arrangement of cuboids of different diameters. The squares are all the same length. This set A shall be called the ‚shaping matrix‘. Now imagine that set A is positioned parallel to the left of the flat panel P and penetrates it from left to right. Figure 2.1 illustrates what has been described: Figure 2.1 Pictorial representation of a selection process of elements S from a set P determined by a set A. The sizes of the set S are completely determined by the size property of set A, since P is constant with respect to this property. 1 Schrausser, D. G. (1996). Permutationstests: Theoretische und praktische Arbeitsweise von Permutationsverfahren beim unverbundenen 2 Stichprobenproblem. Thesis. Institute of Psychology, Karl Franzens University, Austria. DOI: 10.13140/RG.2.2.24500.32640/1 2 D. G. Schrausser / Thesis chapter 2: Random selection as correlative construct (2022) The individual parts that result from this 'shaping process' are cuboids of a set S. This set S will be called the shaped quantity. The side length distribution of the resulting quantity S is defined by the thickness of panel P and the cuboid diameters of shaping matrix A. The cuboids of quantity S have the thickness e of panel P as one side length (a), and the diameter of cuboids with different thicknesses of matrix A as the other side (b), see Figure 2.2. Figure 2.2 Since one side length (a) of quantity S is constant or invariant (always entities e wide), its distribution follows a constant. If one neglects the constant side length of resulting S (a), defined by P, then only the distribution of matrix A becomes apparent in the distribution of quantity S. In other words, only the distribution information2 of shaping matrix A is contained in the distribution information of the shaped quantity S. 2.1.2 Case 2 Again 2 sets are defined: A panel P and a shaping matrix A, which forms cuboids out of P. This time, however, P should not have a uniform thickness e, but should be of different diameters. Let's just vary the thickness in the same way as the diameters of the cuboids of our shaping matrix A in case 1. We get an uneven formed panel (see Figure 2.3). All cuboids of matrix A should have a constant diameter e. The squares are again the same length. Furthermore, the same number of cuboids should belong to matrix A as in case 1, see Figure 2.3. If you form cuboids out of the panel, you get quantity S consisting of cuboids whose side lengths are determined by the two basic sets A and P. Again, one side length of the cuboid (b) is constant e, the other side length (a) varies (see Figure 2.2). This time, however, it is exactly the opposite: the distribution information, which can be neglected (because it is constant), does not result from panel P, but from shaping matrix A, which consists of cuboids with equal diameter. If one neglects the constant side length (b) of resulting S, defined by matrix A, then only the distribution of P becomes apparent in the distribution of S, only the distribution information of panel P is included in the distribution information of the shaped quantity S. Figure 2.3 Pictorial representation of case 2. Again a set A and a set P determining a set S. Here, the size properties of set S are determined by the size properties of set P, since the sizes of the boxes from set A are constant and can be kept that way for S. As a result, both cases (1 and 2) are equivalent, only the prerequisites are different: First, the 'selecting' set (shaping matrix A) was different in terms of its distribution information, contained higher information content (cf. Shannon and Weaver, 1949), the other time it was the set that was 'selected' from, that differed in its distribution information (panel P). In both cases, the distribution information of the cuboids of set S can be 2 The way in which elements of a set differ. 3 D. G. Schrausser / Thesis chapter 2: Random selection as correlative construct (2022) used to deduce the distribution information of one of the initial sets A or P: In case 1, that of set A. In case 2, that of set P. 2.1.3 Case 3 Two sets A and P are defined as in the previous two cases. Now, both sets are of unequal, variant distribution information in terms of thickness or diameter, see Figure 2.4. Panel P is of unequal thickness (case 2), the cuboids of matrix A differ in diameter (case 1). The shaping process proceeds as before. Quantity S created by the shape forming process has cuboids, which in turn get their side lengths (a) and (b) from both initial sets (the cuboid diameters of matrix A and the varying thickness of panel P) (see Figure 2.2). However, since this time both initial sets A and P have variable distributions and therefore no side length can be kept constant, it is no longer possible (if you were to mix the cuboids of S) to identify the distribution pattern of one of the initial sets (A or P) based on the distribution pattern of the cuboid sizes of S. The side lengths of the cuboids S imply the different diameters of both, the cuboids of A and the different thicknesses of P. The distribution of panel P and the distribution of shapingng matrix A can be seen in the distribution of shaped quantity S. Figure 2.4 Pictorial representation of case 3. The size properties of set S are determined by the size properties of set P and set A. 2.2 Scenario B: From panel to population, from cuboid to person. 2.2.1 For case 1 Let panel P be a population of people from whom a hypothetical variable is to be determined, let's say long-term memory performance. The memory performance (e.g. the number of pictures that can be retained by a person) be analogous to the thickness3 of the panel P. In case 1, when P had a uniform thickness, all people would have the same memory performance capacity: all could memorize e images (as an analogue to panel thickness e). Shaping matrix A (different cuboids) should now represent a selection process. This set shall be called the selection set A (cf. Table 2.2.1). The purpose of this construction is to select individuals from the total population P. The selection process takes place in the same way as above: The cuboids of the selection set A form cuboids out of the panel, now population P. These formed cuboids, now selected subjects, are a random sample S (analogous to the shaped quantity S). Persons selected to sample S by the process of shaping would be determined on the one hand by their memory performance in the population P (a) and on the other hand by the cuboid diameters of selection set A (b). Analogously to the side lengths (a) and (b) (cf. Figure 2.2) of the cuboids of the shaping quantity S in case 1, which were determined by the panel thickness (panel P) and cuboid diameters (shaping matrix A). The panel thickness (=memory performance) is constant e and can therefore be neglected. In S only the cuboid diameter distribution of set A is visible. Only the distribution information of selection set A is contained in the distribution information of sample S. 3 The memory performance (=panel thickness) is continuously distributed. 4 D. G. Schrausser / Thesis chapter 2: Random selection as correlative construct (2022) 2.2.2 For case 2 This time the people in population P should d i f f e r in their ability to retain images. The subjects should vary in terms of their memory performance. This is analogous to panel P of unequal thickness from case 2. Accordingly, the selection set A should also have the properties of the shaping matrix A from case 2. (=constant cuboid diameter). If, as described above, people were selected, the random sample S will in turn imply the distribution information of selection set A (diameter of the cuboid) and population P (memory performance). The cuboid diameter is constant and can be neglected. In sample S only the memory performance distribution of population P is evident. Only the population distribution information is included in the sample distribution information, sample S is an image of population P in terms of distribution information. 2.2.3 For case 3 A panel P of different thicknesses, a shaping set A with cuboids of different thicknesses result in a shaped quantity S with cuboids of different side lengths (cf. 2.1.3). This now corresponds to differently distributed memory performance in the population P and different cuboid diameters of the selection set A, which result in a different distribution of the memory performance in sample S. The cuboids of the quantity S imply the distribution information of panel P thickness and cuboid diameters of shaping matrix A. Here: sample S implies the memory performance distribution of population P and the distribution information of selection set A. The sample S reflects the distribution information of the two sets A and P again, which are now implicitly included in the distribution information of the sample S and cannot be considered separately. In terms of distribution information, sample S is an image of population P and selection set A. The distribution of memory performance in sample S can no longer be used to determine the distribution of the memory performance in the population P, since the distribution of selection set A is unknown. In this context Bohm (1987) describes implications concerning holograms. The following analogies are noticeable: (1) The cuboid diameter distribution analogous to the error variance: The more the cuboid diameters of the selection set A differ, the more the cuboid diameters in sample S differ. (2) The panel thickness distribution analogous to the person-related error variance. The more the panel thicknesses of population P differ, the more the cuboid diameters in sample S differ. (3) Instead of person-related error variance, treatment variance can also be used. The thicker a part of the panel, the more effect a treatment (=experimental treatment) achieves for a specific individual. Further analogies (e.g.: mutual relationship of A and P, definition of A1 by A0, S as P etc.) and possible extensions of the principle to other scenarios (e.g.: conditional probabilities, sequences of letters or color mixtures) will not be discussed here. Table 2.2.1 Comparative representation of the terms from scenarios A and B and analogies. 5 D. G. Schrausser / Thesis chapter 2: Random selection as correlative construct (2022) 2.3 Random selection or representative selection If the thickness distribution P (person-related error variance or treatment variance) is to be apparent in the distribution of the cuboids (individuals) of sample S, then there must be no connection between diameter A (selection process) and thickness P (characteristic to be surveyed in the population). It must be ensured that large cuboid diameters A encounter small panel thicknesses with the same frequency as large panel thicknesses. The same applies to small cuboid diameters. Only if this non-relationship between the distribution information from A and P is given, then the distribution information from P will be redundant in S. This is all the more true the smaller the difference (=variance) in cuboid diameter A (=error variance) and the greater the variance thickness P (=person-related error variance or treatment variance). 2.4 Discussion If one sets the condition that only the distribution information of a population P is transferred to a sample S (the sample is an image of the population on a small scale), then the distribution information of a selection set A (selection process) must not be included in sample S. This can be clarified by formulating the occurrence of panel characteristics in a population (thickness of P, memory ability of people in population P) as a statement of probability. Certain feature intervals occur with distribution-related probabilities or frequencies. If the selection process (forming cuboids out of a panel) has no connection whatsoever with the feature to be surveyed (cuboid diameter with panel thickness), then each feature interval (panel thickness) is selected with the same probability (cuboid diameter); i.e.: no feature interval is preferentially selected (no panel thickness is preferentially ‘distorted’ by a particular cuboid diameter). The only determinant that determines the frequency with which a feature or feature interval is selected, is the probability or frequency of the interval in the population. Since this applies to all intervals, all intervals in the sample have the same probability of occurrence as in the population: sample and population have approximately the same distribution information. Representative samples S of a certain population P are only given if there is no correlation between population P and selection process A. (2.4.1) r(A,P) = 0. We speak of random selection if sample S contains the distribution information of population P from which it was selected and if the distribution information of P is not related to the distribution information of the selection A. In case of psychological experiments, where one is almost exclusively dependent on random sampling, in order to be able to make general statements, genuine random selection of elements (= subjects) is of essential importance. Bortz (1993) sums up this problem in context with conditional probabilities: Genaugenommen müßte die Aussage „In diesem Zufallsexperiment hat das Ereignis A eine Wahrscheinlichkeit von p(A)“ ersetzt werden durch die Aussage „In diesem Zufallsexperiment hat das Ereignis A eine Wahrscheinlichkeit von p(A) vorausgesetzt, das Zufallsexperiment wurde korrekt durchgeführt (Ereignis B)“. ... (d. h. daß die Wahrscheinlichkeit eines korrekten Zufallsexperimentes 1 ist bzw. daß p(B)=1, ... . (S.54) Strictly speaking, the statement "In this random experiment, event A has a probability of p(A)" should be replaced by the statement "In this random experiment, event A has a probability of p(A) assuming the random experiment was carried out correctly (event B)". ... (i.e. that the probability of a correct random experiment is 1 or that p(B)=1, ... . (p.54) The Bayesian approach represents an alternative to the access via random experiments. However, when estimating the a priori probabilities, one has to rely on assumptions, which make an objective justification of the results problematic. In addition, it is not possible to infer the distribution pattern of populations if one relativizes the random element4 and thus tolerates a hybrid distribution structure (cf. case 3) of the sample. So 4 It should be noted that there are some doubts as to whether one can make valid statements at all on the basis of random mechanisms, since chance can also be randomly non-random (cf. Urbach, 1985). 6 D. G. Schrausser / Thesis chapter 2: Random selection as correlative construct (2022) the approach has probably more scientific-theoretical than practical value. This is fundamentally illustrated by Chernoff and Moses (1959), Pratt et al. (1965), Bühlmann et al. (1967), de Groot (1970), LaValle (1970) and Bortz (1984). However, the randomness does not only have to refer to the selection of elements, but can also be transferred to there allocation under certain experimental conditions (the selection set A would only have to be renamed allocation set A). Random assignment of subjects to treatment conditions is of crucial importance in context with such permutation methods where random selection does not exist or is not possible (cf. Edgington 1995). References Bohm, D. (1987). Wholeness and the implicate order. London: Routledge & Kegan Paul PLC. Bortz, J. (1984). Lehrbuch der empirischen Forschung. Berlin: Springer. Bortz, J. (1993). Statistik für Sozialwisenschaftler (4. Aufl.). Berlin: Springer. Bühlmann, H., Löffel H., Nievergelt, E. (1967). Einführung in die Theorie und Praxis der Entscheidung bei Unsicherheit. Heidelberg: Springer. Chernoff, H., Moses, L. E. (1959). Elementary decision theorie. New York: Wiley. De Groot, M, H. (1970). Optima statistical decisions. New York: McGraw Hill. Edgington, E. S. (1995). Randomization tests (3rd ed). New York: Marcel Dekker. LaValle, J. H. (1970). An introduction to probability, decision and inference. New York: Holt, Rinehart and Winston. Pratt, J. W., Raiffa, H., Schlaifer, R. (1965). Introduction to statistical decision theory. New York: McGraw Hill. Shannon, C. E., Weaver, W. (1949). The Mathematical Theory of Communication. Urbana. Urbach, P. (1985). Randomization and the design of experiments. Philosophy of Science. 52 (2), 256-273.