Properties of designs

This section shows the structural and statistical properties of the orthogonal arrays, conference designs and D-efficient designs generated by the Orthogonal Array package. The properties of the arrays and designs are calculated using the array_link object or functions from the package.

Definitions of arrays and designs

Before introducing the structural and statistical properties, we define orthogonal arrays, conference designs and D-efficient designs:

An orthogonal array (OA) of strength \({t}\), \({N}\) runs and \({n}\) factors at \({s}\) levels is an \({N}\times {n}\) array of symbols \(0, \ldots,({s}-1)\), such that for every subset of \({t}\) columns, every \({t}\)-tuple occurs equally often [Rao47]. The set of all strength-\({t}\) OAs with \({N}\) runs and \({n}\) factors at \({s}\) levels is denoted by \({\operatorname{OA}({N}; {t}; {s}^{n})}\). If \({s=2}\), the OA is called a two-level OA and the set of all strength-\({t}\) two-level OAs with \({N}\) runs and \({n}\) factors is denoted as \({\operatorname{OA}({N}; {t}; {2}^{n})}\).

For \({N}\) even, a conference design [SEG19] \(C\) is an \({N}\times {n}\) array which satisfies \({C}^{T}C = (n-1) I_{n}\), with \({C}_{ii} = 0\) and \({C}_{ij} \in \{-1,1\}\), for \({i} \neq {j}\) and \({i}, {j} = 1, \ldots, n\). A \({N}\times {N}\) conference design \(E\) such that \(E{E}^{T} = (n-1) I_{n}\) is called a conference matrix; see [EN95], [CD06] and [XLB12].

A D-optimal design [DAT07] (\(X\)) is an \({N}\times {n}\) array which maximizes the \(D\)-efficiency, defined as \({(\operatorname{det}({X}^{T}_{M}{X}^{\phantom{T}}_{M})^{1/p})/N}\), for a given \({N}\times {p}\) model matrix \({X}_{M}\) (for details see Model matrices). The Orthogonal Array package uses a coordinate-exchange algorithm to generate designs that optimize the \(D\)-efficiency. Since there is no guarantee that the resulting designs have the largest possible \(D\)-efficiency, we refer to them as D-efficient designs in this documentation. An orthogonal array is called D-optimal orthogonal array if it provides the largest \(D\)-efficiency among all comparable orthogonal arrays.

Structural properties of an array

The OApackage can calculate the rank of an array, defined as the maximum number of linearly independent column or row vectors in the array. The rank of an array is useful for several other functions in the package. For two-level arrays, the OApackage can also check if the arrays are foldover arrays. A two-level array is called a foldover array if half of its runs are mirror images of the other half, in the sense that the factor levels are changed from 0 to 1 and from 1 to 0.

For example, to calculate the rank of a two-level orthogonal array and determine whether the array is a foldover array, one can use the methods array_link::rank() and array_link::foldover():

Calculate rank of array and test for foldover

>>> array = oapackage.exampleArray(1) # Select an example two-level orthogonal array
>>> array.showarray() # Show the two-level orthogonal array
array:
  0   0   0   0   0
  0   0   0   0   0
  0   0   0   1   1
  0   0   1   0   1
  0   1   0   1   0
  0   1   1   0   0
  0   1   1   1   1
  0   1   1   1   1
  1   0   0   1   1
  1   0   1   0   1
  1   0   1   1   0
  1   0   1   1   0
  1   1   0   0   1
  1   1   0   0   1
  1   1   0   1   0
  1   1   1   0   0
 >>> print(array.rank()) # Calculate the rank of the array
 5
 >>> print(array.foldover()) # Determine if the array is foldover
 False

Other structural properties such as whether an array involves two levels or is symetric can be found in the documentation of array_link, which shows the full set of methods available.

Model matrices

For orthogonal arrays and conference designs, the OApackage can calculate different model matrices. The model matrices available depend on the type of array and design.

Model matrices for two-level orthogonal arrays

For two-level orthogonal arrays, the levels of the array are first coded according to the map \({0 \rightarrow -1}\) and \({1 \rightarrow +1}\). The coded matrix is referred to as the design matrix. The main effect contrast vectors are given by the columns in the design matrix. The contrast vectors associated to the two-factor interactions are calculated by taking products between two different columns in the design matrix. The model matrix consists of the intercept column (i.e. a columns of ones) and the contrast vectors associated to the main effects and, optionally, the two-factor interactions.

Model matrices for conference designs

The model matrix for a conference design consists of the intercept column (i.e. a columns of ones) and the contrast vectors associated to the main effects and, optionally, the second-order effects (two-factor interactions and quadratic effects). The main effect contrast vectors are given by the columns in the conference design. The contrast vectors associated to the second-order effects are calculated by taking products between two columns in the conference design.

Model matrices for mixed-level orthogonal arrays

For mixed-level orthogonal arrays, the main effect contrast vectors are defined by the Helmert contrasts. The contrast vectors associated to the two-factor interactions are calculated by taking products between two different columns in the matrix containing the Helmert contrasts of the array; see Model matrices for mixed-level orthogonal arrays for details. The model matrix consists of the intercept column (i.e. a columns of ones) and the contrast vectors associated to the main effects and, optionally, the two-factor interactions.

An example on how to generate an interaction model matrix for a two-level orthogonal array is shown below.

Calculate interaction effects model matrix

>>> array=oapackage.exampleArray(0,1)
exampleArray 0: array in OA(8,2, 2^2)
>>> array.showarray()
array:
  0   0
  0   0
  0   1
  0   1
  1   0
  1   0
  1   1
  1   1
>>> M=oapackage.array2modelmatrix(array, 'i')
>>> print(M)
[[ 1. -1. -1.  1.]
 [ 1. -1. -1.  1.]
 [ 1. -1.  1. -1.]
 [ 1. -1.  1. -1.]
 [ 1.  1. -1. -1.]
 [ 1.  1. -1. -1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]

Statistical properties of orthogonal arrays

Orthogonal arrays are commonly evaluated in terms of their generalized wordlength pattern [XW01] (GWLP). Two-level OAs are also commonly evaluated in terms of their \({J}_{k}\)-characteristics and \(F\)-vectors [DT99]. The OApackage can calculate all these statistical criteria: array_link::GWLP(), array_link::Fvalues(), array_link::Jcharacteristics().

The following example shows how to calculate the GWLP, \({F}_{k}\)-values and \({J}_{k}\)-characteristics from an array_link object:

Calculate GWLP and F-values

>>> al=oapackage.exampleArray(1,1) # Select an example array
exampleArray 1: array 3 in OA(16, 2, 2^5)
>>> gwlp = al.GWLP() # Calculate its generalized word length pattern
>>> print('GWLP: %s'% str(gwlp) )
GWLP: (1.0, 0.0, 0.0, 1.0, 1.0, 0.0)
>>> print('F3-value: %s' % str(al.Fvalues(3))) # Calculate the F_3 values
F3-value: (4, 6)
>>> print('F4-value: %s' % str(al.Fvalues(4))) # Calculate the F_3 values
F4-value: (1, 4)
>>> print('J3-characteristics: %s' % str(al.Jcharacteristics(3))) # Calculate the J_3-characteristics
J3-characteristics: (-8, -8, 0, 0, 0, -8, 0, -8, 0, 0)

We now briefly mention some technical details of the \({J}_{k}\)-characteristics, the \({F}_{k}\)-values and the GWLP.

\({J}_{k}\)-characteristics

To calculate \({J}_{k}\)-characteristics of a two-level OA, the OApackage codes the levels of the array as \(-1\) and \(+1\). To this end, the package uses the mapping \({0 \rightarrow -1}\) and \({1 \rightarrow +1}\). Let \(D\) be an \({N}\times {n}\) with coded levels \(-1\) and \(+1\). For \({S} = \{l_1, \ldots, l_k\}\), a subset of \(k\) different factors of \(D = (d_{il})\), define

\[j_k (S; D) = \sum_{i = 1}^{N} d_{i l_1} \cdots d_{i l_k}.\]

The \({|{j}_{k} (S; D)|}\) values are called the \({J}_{k}\)-characteristics, which necessarily equal \(N - 4q\) [DT02], where \({q} \leq N/4\) is a non-negative integer.

\({F}_{k}\)-values

The \({F}_{k}\)-vector collects the frequencies of all the \({J}_{k}\)-characteristics. More specifically, the vector \({F}_{k} = (f_{k1}, \ldots, f_{kv})\), where \(v = N/4\) and \(f_{ku}\) denotes the frequency of the \({J}_{k}\)-characteristics which are equal to \(4(v + 1 - u)\). When calculating an \({F}_{k}\)-vector, the OApackage shows only the vector \((f_{k1}, \ldots, f_{kv})\), whose elements are referred to as the \({F}_{k}\)-values.

Generalized word length pattern

Consider an OA, \({D}\), of strength \({t}\) with \({N}\) runs and \({n}\) factors at \({s}\) levels. Let \({X_0}\) be a column of ones, \({X_1}\) the matrix involving the contrast vectors associated with the main effects, and \({X_j}\) the matrix involving the contrast vectors associated with the \({j}\)-factor interactions, \({j \geq 2}\). We assume that the column vectors in \({X_1}\) are normalized so that they have the same length \({\sqrt{N}}\). For \({j = 0, \ldots, n}\), let

\[A_j (D) = N^{-2} 1_{N}^{T} X_{j} X_{j}^{T} 1_{N}^{\phantom{T}},\]

where \(1_{N}\) denotes the \(N \times 1\) column of ones. The value of \({A}_{j}(D)\) is invariant to the choice of the orthonormal contrasts used; see [XW01] for details. The vector \({(A_0(D), \ldots, A_n (D) )}\) is called the generalized word length pattern (GWLP). To increase the speed of the computations for the GWLP, the OApackage uses the distance distribution and the MacWilliams identities as in [XW01] and [Xu09].

Optimality criteria for D-efficient designs

In [ES17], D-efficient designs for the model including the intercept, all main effects and all two-factor interactions are generated. The OApackage provides functionality to compute the optimality criteria used to generate the D-efficient designs in [ES17]. Moreover, the package can calculate the well-known \(A\)- and \(E\)-optimality criteria from the literature on Optimal Experimental Design [DAT07]. The functions to perform the calulcations are array_link::Defficiency(), array_link::DsEfficiency(), array_link::Aefficiency(), array_link::Eefficiency().

The following example shows how to calculate the \(D\)-, \({D}_{s}\)-, \(A\)- and \(E\)-efficiency for a design that permits the estimation of the interaction model.

Calculate optimality criteria for D-efficient designs

# Select an array that can estimate the interaction model
>>> al = oapackage.exampleArray(11, 1)
exampleArray 11: D-optimal array in OA(44, 2^8)
>>> print('D-efficiency: %.4f' % al.Defficiency())
D-efficiency: 0.8879
>>> print('Ds-efficiency (Eendebak and Schoen, 2017): %.4f' % al.DsEfficiency())
Ds-efficiency (Eendebak and Schoen, 2017): 0.8059
>>> print('A-efficiency for the interaction model: %.4f' % al.Aefficiency())
A-efficiency for the interaction model: 0.7906
>>> print('E-efficiency for the interaction model: %.4f' % al.Eefficiency())
E-efficiency for the interaction model: 0.3602

Calculation of \(D\)-, \(A\)- and \(E\)-efficiency

Let \({X}\) be again the \({N}\times {p}\) interaction model matrix (see section Model matrices) consisting of a column of ones and the contrast vectors associated to the main and two-factor interactions of \({n}\) factors, where \({p = 1 + n + (n)(n-1)/2}\). The \(D\)-, \(A\)- and \(E\)-efficiency are calculated using the eigenvalues of the singular-value decomposition (SVD) of \({X}\). To calculate the rank of a matrix, the lower-upper (LU) decomposition, as implemented in the Eigen package [GJ+10], is used.

Let \(\lambda_1, \ldots, \lambda_p\) be the eigenvalues of the SVD of \({X}\). The OApackage calculates the \(D\)-, \(A\)- and \(E\)-efficiency of a design \(D\) as follows:

\[\begin{split}{D_{\text{eff}}(D)} = (\prod_j \lambda_j)^{1/p} / N \label{formula:Defficiency}, \\ {A_{\text{eff}}(D)} = N (\sum_j \lambda_j^{-1})/m \label{formula:VIF}, \\ {E_{\text{eff}}(D)} = \min_j \lambda_j. \label{formula:E-efficiency}\end{split}\]

\(D_s\)-efficiency and \(D_1\)-efficiency

In [ES17], the \(D_s\)-efficiency is used to assess the joint precision of the main effects in the interaction model. Let the interaction model matrix \({X}\) be split into \({X_{1}}\), containing the contrast vectors associated with the main effects only, and \({X_{02}}\), containing the intercept column and the contrast vectors associated to the two-factor interactions. The \(D_{s}\)-criterion of a design \(D\) is defined as

\[{D_{s,\text{crit}}(D)} = \operatorname{det}(X^{T}X) / \operatorname{det}(X_{02}^{T} X_{02}^{\phantom{T}}), \label{formula:Dsefficiency}\]

where \({X_{02}}\) is necessarily of full rank. Similar to the calculations of the \(D\)-efficiency, the OApackage calculates the \(D_{s}\)-criterion using the eigenvalues of the SVD of the matrices \({X}\) and \({X_{01}}\). Finally, the package calculates the \(D_{s}\)-efficiency of \(D\) as \(D_{s,\text{eff}}(A) = D_{s,\text{crit}}(A)^{1/m}\), where \(m\) is the number of factors.

In a similar way the \(D_{1}\)-efficiency of a design \({A}\) with \(n\) factors and model matrix of intercept and main effects \({X_{01}}\), is defined as

\[D_{1,\text{eff}}(A) = ( \operatorname{det}((X_{01})^{T}(X_{01}) )^{1/(n+1)} \label{formula:D1efficiency}\]

Projection capacities

Other relevant statistical criteria to evaluate a two-level design with \(N\) runs and \(k\) factors include the so-called projection estimation capacity (PEC) and projection information capacity (PIC) [LST07]. These criteria focus on the projections of the two-level design onto a smaller number of factors. More specifically, the PEC and PIC summarize the performance of all the \(N\)-run subdesigns with \(l \leq k\) factors in terms of the capacity to estimate the interaction model and the \(D\)-efficiency for this model, respectively.

The PEC and PIC are based on the so-called PEC and PIC sequences, which are formally defined as follows. Let \(PEC_{l}\) denote the proportion of \(N\)-run \(l\)-factor subdesigns that permit the estimation of the interaction model in \(l\) factors, that is, the model including the intercept, all \(l\) main effects and all \(l(l-1)/2\) two-factor interactions. The PEC sequence is the vector \((PEC_{1}, PEC_{2}, \ldots, PEC_{k})\). Now, let \(PIC_{l}\) denote the average \(D\)-efficiency for the interaction model in \(l\) factors accross all \(N\)-run \(l\)-factor subdesigns. The PIC sequence is the vector \((PIC_{1}, PIC_{2}, \ldots, PIC_{k})\). The OApackage can calculate the PEC and PIC sequences of two-level designs with PECsequence() and PICsequence(), respectively.

The following example shows how to compute the PEC and PIC sequences of a two-level orthogonal array using the OApackage.

Calculate the PEC and PIC sequences

>>> al=oapackage.exampleArray(1,1)
exampleArray 1: array 3 in OA(16, 2, 2^5)
>>> PEC = al.PECsequence()
>>> print('PEC sequence: %s'% ','.join(['%.2f' % x for x in PEC]) )
PEC sequence: 1.00,1.00,1.00,0.80,0.00
>>> PIC = al.PICsequence()
>>> print('PIC sequence: %s'% ','.join(['%.2f' % x for x in PIC]) )
PIC sequence: 1.00,1.00,0.95,0.66,0.00

Properties of conference designs

In [SEG19], it is shown that the \(F_4\) vector is useful for classifying definitive screening designs [XLB12] that are generated by folding over a conference design. To calculate the \(F_4\) vector, we first need to compute the \(J_4\)-characteristics of the conference design. The calculations for the \(J_k\)-characteristics of conference designs are similar as for orthogonal arrays; see Statistical properties of orthogonal arrays. Consider a definitive screening design constructed from an \(N\)-run conference design with at least four factors. The \(F_4\) vector of this design collects the frequencies of the \(J_4\)-characteristics of \({2N − 8\lambda}\) for \({\lambda = 1,\ldots,N/4}\) when \(N\) is a multiple of 4, or \({\lambda = 1, \ldots, (N − 2)/4}\) when \(N\) is an odd multiple of 2.

Note: the \(J_4\)-values of a definitive screening design generated by folding over a conference design are twice the value of the \(J_4\)-values of the conference design. The \(F_4\) vector of a conference design and the corresponding definitive screening design are equal.

Calculate the F4 vector for a conference design

>>> import oapackage
>>> array=oapackage.exampleArray(47,1)
exampleArray 47: third conference design in C(20,8)
>>> F4=array.FvaluesConference(4)
>>> print(F4)
(0, 2, 4, 51, 13)
>>> definitive_screening_design = oapackage.conference2DSD(array)

The individual \(J_k\)-characteristics can be calculated with the method Jcharacteristics_conference(). For conference designs, we can calculate the projection statistics using conferenceProjectionStatistics().

Calculate projection statistics for conference designs

>>> array = oapackage.exampleArray(46, 1)
exampleArray 46: second conference design in C(20,8)
>>> pec, pic, ppc = oapackage.conference.conferenceProjectionStatistics(array)
>>> print('Projection estimation capacity for 4 columns: %.3f'  % pec)
Projection estimation capacity for 4 columns: 0.986
>>> J3 = oapackage.Jcharacteristics_conference(array, number_of_columns = 3)