Numerical Experiments with Model-Assisted Pattern Search

Chris Siefert

Department of Mathematics
College of William and Mary
cmsief@cs.wm.edu

1. Introduction

A planned set of evaluations of a deterministic function is called a computer experiment. The problem of optimization of such experiments is a highly non-trivial one. Typically, evaluating function the f, requires running a computationally expensive simulation of some kind of physical system e.g. the helicopter rotor blade studied by Booker (1996). Due to high-frequency, low-amplitude variations in the function f, often induced by numerical error in the computer simulation, derivative information if it can be found, is not helpful. Hence, derivative-based optimization methods, such as quasi-Newton methods would be inappropriate for this problem. Likewise, due to the high computational expense of evaluating the objective function, direct search methods would also be inappropriate for computer experiments, as they require a large number of function evaluations. Since these two mainstays of traditional optimization would fail to be effective for computer experiments, different methods are required.

The idea behind the Surrogate Management Framework (SMF), outlined by Booker et al.(1999), is to use a global approximation to our objective function f, to accelerate the search for a minimizer. The SMF is uses this approximation to guide the pattern search, which is a type of direct search. The specific implementation of the Surrogate Management Framework that was completed by Trosset and Torczon (1997) is called the Model-Assisted Grid Search (MAGS). The algorithm has since been modified, and has been renamed the Model-Assisted Pattern Search (MAPS) by Trosset and Torczon (1999).

The purpose of this report is to study the Model-Assisted Pattern Search more fully and to subject it to more comprehensive testing than has been done to this date. This report explains and discusses the MAPS algorithm in Section 2, discusses previous computational experiments in Section 3 and proposes new computational experiments in Section 4. Section 5 offers preliminary results of the proposed experiments in Section 4, and Section 6 outlines future plans for completing this research. Finally, the paper concludes in Section 7.

2. Model-Assisted Pattern Search (MAPS)

The essential logic of the Model-Assisted Pattern Search is shown below in the figure reproduced from Trosset and Torczon (1999).

1.
Specify an initial grid, $\Gamma_0$, on [a,b].
2.
Perform an initial computer experiment :
(a)
Select initial design sites $x_1,\ldots,x_N\in \Gamma_0$.
(b)
Compute $f(x_1),\ldots,f(x_N)$.
(c)
Construct $\hat{f_0}$ by kriging.

3.
Let $\Gamma_c = \Gamma_0$. Let $x_c=\mbox{argmin}\left(f(x_1),\ldots,f(x_N)\right)$.
Let $\hat{f_c}=\hat{f_0}$ and $\mbox{Eval}_c=\{x_1,\ldots,x_N\}$.

4.
Do until convergence:
(a)
Do until $\mbox{Core}(x_c) \subset \mbox{Eval}_c$:
i.
Apply an optimization algorithm to Sc (the search criterion) to obtain $x_t\in\Gamma_c \backslash \mbox{Eval}_c$.
ii.
Compute f(xt). Let $\mbox{Eval}_c = \mbox{Eval}_c \cup
 \{x_t\}$. Update Sc.
iii.
If f(xt)<f(xc), then let xc=xt.
(b)
Refine $\Gamma_c$.


As a special case of SMF, MAPS inherits its guarantee of global convergence. This guarantee is derived from the fact that SMF is based on the pattern search. See Booker et al.(1999) and Lewis et al.(1998) for information about the convergence of pattern search. Like pattern search, MAPS searches on a sequence of grids.

Grid are refined according to pattern search rules, i.e. only when f(xt) > f(xc) for every xt in the ``core pattern'' of points at xc, denoted as $\mbox{Core}(x_c)$. $\mbox{Eval}_c$ is the set of sites at which f has been evaluated, so when $\mbox{Core}(x_c) \subset \mbox{Eval}_c$ the pattern search grid refinement criterion has been satisfied, and grid refinement is allowed.

A key feature of MAPS is its use of global approximations to help the pattern search find promising trial points. The current global approximation, $\hat{f_c}$, is constructed by a technique known as kriging, explained by Sacks et al.(1989) and Koehler and Owen (1996). Before the pattern search segment of MAPS commences, an initial approximation is constructed by selecting an initial set of design sites, evaluating f there, and then kriging the results. Previous studies of MAGS/MAPS selected the initial design sites with Latin hypercube sampling.

To search for trial points, MAPS optimizes a search criterion, Sc. The original MAGS algorithm used $\hat{f_c}$ as the search criterion; however Torczon and Trosset (1998) and Trosset (1998) argued that this choice is problematic. They suggested that MAGS is too single-minded in its pursuit of a minimizer of $\hat{f_c}$ and that it gives insufficient attention to improving the quality of $\hat{f_c}$. Trosset (1998) proposed setting Sc equal to $\hat{f_c}-w_c\cdot D_c(x)$, where Dc(x) measures a point's potential to improve the approximation and wc is the weight placed on Dc(x). Influenced by Cox and John (1997), Trosset and Torczon (1997) adopted $D_c(x) = \sqrt{\widehat{MSE}(\left[ \hat{f_c}(x)\right])}$.

3. Previous Computational Results

Previous experiments with MAGS/MAPS have focused on the importance of sequential experimentation. Trosset and Torczon (1997, 1999) compared MAGS/MAPS to an the DACE approach proposed by Welch and Sacks (1991). In DACE, a single experiment is performed, a single approximation is constructed and minimized, and a single validation evaluation is performed at the putative minimizer. Trosset and Torczon (1997) found that MAGS outperformed DACE for several choice of V, the permitted number of function evaluations. However, these experiments considered just one objective function. Trosset and Torczon (1999) considered 20 objective functions, constructed by generating 10 realizations of a stochastic process and adding a constant or a quadratic trend. One of these functions was used in this report as well. Again, in Trosset and Torczon's (1999) report MAPS performed DACE. However, only one implementation of MAPS was tested.

Other than this limited testing, however, very little testing has been performed on the MAPS algorithm. There has been no experimentation with varying the number of evaluations, N, devoted to the initial design, and there has also been no experimentation on the effects of a small value of N. Likewise, little testing has been done with varying the mesh size $\Gamma$, and the weighting schedule which determines wc for each iteration of the loop in the MAPS algorithm.

4. Proposed Computational Experiments

Many computational experiments remain to be performed. This study investigates four specific questions, outlined below.

1.
The role of the initial Latin hypercube initial design, should be investigated. Randomly generating Latin hypercube designs with a small number (N) of design sites may result in a set of points that ``clump'' together. This condition may be responsible for the occasional breakdown in MAPS performance. Is this really a problem, and if so, how disastrous is the effect?
2.
Initial designs should also be investigated. Specifically, the proportion of the function budget V, to allocate to the initial design should be a topic for investigation. Allocating the entire budget is unwise, as it removes the advantages of the sequential nature of the MAPS algorithm; and more evaluations than variables are needed to fit a simple linear approximation. However, it is not known what proportion N/V is ideal.

3.
The effect of the initial mesh size on MAPS performance is another area for research. Do large meshes waste function evaluations evaluating Core patterns prior to mesh refinement? Or do small meshes allow the pattern of points sampled to degenerate, making it difficult to improve the approximation. To what extent is this dilemma resolved by the use of a design criterion? These questions lead to a host of possible topics to be investigated.

4.
The role of the weighting schedule for the search criterion should also be investigated. There has been no research on the initial weight to place on the design criterion, nor on how to vary that weight with each iteration of the pattern search. What kind of weighting schedules do well, and how do they interact with mesh sizes? Do certain kinds of schedules work better with certain mesh sizes?

5. Preliminary Results

The experiments described in this section were performed a on randomly generated objective functions of p=2 variables. This function was generated using methods described by Trosset (1999). It is one of the objective functions studied by Trosset and Torczon (1997), the sum of their realizations of a Gaussian process and a constant trend. Unless otherwise noted, all experiments were performed with a budget of function evaluations V=2, N=5 initial design sites, a weighting scheme of $w_0=2, w_{+} = 0.8\cdot w_c$, and a initial mesh fineness of 0.1. A plot of the function over the region of interest is attached in Appendix A.

Several tests of the MAPS algorithm were performed n order to study the effects of the proportion N/V, of function evaluations allocated to the initial design. These tests had 20 runs of MAPS each, and were run for N=3,5,7,11,13 for V=20 and V=25 function evaluations. The figures below illustrate the results of these tests.


  
Figure 1: Variations of Initial Design, V=20
\begin{figure}
\begin{center}
 
\scalebox {.4}{\includegraphics[angle=270]{id20_l.eps}}

 \end{center}\end{figure}


  
Figure 2: Variations of Initial Design, V=25
\begin{figure}
\begin{center}
 
\scalebox {.4}{\includegraphics[angle=270]{id25_l.eps}}

 \end{center}\end{figure}

Figure 1 and 2 support the theory outlined above -- MAPS tended to perform poorly if small quantities of function evaluations were allocated to the initial design. Likewise, MAPS performance also suffered on average, when a relatively high proportion of function evaluation was allocated to the initial design. Although the ideal balance cannot be determined by these preliminary results, the goal of finding such a balance should certainly be a future research goal.

Varying the mesh size produced mixed results. For the function tested in Figure 3, the results seem to suggest that smaller mesh sizes perform the best. This supports the hypothesis that larger mesh sizes tended to waste function evaluations evaluating Core patterns prior to grid refinement.


  
Figure 3: Variations of Mesh Size, V=20
\begin{figure}
\begin{center}
 
\scalebox {.3}{\includegraphics[angle=270]{mesh20_l.01.eps}}

 \end{center}\end{figure}

Another question dealt with in this report is that of the weighting schedules. Specifically, how do weighting schedules interact with mesh size. It is reasonable to suppose that Trosset and Torczon's weighting schedules, $w_0=2, w_{+} = 0.8\cdot w_c$, will do better for most functions when given a smaller mesh. But how are other weighting schedules affected by mesh size? Specifically, how are the schedules based on Cox and John's SDO design (1997) with $w_c=2,
\forall c$ and the original MAGS weight-free design $w_c=0, \forall
c$ affected by the mesh size.


  
Figure 4: Trosset and Torczon's Weighting Schedule, V=20
\begin{figure}
\begin{center}
 
\scalebox {.3}{\includegraphics[angle=270]{wmiact.tnt.01.eps}}

 \end{center}\end{figure}

The results here are fairly interesting. Since the function evaluated in Figure 4 is the same function tested in the testing of mesh sizes, the results are fairly clear -- smaller mesh sizes tend to yield a significant performance gain in the MAPS algorithm. Likewise, the Cox and John inspired weighting schedule shown in Figure 5 shows a similar gain.


  
Figure 5: Cox and John inspired Weighting Schedule, V=20
\begin{figure}
\begin{center}
 
\scalebox {.3}{\includegraphics[angle=270]{wmiact.cox.john.01.eps}}

 \end{center}\end{figure}

However, the ``No Weight'' weighting schedule has a significantly smaller performance gain when it uses a smaller mesh. In fact, this apparent gain is in all likelihood illusory, a product of the small sample size used for these preliminary results. So it is reasonable to infer from these results that there is some sort of interaction between mesh size and weighting scheme.


  
Figure 6: ``No Weight'' Schedule, V=20
\begin{figure}
\begin{center}
 
\scalebox {.3}{\includegraphics[angle=270]{wmiact.noweight.01.eps}}

 \end{center}\end{figure}

Finally, testing on Latin hypercube initial designs has also yielded some interesting results. The figure below shows four initial Latin hypercube designs. In examining these figures, one notes that there is no distinguishing features which sets the designs that yield good results apart from the design that yield poor results. More specifically, it seems to be difficult to discern, a priori if a given Latin hypercube initial design is ``good''. Thus alternatives to Latin hypercubes initial designs should be investigated.


 

\begin{figure}
 \begin{center}
\scalebox {.2}{\includegraphics{glhc.1.eps}}

 
\...
 ....1.eps}}

 
\scalebox
{.2}{\includegraphics{blhc.2.eps}}\end{center}\end{figure}

6. Future Plans

Currently, the code used for the MAPS algorithm is written in S-PLUS, a high-level mathematical programming language. Although S-PLUS supports many advanced features, such non-linear optimization, it is also quite inefficient when it comes to execution time. The present version takes hours to run a set of tests on a mid-level pentium. Over the summer, the author plans to implement the MAPS algorithm in C++. This language is far more portable than S-PLUS, as nearly every system has a C++ compiler available. Also, execution time should be much faster for the C++ implementation, as the C++ can take advantage of optimizing compilers, while S-PLUS is stuck using its interpreter.

Across the course of the next academic year, the author plans to study more closely the affect of these facets of MAPS, as well as others, on the algorithm performance. The author also intends to test MAPS on a wide variety of randomly-generated functions, and in doing so produce much more definitive results than those in the present report.

7. Conclusions

In conclusion, the MAPS algorithm provides a potentially useful method for optimization on a limited budget. However, there has been little study into the effect of algorithm's various parameters, and this report has taken a preliminary look into this area. The tests performed here suggest that MAPS performance is inversely related to mesh size, for certain weighting schedules, but that the choice of weighting schedule can mitigate or possibly reverse that effect. This report also suggests MAPS tends to perform best on allocations of evaluations to the initial design that are medium-sized. However, finding the optimal level is an area for future research. Finally, as for Latin hypercube initial designs, this report can say little, other than to suggest that more research should go into the use of other experimental designs.

Acknowledgments

The author would like to thank Professors Trosset and Torczon for their aid and assistance in the production of this report. The author would also like to thank all other participants in Math490-02, especially Laura Taylor, Tony Padula and Erin Parker for their support in all my efforts.

A. Plot of the Studied Function


 

\begin{figure}
\begin{center}
 
\scalebox {0.65}{\includegraphics{function.01.eps}}

 \end{center}\end{figure}

References

1
Booker A.J. (1996). Case studies in design and analysis of computer experiments. In Proceedings of the Section on Physical and Engineering Sciences, pages 244-248. American Statistical Association.

2
Booker, A.J.; Dennis, J.E.; Frank, P.D.; Serafini, D.B.; Torczon, V. and Trosset, M.W. (1999). A rigorous framework for optimization of expensive functions by surrogates. Structural Optimization, 17(1):1-13.

3
Cox, D.D. and John, S. (1997). SDO: a statistical method for global optimization. In Alexandrov, N. and Hussaini M.Y., editors Multidisciplinary Design Optimization: State of the Art, pages 315-329. SIAM, Philadelphia.

4
Goldstein, A.A. and Price, J.F. (1977). On descent from local minima. Mathematics of Computation, 25:569-574.

5
Koehler, J.R. and Owen, A.B. (1996). Computer experiments. In Ghosh, S. and Rao, C.R., editors, Handbook of Statistics, Volume 13, pages 261-308. Elsevier Science, New York.

6
Lewis, R.M., Torczon V., Trosset, M.W. (1998). Why pattern search works. Optima, (59)1-7.

7
Sacks, J.; Welch, W.J.; Mitchell, T.J. and Wynn, H.P. (1989). Design and analysis of computer experiments. Statistical Science. 1989. Vol 4. No. 4. 409-435.

8
Torczon, V. and Trosset, M.W. (1998). Using approximations to accelerate engineering design optimization. In 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization: A Collection of Technical Papers, Part 2, pages 738-748. American Institute of Aeronautics and Astronautics.

9
Trosset, M.W. (1998). Optimization on a limited budget. In 1998 Proceedings of the Section on Physical and Engineering Sciences. American Statistical Association. To appear.

10
Trosset, M.W. (1999). The krigifier: a procedure for generating pseudorandom nonlinear objective functions for computational experimentation. Interim Report 35, Institute for Computing Applications in Science & Engineering, NASA Langley Research Center, Hampton, VA 23681-0001.

11
Trosset, M. W. and Torczon V. (1997). Numerical optimization using computer experiments. Technical Report 97-38. Institute for Computing Applications in Science & Engineering, NASA Langley Research Center, Hampton, VA 23681-2199.

12
Trosset M.W. and Torczon V. (1999). Sequential Computer Experiments for Numerical Optimization.

13
Welch W.J., and Sacks, J. (1991). A system for quality improvement via computer experiments. Communications in Statistics -- Theory and Methods, 20:477-495.

About this document ...

This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -no_navigation -split 0 maps_math490.tex.

The translation was initiated by Chris Siefert on 5/17/1999


Chris Siefert
5/17/1999