Presentation Transcript
Getting started with GEM-SA: Getting started with GEM-SA Marc Kennedy
This talk: This talk Starting GEM-SA program
Creating input and output files
Explanation of the menus, toolbars, etc.
Description of the project window
Starting GEM-SA: Starting GEM-SA Double-click the GEM-SA icon to start
The main window appears, with
Menu
Toolbar
Sensitivity analysis output grid
Log window
Slide4: menu Log window toolbar Sensitivity analysis output grid
Toolbar icons: Toolbar icons New project
Open project
Save project
Print output report
Edit project
Generate input design points Rescale an input
Standardise design
Copy input design to clipboard
Convert input to integer
Run the analysis
Help
Sensitivity analysis output grid: Sensitivity analysis output grid This will report the sensitivity results after the analysis is complete
One line for each input parameter
One line for each pair of inputs, if joint effects are selected
Log Window output: Log Window output Tells us
Which training data are being loaded/saved
Transformations applied to the data
Fitted Gaussian process parameters
Summary of the uncertainty analysis
Creating a GEM project: Creating a GEM project To build the emulator we first need 3 files:
Data file of code inputs
Data file of code outputs
GEM-SA project file
Restrictions on input/output data: Restrictions on input/output data Single output
Multiple outputs must be treated individually
Max 30 input parameters
Max 400 training points
The data files are plain text files
One line for each point
Input file can be space or tab delimited
Generating a new input design: Generating a new input design Designs can be generated using the toolbar icon or the menu: Input Generate…
The design dialog appears
Generating a new input design: Generating a new input design Click OK and fill in the required range for each input
Click OK again
Editing input designs: Editing input designs If you select a column, you can rescale values of that input or round values to be integers
Designs can be loaded into or saved from this window using the Inputs menu. Use to copy the points to the clipboard for use in other programs
Types of design: Types of design GEM-SA can generate 2 types of design
LP-
Maximin Latin Hypercube designs
Both have good space-filling properties
Ensure all regions of the input space are well represented
LP- design: LP- design Very quick to generate
Deterministic set of uniform points
Increasing the sample size just adds points to the smaller design
Making it useful for sequential analysis
Only have to generate the extra runs
Maximin Latin hypercube design: Maximin Latin hypercube design Maximin Latin Hypercube designs
Maximise the minimum distance amongst all pairs of points
Can take a long time to generate
Univariate projections are equally spaced
Each input has all its range represented
Good when only a few inputs are active
Creating output points from these inputs: Creating output points from these inputs This is the tricky part…
Each row from the input design must be used to generate a single output, e.g. using
Spreadsheet
Simple, but requires functional form
Script
Only need executable code
Loop through inputs, modify code input file
Modify code to loop through the points
Messy, need source code
Example: using a spreadsheet: Example: using a spreadsheet Copy the input design to the clipboard using
Open Excel and paste inputs
Create formula in final column
Copy formula for all rows of the design
Cut and paste special (values) in a new sheet
Save as text file
Example: using a script: Example: using a script Read base input file
Read training inputs file
Loop through training file lines
Replace target inputs using training line
Write new base input file
Run code
Calculate single output and add to training output file
Slide19: my $pftchangeline = 21; # change line 21 within the input file for each run
my @pftchangecols = (11,14,23,19); # columns within pftchangeline to modify
my @pftinlh = (0,1,2,3); # ordering of these parameters within training inputs
open(BASEINFILE, 'input.dat'); # getinitial (fixed) input file used by sdgvmd
my @lines = andlt;BASEINFILEandgt;; # and store the input lines in @lines
close BASEINFILE;
open(LHFILE, 'training_inputs.txt');
my $newpftline = $lines[$pftchangeline];
my @newpftpoints = split(' ', $newpftline);
while (andlt;LHFILEandgt;){
# assigns each line in turn to $_
chomp;
split;
my @lhpoints = @_;
open(INFILE, 'andgt; inputfile.dat');
@newpftpoints[@pftchangecols] = @lhpoints[@pftinlh] # modify lines
$lines[$pftchangeline] = join(' ', @newpftpoints).'\n';
print INFILE @lines;
close INFILE;
`sdgvm0 input.dat`; # run sdgvm0 with modified input
# now do something with the output files....
...
}
The project window: The project window Appears whenever you
Load a project
Edit a project
Create new project
This window has 3 tabs
Options
Files
Simulations
Slide21: How many inputs? What are the input names?
Slide22: Which joint effects should be calculated? What should be calculated, and how?
Slide23: Are the inputs uncertain? What prior mean for the output?
Slide24: What kind of prediction? What kind of cross validation?
Slide25: Names for the input files Names for the output files
Slide26: MCMC control parameters How many points used to calculate main effects, joint effects How many realisations of predictions, main and joint effects to generate
Input parameter names: Input parameter names This window appears if you press the Names… button
Giving names is optional, but useful later when looking at GEM-SA output
Ordering can be changed using the arrows
Selecting joint effects: Selecting joint effects If you select calculate joint effects, individual items in the joint effects window can be highlighted for inclusion in joint effect calculations
Need to unselect the default all inputs first
Unless you want to consider all pairs
Other checkboxes: Other checkboxes Sum effects
Use this if you want main effects of the 2 inputs to be included in the realisations of the joint effect of a pair
The sensitivity measure, which computes joint sensitivity indices separately from the component main effects
Other checkboxes: Other checkboxes Code has numerical error
Use this if your code has numerical errors which you want to smooth out
The variance of the error will be estimated as part of the fitting process
Can make the fitting process quite unstable, so avoid if possible!
Other checkboxes: Other checkboxes Use MCMC for emulator parameters
For serious Bayesians only!
Takes into account uncertainty in the fitting of the emulator
Slows down the computation substantially, usually with minimal effect on the results
Auto-tune Metropolis algorithm
Use only with MCMC
Input uncertainty options: Input uncertainty options All unknown, product normal
Inputs are independent, normally distributed
All unknown, uniform
Inputs are independent, distributed uniformly between the min and max values of the training data
All known
No uncertainty analysis required
Input uncertainty options: Input uncertainty options Some known, rest product normal
Some input values will be fixed (in the dialog window or in a prediction file)
Others will be given normal input parameters
Prior mean options: Prior mean options If you believe the output is roughly linear function of its inputs, select ‘linear term for each input’
Otherwise a single value will be used to represent the prior overall level of the output
Input normal parameters: Input normal parameters Window appears if you click OK having selected normal inputs
Input fixed and normal parameters: Input fixed and normal parameters Window appears if you click OK having selected some fixed inputs, rest normal
For fixed inputs, tick the box and enter the fixed value in the first test box
Selecting prediction type: Selecting prediction type Predictions can be
Correlated realisations of outputs at the prediction inputs
Similar to main effect outputs
Marginal means and variances of outputs at the prediction inputs
Faster to compute, especially with many prediction points
Easy to interpret
Selecting cross validation type: Selecting cross validation type Choice of none, leave-one-out or leave final 20% out
Leave-one-out
Hyper-parameters use all data and are then fixed when prediction is carried out for each omitted point
Leave final 20% out
Hyper-parameters are estimated using the reduced data subset