Main Menu   Here We Go Visualize Menu


Data Menu

Menus for the chosen
domain are listed in the menu bar. The default domain is D(iscrete) for certain discrete distributions.

The data menu contains entries to generate, transform and convert data sets. The parametric options depend on the selected mode and domain. The following options are provided by the Data Menu:

Read Data
Generate Univariate Data
Generate Bivariate Data
Generate Multivariate Data
Generate Time Series
Generate Counting/Point Process
Transform Data
Convert to
Choose Data
List Data
Quit


Read Data

Load a data set from a file by means of the Read Data option. Several data sets are stored in the dat subdirectory. There are the following options in the dialog box:
Filename
All data sets with predefined suffix *.dat in your working directory are displayed in the dialog box. To load data sets which are stored in a different directory, one must choose a different path.
File format
In this list, select a predefined file format. Current formats are *.dat and *.*.
Directories and drives
Here, specify any valid drive or directory into which data can be stored.

Generate Univariate Data

The following options enable the generation of a univariate data set. Note that the distributions belong to different domains.

Discrete domain: Uniform Binomial
Poisson Negative Binomial
SUM domain: Gaussian Gaussian-GCauchy
Student Distributions Non-central Student
Sum-Stable Distributions
MAX domain: Gumbel (EV 0) Frechet (EV 1)
Weibull (EV 2) EV
POT domain: Exponential (GP 0) Pareto (GP 1)
Beta (GP 2) GP
Animation for Discrete Data
Visualizing Continuous Data

Discrete Uniform

Generate a data set of a selected sample size according to the uniform distribution on the integers from r to s.

Options:
r integer
s integer > r
Samplesize positive integer
Filename Select a filename, and, optionally, a directory.

The stored data set of type Xtremes Discrete Data is now the
active one. Execute the Animation option to generate the data interactively.

Binomial

Generate a data set according to a binomial B(n,p) distribution.

Options:
n positive integer
p [ 0 , 1 ]
Samplesize positive integer
Filename Select a filename, and, optionally, a directory.

The stored data set is now the
active one. Execute the Animation option to generate the data interactively.

Poisson

Generate a data set according to a Poisson P(lambda) distribution.

Options:
lambda positive real
Samplesize positive integer
Filename Select a filename, and, optionally, a directory.

The stored data set is now the
active one. Execute the Animation option to generate the data interactively.

Negative Binomial

Generate a data set according to the negative binomial distribution, which is a mixed Poisson distribution with respect to a gamma distribution.

Options:
r parameter nonnegative real
p parameter ( 0 , 1 )
Samplesize positive integer
Filename Select filename, and, optionally, a directory.

Note that r is the shape parameter of the mixing gamma distribution. Moreover, p = 1/( 1+sigma ), where sigma > 0 is the scale parameter of the mixing gamma distribution.
The stored data set is now the
active one. Execute the Animation option to generate the data interactively.

Gaussian

Generate a data set according to a Gaussian distribution.

Options:
mu location parameter real
sigma scale parameter positive real
Filename Select a filename, and, optionally, a directory.

Gaussian-GCauchy

Generate a data set according to a mixture of a Gaussian and and a GCauchy distribution.

Options:
mu location parameter real
sigma scale parameter (GCauchy) positive real
d contamination parameter [ 0 , 1 ]
alpha shape parameter positive real
sigma 1 scale parameter (Gaussian) positive real
Filename Select a filename, and, optionally, a directory.

The contamination parameter d determines the weight of the GCauchy part of the distribution. The location parameter mu is used for the GCauchy distribution as well.

Student Distributions

Generate a data set according to a Student distribution.

Options:
sigma scale parameter positive real
alpha shape parameter positive real
Filename Select a filename, and, optionally, a directory.

Xtremes uses the parameterization as given in Statistical Analysis, page 94, with the Cauchy distribution for alpha = 1 and the Gaussian distributions as a limiting case when alpha goes to infinity.

Non-central Student

Generate a data set according to a non-central Student distribution.


Sum-Stable Distributions

Generate a data set according to a sum-stable distribution.

Options:
alpha shape parameter [ 0 , 2 ]
skewness skewness ( -1 , 1 )
mu location parameter real
sigma scale parameter positive real
Filename Select a filename, and, optionally, a directory.

Xtremes uses the continuous parameterization (see Statistical Analysis, Section 6.3). The procedure is taken from J.M. Chambers, C.L Mallows and B.W. Stuck (1976): A Method for Simulating Stable Random Variables, Journal of the American Statistical Association 71, 340 - 344.

Gumbel (EV 0)

Generate a data set according to a Gumbel distribution.

Options:
mu location parameter real
sigma scale paremeter positive real
Filename Select a filename, and, optionally, a directory.

Frechet (EV 1)

Generate a data set according to a Frechet (EV 1) distribution.

Options:
alpha shape positive real
mu location real
sigma scale positive real
Filename Select a filename, and, optionally, a directory.

Weibull (EV 2)

Generate a data set according to a Weibull (EV 2) distribution.

Options:
alpha shape negative real
mu location real
sigma scale positive real
Filename Select a filename, and, optionally, a directory.

EV

Generate a data set according to an EV distribution using the unified von Mises parameterization.

Options:
gamma shape parameter real
mu location parameter real
sigma scale parameter positive real
Filename Select a filename, and, optionally, a directory.

Exponential (GP 0)

Generate a data set according to an exponential distribution.

Options:
mu location parameter real
sigma scale parameter positive real
Filename Select a filename, and, optionally, a directory.

Pareto (GP 1)

Generate a data set according to a Pareto distribution.

Options:
alpha shape parameter positive real
mu location parameter real
sigma scale parameter positive real
Filename Select a filename, and, optionally, a directory.

Beta (GP 2)

Generate a data set according to a Beta (GP 2) distribution.

Options:
alpha shape parameter negative real
mu location parameter real
sigma scale parameter positive real
Filename Select a filename, and, optionally, a directory.

GP

Generate a data set according to a GP distribution using the unified von Mises parameterization.

Options:
gamma shape parameter real
mu location parameter real
sigma scale parameter positive real
Filename Select a filename, and, optionally, a directory.

Animation for Discrete Data

Two windows open for plotting

Generate data by clicking on

The scatterplot and the sample histogram for the generated data are displayed. The data set is saved to a file and becomes the active one as soon as the specified Samplesize is attained.
In addition, a dialog box opens with +1 and +20 buttons by which one can also increase the number of generated data until the selected Samplesize is attained.

Visualizing Continuous Data

Data are generated interactively. The underlying df is displayed in a graphics window. Generate data by

The sample df for the given data is displayed. The data set is saved to a file and becomes active as soon as the size specified in the dialog box is attained.

Generate Bivariate Data

This submenu provides the generation of data from distributions in

Bivariate EV models: Gumbel-McFadden
Marshall-Olkin
Huesler-Reiss

It is only available in the multivariate mode within the MAX domain.

Gumbel-McFadden

Generate a bivariate data set according to the Gumbel-McFadden distribution with univariate Weibull (EV 2) margins with shape parameter alpha = -1, i.e. exponential distributions on the negative half line.

Options:
mu1 location parameter real
sigma1 scale parameter positive real
mu2 location parameter real
sigma2 scale parameter positive real
lambda dependence parameter larger/equal to 1
Sample Size positive integer
Filename Select a filename, and, optionally, a directory.

The stored data set is now the
active one. Note that the generated data set is of type Xtremes Multivariate Data.

Marshall-Olkin

Generate a bivariate data set according to the Marshall-Olkin distribution with univariate Weibull (EV 2) margins with shape parameter alpha = -1, i.e. exponential distributions on the negative half line.

Options:
mu1 location parameter real
sigma1 scale parameter positive real
mu2 location parameter real
sigma2 scale parameter positive real
lambda dependence parameter [ 0 , 1 ]
Sample Size positive integer
Filename Select a filename, and, optionally, a directory.

The stored data set is now the
active one. Note that the generated data set is of type Xtremes Multivariate Data.

Huesler-Reiss

Generate a bivariate data set according to a Huesler-Reiss distribution with univariate Gumbel margins with location and scale parameters mu1 and sigma1, or mu2 and sigma2, respectively.

Options:
mu1 location parameter real
sigma1 scale parameter positive real
mu2 location parameter real
sigma2 scale parameter positive real
lambda correlation coefficient [ -1 , 1 ]
Sample Size positive integer
Filename Select a filename, and, optionally, a directory.

The stored data set is now the
active one. Note that the generated data set is of type Xtremes Multivariate Data.

Generate Multivariate Data

This option is only available in the multivariate mode within the SUM domain. One can generate bi- and trivariate Gaussian samples.

Bivariate Gaussian

Generate a bivariate Gaussian sample. The first component is distributed according to a Gaussian distribution with location parameter mu1 and scale parameter sigma1, the second one with mu2 and sigma2. The correlation coefficient rho determines the degree of dependence between both components.

Options:
mu1 location parameter real
sigma1 scale parameter positive real
mu2 location parameter real
sigma2 scale parameter positive real
rho correlation coefficient [ -1 , 1 ]
Sample Size positive integer
Filename Select filename, and, optionally, directory.

The stored data set is now the
active one. Note that the generated data set is of type Xtremes Multivariate Data.

Trivariate Gaussian

Generate a trivariate Gaussian-distributed data set.

Options:
Covariances positive real
Location location parameters real
Sample Size positive integer
Filename Select a filename, and, optionally, a directory.

Let (X(1), X(2), X(3)) be distributed according to a trivariate Gaussian distribution. In the dialog box, enter Cov(X(1),X(1))=Var(X(1)) (first column of option Covariances), Cov(X(2),X(1)),Cov(X(2),X(2))=Var(X(2)) (second column) and Cov(X(3),X(1)),Cov(X(3),X(2)),Cov(X(3),X(3))=Var(X(3)) (third column). Under Location, enter the location parameters for each component.
The stored data set is now the
active one.

Bivariate Student

Trivariate Student

Generate Time Series

The following options are provided to generate time series data.

Gaussian AR(1)

Generate a data set according to a Gaussian AR(1) process.

Options:
mu location parameter real
sigma scale parameter positive real
d correlation coefficient [ 0 , 1 ]
Sample Size positive integer
Filename Select a filename, and, optionally, a directory.

The stored data set is now the
active one.

Moving Average MA(q)

Generate a data set according to a Moving Average MA(q) process.

Options:
Coefficients of Moving Average
Enter the coefficients of the MA polynomial, separated by blanks.
Qf of initial Random Variables
Enter qf of the white noise random variables. Use the predefined function calls provided by the UserFormula facility. For example, enter paretoqf( 2,x ) to generate standard Pareto data under the shape parameter alpha = 2 .
Additional options:
Sample Size positive integer
Filename Select a filename, and, optionally, a directory.
The stored data set is now the active one.

ARMA(p, q) Process

Generate a data set according to a Gaussian ARMA(p,q) process. The simulation makes use of the innovation algorithm.

Options:
AR polynomial
Enter coefficients of the AR polynomial, separated by blanks (the leading coefficient phi (0) = 1 must be omitted).
MA polynomial
Enter coefficients of the MA polynomial, separated by blanks (the leading coefficient theta (0) = 1 must be omitted).
If the AR polynomial is unequal to zero on the unit circle, then there is a causal ARMA process (which has a representation as a moving average).
Additional options:
Sample Size positive integer
Filename Select a filename, and, optionally, a directory.
The stored data set is now the
active one.

Generate Counting/Point Process

Let 0 <= T[1] <= T[2] <= T[3] <= ... denote the arrival times of data X[1], X[2], X[3], ... . Up to a time horizon T such arrival times are generated and stored to a file as Xtremes Univariate Data.
In addition, the path of the pertaining counting process N(t), t >= 0, representing the number of data occurring up to time t, can be plotted by using the Visualizing button. Alternatively, adopt the option
Path in the Visualize menu.
In the bivariate case the marks X[1], X[2], X[3], ... are added resulting in points (T[1],X[1]), (T[2],X[2]), (T[3],X[3]), ....
For further details, see Statistical Analysis, pages 197 - 200.

Poisson Process

Recollect the general remarks about arrival times in Generate Counting/Point Process. We start with the most simple case of a homogeneous Poisson process on the positive half-line.
The first arrival process is the homogeneous Poisson process with intensity lambda. The interarrival times Y[i] = T[i] - T[i-1] are iid exponential random variables with expectation 1/lambda. The numbers N(t) are Poisson distributed with parameter lambda t.

Parameters are:

lambda intensity positive real
T time horizon nonnegative real
Filename filename

Recollect that the stored arrival times are of type Xtremes Univariate Data. This is now the active data set.

Polya-Lundberg Process

Recollect the general remarks about arrival times in Generate Counting/Point Process. The second arrival process is the Polya-Lundberg process which is a mixed Poisson process with a parameter lambda drawn according to a gamma density with shape parameter alpha and scale parameter sigma. Thus, the arrival times are drawn according to the Poisson process with intensity lambda.

alpha shape positive real
sigma scale positive real
Filename filename

Recollect that the stored arrival times are of type Xtremes Univariate Data. This is now the active data set.

Marked Poisson Process

Generate a time series with arrival times according to a homogeneous Poisson process with intensity lambda (also see Poisson Process) and marks according to a generalized Pareto (GP) distribution.

Transform Data

Data sets can be transformed by means of several predefined operations. The resulting data set is of the same data type as the original one. Choose the option Convert to to convert a data set to a different type.

Change Sign

The signs of the data of the active univariate or multivariate data set are changed. Enter a filename to store the transformed values in a separate file.

This option may also be applied to a time series. In this case, the second component is transformed; the first one remains unchanged.

Affine Transformation

An affine transformation is applied to the active univariate or multivariate data set, i.e. the transformation
f(x) = mu + sigma x
is done for each point of the data set. The transformed values are written to a file.

This option may also be applied to a time series. In this case, the second component is transformed; the first one remains unchanged.

Save Exceedances

This option is applicable for Xtremes Univariate Data, Xtremes Time Series and Xtremes Multivariate Data. The values exceeding the specified threshold are written to a new data set. In the multivariate case, recall that a multivariate sample has a matrix form. One must select one component (column) of the active data set. The lines containing exceedances over the specified threshold in the selected column are written to a new multivariate data set.

Save Blocks Maxima

Given a univariate data set x[1], ..., x[n] of size n, Xtremes builds blocks x[1], ..., x[k]; x[k+1], ..., x[2k]; ... ; x[lk], ..., x[n] with k denoting the Block size and l = [n/k] the number of blocks. The maximum of each block is saved to a file. Enter block size and filename in the pertaining edit fields. The transformed data set is stored under the selected name.

Save Moving Maxima

Given a univariate data set x[1], ..., x[n], Xtremes builds moving blocks x[1], ..., x[k]; x[2], ..., x[k+1], .... If the sample size n is exceeded, blocks will be filled with values x[1], x[2], ... again. The maxima of all moving blocks are calculated and written to a new data set. Enter block size and filename in the pertaining edit fields. The transformed data set is stored under the selected name.

Save Blocks Sums

Given a univariate data set x[1], ..., x[n], Xtremes builds blocks x[1], ..., x[k], x[k+1], ..., x[2k], ..., x[lk], ..., x[n] with k denoting the Block size and l the number of blocks. The sum of each block is saved to a file. Enter block size and filename in the pertaining edit fields. The transformed data set is stored under the selected name.

Save Moving Sums

Given a univariate data set x[1], ..., x[n], Xtremes builds moving blocks x[1], ..., x[k]; x[2], ..., x[k+1], .... If the sample size n is exceeded, blocks will be filled with values x[1], x[2], ... again. The sums of all moving blocks are calculated and written to a new data set. Enter block size and filename in the pertaining edit fields. The transformed data set is stored under the selected name.

Save Cluster Maxima

Given a data set of type Xtremes Time Series x[1], ..., x[n], consider the exceedances x[i(1)], ..., x[i(k)] over a predetermined threshold u. The values i(j) are addressed as exceedance times. Clusters of exceedance times are built in the following way: Fix some positive integer r. Any run of at least r consecutive observations x[i] below the threshold u separates two clusters. In other words: between two consecutive clusters of exceedance times there is a minimal gap of length r. Xtremes calculates the maxima of all clusters and writes them to a new data set.

Dialog options:
Run length
Enter the minimum gap between two consecutive clusters.
Threshold
Enter threshold for data set.
Filename
The stored data set is now the active one.

Order Data

The active univariate data set is sorted. The sorted values are written to a file. In case of Xtremes Multivariate Data, one must select a component of the active data set first. The values in the selected column are sorted in an ascending order, leaving the line intact, that means, if the position of a specific value must be changed, the whole line containing it will be moved (recall that multivariate sample have a matrix structure).

Cumulate Data

The active data set is cumulated, i.e. the k-th value of the cumulated data set contains the sum of the first k values of the original one.

Symmetrize

The active data set is symmetrized around zero or the median, i.e. given a sample x[1], ..., x[n], Xtremes generates a data set x[1], -x[1]+m, ..., x[n], -x[n]+m with either m = 0 or m = F**(-1)(1/2) (with F denoting the underlying df).

Select Columns

Generate a new multivariate data set by selecting single columns of the active one. The components of the active data set are displayed on the left-hand side of the dialog box, those of the new one on the right side. They can be moved from on side to the other using the arrow buttons. The new multvariate data set is composed from the selected columns. This option can also be utilized to rearrange the components of the active data set.

Dialog option:
Filename
The stored data set is now the active one.

Date Transformation

This operation requires the date (format: day-month-year) in the first three columns of the data set. If necessary, apply the Select Columns option. The transformation works as follows: the days are enumerated in an ascending order, where missing days are treated as described subsequently: if, for instance, Monday 24th is addressed as 0, then Tuesday 25th will be 1 and Thursday 27th will be 3 while Wednesday is missing. The transformed data set is of type Xtremes Multivariate Data and can be converted to an Xtremes Time Series. In other words: for each day, Xtremes calculates the difference (in days) to day zero.

Relative Frequencies

Given data of type Xtremes Discrete Data, the frequencies in the second component are replaced by the pertaining relative frequencies. The data are of type Xtremes Multivariate Data.

Fill missing (only multivariate mode)

Missing values of a given multivariate data set are imputed according to a procedure explained in Statistical Analysis on page 220: A value is randomly selected from the k nearest neighbors.

Convert to

Data sets can be converted to other types by means of several predefined operations. Choose the option Transform Data to perform transformations within a data type.

Convert to Grouped Data

The active univariate data set is converted to a grouped data set. Specify a partition and a filename for the generated data set.

The edit fields from, to and step width may be utilized to create a partition.

This option may be applied to Xtremes Multivariate Data and Xtremes Time Series as well. In the multivariate case, one must first select two components of the original data set. A new dat set is generated, with the first component containing the "cells", the second one the "data". The new data set is of type Xtremes Grouped Data, while the data themselves have not been changed. In case of Xtremes Time Series, the sample remains unchanged, while only the data type is converted to Xtremes Grouped Data.

Additional dialog options:
Filename
Enter filename and, optionally, directory. The stored data set is now the active one.

Convert to Univariate Data

The active data set is converted to a univariate one. The option may be applied to the following data types:

Censored The censoring information is removed from the data set.
Grouped The data are distributed equally within the intervall defined by the partition.
Multivariate The user is prompted for a component of the data set.
Discrete A data set with multiple points is generated.
Time Series The first component containing the time is removed.

Additional dialog options:
Filename
Enter filename and, optionally, directory. The stored data set is now the active one.

Convert to Discrete Data

Convert to Time Series

The active data set is converted to an Xtremes Time Series and written to a text file. Time series data are given by pairs ( i, x[i] ), i = 1, ..., n, of discrete times i and reals x[i]. This option can be applied to univariate and multivariate data sets.

Dialog options:
Filename
Enter filename and, optionally, directory. The stored data set is now the active one.
If the active data set is of type Xtremes Multivariate Data, the dialog box provides another option:
Please select two components of the multivariate data set
Select one component in the left-hand list, this will be considered as the time index. The other component, chosen from the right-hand list, represents the values of the time series.

Convert to Censored Data

The primary interest concerns data x[1], ..., x[n], yet the pairs (z[1], d[1]), ..., (z[n], d[n]) are merely observed. We have z[i] = min (x[i], y[i]), where y[i] are the censoring values. Moreover, d[i] indicates whether censoring has taken place or not. We have

d[i] = 1, if x[i] is not censored, and
d[i] = 0, if x[i] is censored by y[i].

Let x[1],..,x[i] be the active data of type Xtremes Univariate Data. Fix a censoring distribution in the dialog box. Xtremes will generate the censoring values y[1], ..., y[n] under this distribution and compute d[i] and z[i]. The data set (z[1], d[1]), ..., (z[n],d[n]) is of type Xtremes Censored Data.
Selection of Censoring Quantile Function
Enter the qf of the censoring distribution. Use the predefined function calls provided by the UserFormula facility. For example, enter paretoqf(2,x) to generate standard Pareto data y[i], under the shape parameter alpha = 2 , as censoring values.
Filename
Enter a filename and, optionally, a directory. The stored data set is now the active one.

Convert to Multivariate Data

An arbitrary number of data sets can be combined to a multivariate data set. Mark those you want to combine in the list box showing all loaded data sets.

Zeros will be added at the end of a column if data sets of different sizes are combined. Grouped and discrete data sets are not processed any further; Xtremes treats them like bivariate data sets when this option is applied.

Dialog options
Filename
Enter filename and, optionally, directory. The stored data set is now the active one.

Choose Data

The Choose Data dialog allows the user to choose a new active data set from those already having been loaded or generated.
One can also delete data sets from the memory. Such a deletion will not affect the associated files on your disk. All curves and estimators based on the data set will be deleted automatically.
This option is also available by means of a rightclick within the Xtremes main window.

List Data

The active data set is displayed in a text window.

Quit

The Quit option terminates Xtremes.

© 2005
Xtremes Group · updated Jun 21, 2005