International Conference on Statistical Distributions and Applications
ICOSDA 2019

 Oct. 10-12, 2019, at Eberhard Conference Center, Grand Rapids, MI, USA

Home

Conference Committee

Topic-Invited Sessions

Scheduled Program   Conference Map

Call for Papers

(Expired)

Grant/Travel/Lodging Information

Accepted Abstracts

Keynotes & Plenary Speakers

Conference Registration

Important Deadlines

 

Titles and abstracts for Keynote and Plenary speakers are on the ‘Keynotes & Plenary Speakers’ Page.

 

Abstracts – Topic-Invited Speakers (Alphabetically Ordered)

 

TI_1_0

Abdelrazeq, Ibrahim

Rhodes College 

Title

Goodness of fit Tests

In general, the goodness-of-fit-tests are used to test whether a sampled data fits a claimed distribution, a particular model, or even a stochastic process. This area has become very vast, and many approaches are now used to find the appropriate goodness-of-fit test: parametric, non-paramedic, classical, or even Bayesian approaches. In this session's talks, you will explore goodness-of-fit tests that exemplify many of these different approaches.

TI_1_4

Abdelrazeq, Ibrahim

Rhodes College 

Title

The Spread Dynamics of S&P 500 vs Levy-Driven OU Processes 

When an Ornstein-Uhlenbeck process is claimed and observed at discrete times 0, h, 2h,··· [T/h]h, the unobserved driving process can be approximated from the observed process. Approximated increments of the driving process are used to test the assumption that the process is Levy-driven. Asymptotic behavior of the test statistic at high sampling frequencies is developed assuming that the model parameters are known. The behavior of the test statistics using an estimated parameter is also studied. Performance of the test is illustrated through simulation. 

TI_3_4

Abdurasul, Emad

James Madison University

Title

The Product Limit survival function Distribution with Small Sample Inference

Our contribution is deriving the exact distribution of product limit estimators and developed mid-p population tolerance interval for it. Then we develop a saddlepoint-based method for the population survival function from the product limit (PL) survival function estimator, under the proportional hazards model to generate a small sample confidence bands for it. The saddlepoint technique depends upon the Mellin transform of the zero-truncated product limit estimator. This transform is inverted through saddlepoint approximations to yield highly accurate approximations to the cumulative distribution functions of the respective cumulative hazard function estimator. Then we compare our saddlepoint confidence interval with what we got from the exact distribution and with that we got from the large sample method. From our simulation study we found that the saddlepoint confidence interval is very close to the confidence interval derived from the exact distribution, while being much less difficult to compute, and outperform the competing large sample methods in terms of coverage probability.

TI_48_4

Aburweis, Mohamed

University of Central Florida

Title

Comparative study of the distribution of repetitive DNA in model organisms abstract

Repetitive DNA elements are abundant in the genome of a wide range of organisms. In mammals, repetitive elements comprise about 40-50% of the total genomes. However, their biological functions remain largely unknown. Analysis of their abundance and distribution may shed some light on how they affect genome structure, function, and evolution. We conducted a detailed comparative analysis of repetitive DNA elements across ten different eukaryotic organisms, including chicken (G. gallus), zebrafish (D. rerio), Fugu (T. rubripes), fruit fly (D. melanogaster), and nematode worm (C. elegans), along with five mammalian organisms: human (H. sapiens), mouse (M. musculus), cow (B. taurus), rat (R. norvegicus), and rhesus (M. mulatta). Our results show that repetitive DNA content varies widely, from 7.3% in the Fugu genome to 52% in the zebrafish, based on Repeat Masker data. The most frequently observed transposable elements (TEs) in mammals are SINEs (Short Interspersed Nuclear Elements), followed by LINEs (Long Interspersed Nuclear Elements). In contrast, LINEs, DNA transposons, simple repeats, and low complexity repeats are the most frequently observed repeat classes in the chicken, zebrafish, fruit fly, and nematode worm genomes, respectively. LTRs (Long Terminal Repeats) have significant genomic coverage and diversity, which may make them suitable for regulatory roles. With the exception of the nematode worm and fruit fly, the frequency of the repetitive elements follows a log-normal distribution, characterized by a few highly prevalent repeats in each organism. In mammals, SINEs are enriched near genic regions, and LINEs are often found away from genes. We also identified many LTRs that are specifically enriched in promoter regions, some with a strong bias towards the same strand as the nearby gene. This raises the possibility that the LTRs may play a regulatory role. Surprisingly, most intronic repeats, with the exception of DNA transposons, have a strong tendency to be on the opposite DNA strand as the host gene. One possible explanation is that intronic RNAs which result from splicing may contribute to retro transposition to the original intronic loci.

TI_2_3

Ahmad, Morad

University of Jordan 

Title

On the class of Transmuted-G Distributions 

In this talk, we compare the reliability and the hazard function between a baseline distribution and the corresponding transmuted-G distribution. Some examples based on existing transmuted-G distributions in literature are used. Three tests of parameter significance are utilized to test the importance of a transmuted-G distribution over the baseline distribution, and real data is used in an application of the inference about the importance of transmuted –G distributions. 

TI_47_0

Akinsete, Alfred

Marshall University, Huntington, WV 

Title

A new class of generalized distributions 

This session presents a new class of generalized statistical distributions, which may provide the robustness and versatility for scientists and practitioners dealing with real life data. Each paper presents detailed mathematical and statistical properties of distribution, parameter estimation, and applications to various types of datasets. 

TI_2_0

Al-Aqtash, Raid

Marshall University 

Title

Generalized Distributions and Applications

The first speaker, Dr. Elkadry, presents his work that relates to Bayesian statistics with application to real life data. The other speakers, Drs. Aljarrah, Ahmed & Al-Aqtash, present their work on recently developed generalized statistical distributions with application to real data. 

TI_2_4

Al-Aqtash, Raid

Marshall University 

Title

On the Gumbel-Burr XII Distribution; Regression and Application 

Additional properties of the Gumbel-Burr XII distribution GBXII(L) are studied. We consider useful characterizations for the GBXII(L) distribution in addition to some structural properties including mean deviations and the distribution of the order statistics. A simulation study is conducted to assess the performance of the MLEs and then usefulness of the GBXII(L) distribution is illustrated by means of real data. A log-GBXII(L) regression model is proposed and a survival data is used in an application of the proposed regression model. 

TI_5_3

Aldeni, Mahmoud

Western Carolina University

Title

TX Family and Survival Models

We introduce a generalized family of lifetime distributions, namely, the uniform-R{generalized lambda} (U-R{GL}) and derive the corresponding survival models. Two members of this family are derived, namely, the U-Weibull{GL} (U-W{GL}), a generalized Weibull distribution, and U-loglogistic{GL} (U-LL{GL}), a generalized loglogistic distribution. The hazard function of U-R{GL} family can be monotonic, bathtub, upside-down bathtub, N-shaped, and bimodal shaped. The U-W{GL} distribution is applied to fit two lifetime data sets. The survival model, based on the U-W{GL} distribution, is applied to fit a right censored lifetime data set.

TI_2_2

Aljarrah, Mohammad A.

Tafila Technical University, Tafila, Jordan 

Title

A new generalized normal regression model.  

We develop a regression model using the new generalized normal distribution. Assuming censored data, maximum likelihood estimates for the model parameters are obtained. The implementation of this model is demonstrated through applications to censored survival data. A diagnostic analysis and a model check was performed based on martingale-type residuals. 

TI_1_1

Al-Labadi, Luai

University of Toronto, Mississauga

Title

A Bayesian Nonparametric Test for Assessing Multivariate Normality 

A novel Bayesian nonparametric test for assessing multivariate normal models is presented. The use of the procedure has been illustrated through several examples, in which the proposed approach shows excellent performance. 

TI_16_1

Al-Mofleh, Hazem

Tafila Technical University, Tafila, Jordan

Title

Wrapped Circular Statistical Distributions and Applications

Measurements in direction is common in science and real-life data observations. Therefore, a circular distribution with random angle is used to describe these phenomena. There are many techniques to getting a circular distribution form the underlying density function, one of the very effective techniques is called “wrapping”.

TI_31_3

Almohalwas, Akram

UCLA

Title

Analysis of Donald Trump's Twitter Data Using Text Mining and Social Network Analysis 

As the U.S. grows more accustomed to social media, it has started to be incorporated into many aspects of American life, thus, it becomes one of the most efficient “weapon” for politicians campaigning and communicating with people. One of the most famous examples is Donald Trump on Twitter. Twitter is one of the well-known social media tools, it has a huge size of data that needs to be swift through to get some insights into the owner of the Twitter account.

TI_5_1

Almomani, Ayman

Almomany Trade

Title

TX: The Extended Family

Consider two CDFs T and F with supports [0,1] and S, respectively, then G(x) = ToF(x) is a CDF whose support is S and whose parameters include both those of T and F.  The distribution T is called a complementary distribution and its choice is crucial in defining the distributional properties and moments of the newly generated G.  We investigate the connection between complementary distributions and the TX family and present different ways of extending the TX family through different choices of the function T. We make recommendations on how to select the appropriate T-transformations.

TI_14_3

Alzaatreh, Ayman

American University of Sharjah

Title

Truncated T-X family of distributions

The time and cost to start a business are highly related to the degree of transparency of business information, which strongly impacts the loss due to illicit financial flows. In order to study the distributional characteristics of time and cost to start a business, we introduce right-truncated and left-truncated T-X families of distributions. These families are used to construct new generalized families of continuous distributions. Relationships between the families are investigated. Real data sets including time and cost to start a business are analyzed and the results show that the truncated families perform very well for fitting highly skewed data.

TI_3_0

Alzaghal, Ahmad

State University of New York at Farmingdale

Title

Distributions and Applications

 

 

TI_37_2

Alzaghal, Ahmad

State University of New York at Farmingdale

Title

A Generalized Family of Lindley Distribution: Properties and Applications

In this talk, we introduce new families of generalized Lindley distribution, using the T-R{Y} framework, named T-Lindley family of distributions. The new families are generated using the quantile functions of uniform, exponential, Weibull, logistic, log logistic and Cauchy distributions. Several general properties of the T-Lindley family are studied in detail including moments, mean deviations, mode and Shannon’s entropy. Several new members of T-Lindley distributions are studied in more detail. The distributions in the T-Lindley family can be skewed to the right, symmetric, skewed to the left, or bimodal. A data set is used to demonstrate the flexibility and usefulness of the T-Lindley family of distributions.

TI_4_0

Amezziane, Mohamed

Central Michigan University 

Title

Models for Complex Data

Models for densities, spatial autoregressive inference, post selection inference and false discovery rate control.

TI_15_2

Andrews, Beth

Northwestern University

Title

Partially specified spatial autoregressive model with artificial neural network

For spatial modeling and prediction, we propose a spatial autoregressive model with nonlinear neural network component. This allows for model flexibility in describing the relationship between the dependent variable and covariates. We consider model/variable selection and use a maximum likelihood technique for parameter estimation. The estimators are consistent and asymptotically Normal under general conditions. Simulation results indicate the asymptotic theory holds for finite, large samples, and we use of our methods to model United States voting patterns.
This is joint work with Wenqian Wang (Northwestern University).

TI_6_0

Arslan, Olcay

Ankara University

Title

Some non-normal distributions and their applications in robust statistical analysis

In this topic-invited session, some non-Gaussian distributions used for modeling as alternatives to the normal distribution will be discussed and some new extensions of these distributions will be proposed.  Several different applications of these distributions will be given to demonstrate the performances of these distributions for conducting robust statistical analysis of data sets that may have non-normal empirical distributions. 

TI_6_1

Arslan, Olcay

Ankara University

Title

Multivariate Laplace and multivariate skewed Laplace distributions and their applications in robust statistical analysis 

In this study, we will consider multivariate Laplace distribution and its skew extension that  can be used alternatives to the multivariate normal or other multivariate distributions for modeling non-normal data sets.  One of the advantages of these distributions is that they can model thick-tailed and skew datasets and have a simpler form than other multivariate or skew multivariate   distributions.   Concerning the number of parameters, these distributions have the same number of parameters with the multivariate normal distribution and its skew extensions. This will be an advantage in terms of the parameter estimation.   We   will explore some properties of these 

distributions and study the parameter estimation via EM algorithm.  We will also discuss some applications to demonstrate the modeling strength of these distributions.  

TI_47_1

Aryal, Gokarna

Purdue University Northwest, Hammond, IN

Title

Transmuted-G Poisson Family 

In this talk, we present a new family of distributions called the Transmuted–G Poisson (TGP) family.  This family of distributions is constructed by using the genesis of the zero truncated Poisson (ZTP) distribution and the transmutation map. Some mathematical and statistical properties of TGP family are provided. The parameter estimation and simulation procedures are also discussed.  Usefulness of TGP family is illustrated by modeling couple of real-life data. 

TI_9_3

Babic, Sladana

Ghent University

Title

Comparison and classification of flexible distributions for multivariate skew and heavy-tailed data

We present, compare and classify the most popular families of flexible multivariate distributions. By flexible distribution we mean that, besides the usual location and scale parameters, the distribution has also both skewness and tail parameters. The following families are presented: elliptical distributions, skew-elliptical distributions, multiple scaled mixtures of multinormal distributions, multivariate distributions based on the transformation approach, copula-based multivariate distributions and meta-elliptical distributions. Our classification is based on the tail behavior (a single tail weight parameter or multiple tail weight parameters) and the type of symmetry (spherical, elliptical, central symmetry or asymmetry). We compare the flexible families both theoretically (comparing the relevant properties and distinctive features) and with a Monte Carlo study (comparing the fitting abilities in finite samples). 

TI_5_4

Bahadi, Taoufik

University of Tampa

Title

TX Family of Link functions for Binary Regression

The link function in binary regression is used to specify how the probability of success is linked to the model’s systematic component. These link functions are chosen to be quantile functions of popular distributions such as the logistic (logit), Gaussian (probit) and Gumbel (cloglog) distributions. We choose new flexible link functions from the TX family of distributions, build an inference framework for their regression models and derive a new model validation procedure.

TI_46_1

Bandyopadhyay, Tathagata

St. Ambrose University

Title

Inference problems in binary regression model with misclassified responses

The problem of predicting a future outcome based on the past and currently available samples arises in many applications. Applications of prediction intervals (PIs) based on continuous distributions are well-known. Compared to continuous distributions results on constructing PIs for discrete distributions are very limited. The problems of constructing prediction intervals for the binomial, Poisson and negative binomial distributions are considered here. Available approximate, exact and conditional methods for these distributions are reviewed and compared. Simple approximate prediction intervals based on the joint distribution of the past samples and the future sample are proposed. Exact coverage studies and expected widths of prediction intervals show that the new prediction intervals are comparable to or better than the available ones in most cases.

TI_7_3

Baron, Michael

American University

Title

Sequential testing and post-analysis of credibility 

Actuaries routinely make decisions that are sequential in nature. During each insured period, the new claims and losses data are collected, and together with the new economic and financial situation and other factors, they are taken into account for the calculation of revised premiums and risks. This talk focuses on the assessment of credibility, estimation of credibility factors, and testing for full credibility based on sequentially collected actuarial data. Proposed sequential tests for full credibility control the overall error rate and power. They result in a rigorous set of conditions under which an insured cohort becomes fully credible. Following sequential decisions, methods are developed for the computation of sequential p-values. Inversion of the derived sequential test leads to a construction of a sequence of repeated confidence intervals for the credibility factor. Methods are detailed for Gamma, Weibull, and Pareto loss distributions and applied to CAS Public Loss Simulator data sets. 

TI_9_2

Bekker, Andriette

University of Pretoria, South Africa.

Title

Class of matrix variate distributions: a flexible approach based on the mean-mixture of normal model 

Limited research has been conducted on matrix variate data that can describe skewness present in data. This paper introduces a new class of matrix variate distributions based on the mean-mixture of normal (MMN) model. The properties of the new matrix variate class - stochastic representation, moments and characteristic function, linear and quadratic forms as well as marginal, conditional distributions are investigated. Three special cases including the restricted skew-normal, exponentiated MMN and the half-normal exponentiated MMN matrix variate distributions are highlighted. An EM-algorithm is implemented to obtain maximum likelihood estimates of the parameters. The usefulness and practical utility of the proposed methodology are illustrated using two conducted simulation studies. To investigate the performance of the developed model in the real-world analysis, Landsat satellite data (LSD) originally, obtained from NASA, are used. Numerical results show that the new models, within this proposed class, performed well when applied to skewed matrix variate experimental data.  

TI_15_0

Berrocal, Veronica

University of California Irvine

Title

Comparing Spatial Fields

In weather forecast verification, the need for more advanced methods for analyzing high-resolution forecasts prompted a lot of new methodology to be introduced; largely from image analysis and computer vision, some from spatial statistics.  In this genre, it is important to capture information about how similar features within the fields are, and there has not been much, if any, work done on statistical inference in this arena, which is a more general topic than just weather forecast verification.  Deciding on how close, or far away, two spatial fields are in some context is an important question in many areas of research.

TI_15_1

Berrocal, Veronica

University of California Irvine

Title

Comparing spatial fields to detect systematic biases in regional climate models

Since their introduction in 1990, regional climate models (RCMs) have been widely used to study the impact of climate change on human health, ecology, and epidemiology. To ensure that the conclusions of impact studies are well founded, it is necessary to assess the uncertainty in RCMs. This is not an easy task because two major sources of uncertainties can undermine an RCM: uncertainty in the boundary conditions needed to initialize the model and uncertainty in the model itself. Using climate data for Southern Sweden over 45 years, in this paper, we present a statistical modeling framework to assess an RCM driven by analyses. More specifically, our scientific interest here is determining whether there exist time periods during which the RCM inconsideration displays the same type of spatial discrepancies from the observations. The proposed model can be seen as an exploratory tool for atmospheric modelers to identify time periods that require a further in-depth examination. Focusing on seasonal average temperature, our model relates the corresponding observed seasonal fields to the RCM output via a hierarchical Bayesian statistical model that includes a spatio-temporal calibration term. The latter, which represents the spatial error of the RCM, is in turn provided with a Dirichlet process prior, enabling clustering of the errors in time. We apply our modeling framework to data from Southern Sweden spanning the period 1 December 1962 to 30 November 2007, revealing intriguing tendencies with respect to the RCM spatial errors of seasonal average temperature.

TI_4_2

Bhattacharjee, Abhishek

University of Northern Colorado

Title

Empirical Bayes Intervals for the Selected Mean

Empirical Bayes (EB) methods are very useful for post selection inference. Following Datta et al. (2002), construction of EB confidence intervals for the selected population mean will be discussed in this presentation. The EB intervals are adjusted to achieve the target coverage probabilities asymptotically up to the second order. Both unconditional coverage probabilities of EB intervals and corresponding probabilities conditional on ancillary statistics are found.

TI_27_1

Bonner, Simon

University of Western Ontario 

Title

Modelling Score Based Data from Photo-Identification Studies of Wild Animals 

Photographic identification has become an invaluable tool for studying populations of animals that are hard to follow in the wild. Photographs are often compared in-silico with computer algorithms that produce continuous scores which are then classified to identify matches based on some predefined cut-off. This process is prone to errors (false positive or negative matches) which bias estimates of the population’s demographics. We present a general framework for modelling photo-id data based on the raw scores, describe the Bayesian framework for fitting this model, discuss computational issues, and present an application to a long-term study of whale sharks (Rhincodon typus).  

TI_7_0

Brazauskas, Vytaras

University of Wisconsin-Milwaukee 

Title

Actuarial Statistics 

In this session, we will discuss several statistical methodological techniques that appear in actuarial studies, including credibility, modeling of random variables affected by coverage modifications and dependence, and non-standard distributions relevant to insurance data.  

TI_7_4

Brazauskas, Vytaras

University of Wisconsin-Milwaukee 

Title

Modeling severity and measuring tail risk of Norwegian fire claims 

The probabilistic behavior of the claim severity variable plays a fundamental role in calculation of deductibles, layers, loss elimination ratios, effects of inflation, and other quantities arising in insurance. Among several alternatives for modeling severity, the parametric approach continues to maintain the leading position, which is primarily due to its parsimony and flexibility. In this paper, several parametric families are employed to model severity of Norwegian fire claims for the years 1981 through 1992. The probability distributions we consider include: generalized Pareto, lognormal-Pareto (two versions), Weibull-Pareto (two versions), and folded-t. Except for the generalized Pareto distribution, the other five models are fairly new proposals that recently appeared in the actuarial literature. We use the maximum likelihood procedure to fit the models and assess the quality of their fits using basic graphical tools (quantile-quantile plots), two goodness-of-fit statistics (Kolmogorov-Smirnov and Anderson-Darling), and two information criteria (AIC and BIC). In addition, we estimate the tail risk of 'ground up' Norwegian fire claims using the value-at-risk and tail-conditional median measures. We monitor the tail risk levels over time, for the period 1981 to 1992, and analyze predictive performances of the six probability models. In particular, we compute the next-year probability for a few upper tail events using the fitted models and compare them with the actual probabilities. 

TI_16_4

Broniatowski, Michel

Université Pierre et Marie Curie (Sorbonne Université)

Title

A review on divergence-based inference in parametric and semiparametric models

The Csiszar class of divergences has the main advantage to fit to both parametric and non-parametric settings, in contrast with other classes of dissimilarity indexes. Starting from the dual representation for Csiszar divergences the talk will fist provide a unified treatment for parametric inference, with some accent to non-regular models, as occurs for the number and the nature of components in mixture models. We will then turn to semi parametric models of two kinds: firstly, we will consider mixtures with a parametric component and a nonparametric one, a useful class of models for applications. Other 

semi parametric models defined by moment conditions have been widely considered in the present literature, rooting in the wellknown empirical likelihood paradigm (Owen 1980). We will show that divergence based approaches can be applied in semiparametric models de.ned by conditions on moments of L-statistics; typical examples are provided when considering models defined as neighborhoods of parametric classes, such as Weibull or Pareto ones, when those neighborhoods are defined through conditions on their first L moments. The basic dual representation of divergences 

in parametric and non arametric models have been considered independently by Liese and Vajda (2006) and Broniatowski and Keziou (2006,2009). Semi parametric mixtures have been considered in the frame of Csiszar divergence-based inference in Al Mohamad and Bumahdaf (2016), and inference under L-moment conditions have been studied by Broniatowski and Decurninge (2017). 

TI_6_3

Bulut, Yakup Murat

Eskişehir Osman Gazi University

Title

Matrix variate extensions of symmetric and skew Laplace distributions: Properties, parameter estimation and applications  

In this work, we introduce symmetric and skew matrix variate Laplace distributions using mixture approaches. To obtain symmetric version of the matrix variate Laplace distribution, we use scale mixture approach. To drive a skew version of the matrix variate Laplace distribution, we apply the variance-mean mixture approach.  Some statistical properties of newly defined distributions are investigated. Further, we give EM based algorithm to estimate the unknown parameters. A small simulation study and a real data example are given to explore the performance of the proposed algorithm for finding the parameter estimates and also to illustrate the capacity of the proposed distribution for modeling matrix variate data sets.  

TI_27_3

Burkett, Kelly

University of Ottawa

Title

Markov chain Monte Carlo sampling of gene genealogies conditional on genotype data from trios 

To discover genetic associations with disease, it is useful to model the latent ancestral trees (gene genealogies) that gave rise to the observed genetic variability. Though the true tree is unknown, we model its distribution conditional on observed genetic data and use Monte Carlo methods to sample from this distribution. In this presentation, I first describe my sampler, ‘sampletrees’, that conditions on data from unrelated individuals. I then discuss an extension to the algorithm when the observed data is from trios, consisting of two parents and a child. Finally, as illustration, the trio-based sampler will be applied to real data. 

TI_06_2

Çelikbıçak, Müge B.

Gendermarie and Coast Guard Academy 

Title

Parameter Estimation in MANOVA with Repeated Non-normal Measures

Repeated measures design which multiple observations are made on each experimental unit play an important role in the health and behavioral sciences. In these designs, there are many methods to the analysis of repeated measures data. Statistically the difference between these methods is in the assumptions underlying the models. Many of these methods are based on normality assumptions. In this study, we introduce an alternative non-normal distribution as a scale-mixture of normal distribution to analyze multivariate repeated measure data. We use EM algorithm to obtain maximum likelihood estimators of parameters of analysis of variance model for multivariate repeated measure.

TI_19_3

Chacko, Manjo

University of Kerala, India

Title

Bayesian Analysis of Weibull distribution based on Progressive type-II Censored Competing Risks Data

In this work, we consider the analysis of competing risk data under progressive type-II censoring by assuming the number of units removed at each stage is random and follows a binomial distribution. Bayes estimators are obtained by assuming the population under consider follows a Weibull distribution. A simulation study is carried out to study the performance of the different estimators derived in this paper. A real data set is also used for illustration

TI_11_4

Chaganty, Rao

Old Dominion University 

Title

Models for selecting differentially expressed genes in microarray experiments 

There have been many advances in microarray technology, enabling researchers to quantitatively analyze expression levels of thousands of genes simultaneously. Two types of microarray chips are currently in practice - the spotted cDNA chip developed by microbiologists at Stanford University in the mid-1990’s and the oligonucleotide array first commercially released by Affymetrix Corporation in 1996. Our focus is on the spotted cDNA chip, which is more popular than the later microarray. In a cDNA microarray, or “two-channel array,” the experimental sample is tagged with red dye and hybridized along with a reference sample tagged with green dye on a chip which consists of thousands of spots. Each spot contains preset oligonucleotides. The red and green intensities are measured at each spot by using a fluorescent scanner. In this talk, we aim to discuss bivariate statistical models for the red and green intensities, which enable us to select differentially expressed genes.  

TI_41_1

Chang, Won

University of Cincinnati 

Title

Ice Model Calibration using Semi-continuous Spatial Data 

Rapid changes in Earth's cryosphere caused by human activity can lead to significant environmental impacts. Computer models provide a useful tool for understanding the behavior and projecting the future of Arctic and Antarctic ice sheets. However, these models are typically subject to large parametric uncertainties due to poorly constrained model input parameters that govern the behavior of simulated ice sheets. Computer model calibration provides a formal statistical framework to reduce and quantify the uncertainty due to such parameters. Calibration of ice sheet models is often challenging because the relevant model output and observational data take the form of semi-continuous spatial data, with a point mass at zero and a right-skewed continuous distribution for positive values. The current calibration approaches cannot readily handle such data type. Here we introduce a hierarchical latent variable model that sequentially handles binary spatial patterns and positive continuous spatial patterns in two stages. To overcome challenges due to high-dimensionality we use likelihood-based generalized principal component analysis to impose low-dimensional structures on the latent variables for spatial dependence. We demonstrate that our proposed reduced-dimension method can successfully overcome the aforementioned challenges in the example of calibrating PSU-3D ice model for the Antarctic ice sheet and provide improved future ice-volume change projections. 

TI_8_0

Chatterjee, Arpita

Georgia Southern University 

Title

Statistical Advancements in Health Sciences 

Statistics plays a pivotal role in research, planning, and decision-making in the health sciences. In recent years there has been an increasing interest for new statistical methodologies in the field of biomedical sciences. This session will address statistical advances to explore complex data emerging from non-inferiority clinical trials and microarray experiments. 

TI_8_4

Chatterjee, Arpita

Georgia Southern University 

Title

An Alternative Bayesian Testing to Establish Non-inferiority.  

Noninferiority clinical trials have gained immense popularity within the last decades. Such trials are designed to demonstrate that a new experimental drug is not unacceptably worse than an active control by more than a pre-specified small margin. Three-arm non-inferiority trials have been widely acknowledged as the Gold Standard because they can simultaneously establish both non-inferiority and the assay sensitivity. Bayesian testing, based on the posterior probability, for Non-inferiority trials, have already been established in the context of continuous and count data. We propose a Bayesian non-inferiority test based on Bayes factors. The performance of our proposed test is demonstrated through simulated data.

TI_22_0

Chen, Din (Org Lio, Yuhlong)

University of North Carolina at Chapel Hill

Title

Statistical Modeling for Degradation Data I

In recent years, statistical modeling and inference techniques have been developed based on different degradation measures. This invited session is based on the book “Statistical Modeling for Degradation Data” co-edited by Professors Ding-Geng (Din) Chen, Yuhlong Lio, Hon Keung Tony Ng, Tzong-Ru Tsai, published by Springer in 2017.  The book strives to bring together experts engaged in statistical modeling and inference to present and discuss the most recent important advances in degradation data analysis and related applications.  The speakers in this session are invited to contribute to this book and further present their recent development in this research area.

TI_32_1

Chen, Din

University of North Carolina at Chapel Hill

Title

Homoscedasticity in the Accelerated Failure Time Model 

The semiparametric accelerated failure time (AFT) model is a popular linear model in survival analysis.  Current research based on the AFT model assumed homoscedasticity of the survival data. Violation of this assumption has been shown to lead to inefficient and even unreliable estimation, and hence, misleading conclusions for survival data analysis. However, there is no valid statistical test in the literature that can be utilized to test this homoscedasticity assumption. This talk will discuss a novel quasi-likelihood ratio test for the homoscedasticity assumption in the AFT model. Simulation studies are conducted to show the satisfactory performance of this novel statistical test. A real dataset is used to demonstrate the application of this developed test. 

TI_9_1

Chen, Ding-Geng

University of Pretoria, South Africa.

Title

A statistical distribution for simultaneously modeling skewness, kurtosis and bimodality 

In our funded research on cusp catastrophe modelling supported by USA NIH R01, we revitalized a family of distributions defined as f(x, α,β)=φ×exp[αx+12βx2−14x4] where α is the asymmetry parameter,  β is the bifurcation parameter and the φ is the normalizing constant. This distribution is from the cusp catastrophe theory and was developed in the early 1970s by Rene Thom (Thom, R. 1975. Structural stability and morphogenesis. New York, NY: Benjamin-Addison-Wesley.) as part of the catastrophe theory in topographic research which included 7 elementary catastrophes (e.g., Fold, Cusp, Swallowtail, Elliptic Umbilic, Hyperbolic Umbilic, Butterfly, and Parabolic Umbilic). This distribution also belongs to the classical exponential family which can be used to statistically analyze data with skewness, kurtosis and bimodal simultaneously. In this talk, we will show the properties of this distribution and the parameter estimation with the theory of maximum likelihood estimation. We further demonstrate the applications of this distribution to analyze real data.

TI_21_2

Chen, Guangliang

San Jose State University

Title

All data are "documents": A scalable spectral clustering framework based on landmark points and cosine similarity 

We present a unified scalable computing framework for various versions of spectral clustering. We first consider the special setting of cosine similarity for clustering sparse or low-dimensional data and show that in such cases, spectral clustering can be implemented without computing the weight matrix. Next, for general similarity, we introduce a landmark-based technique to convert the given data (and the selected landmarks) into a “document-term” matrix and then apply the scalable implementation of spectral clustering with cosine similarity to cluster them. We demonstrate the performance of our proposed algorithm on several benchmark data sets while comparing it with other methods. 

TI_10_2

Cheng, Chin-I

Central Michigan University 

Title

Bayesian estimators of the Odd Weibull distribution with actuarial application

The Odd Weibull distribution is a three-parameter generalization of the Weibull and the inverse Weibull distributions. The Bayesian approach with Jeffreys-type informative prior for estimating parameters of the Odd Weibull are considered. The propriety of the posterior distribution with proposed prior is provided. The Metropolis-Hastings algorithm and Adaptive Rejection Metropolis Sampling (ARMS) are adapted to generate random samples from full conditionals for inferences on parameters. The estimates based on Bayesian and maximum likelihood with application in actuarial dataset are compared.

TI_47_3

Chhetri, Sher B.

University of South Carolina, Sumter

Title

On the Beta-G Poisson Family  

In this talk, we present a new family of distributions which is defined by using the genesis of the truncated Poisson distribution and the beta distribution. Some mathematical properties of the new family will be discussed. We also discuss the parameter estimation procedures and potential applications of such generalized family of distributions. 

TI_9_0

Coelho, Carlos Agra

Universidade Nova de Lisboa, Portugal 

Title

Contemporary Methods in Distribution Theory and Likelihood Inference 

Recent results in the areas of Distribution Theory and Likelihood Inference that will be presented include: distributions adequate for simultaneously modeling skewness, kurtosis and bimodality, as well as multivariate skewness and heavy tails and yet likelihood ratio tests for elaborate covariance structures, based on samples of random sizes. 

TI_10_0

Cooray, Kahadawala

Central Michigan University 

Title

Parametric models for Actuarial Applications 

This session presents a new copula to account for negative association with financial application, a new Pareto extension with applications to insurance data, new copula families by distorting the existing copulas with applications in financial risk management, and Bayesian estimation of the Odd Weibull parameters with applications to insurance data. 

TI_15_4

Daniels, John

Central Michigan University

Title

Seeing RED:  A New Statistical Solution to an Old Categorical Data Problem

Dental morphological traits (DMT) are often used to conduct inference on cultural populations.  Often, the statistical “distance” between various populations is described using techniques such as Mean Measure of Divergence (MMD) or pseudo-Mahalanobis D2.  These techniques, although common in Anthropology Research, have some significant drawbacks.  First, MMD requires data compression into a dichotomized presence/absence indication at some arbitrary cutoff point.  Second, the total sample size will be reduced in the presence of any missing values.  This can be problematic with compromised or smaller data sets.  A newly developed non-parametric method, Robust Estimator of Differences (RED) is proposed as a viable alternative.  Utilizing both actual data and simulated data (with a known relationship), we will use both PCA and Cluster Analysis to determine the relationships between various cultural groups.  The results will show that RED can outperform either method and is a viable alternative for Anthropologists to consider.

TI_46_4

Davies, Katherine

University of Manitoba

Title

Progressively Type-II Censored Competing Risks Data from the Linear Exponential Distribution

Across different types of lifetime studies, whether it be in the medical or engineering sciences, the possibility of competing causes of failures needs to be addressed. Typically referred to as competing risks, in this paper we consider progressively type-II censored competing risks data when the lifetimes are assumed to come from a linear exponential distribution. We develop likelihood inference and demonstrate the performance of the estimators via an extensive Monte Carlo simulation study. We also provide an illustrative example using a small data set.

TI_20_3

Davila, Victor Hugo Lachos

University of Connecticut

Title

Finite mixture modeling of censored data using the multivariate skew-normal distribution

Longitudinal HIV-1 RNA viral load measures are often subjected to censoring due to upper and lower detection limits depending on the quantification assays. A complication arises when these continuous measures present a heavy-tailed behavior because inference can be seriously affected by the misspecification of their parametric distribution. For such data structures, we propose a robust nonlinear censored regression model based on the scale mixtures of normal distributions. For taking into account the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is considered. A stochastic approximation of the EM algorithm is developed to obtain the maximum likelihood estimates of the model parameters. The main advantage of this new procedure allows us to estimate the parameters of interest and evaluate the log-likelihood function in an easy and fast way. Furthermore, the standard errors of the fixed effects and predictions of unobservable values of the response can be obtained as a by-product. The practical utility of the proposed method is exemplified using both simulated and real data. 

TI_19_2

Dharmaja, S.H.S.

Govt. College for Women, Trivandrum, India

Title

On logarithmic Kies distribution

In this paper we consider a logarithmic form of the Kies distribution and discuss some of its important properties. We derive explicit expressions for its percentile measures, raw moments, reliability measures etc. and attempted the maximum likelihood estimation of the parameters of the distribution. Certain real-life applications are also considered for illustrating the usefulness of the proposed distribution compared to existing models. Also, the asymptotic behaviour of likelihood estimators are studied by using simulated data sets.

TI_11_0

Diawara, Norou

Old Dominion University 

Title

Statistical Methods for Space and Time Applications

 

 

TI_15_3

Diawara, Norou

Old Dominion University 

Title

Density Estimation of Spatio-temporal Point Patterns using Moran's Statistic

In this paper, an Inflated Size-biased Modified Power Series Distributions (ISBMPSD), where inflation occurs at any of the support points is studied. This class include among others the size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution as its particular cases. We obtain the recurrence relations among ordinary, central and factorial moments. The maximum likelihood and Bayesian estimation of the parameters of the Inflated Size-biased MPSD is obtained. As special cases, results are extracted for size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution. Finally, an example is presented for the size-biased generalized Poisson distribution to illustrate the results and a goodness of fit test is done using the maximum likelihood and Bayes estimators.

TI_43_1

Dong, Yuexiao

Temple University 

Title

On dual model-free variable selection with two groups of variables 

In the presence of two groups of variables, existing model-free variable selection methods only reduce the dimensionality of the predictors. We extend the popular marginal coordinate hypotheses (Cook, 2004) in the sufficient dimension reduction literature and consider the dual marginal coordinate hypotheses, where the role of the predictor and the response is not important. Motivated by canonical correlation analysis (CCA), we propose a CCA-based test for the dual marginal coordinate hypotheses and devise a joint backward selection algorithm for dual model-free variable selection. The performances of the proposed test and the variable selection procedure are evaluated through synthetic examples and a real data analysis. 

TI_33_4

Duval, Francis

Université du Québec à Montréal (UQAM) 

Title

Gradient Boosting-Based Model for Individual Loss Reserving 

Modeling based on data information is one of the most challenging research topics in actuarial science. Statistical learning approaches offer a set of tools that could be used to evaluate loss reserves in an individual framework. In this talk, we contrast some traditional aggregate techniques with individual models based on both parametric and gradient boosting algorithms. These models use information about each of the payments made for each of the claims in the portfolio, as well as characteristics of the insured. We provide an example based on a dataset from an insurance company and we discuss some points related to practical applications. 

TI_1_3

El Ktaibi, Farid

ZAYED university, UAE  

Title

Bootstrapping the Empirical Distribution of a Stationary Process with Change-point 

When detecting a change-point in the marginal distribution of a stationary time series, bootstrap techniques are required to determine critical values for the tests when the pre-change distribution is unknown. In this presentation, we propose a sequential moving block bootstrap and demonstrate its validity under a converging alternative. Furthermore, we demonstrate that power is still achieved by the bootstrap under a non-converging alternative. These results are applied to a linear process and are shown to be valid under very mild conditions on the existence of any moment of the innovations and a corresponding condition of summability of the coefficients. 

TI_2_1

Elkadry, Alaa

Marshall University

Title

Analyzing Continuous Randomized Response Data with an Indifference-Zone Selection Procedure

A randomized response model applicable to continuous data that considers a mixture of two normal distributions is considered. The target here is to select the population with the best parameter value. A study on how to choose the best population between k distinct populations using an indifference-zone procedure is provided. Also, the operating characteristics (OCs) of a subset ranking and selection procedure are derived for the randomized response model for continuous data considered. The operating characteristics for the subset selection procedures are considered for two parameter configurations, the slippage configuration and the equi-spaced configuration. 

TI_23_2

Ferreira, Johan

University of Pretoria

Title

Alternative Dirichlet priors for estimation of Shannon entropy using countably discrete likelihoods

Claude Shannon‘s seminal paper “A Mathematical Theory of Communication” is widely considered as the basis of information theory. Shannon entropy is a functional of a probability structure and is a measurement of information contained in a system. It has been applied as a cryptographic measure for a key generator module, for mining part of the security of the cipher system. In a machine-learning context, entropy is used to define an error function as part of the learning of weights in multilayers perceptron in neural networks. The practical problem of estimating entropy from samples (sometimes small samples) in many applied settings remains a challenging and relevant problem. In this presentation, previously unconsidered Dirichlet generators are introduced as possible priors for an underlying countably discrete model (in particular, the multinomial model). Resultant estimators for the entropy H(p) under the considered priors and assuming squared error loss will be presented. Particular cases of these proposed priors will of interest and their effect on the estimation of entropy subject to different parameter scenarios will be investigated.

TI_44_4

Fisher, Thomas

Miami University 

Title

A split and merge strategy to variable selection  

The curse of dimensionality, where p is large relative to n, is a well-known problem that can affect variable selection methods as well as model performance. We consider an algorithm similar to k-fold cross-validation where we segment the feature variables into subsets, variable selection (LASSO or others) is performed within the subset and the final set of selected variables is aggregated for a final model. Simulations show that this approach has comparable performance to standard techniques with the added benefit of improved computational run time. The method easily can be parallelized for further improved efficiency.

TI_12_0

Flegal, James M.

University of California, Riverside 

Title

Advances in Bayesian Theory and Computation 

Bayesian computation remains an active theoretical and practical research area.  Talks in this session consider Bayesian penalized regression models under a unified framework, locally adaptive shrinkage in the Bayesian framework, weighted batch means variance estimators for MCMC output analysis, and recent developments concerning a graph-based Bayesian approach to semi-supervised learning. 

TI_12_3

Flegal, James M.

University of California, Riverside 

Title

Weighted batch means estimators in Markov chain Monte Carlo 

We propose a family of weighted batch means variance estimators, which are computationally efficient and can be conveniently applied in practice. The focus is on Markov chain Monte Carlo simulations and estimation of the asymptotic covariance matrix in the Markov chain central limit theorem, where conditions ensuring strong consistency are provided. Finite sample performance is evaluated through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic regression examples, where the new estimators show significant computational gains with a minor sacrifice in variance compared with existing methods. 

TI_11_3

Fofana, Demba

University of Texas Rio Grande Valley

Title

Combining Assumptions and Graphical Network into Gene Expression Data Analysis 

Analyzing properly gene expression data is a daunting task that requires taking both assumptions and network relationships among genes into consideration. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be better analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Our method that combines assumptions and network information into the analysis is shown to be more powerful. We will provide an R package to help use this approach. 

TI_31_2

Galoppo, Travis + Kogan, Clark

ABB US Corporate Research

Title

A GPU Enhanced Bayesian Ordinal Logistic Regression Model of Hospital Antimicrobial Usage

Bayesian data analysis has a high computational demand, with a critical bottleneck in the evaluation of data likelihood. When data samples are independent, there is significant opportunity for parallelization of the data likelihood calculation. We demonstrate a prototype GPU enhanced Gibbs sampler implementation using NVIDIA CUDA, applying a Bayesian ordinal logistic regression to a large dataset of antimicrobial usage in hospitals. Our implementation offloads only the data likelihood calculation to the GPU, while maintaining the core sampling logic on the CPU. We compare our results to other popular software packages, both to verify correctness and to showcase performance.

TI_22_2

Gao, Yong

Ohio University

Title

A Hierarchical Bayesian Bi-exponential Wiener Process for Luminosity Degradation of Display Products

This presentation will discuss a nonlinear Wiener process degradation model for analyzing the luminosity degradation of display products. To account for the nonlinear two-phase pattern in the observed degradation paths, we assume the bi-exponential function as the drift function of the Wiener process degradation model. The hierarchical Bayesian modeling framework is adopted to construct the model. The failure-time distribution of a unit randomly selected from the population is obtained.  Prediction results are compared to the results from two alternative models, a bi-exponential degradation-path model and a time-scale transformed linear Wiener process.

TI_13_0

George, Olusegun

The University of Memphis

Title

Exchangeability in Statistical Inference - Theory and Applications

It is well documented that exchangeability is at the heart of statistical inference.   The ground-breaking representation theorem of De Finetti (1931) on infinite exchangeability has had profound impact in the modeling clustered data.   This special session is dedicated recent applications of finite and infinite exchangeability to analysis of clustered data. 

TI_5_0

George, Tyler (Org - Amezziane,M.)

Central Michigan University 

Title

TX Family: Extensions and Inference

TX family is a class of families formed through the compounding of distributions. Such operation allows the generated distribution to inherit the parameters of the compounded distributions but not necessarily their properties. This session explores different problems that can be solved using the flexibility of the TX distributions.

TI_14_0

Ghosh, Indranil

University of North Carolina, Wilmington

Title

Probability and Statistical models with applications

This session represents some of the recent developments and some of the noteworthy results in distribution theory (both in discrete and in the continuous paradigm).  In addition, several application(s) and a through discussion on the associated statistical inference are also discussed.  

TI_32_2

Ghosh, Indranil

University of North Carolina, Wilmington

Title

Bivariate Beta and Kumaraswamy Models developed using the Arnold-Ng Bivariate Beta Distribution 

In this paper we explore some mechanisms for constructing bivariate and multivariate beta and Kumaraswamy distributions. Specifically, we focus our attention on the Arnold-Ng (2011) eight parameter bivariate beta model. Several models in the literature are identified as special cases of this distribution including the Jones-Olkin-Liu-Libby-Novick bivariate beta model, and certain Kotz and Nadarajah bivariate beta models among others. The utility of such models in constructing bivariate Kumaraswamy models is investigated. Structural properties of such derived models are studied. Parameter estimation for the models is also discussed. For illustrative purposes, a real-life data set is considered to exhibit the applicability of these models in comparison with rival bivariate beta and Kumaraswamy models.  

TI_8_1

Ghosh, Santu

Medical College of Georgia, Augusta University 

Title

Two-sample Tests for High Dimensional Means with Prepivoting and Random Projection 

Within the medical field, the demand to store and analyze small sample, large variable data has become ever-abundant. Several two-sample tests for equality of means, including the revered Hotelling's T2 test, have already been established when the combined sample size of both populations exceeds the dimension of the variables. However, tests such as Hotelling's T2 become either unusable or output small power when the number of variables is greater than the combined sample size. We propose a test using both pre-pivoting and an Edgeworth expansion that maintains high power in this higher dimensional scenario, known as the ``large p small n" problem. Our test's finite sample performance is compared with other recently proposed tests designed to also handle the large p small n situation. We apply our test to a microarray gene expression data set and report competitive rates for both power and Type-I error.

TI_14_1

Ghosh, Souparno

Texas Tech University 

Title

Coherent Multivariate Feature Selection and Inference across multiple databases 

Random forest (RF) has become a widely popular prediction generating mechanism. Its strength lies in its flexibility, interpretability and ability to handle large number of features, typically larger than the sample size. However, this methodology is of limited use if one wishes to identify statistically significant features. Several ranking schemes are available that provide information on the relative importance of the features, but there is a paucity of general inferential mechanism, particularly in a multivariate set up. We use the conditional inference tree framework to generate a RF where features are deleted sequentially based on explicit hypothesis testing. The resulting sequential algorithm offers an inferentially justifiable, but model-free, variable selection procedure. Significant features are then used to generate predictive RF. An added advantage of our methodology is that both variable selection and prediction are based on conditional inference framework and hence are coherent. Next, we extend this methodology to model paired observations obtained from two pharmacogenomics databases where the predictors are measured under different experimental protocols. Instead of simply taking the average of the paired predictors, we offer a latent variable approach that can impute over the databases and then perform variable selection over the full set of paired samples across the databases. We illustrate the performance of our Sequential Multi-Response Feature Selection approach through simulation studies and finally apply this methodology on Genomics of Drug Sensitivity for Cancer and Cancer Cell line Encyclopedia databases to identify genetic characteristics that significantly impact drug sensitivities. Significant set of predictors obtained from our method are further validated from biological perspective. 

TI_26_3

GunasekeraSumith  

The University of Tennessee at Chattanooga 

Title

On Estimating the Reliability in a Multicomponent System based on Progressively-Censored Data from Chen Distribution 

This research deals with the classical, Bayesian, and generalized estimation of stress-strength reliability parameter, R_{s, k} =Pr (At least s of the (X_{1}, X_{2},...,X_{k}) exceed Y) = Pr (X_{k-s+1:k}>Y) of an s-out-of-k: G multicomponent system, based on progressively type-II right censored samples with random removals when stress (Y) and strength (X) are two independent Chen random variables.  Under squared-error and LINEX loss functions, Bayes estimates are developed by using Lindley's approximation and the Markov Chain Monte Carlo method. Generalized estimates are developed by using generalized variable method while classical estimates, the maximum likelihood estimators, their asymptotic distributions, asymptotic confidence intervals, bootstrap-based confidence intervals - are also developed. A simulation study and a real-world data analysis are given to illustrate the proposed procedures. The size of the test adjusted and unadjusted power of the test, coverage probability and expected confidence lengths of the confidence intervals, and biases of the estimators are also computed and compared and contrasted.  

TI_3_2

Hamdan, Hasan

James Madison University 

Title

Approximating and Characterizing Infinite Scale Mixtures 

In this talk, an efficient method for approximating any infinite scale mixture by a finite scale mixture up a specified tolerance level will be presented. Then this method will be applied to approximate many common classes of infinite scale mixtures. In particular, the method will be used to approximate infinite scale mixtures of normals, infinite scale mixtures of exponentials and infinite scale mixtures of uniforms. Several important results related to infinite scale mixtures will be presented with the focus on scale mixtures of normals. An extension to the multivariate infinite scale mixtures and to the class of infinite scale-location will be discussed. 

TI_3_1

Hamed, Duha

Winthrop University 

Title

New Families of Generalized Lomax Distributions: Properties and Applications

In this talk, we propose some families of generalized Lomax distributions named T-Lomax{Y} by using the methodology of the T-R{Y} framework. The T-Lomax{Y} families introduced arise from the quantile functions of exponential, logistic, log-logistic and Weibull distributions. The shapes of these T-Lomax{Y} distributions vary between unimodal and bimodal. Various structural properties of the new families are derived including moments, modes and Shannon entropies. Several new generalized Lomax distributions are studied and the estimation of the model parameters for a member of the new defined families of distributions is performed by the maximum likelihood method. An application of real data set is used to demonstrate the flexibility of this family of distributions.  

TI_16_0

Hannig, Jan (organizer: Jana Jureckova

The Czech Academy of Sciences,  Charles University

Title

Nonlinear Functionals of Probability Distributions 

The talks of the session characterize and estimate various functionals of probability distributions, that are not only parameters, but which also analyze the shape of the distribution and its relation to other distributions, as their mutual dependence or the divergence. 

TI_16_3

Hannig, Jan

University of North Carolina at Chapel Hill 

Title

Model Selection without penalty using Generalized Fiducial Inference 

Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model.  However, coefficient estimates are unreliable when, for example, the design matrix is collinear.  To overcome this challenge an entirely new perspective on variable selection is presented within a generalized fiducial inference framework. We apply this idea to two different problems. First, this new procedure is able to effectively account for linear dependencies among subsets of covariates in a high-dimensional regression setting. Second, we apply our variable selection method to the sparse vector AR(1). 

TI_35_3

He, Wenqing

Western University

Title

Perturbed Variance Based Null Hypothesis Tests with An Application to Clayton Models 

Null hypothesis tests are popularly used when there is no appropriate alternative hypothesis available, especially in model assessment where the assumed model is evaluated with no model being considered an alternative. Motivated by the test of the Clayton models in multivariate survival analysis, a simple perturbed variance resampling method is proposed for null hypothesis testing. The proposed methods make use of the perturbation method to estimate the covariance matrix of the estimator to avoid intractable variance estimate for the estimator. The proposed tests enjoy the simplicity and theoretical justification.  We apply the proposed method to modify the tests for the assessment of Clayton models.  The proposed methods have simpler procedures than both the parametric bootstrap and the nonparametric bootstrap and present promising performance as shown in the simulation studies.  A colon cancer study further illustrates the proposed methods. 

TI_33_3

Herrmann, Klaus

University of Sherbrooke

Title

The Extreme Value Limit Theorem for Dependent Sequences of Random Variables 

Extreme value theory is concerned with the limiting distribution of location-scale transformed block-maxima Mn(X1, …, Xn) of a sequence of identically distributed random variables (Xi)ni=1 defined on a common probability space (Ω,F,P). In case Xi, i N, are independent, the weak limiting behaviour of appropriately location-scale transformed Mn is adequately described by the classical Fisher-Tippett-Gnedenko theorem. In this presentation we are interested in the case of dependent random variables Xi, i N, while keeping a common marginal distribution function F for all Xi, i N. As dependence structures we consider Archimedean copulas and discuss the connection between block-maxima and copula diagonals. This allows one to derive a generalization of the Fisher-Tippett-Gnedenko theorem for Xi, i N dependent according to Archimedean copulas. We discuss connections to exchangeability and upper tail independence. Finally, we illustrate the resulting limit laws and discuss their properties.

TI_11_2

Hitchcock, David

University of South Carolina

Title

A Spatio-temporal Model Relating Gage Height Data to Precipitation at South Carolina Locations 

The gage height of rivers (i.e., the height of the water’s surface) can be used to help define flood events.  We use a Conditionally Autoregressive (CAR) model to relate gage height measured daily over five years (2011-2015) at nearly 100 locations across South Carolina to several covariates.  An important covariate is the daily precipitation at these locations.  Other covariates considered include the elevation at the locations and a fall-season indicator variable.  We also include interactions in our model.  The spatial dependency is specified by defining catchment basins as neighborhoods.  We use a Bayesian approach to estimate our model parameters.  Both the temporal and spatial correlations in the model are significant.  Precipitation appears to have a positive effect on gage height, and this effect is significantly greater during the fall season.  This is joint work with Haigang Liu and S. Zahra Samadi

TI_41_3

Hu, Guanyu 

University of Connecticut 

Title

A Bayesian Joint Model of Marker and Intensity of Marked Spatial Point Processes with Application to Basketball Shot Chart 

The success rate of a basketball shot may be higher at locations in the court where a player makes more shots. In a marked spatial point process model, this means that the markers are dependent on the intensity of the process. We develop a Bayesian joint model of the marker and the intensity of marked spatial point processes, where the intensity is incorporated in the model of the marker as a covariate. Further, we allow variable selection through the spike-slab prior. Inferences are developed with a Markov chain Monte Carlo algorithm to sample from the posterior distribution. Two Bayesian model comparison criteria, the modified Deviance Information Criterion and the modified Logarithm of the Pseudo-Marginal Likelihood, are developed to assess the fit of different joint models. The empirical performance of the proposed methods is examined in extensive simulation studies. We apply the proposed methodology to the 2017--2018 regular season shot data of four professional basketball players in the NBA to analyze the spatial structure of shot selection and field goal percentage. The results suggest that the field goal percentages of all four players have are significantly positively dependent on their shot intensities, and that different players have different predictors for their field goal percentages

TI_48_0

Huang, Hsin-Hsiung

University of Central Florida

Title

Statistical Methodology for Big Data

In this session, the speakers will talk about various novel methods to handle problems of real data which may have large sample sizes from different locations, missing values, and other challenges.

TI_48_1

Huang, Hsin-Hsiung

University of Central Florida

Title

A new statistical strategy for predicting major depressive disorder using whole-exome genotyping data

Major depressive disorder is a common and serious psychiatric disorder, which may cause significant morbidity and mortality, and lead to high rates of suicide. Genetic factors have been proven to play important roles in the development of MDD. Recently, genome-wide association studies on common variants have been studied. However, the large amount of missing values influences the analysis results. In this paper, we proposed to treat the missing values as distinct categories with various statistical classification models. The classification results improve significant compared to imputation of the missing values.

TI_22_4

Jayalath, Kalanka

University of Houston - Clear Lake

Title

A Bayesian Survival Analysis for the Inverse Gaussian Data

This talk focuses on a comprehensive survival analysis for the inverse Gaussian distribution employing Bayesian and Fiducial approaches. The analysis previously made in the literature required the distribution mean to be known, which is unrealistic, and thus it restricted the scope of the investigation. No such assumption is made here. Also, this study further includes an illustration for survival analysis of data with random rightly censored observations. The Gibbs sampling is employed in estimation and bootstrap comparisons are made between the Bayesian and Fiducial estimates. It is concluded that the size of censoring in data and the shape of inverse Gaussian distribution have the most impact on the two analyses, Bayesian vs Fiducial.

TI_3_3

Johnston, Douglas E

State University of New York at Farmingdale 

Title

A Recursive Bayesian Model for the Excess Distribution with Stochastic Parameters 

The generalized extreme value (GEV) and Pareto (GPD) distributions are important tools for analyzing extreme values such as large losses in financial markets.  In particular, the GPD is the canonical distribution for modelling excess losses above a “high” threshold. This conditional distribution is typically used for the computation of risk-metrics such as expected shortfall (i.e., the conditional mean) and extreme quantiles. In our work, we propose a new approach for analyzing extreme values by apply a stochastic parametrization to the GPD distribution with the parameters following a hidden stochastic process which results in a non-linear, non-Gaussian state-space model with unknown static parameters.  This approach allows for dependencies, such as clustering of extremes, often witnessed in financial data.  To compute the predictive excess loss distribution, we derive a Rao-Blackwellized particle filter that reduces the parameter space, and a concise, recursive solution is obtained. This has the benefit of improved filter performance and permits real-time implementation.  We introduce a new risk-measure that is a more robust estimate for the expected shortfall and we illustrate our results using both simulated data and actual stock market returns from 1928-2018. Finally, we compare our results to traditional methods of estimating the excess loss distribution, such as maximum likelihood, to show the improvement obtained. 

TI_12_1

Jones, Galin L.

University of Minnesota 

Title

Fully Bayesian Penalized Regression with a Generalized Bridge Prior 

We consider penalized regression models under a unified framework. The particular method is determined by the form of the penalty term, which is typically chosen by cross validation. We introduce a fully Bayesian approach that incorporates both sparse and dense settings and show how to use a type of model averaging approach to eliminate the nuisance penalty parameters and perform inference through the marginal posterior distribution of the regression coefficients. We establish tail robustness of the resulting estimator as well as conditional and marginal posterior consistency for the Bayesian model. We develop a component-wise Markov chain Monte Carlo algorithm for sampling. Numerical results show that the method tends to select the optimal penalty and performs well in both variable selection and prediction and is comparable to, and often better than alternative methods. Both simulated and real data examples are provided. 

TI_34_4

Kang, Sang (John)

The University of Western Ontario 

Title

Moment-based density approximation techniques as applied to heavy-tailed distributions 

Several advances for the approximation and estimation of heavy-tailed distributions are proposed. It is first explained that on initially applying the Esscher transform to heavy-tailed density functions, one can utilize a moment-based technique whereby the tilted density functions are expressed as the product of a base density function and a polynomial adjustment. Alternatively, density approximants can be secured by appropriately truncating the distributions or mapping them onto compact supports. Extensions to the context of density estimation, in which case sample moments are employed in lieu of exact moments are discussed, and illustrative applications involving actuarial data sets are presented. 

TI_17_0

Kao, Ming-Hung (Jason)

Arizona State University 

Title

Design and analysis of complex experiments: Theory and applications 

The four talks on the design and analysis of complex experiments in this session include sub-data sections for big data, a large data issue in computer experiments, a study on order-of-addition experiments, and an optimal experimental design approach for functional data analysis.

TI_24_4

Kapenga, John

Western Michigan University 

Title

Computation of High Dimensions Integrals 

Integrals in dimensions from 20 to a few thousand have recently been used in several applications including finance, Bayesian statistics and quantum physics. Even infinitely dimension integrals have been attacked numerically. Traditional numerical methods and the usual Monte Carlo methods cannot be applied as the dimension increases beyond perhaps 20. A brief history and the status of effective current lattice methods, such as the fast CBC construction, will be presented. Several examples and timings will be included. 

TI_30_3

Kim, Jong Min

University of Minnesota-Morris 

Title

Change point detection method with copula conditional distribution to multistage sequential control chart

In this research we propose change point model of the multistage Statistical Process Control (SPC) chart for high correlated multivariate data via copula conditional distribution, principal component analysis (PCA) and functional PCA. Furthermore, we review the current available multistage statistical process control charts. In addition, to verify our proposed change point model, we compare the current change point models of the single stage SPC chart via PCA with our change point model for the multistage SPC chart via copula conditional distribution, PCA and functional PCA with highly correlated multistage simulated and real data 

TI_18_0

Kozubowski, Tomasz

University of Nevada

Title

Discrete Stochastic Models and Applications 

Discrete stochastic models are an essential part of statistician’s toolbox, as they are widely used across many areas of applications. The session focuses on recent developments in this important area, and its scope is rather broad, from univariate to multivariate discrete distributions, including hybrid models with discrete as well as continuous components, heavy-tail distributions, and their applications.

TI_36_3

Kozubowski, Tomasz

University of Nevada

Title

Multivariate models connected with random sums and maxima of dependent Pareto components 

We present recent results concerning stochastic models for (X,Y,N), where X and Y, respectively, are the sum and the maximum of N dependent, heavy tailed Pareto components. Models of this form are desirable in many applications, ranging from hydro-climatology, to finance and insurance. 

Our construction is built upon a pivotal model involving a deterministic number of IID exponential variables, where the basic characteristics of the involved multivariate distributions admit explicit forms. In addition to theoretical results, we shall present real data examples, illustrating the usefulness of these models

TI_26_2

Krishnamoorthy, Kalimuthu

University of Louisiana at Lafayette 

Title

Fiducial Inference with Applications 

Fiducial distribution for a parameter is essentially the posterior distribution with no a prior distribution on the parameter. In this talk, we shall describe Fisher's method of finding a fiducial distribution for normal parameters and fiducial inference through examples involving well-known distributions such as the normal and related distributions. We then describe the approach for finding fiducial distributions for the parameters of a location-scale family and for discrete distributions. We illustrate the approach for the Weibull distribution and delta-lognormal distribution. In particular, we shall see fiducial methods for finding confidence intervals, prediction intervals, prediction limits for the mean of a future sample. 

TI_19_0

Kumar, C. Satheesh

University of Kerala, Trivandrum, India

Title

Distribution Theory

The session consists of four talks - the first two talks will be on Weibull related classes of distributions, while the third talk on the analysis of competing risk data under progressive type-II censoring. The session concludes with a talk on certain classes of discrete distributions of order k.

TI_19_4

Kumar, C. Satheesh

University of Kerala

Title

On a Wide Class of Discrete Distribution

Several types of discrete distributions of order k are available in the literature and they have been found extensive applications in many areas of scientific research. In the present talk, we discuss certain new classes of discrete distributions of order k, which are developed as distributions of the random sum of certain independent and identically distributed Hirano type random variables. We attempt to outline several important distributional properties of these families of distributions along with a brief discussion on their mixtures and limiting cases.

TI_7_2

Lee, Gee

Michigan State University

Title

General insurance deductible ratemaking (and extensions) 

Insurance claims have deductibles, which must be considered when pricing for insurance premium. The deductible may cause censoring and truncation to the insurance claims. In this talk, an overview of deductible ratemaking will be provided, and the pros and cons of two deductible ratemaking approaches will be compared; the regression approach, and the maximum likelihood approach. The regression approach turns out to have an advantage in predicting aggregate claims, while the maximum likelihood approach has an advantage when calculating theoretically correct relativities for deductible levels beyond those observed by empirical data. A comparison of selected models show that the usage of long-tail severity distributions may improve the deductible rating, while the 01-inflated frequency model may have limited advantages due to estimation issues under censoring and truncation. For demonstration, loss models fit to the Wisconsin Local Government Property Insurance Fund (LGPIF) data will be illustrated, and examples will be provided for the ratemaking of per-loss deductibles offered by the fund. 

TI_22_3

Lee, I-Chen

National Cheng-Kung University

Title

Global Planning of Accelerated Degradation Tests

The accelerated degradation test (ADT) is an efficient tool for assessing the life-time information of highly reliable products. Without taking the experimental cost into consideration, recently, an analytical approach was proposed in the literature to determine the optimum stress levels and the corresponding optimum sample size allocation simultaneously in a general class of exponential dispersion (ED) degradation models. However, conducting an ADT is very expensive. Therefore, how to conduct a cost-constrained ADT plan is a great challenging issue for reliability analysts. By taking the experimental cost into consideration, this study further proposes a semi-analytical procedure to determine the total sample size, the measurement frequencies, and number of measurements (within a degradation path) globally under the class of ED degradation models. An example is used to demonstrate that our proposed method is very efficient to obtain the cost-constrained ADT plan, compared with the conventional optimum plan by the grid search algorithm.

TI_24_2

Lee, Kevin

Western Michigan University 

Title

Temporal Exponential-Family Random Graph Models with Time-Evolving Latent Block Structure for Dynamic Networks 

Model-based clustering of dynamic networks has emerged as an essential research topic in statistical network analysis. We present a principled statistical clustering of dynamic networks through the temporal exponential-family random graph models with a hidden Markov structure. The temporal exponential-family random graph models allow us to detect groups based on interesting features of the dynamic networks and the hidden Markov structure is used to infer the time-evolving block structure of dynamic networks. The power of our proposed method is demonstrated in real-world applications. 

TI_20_0

Levine, Michael

Purdue University

Title

Recent advances involving latent variable models for various distributions 

This session is dedicated to some new developments in latent variable models. Models for specific distributions that are widely used in practice as well as the nonparametric latent variable models will be discussed.  Moreover, some models for new types of data lying in non-Euclidean spaces will also be considered. Taken together, the models discussed in this section are capable of modeling a very wide range of data with some hidden/unobservable structure.

TI_20_1

Levine, Michael

Purdue University

Title

Estimation of two-component skew normal mixtures where one component is known 

Two component mixtures have a special relevance for binary classification problems. In the standard setting for binary classification, labeled samples from both components are available in the form of training data. However, many real-world problems do not fall in this standard paradigm. For example, in social networks users may only be allowed to click `like' (if there is no `dislike' button) for a particular product. Thus, labeled data can be collected only for one of the components (a sample containing users who clicked `like'). In addition, unlabeled data from the mixture (a sample containing all users) is also available. To guarantee unimodality of components and allow for the skewness, we model the components with a skew normal family, a generalization of the Gaussian family with good theoretical properties and tractable inference. An efficient algorithm that estimates a mixture proportion as well as the parameters of the unknown component is proposed. We illustrate its performance using a well-designed simulation study.

TI_21_0

Li, Daoji

California State University Fullerton 

Title

Big Data and Dimension Reduction  

This session will present recent advances in big data and dimension reduction, including optimal subsampling for massive data, scalable spectral clustering framework, Robust PCA, and High-dimensional interaction detection. 

TI_21_4

Li, Daoji

California State University Fullerton

Title

High-dimensional interaction detection with false sign rate control 

Understanding how features interact with each other is of paramount importance in many scientific discoveries and contemporary applications. Yet interaction identification becomes challenging even for a moderate number of covariates. In this paper, we suggest an efficient and flexible procedure for interaction identification in ultra-high dimensions. Under a fairly general framework, we establish that for both interactions and main effects, the method enjoys oracle inequalities in selection. We prove that our method admits an explicit bound on the false sign rate, which can be asymptotically vanishing. Our method and theoretical results are supported by several simulation and real data examples. 

TI_48_2

Li, Keren

Northwestern University

Title

Score-Matching Representative Approach for Big Data Analysis with Generalized Linear Models

We propose a fast and efficient strategy, called the representative approach, for big data analysis with linear models and generalized linear models. With a given partition of big dataset, this approach constructs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the divide-and-conquer method. With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other cases, we recommend score-matching representatives (SMR). As an illustrative application to the Airline on-time performance data, MR and SMR are as good as the full data estimate when available. Furthermore, the proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected computers”

TI_46_0

Lio, Yuhlong

University of South Dakota

Title

Statistical Modeling for Degradation Data II

In recent years, statistical modeling and inference techniques have been developed based on different degradation measures. This invited session is based on the book “Statistical Modeling for Degradation Data” co-edited by Professors Ding-Geng (Din) Chen, Yuhlong Lio, Hon Keung Tony Ng, Tzong-Ru Tsai, published by Springer in 2017.  The book strives to bring together experts engaged in statistical modeling and inference to present and discuss the most recent important advances in degradation data analysis and related applications.  The speakers in this session are invited to contribute to this book and further present their recent development in this research area.

TI_32_3

Lio, Yuhlong

University of South Dakota

Title

Estimation of Stress-Strength for Burr XII distribution based on the progressively first failure-censored samples 

Stress-strength is studied under the progressively first failure-censored samples from Burr XII distributions.  Confidence intervals for stress-strength constructed respectively by using variate procedures are discussed.  Some computation results from simulation study are presented and an illustrative example is provided for demonstration.  

TI_40_2

Liu, Ruiqi

Indiana University Purdue University Indianapolis

Title

Optimal Nonparametric Inference via Deep Neural Network 

The deep neural network is a state-of-art method in modern science and technology. Much statistical literature has been devoted to understanding its performance in nonparametric estimation, whereas the results are suboptimal due to a redundant logarithmic sacrifice. In this work, we show that such log-factors are not necessary. We derive upper bounds for the L^2 minimax risk in nonparametric estimation. Sufficient conditions on network architectures are provided such that the upper bounds become optimal (without log-sacrifice). Our proof relies on an explicitly constructed network estimator based on tensor product B-splines. We also derive asymptotic distributions for the constructed network and a relating hypothesis testing procedure. The testing procedure is further proven as minimax optimal under suitable network architectures. 

TI_47_2

Long, Hongwei

Florida Atlantic University, Baca Raton, FL. 

Title

The Beta Transmuted Pareto Distribution: Theory and Applications 

In this talk, we present a composite generalizer of the Pareto distribution. The genesis of the beta distribution and transmuted map is used to develop the so-called beta transmuted Pareto (BTP) distribution. Several mathematical properties including moments, mean deviation, probability weighted moments, residual life, distribution of order statistics and the reliability analysis are discussed. The method of maximum likelihood is proposed to estimate the parameters of the distribution. We illustrate the usefulness of the proposed distribution by presenting its application to model real-life data sets. 

TI_33_2

Mailhot, Melina

University of Concordia 

Title

Multivariate geometric expectiles and range value-at-risk

Geometric generalizations of expectiles and Range Value-at-Risk for d-dimensional multivariate distribution functions will be introduced. Multivariate geometric expectiles are unique solutions to a convex risk minimization problem and are given by d-dimensional vectors. Multivariate geometric Range Value-at-Risk is also a risk measure considering tail events, which has TVaR as a special case. They are well behaved under common data transformations. Properties and highlights on the influence of varying margins and dependence structures will be presented.

TI_8_2

Maity, Arnab Kumar

Pfizer Inc.  

Title

Bayesian Data Integration and Variable Selection for Pan-Cancer Survival Prediction using Protein Expression Data 

Accurate prognostic prediction using molecular information is a challenging area of research which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes like the survival time of the patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, the data are available for different tumor types hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models which accommodate all these challenges aforementioned here. We use hierarchical Bayesian accelerated failure time (AFT) model for the survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We allow to borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated The Cancer Proteome Atlas (TCPA) which contains RPPA based high quality protein expression data as well as detailed clinical annotation including survival times.  Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model which links different tumors with the correlated prior structures. 

TI_30_2

Makubate, Boikanyo

Botswana International university of Science and Technology 

Title

A New Generalized Weibull Distribution with Applications to Lifetime Data 

A new and generalized Weibull-type distribution is developed and presented. Its properties are explored in detail. Some estimation techniques including maximum likelihood estimation method are used to estimate the model parameters and finally applications of the model to real data sets are presented to illustrate the usefulness of the proposed generalized distribution. 

TI_14_4

Mallick, Avishek

Marshall University, West Virginia 

Title

An Inflated Geometric Distribution and its application  

A count data that have excess number of zeros, ones, twos or threes are common- place in experimental studies. But these inflated frequencies at particular counts may lead to over dispersion and thus may cause difficulty in data analysis. So to get appropriate results from them and to overcome the possible anomalies in parameter estimation, we may need to consider suitable inflated distribution. Generally, Inflated Poisson or Inflated Negative Binomial distribution are the most commonly used for modeling and analyzing such data. Geometric distribution is a special case of Negative Binomial distribution. This work deals with parameter estimation of a Geometric distribution inflated at certain counts, which we called Generalized Inflated Geometric (GIG) distribution. Parameter estimation is done using method of moments, empirical probability generating function based method and maximum likelihood estimation approach. The three types of estimators are then compared using simulation studies and finally a Swedish fertility dataset was modeled using a GIG distribution.  

TI_42_1

Mandal, Saumen

University of Manitoba 

Title

Constrained optimal designs for estimating probabilities in contingency tables 

Construction of optimizing probability distributions plays an important role in many areas of statistical research. One example is estimation of cell probabilities in contingency tables. It is well known that the unconstrained maximum likelihood estimation of the cell probabilities is quite straightforward. However, the presence of constraints on the probabilities makes the problem quite challenging. For example, the constraints could be based on a hypothesis of marginal homogeneity. In this work, we attempt to solve the constrained maximum likelihood problem using optimal design theory, Lagrangian theory and simultaneous optimization techniques. This is an optimization problem with respect to variables that satisfy several constraints. We first formulate the Lagrangian function with the constraints, and then transform the problem to that of maximizing a number of functions of the cell probabilities simultaneously. These functions have a common maximum of zero that is simultaneously attained at the optimum. We then apply the methodology in some real data sets. Finally, we discuss that our approach is flexible and provide a unified framework for various types of constrained optimization problems. 

TI_23_0

Marques, Filipe

Universidade NOVA de Lisboa, Portugal

Title

Advances in distribution theory and statistical methodologies

 

TI_24_0

McKean, Joseph

Western Michigan University 

Title

Big Data: Algorithms, Methodology, and Applications 

Statisticians and Data Scientists must face the challenges of Big Data. In these talks, new algorithms and procedures (robust and traditional) are discussed to handle these challenges.   Algorithm optimization in terms of error distributions are discussed.  Application areas covered, include astronomical data, network analysis, and numerical integration. 

TI_10_1

Mdziniso, Nonhle Channon

Bloomsburg University of Pennsylvania 

Title

Odd Pareto families of distributions for modeling loss payment data

A three-parameter Odd Pareto (OP) distribution is presented with density function having a flexible upper tail in modeling loss payment data. The OP distribution is derived by considering the distributions of the odds of the Pareto and inverse Pareto distributions. Basic properties of the OP distribution are studied. Simulation studies based on the maximum likelihood method are conducted to compare the OP with other Pareto-type distributions. Furthermore, examples from the Norwegian fire insurance claims data-set are provided to illustrate the upper-tail flexibility of the distribution. Extensions of the Odd Pareto distribution are also considered to improve the fitting of data. 

TI_46_3

MeInykov, Volodymyr

The University of Alabama

Title

On Model-Based Clustering of Time-Dependent Categorical Sequences

Clustering categorical sequences is an important problem that arises in many fields such as medicine, sociology, and economics. It is a challenging task due to the fact that there is a lack of techniques for clustering categorical data as the majority of traditional clustering procedures are designed for handling quantitative observations. Situations with categorical data being related to time are even more troublesome. We propose a mixture-based approach for clustering categorical sequences and apply the developed methodology to a real-life data set containing sequences of life events for respondents participating in the British Household Panel Survey.

TI_25_4

Melnykov, Igor

Colorado State University

Title

Positive and negative equivalence constraints in the semi-supervised K-means algorithm

K-means algorithm is a widely used clustering procedure thanks to its intuitive design and computational simplicity. The objective function of the algorithm has a clear interpretation when the algorithm is applied as an unsupervised method. In a semi-supervised setting, when certain restrictions are imposed on the solution, modifications of the objective function are necessary. We consider two classes of equivalence constraints that may influence the proposed clustering solution. We propose a method making both kinds of restrictions a part of the fabric of the algorithm and provide the necessary modifications of its objective function

TI_25_0

Melnykov, Volodymyr

The University of Alabama

Title

New developments in finite mixture modeling with applications

Finite mixtures present a flexible tool for modeling heterogeneity in data. Model-based cluster analysis is the most famous application of mixture models. The session covers novel methodological developments in this area and considers various applications.

TI_25_1

Melnykov, Yana

The University of Alabama

Title

On finite mixture modeling of processes with change points

We consider a novel framework for modeling heterogeneous processes with change points. The proposed finite mixture model can effectively take into account the potential presence of change points. Conducted simulation studies show that the model can correctly assess the mixture order as well as the location of change points within mixture components. The application to real-life data yields promising results.

TI_25_2

Michael, Semhar

South Dakota State University

Title

Finite mixture of regression models for data from complex survey design

We explored the use of finite mixture regression models when the samples were drawn using a stratified sampling design. We developed a new design-based inference where we integrated sampling weights in the complete-data log-likelihood function. The expectation-maximization algorithm was derived accordingly. A simulation study was conducted to compare the proposed method with the finite mixture of a regression model. The comparison was done using bias-variance components of mean square error with interesting results. Additionally, a simulation study was conducted to assess the ability of the Bayesian information criterion to select the optimal number of components under the proposed modeling approach

TI_34_3

Mohsenipour, Akbar

Vivametrica 

Title

Approximating the distribution of various types of quadratic expressions on the basis of their moments 

Several moment-based approximations to the distribution of various types of quadratic forms and expressions, including those in singular Gaussian and in elliptically contoured random vectors are proposed. In the normal case, the moments are obtained recursively from the cumulants and the distribution of positive definite quadratic forms is approximated by means of two and three-parameter gamma-type distributions. Approximations to the density functions of Hermitian quadratic forms in normal vectors and quadratic forms in order statistics from a uniform population are provided as well. 

TI_27_0

Muthukumarana, Saman

University of Manitoba 

Title

Bayesian Methods with Applications 

This session will highlight the use of Bayesian modelling and inferential methods in discovering genetic associations with diseases, image analysis, studying populations of animals and sports.  Bayesian regression tree models, latent ancestral tree models, semi-parametric Bayesian methods using Dirichlet process and   Bayesian models for photographic identification in animal populations are discussed.  

TI_27_4

Muthukumarana, Saman

University of Manitoba 

Title

Model Based Estimation of Baseball Batting Metrics 

We consider the modeling of batting outcomes of baseball batters using a weighted likelihood approach and a semi-parametric Bayesian approach. The weighted likelihood allows the other batters to contribute to the inference so that the relevant information they contain is not lost and the weights are determined based on their dissimilarities with the target batter. Minimum Averaged Mean Squared Error (MAMSE) weights are used as the likelihood weights. We then propose a semi-parametric Bayesian approach based on Dirichlet process that enables the borrowing information across batters. We demonstrate and compare these approaches using 2018 Major League Baseball data

TI_28_0

Nayak, Tapan

George Washington University 

Title

Protection of Respondents' Privacy and Data Confidentiality 

Protecting respondent’s privacy and data confidentiality has become a very important topic in recent years. This session is devoted to discussing recent developments in this area.  

TI_28_4

Nayak, Tapan

George Washington University 

Title

Discussion 

I shall present some concluding remarks on protecting respondent’s privacy and data confidentiality.  

TI_22_1

Ng, Hon Keung Tony

Southern Methodist University

Title

Improved Techniques for Parametric and Nonparametric Evaluations of the First-Passage Time of Degradation Processes

Determining the first-passage time (FPT) distribution is an important topic in reliability analysis based on degradation data because FPT distribution provides some valuable information on the reliability characteristics. In this paper, we propose some improved techniques based on saddlepoint approximation to improve upon some existing methods to approximate the FPT distribution of degradation processes. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. The limitations related to the improved techniques are discussed and some possible solutions to these limitations are proposed. Concluding remarks and practical recommendations are provided based on the results.

TI_32_0

Ng, Hon Keung Tony

Southern Methodist University

Title

Statistical Models and Methods for Analysis of Reliability and Survival Data 

This session focus on the statistical methodologies for analyzing different kinds of reliability and survival data in industrial and medical studies. These methods are important to reliability engineers and medical researchers because they make the extraction of lifetime characteristics possible through suitable statistical analysis and lead to better decision making.  

TI_4_4

Nguyen, Yet

Old Dominion University

Title

A histogram-Based Method for False Discovery Rate Control in Two Independent Experiments

In this talk, we present a new method to estimate and control false discovery rate (FDR) when identifying simultaneous signals in two independent experiments. In one experiment, thousands or millions of features are tested for significance with respect to some factor of interest. In a second experiment, the same features are also tested for significance. Researchers are interested in identifying simultaneous signals, i.e., features that are significant in both experiments. We develop an FDR estimation and control procedure that is a generalization of the histogram-based FDR estimation and control procedure for one experiment. Asymptotic results and simulation studies are shown to investigate performance of the proposed method and other existing methods.

TI_34_2

Nkurunziza, Sévérien

University of Windsor 

Title

Some identities for the risk and bias of shrinkage-type estimators in elliptically contoured distributions 

We consider an estimation problem regarding the mean of a random matrix whose distribution is elliptically contoured. In particular, we study the properties of a class of multidimensional shrinkage-type estimators in the context where the variance-covariance matrix of the shrinking random component is the sum of two Kronecker products. We present some identities for computing some mixed moments as well as two general formulas for the bias and risk functions of shrinkage-type estimators. As a by-product, we generalize some identities established in Gaussian sample cases for which the shrinking random component is represented by a single Kronecker product.  

TI_36_2

Nolan, John

American University 

Title

Multivariate Generalized Logistic Laws 

Multivariate Fréchet laws are a class of extreme value distributions that exhibit heavy tails and directional dependence controlled by an angular measure.  Multivariate generalized logistic laws are a recently described sub-class that are dense in a certain sense.  It is shown that these laws are related to positive multivariate sum stable laws, which gives a way to simulate from these laws.  The corresponding angular measure density is described, and expressions for the density of the distribution are given. 

TI_13_4

Olufadi, Yunusa

University of Memphis

Title

EM Bayesian variable selection for clustered discrete and continuous outcomes 

Feature selection for Gaussian and non-Gaussian linear model is common in literature. However, to our knowledge, there is scant report on clustered discrete and continuous outcomes that are highly dimensional. Mixed outcomes data of this kind are becoming increasingly common in developmental toxicity (DT) studies and several other studies. In toxico-epigenomics study for example, interest might be to extract biomarkers of DT or detect new biomarkers of DT. We develop a Bayesian hierarchical modeling procedure to guide both the estimation and efficient extraction of the most useful features.  

TI_30_0

Oluyede, Broderick

Georgia Southern University 

Title

Copulas, Informational Energy, Exponential Dominance and Uncertainty for Generalized and Multivariate Distributions

Copulas, exponential dominance and uncertainty for generalized distributions are explored and comparisons via informational energy functional and differential entropy are presented in this session. More importantly, the first talk deals with stochastic dominance and bounds for cross-discrimination and uncertainty measures for weighted reliability functions. In the second talk, new generalized distributions are developed. In the third talk, change point model for high correlated multivariate data via copula conditional distribution, principal component analysis (PCA) and functional PCA is presented. Finally, the last presentation deals with a class of stochastic SEIRS epidemic dynamic models. 

TI_30_1

Oluyede, Broderick

Georgia Southern University 

Title

Informational Energy, Stochastic Inequalities and Bounds for Weighted Weibull-Type Distributions. 

In this talk, generalized distributions that are weighted distributions are presented.  Inequalities and dominance, uncertainty and informational measures for weighted and parent generalized Weibull-type distributions are developed. Comparisons of the weighted and parent generalized Weibull-type distributions via informational energy function and the differential entropy are presented. Moment-type and stochastic inequalities as well as bounds for cross-discrimination and uncertainty measures in weighted and parent life distribution functions and related reliability measures are given. 

TI_31_0

Omolo, Bernard

University of South Carolina – Upstate

Title

Statistical Methods for High‐Dimensional Data Analysis: Application to Genomics 

 

TI_31_1

Omolo, Bernard

University of South Carolina – Upstate

Title

A Model-based Approach to Genetic Association Testing in Malaria Studies

In this study, we propose a two-step approach to genetic association testing in malaria studies in a GWAS setting that may enhance the power of the tests, by identifying the underlying genetic model first before applying the association tests. This is performed through tests of significance of a given genetic effect, noting the minimum p-values across all the models and the proportion of tests that a given genetic model was deemed the best, using simulated data. In addition, we fit generalized linear models for the genetic effects, using case-control genotype data from Kenya, Gambia and Malawi, available from MalariaGEN®.

TI_1_2

Oraby, Tamer

University of Texas - Rio Grande Valley  

Title

Modeling Progression of Co-Morbidity Using Bivariate Markov Chains 

 In this work, we use bivariate Markov Chain (MC) to model the progression of two diseases or morbidities, like obesity and diabetes, and the correlation between both processes. We postulate that the MC has rates of transition that are dependent on a set of covariate, like age and gender as well as treatment. The data includes individuals who are dependent due to familial relationship. We will present the estimation of the model’s parameters and discuss its goodness of fit.   

TI_18_3

Otunuga ,  Olusegun

Marshall University

Title

Closed form probability distribution of number of infections at a given time in a stochastic SIS epidemic model

We study the effect of external fluctuation in the transmission rate of certain diseases and how this perturbation affects the distribution of the number of infections over time. To do this, we introduce random noise in the transmission rate in a deterministic SIS model and study how the number of infections behaves over time. The closed form probability distribution of the number of infections at a given time in the resulting stochastic SIS epidemic model is derived. Using the Fokker-Planck equation, we reduce the differential equation governing the number of infections to a generalized Laguerre differential equation. The distribution is demonstrated using U.S. influenza data.

TI_23_1

Oyamakin S. O.

Universidade de São Paulo 

Title

Some New Nonlinear Growth Models For Biological Processes based on Hyperbolic Sine Function

In this paper, we propose maximum a posteriori (MAP) estimators for the parameters of some survival distributions, which have a simple closed-form expression. In principle, we focus on the Nakagami distribution, which plays an essential role in communication engineering problems, particularly to model fading of radio signals. Moreover, we show that the obtained results can be extended to other survival probability distributions, such as the gamma and generalized gamma ones. Numerical results reveal that the MAP estimators outperform the existing estimators and produce almost unbiased estimates even for small sample sizes. Our applications are driven by embedded systems, which are commonly used in communication engineering. Particularly, they can consist of an electronic system inside a microcontroller, which can be programmed to maintain communication between a transmitting antenna and mobile antennas, which are operating at the same frequency.  In this context, from the statistical point of view, closed-form estimators are needed, since they are embedded in mobile devices and need to be sequentially recalculated at real time. 

TI_6_4

Ozdemir, Senay

Afyon Kocatepe University   

Title

Combining Heavy-Tailed   Distributions and Empirical Likelihood method for Linear Regression Model 

Empirical likelihood (EL) estimation method proposed by Owen (1991) is one of the nonparametric methods to estimate the parameters of a linear regression model.  In EL method an EL function is maximized under some constraints formed using the likelihood scores  under normally distributed errors.  In this paper, an alternative empirical likelihood (EL) estimator for the parameter vector of a linear regression model is proposed using the score functions of some popular heavy tail distributions as  the  constraints in   the  EL  estimation  method. Our numerical studies show that, when data set is subject to heavy- tailedness, the performance of the proposed EL estimator is remarkably superior to the performance of the  EL estimator obtained  under normally  distributed  error terms . 

TI_32_4

Pal, Suvra

University of Texas at Arlington

Title

A New Estimation Algorithm for a Flexible Cure Rate Model 

In this talk, I will first present a flexible cure rate model that contains the mixture cure rate model and promotion time cure rate model as special cases. For the estimation of the model parameters, I will present the results of the well-known EM algorithm and then discuss some of the issues associated with the EM algorithm. To circumvent these issues, I will present a new optimization procedure based on non-linear conjugate gradient (NCG) algorithm. Through a simulation study, I will show the advantages of NCG algorithm over the EM algorithm. 

TI_41_2

Pal, Subhadip 

University of Louisville 

Title

A Bayesian Framework for Modeling Data on the Stiefel Manifold. 

Directional data emerges in a wide array of applications, ranging from atmospheric sciences to medical imaging. Modeling such data, however, poses unique challenges by virtue of their being constrained to non-Euclidean spaces like manifolds. Here, we present a Bayesian framework for inference on the Stiefel manifold using the Matrix Langevin distribution. Specifically, we propose a novel family of conjugate priors and establish a number of theoretical properties relevant to statistical inference. Conjugacy enables translation of these properties to their corresponding posteriors, which we exploit to develop the posterior inference scheme. For the implementation of the posterior computation, including the posterior sampling, we adopt a novel computational procedure for evaluating the hypergeometric function of matrix arguments that appears as normalization constants in the relevant densities. 

TI_18_2

Panorska, Anna K.

University of Nevada, Reno 

Title

Discrete Pareto Distributions, Butterfly Diet Breadth, and Climate Change 

We propose a new discrete distribution with finite support, which generalizes truncated Pareto and beta distributions as well as uniform and Benford’s laws. We present its fundamental properties and consider parameter estimation. We include an illustration of the applications of this new stochastic model in ecology. 

TI_37_3

Pararai, Mavis

INDIANA UNIVERSITY OF PENNSYLVANIA

Title

The Weibull Linear Failure Rate Distribution and Its Applications

A new distribution called Weibull Linear Failure distribution is introduced and its properties are explored. The properties of this new distribution and its sub models will be discussed. Some statistical properties of the proposed distribution and maximum likelihood estimation of parameters are discussed. A simulation study to examine the bias and mean square error of the maximum likelihood estimators for each parameter is presented. Finally, applications of the model using a real data set is presented to illustrate how useful the model is.

TI_38_0

Peng, Hanxiang

Binghamton University

Title

Empirical Likelihood 

The session addresses topics centered around the empirical likelihood approach. 

TI_38_1

Peng, Hanxiang

Indiana University-Purdue University Indianapolis

Title

Maximum empirical likelihood estimation in U-statistics based general estimating equations. 

In this talk, we discuss maximum empirical likelihood estimates (MELE's) in U-statistics based general estimating equations. Our approach is the jackknife empirical likelihood (JEL). We derive the estimating equations for MELE's and provide asymptotic normality. We provide a class of MELE's which have less computational burden than the usual MELE's and can be implemented using existing software. We show that the MELE's are efficient. We present several examples for constructing efficient estimates for moment based distribution characteristics in the presence of side information. In the end, we report some simulation results.  

TI_13_2

Peng, Hanxiang

Indiana University-Purdue University Indianapolis

Title

An Empirical Likelihood Approach of Testing of Multivariate Symmetries 

We propose several empirical likelihood tests for testing spherical symmetry, rotational symmetry, antipodal symmetry, coordinate-wise symmetry, and exchangeability. We construct the tests by exploiting the characterizations of these symmetries. The jackknife empirical likelihood for vector U-statistics are employed to incorporate side information. We exhibit that the tests are distribution free and asymptotically chi-square distributed. We report some simulation results about the numerical performance of the tests. 

TI_26_0

Peng, Jianan

Acadia University 

Title

Generalized  and Fiducial Inference with Applications

Generalized inference, introduced by Weerahandi, has many applications. Fiducial inference, initiated by Fisher, is resurrecting to a new life, mainly due to Hanning and other researchers. In this session we have two talks (including the one by Weerahandi)  on generalized inference and two talks on (generalized) fiducial inference.  

TI_26_4

Peng, Jianan

Acadia University 

Title

Successive Comparisons for One-way Layout under Heteroscedasticity 

Suppose that k (k>2) treatments in a one-way layout  are  ordered in a certain way. For example, the treatments may be increasing dose levels of a drug in dose response studies.  The experimenters may be interested in the successive comparisons of the treatments. In this talk, we consider the simultaneous confidence intervals for the successive comparisons under heteroscedasticity. We propose several methods, including the maxT method, the minP method, and the  generalized fiducial confidence intervals, among others.  We show that the generalized fiducial confidence intervals   have correct coverage probability asymptotically. A simulation study and a real data example are given to illustrate the proposed procedures. 

TI_11_1

Peng, Stephen

Georgetown University 

Title

A Flexible Univariate Autoregressive Time-Series Model for Dispersed Count Data 

Integer-valued time series data have an ever-increasing presence in various applications and need to be analyzed properly. While a Poisson autoregressive (PAR) model would seem like a natural choice to model such data, it is constrained by the equi-dispersion assumption. Hence, data that are over- or under-dispersed are improperly modeled, resulting in biased estimates and inaccurate forecasts. This work (coauthored by Stephen Peng and Ali Arab) instead develops a flexible integer-valued autoregressive (INAR) model for count data that contain over- or under-dispersion. Using the Conway-Maxwell-Poisson (COM-Poisson or CMP) distribution and related distributions as motivation, we develop a first-order sum-of-Conway-Maxwell-Poisson autoregressive (SCMPAR(1)) model that will instead offer a generalizable construct that captures the PAR, negative binomial AR (NBAR), and binomial AR (BAR) models respectively as special cases, and serve as an overarching representation connecting these three special cases through the dispersion parameter. We illustrate the SCMPAR model's flexibility through simulated and real data examples. 

TI_17_2

Phoa, Frederick

Academia Sinica 

Title

A systematic construction of cost-efficient designs for order-of-addition experiments

An order-of-addition (OofA) experiment aims at investigating how the order of factor inputs affects the experimental response, which is of great interest in clinical trials and industrial processes. Recent studies on the OofA designs focused on their properties of algebraic optimality rather than cost-efficiency. In this talk, we propose a systematic construction on the cost-efficient designs of the OofA experiments, which each pair of level settings from two different factors appears exactly once. Furthermore, unlike recent studies on OofA experiments, our designs can handle experimental factors with more than one level. Notice that the use of placebo or the choice of different does reveal the practicality of our designs in clinical trials for example.

TI_33_0

Pigeon, Mathieu

Université du Québec à Montréal (UQAM) , Canada

Title

Recent developments in predictive distribution modelling with applications in insurance 

 

TI_23_3

Piperigou, Violetta

University of Patras, Greece

Title

Maximum Likelihood Estimators for a Class of Bivariate Discrete Distributions

"The method of maximum likelihood (ML) yields estimators which, asymptotically, are normally distributed, unbiased and with minimum variance. In this method, computational difficulties are encountered when families of univariate discrete distributions are considered such as convolutions and compound distributions. For these types of distributions the probabilities are given through recurrence relations and consequently the ML estimators require iterative procedures to be obtained. It has been shown that in a large class of univariate discrete distributions, the ML equations can be reduced by one, which is replaced by the first equation of the method of moments. As examples of two-parameter distributions the Charlier and the Neyman are presented, where only a single equation need be solved iteratively to derive the estimators. The parameterization used, when working with these distributions, often leads to extremely high correlations of the ML estimators. A reparameterization that reduces or eliminates such correlation is desirable. If the MLE's are asymptotically uncorrelated the parameterization is orthogonal. It is discussed such a reparameterization for a class of discrete distributions, where one of the orthogonal parameters is the mean. This class includes, among

others, Delaporte and Hermite univariate distributions. These results are extended to a class of bivariate discrete distributions and the properties of MLE's are given. The case of a three-parameter bivariate Poisson is extensively discussed and some

examples of applications are given."

TI_47_4

Pokhrel, Keshav P.

University of Michigan-Dearborn 

Title

Reliability Models Using the Composite Generalizers of Weibull Distribution 

In this article, we study the composite generalizers of Weibull distribution using exponentiated, Kumaraswamy, transmuted and beta distributions. The composite generalizers are constructed using both forward and reverse order of each of these distributions. The usefulness and effectiveness of the composite generalizers and their order of composition is investigated by studying the reliability behavior of the resulting distributions.  Two sets of real-world data are analyzed using the proposed generalized Weibull distributions. 

TI_27_2

Pratola, Matthew

The Ohio State University 

Title

Adaptive Splitting Bayesian Regression Tree Models for Image Analysis 

Bayesian regression tree models are competitive with leading machine learning algorithms yet retain the ability to capture uncertainties, making them incredibly useful for many modern statistical applications where one requires more than point prediction.  However, a key limitation is the variable split rules, which are determined using static candidates.  This limits the ability of the model to capture local sources of variation, and increasing the number of candidates is computationally burdensome.  We introduce a novel adaptive strategy that replaces static splits with a dynamic grid that allows the tree bases to adapt, thereby more efficiently capturing patterns of local variation.  Combined with a clever dimension-reduction prior enables low-dimensional tree representations of processes.  We demonstrate these advances on an image analysis study investigating beach visitor counts in San Diego. 

TI_34_0

Provost, Serge

The University of Western Ontario 

Title

Recent Distributional Advances Involving Population and Sample Moments 

This session features novel advances in connection with the application of certain moment-based methodologies to data modeling, the approximation of the distribution of quadratic forms and the estimation of heavy-tailed distributions. As well, a shrinkage-type estimator of the mean of an elliptically contoured random vector is introduced. 

TI_34_1

Provost, Serge

The University of Western Ontario 

Title

On recovering sample points from their associated moments and certain moment-based density estimation methodologies 

A theorem asserting that, given the first n moments of a sample of size n, one can retrieve the original n sample points, will be discussed. For instance, this result entails that all the information being available in a sample of size n is contained in its first n moments, which substantiates the utilization of sample moments in statistical modeling and inference. Clearly, only a number of these n moments are useable in practice.  Certain density estimation methodologies relying on such sample moments shall be presented. 

TI_35_0

Qingcong Yuan (org: Qian, Lianfen)

University of Kentucky

Title

Recent Advances in Analyzing Medical Data and Dimension Reduction 

This purpose of this invited session is to disseminate most recent advances in analyzing medical data and dimension reduction methods. Specifically interests may be on modeling semi-competing risks data, imputation methods for missing data and dimension reduction. 

TI_17_3

Rha, Hyungmin

Arizona State University 

Title

A probabilistic subset search (PSS) algorithm for optimizing functional data sampling designs

We study optimal sampling times for functional data. Our main objective is to find the best sampling schedule on the predictor time axis to precisely recover the trajectory of predictor function and predict the scalar/functional response through functional linear regression models. Three optimal designs are considered: the schedule maximizing the precision of recovering predictor function, the schedule best for predicting response, and the schedule optimizing a user-defined mixture of the relative efficiencies of the two objectives. We propose an algorithm that can efficiently generate nearly optimal designs, and demonstrate that our approach outperforms the previously proposed methods.

TI_36_0

Richter, Wolf-Dieter

University of Rostock

Title

Multivariate distributions 

Authors of this Session  discuss a new methodology for evaluating probabilities and normalizing constants of probability distributions particular extreme value distributions that exhibit heavy tails and controlled directional

dependenc construction and application of models connected with sumsand maxima of dependent Pareto components - the stochastic representation, simulation and dynamic geometric disintegration of (p_1,…,p_k)-spherical probability laws. 

TI_36_4

Richter, Wolf-Dieter

University of Rostock

Title

On (p_1,...,p_k)-spherical distributions 

The class of (p_1, … , p_k)-spherical probability laws and  a method of simulating random vectors following such distributions are  introduced using  a new stochastic vector representation. A dynamic geometric disintegration method and a corresponding geometric measure representation are used for generalizing the classical Chi-square-, t- and F- distributions. Combining the principles of specialization and marginalization gives rise to an effective method of dependence modeling. 

TI_10_3

Samanthi Ranadeera

Central Michigan University 

Title

On bivariate distorted copulas  

In this talk, we propose families of bivariate copulas based on the distortions of existing copulas. The beta and Kumaraswamy cumulative distribution functions are employed to construct the proposed distorted copulas. With the additional two parameters in the distributions, the distorted copulas permit more flexibility in the dependence behaviors. Two theorems linking the original tail dependence behaviors and those of the distorted copula are derived for distortions that are asymptotically proportional to the power transformation in the lower tail and the dual-power transformation in the upper tail. Simulation results and an application to financial risk management are presented. 

TI_45_4

Samanthi, Ranadeera

Central Michigan University 

Title

Methods for Generating Coherent Distortion Risk Measures

In this talk, we present methods for generating new distortion functions by utilizing distribution functions and composite distribution functions. To ensure the coherency of the corresponding distortion risk measures, the concavity of the proposed distortion functions is established by restricting the parameter space of the generating distribution. Closed-form expressions for risk measures are derived for some cases. Numerical and graphical results are presented to demonstrate the effects of parameter values on the risk measures for exponential, Pareto and log-normal losses. In addition, we apply the proposed distortion functions to derive risk measures for a segregated fund guarantee. (This is a joint work with Jungsywan Sepanski, Central Michigan University.)

TI_12_4

Sanz-Alonzo, Daniel

University of Chicago 

Title

Scalable graph-based Bayesian semi-supervised learning 

The aim of this talk is to present some new theoretical and methodological developments concerning the graph-based, Bayesian approach to semi-supervised learning. I will show suitable scaling of graph parameters that provably lead to robust Bayesian solutions in the limit of large number of unlabeled data. The analysis relies on a careful choice of topology and in the study of the spectrum of graph Laplacians. Besides guaranteeing the consistency of graph-based methods, our theory explains the robustness of discretized function space MCMC methods in semi-supervised learning settings. 

TI_28_2

Sarathy, Rathindra

Oklahoma State University 

Title

Statistical Basis for Data Privacy and Confidentiality 

Statistical disclosure limitation methods are occasionally viewed as ad hoc methods, providing no strong privacy or confidentiality guarantees.  Although not true, this has been the primary motivation for recent standards such as differential privacy and their associated methods. In this talk, we explore the statistical basis for data confidentiality and methods that satisfy privacy and confidentiality requirements. We discuss the concepts underlying differential privacy to provide a comparison, as well as the potential utility trade-offs under both these frameworks. 

TI_37_0

 Sarhan, Ammar

Dalhousie University

Title

Generalization of lifetime distributions 

Generalization of lifetime distribution is one of the important tools in lifetime analysis. Most of the commonly used lifetime distributions have monotonic hazard rate functions. In applications, many data sets show non-monotonic shapes of the hazard rates. In this session, some of the generalizations of lifetime distributions will be discussed.

TI_37_1

 Sarhan, Ammar

Dalhousie University

Title

A new extension of the two-parameter bathtub hazard shaped distribution 

This article proposes a new generalization of the two-parameter bathtub shaped lifetime distribution, named the odd generalized exponential two-parameter bathtub shaped. Statistical properties of the proposed distribution are discussed. The maximum likelihood and Bayesian procedures are used to estimate the model parameters and some of its reliability measures.  To discuss the applicability of the proposed distribution, two real data sets are analyzed using different sampling scenarios.  Simulations study is provided to investigate the properties of the methods applied. 

TI_25_3

Sarkar, Shuchismita

Bowling Green State University

Title

Finite mixture modeling and model-based clustering for directed weighted networks

A novel approach relying on the notion of mixture models is proposed for modeling and clustering directed weighted networks. The developed methodology can be used in a variety of settings including multilayer networks. Computational issues associated with the developed procedure are effectively addressed by the use of MCMC techniques. The utility of the methodology is illustrated on the set of experiments as well as applications to real-life data containing export trade amounts for European countries.

TI_24_1

Schafer, Chad

Carnegie Mellon University 

Title

Astrostatistics in the Era of LSST 

The Large Synoptic Survey Telescope (LSST) will yield 15 Terabytes of data each evening over a ten year period, revolutionizing our understanding of the Universe. In this talk I will describe some of the opportunities, focusing on the recurring challenges when working with high-dimensional and noisy astronomical data. In their raw form, these data are difficult to model, and assumptions that may have been reasonable at small sample sizes could be revealed to be inadequate by LSST-scale data. Such inference challenges provide statisticians with opportunities to both contribute to science, and to advance statistical methodology. 

TI_18_4

Schissler, A. Grant

University of Nevada

Title

On Simulating Ultra High-Dimensional Multivariate Discrete Data 

It's critical to conduct realistic Monte Carlo studies. This is problematic when data are inherently multivariate and high dimensional. This situation appears frequently in high-throughput biomedical experiments (e.g., RNA-sequencing). Researchers, however, often resort to simulation designs that posit independence --- greatly diminishing insights into the empirical operating characteristics of any proposed methodology. To meet this gap, we propose a procedure to simulate high-dimensional multivariate discrete distributions and study its performance. We apply our method to simulate RNA-sequencing data sets (dimension > 20,000) with negative binomial marginals. 

TI_5_2

Schmegner, Claudia

DePaul University

Title

TX Family and Horseshoe Priors

Consider the problem of estimating the vector of normal means θ= (θ1,...,θn) in the ultra-sparse normal means model (yi|θi)N(θi,1) for i= 1,...,n. Horseshoe priors are very at handling cases in which many components of θ are exactly or approximately 0. The name “horseshoe” does not describe the shape of the density of θi, but rather the shape of the implied prior for the shrinkage coefficient associated with θi. We use the TX technique for generating distributions to propose new classes of Horseshoe priors, investigate their properties and compare their performances to those of the usual ones.

TI_8_3

Sen, Ananda

University of Michigan, Ann Arbor 

Title

Honey I Shrunk the Intercept

In logistic regression, separation occurs when a linear combination of predictors perfectly discriminates the binary outcome. This is the premise of the current discourse. Because finite valued maximum likelihood parameter estimates do not exist under separation, Bayesian regressions with informative shrinkage of the regression coefficients offer a suitable alternative. Classical studies of separation imply that efficiency in estimating regression coefficients may also depend upon the choice of intercept prior, yet relatively little focus has been given on whether and how to shrink the intercept parameter. Alternative prior distributions for the intercept are proposed that down-weight implausibly extreme regions of the parameter space, yielding regression estimates that are less sensitive to separation. Through extensive simulation, differences across priors are assessed using statistics measuring the degree of separation. Relative to diffuse priors, the proposed priors generally yield more efficient estimation of the regression coefficients themselves when the data are separated or nearly so. Moreover, they are equally efficient in non-separated datasets, making them suitable for default use. These numerical studies also highlight the interplay between priors for the intercept and the regression coefficients. Finally, the methodology is illustrated through implementation on a couple of datasets in the biomedical context. 

TI_44_3

Shahzad, Mirza Naveed

University of Gujrat

Title

Singh-Maddala Distribution: A new candidate to analyze the extreme value data by linear moment estimation

Modeling, accurate inference, and prediction of extreme events by probabilistic models are very important in every field to minimize the damage as much as possible due to extremes. To secure this useful purpose, Singh-Maddala distribution is considered in this article as a new candidate for the analysis of extreme events. The extreme value datasets are frequently heavy-tailed, for such datasets method of L-moments and method of TL-moments are proposed to estimate the parameters of the distribution. The results of the simulation study and real dataset are indicated that the estimates of the linear-moments are the least bias than other methods.

TI_43_2

Shao, Xiaofeng

University of Illinois at Urbana Champaign  

Title

 Inference for change points in high dimensional data 

 In this talk, I will  present some recent work on change point testing and estimation for high dimensional data.  In the case of testing for a mean shift, we propose a new test which is based on U-statistics and utilizes the self-normalization principle. Our test targets dense alternatives in the high dimensional setting and  involves no tuning parameters. We show the weak convergence of a sequential U-statistic based process to derive the pivotal limit under the null and also obtain the asymptotic power under the local  alternatives.  Time permitting, we illustrate how our approach can be used in combination with wild binary segmentation to estimate the  number and location of multiple unknown change points. 

TI_42_2

Shay, Garrett Charlie

Brock University 

Title

Probabilistic and non-probabilistic methods of active learning for classification

Active learning is a useful learning process for classification. With a fixed size of training data, an active classifier selects the most beneficial data to learn from and achieves better classification accuracy than a passive classifier. We discuss the methods of developing optimal active learning processes, including both probabilistic and non-probabilistic ones. For a comparison study, we adapt a probabilistic classifier obtained by logistic regression, as well as a non-probabilistic classifier derived from an estimated discriminant function. Performance of proposed active classifiers is investigated under varying conditions and assumptions. Optimal two-stage and sequential active classification has been developed.  Monte Carlo simulations have shown improved classification accuracy of the proposed active learning process compared to passive learning process for all scenarios considered.

TI_33_1

Shi, Peng

University of Wisconsin-Madison 

Title

Regression for Copula-linked Compound Distributions with Applications in Modeling Aggregate Insurance Claims 

In actuarial research, a task of particular interest and importance is to predict the loss cost for individual risks so that informative decisions are made in various insurance operations such as underwriting, ratemaking, and capital management. The loss cost is typically viewed to follow a compound distribution where the summation of the severity variables is stopped by the frequency variable. A challenging issue in modeling such outcome is to accommodate the potential dependence between the number of claims and the size of each individual claim. In this article, we introduce a novel regression framework for compound distributions that uses a copula to accommodate the association between the frequency and the severity variables, and thus allows for arbitrary dependence between the two components. We further show that the new model is very flexible and is easily modified to account for incomplete data due to censoring or truncation. The flexibility of the proposed model is illustrated using both simulated and real data sets. In the analysis of granular claims data from property insurance, we find substantive negative relationship between the number and the size of insurance claims. In addition, we demonstrate that ignoring the frequency-severity association could lead to biased decision-making in insurance operations. 

TI_37_4

Sinha, Sanjoy K.

Carleton University

Title

Joint modeling of longitudinal and time-to-event data with covariates subject to detection limits

In many clinical studies, subjects are measured repeatedly over a fixed period of time. Longitudinal measurements from a given subject are naturally correlated. Linear and generalized linear mixed models are widely used for modeling the dependence among longitudinal outcomes. In addition to the longitudinal data, we often collect time-to-event data (e.g., recurrence time of a tumor) from the subjects. When multiple outcomes are observed from a given subject with a clear dependence among the outcomes, a natural way of analyzing these outcomes and their associations would be the use of a joint model. I will discuss a likelihood approach for jointly analyzing the longitudinal and time-to-event data. The method would be useful for dealing with left-censored covariates often observed in clinical studies due to limits of detection. The finite-sample properties of the proposed estimators will be discussed using results from a Monte Carlo study. An application of the proposed method will be presented using a large clinical dataset of pneumonia patients obtained from the Genetic and Inflammatory Markers of Sepsis (GenIMS) study.

TI_43_4

Sriperumbudur, Bharath

Penn State University 

Title

Approximate Kernel PCA: Computational vs. Statistical Trade-off 

Kernel principal component analysis (KPCA) is a popular non-linear dimensionality reduction technique, which generalizes classical linear PCA by finding functions in a reproducing kernel Hilbert space (RKHS) such that the function evaluation at a random variable X has maximum variance. Despite its popularity, kernel PCA suffers from poor scalability in big data scenarios as it involves solving an x n eigensystem leading to a computational complexity of O(n^3) with n being the number of samples. To address this issue, in this work, we consider a random feature approximation to kernel PCA which requires solving an m x m eigenvalue problem and therefore has a computational complexity of O(m^3), implying that the approximate method is computationally efficient if m<n with m being the number of random features. The goal of this work is to investigate the trade-off between computational and statistical behaviors of approximate KPCA, i.e., whether the computational gain is achieved at the cost of statistical efficiency. We show that the approximate KPCA is both computationally and statistically efficient compared to KPCA in terms of the error associated with reconstructing a kernel function based on its projection onto the corresponding eigenspaces. Depending on the eigenvalue decay behavior of the covariance operator, we show that only n^{2/3} features (polynomial decay) or \sqrt{n} features (exponential decay) are needed to match the statistical performance of KPCA, which means without losing statistically, approximate KPCA has a computational complexity of O(n^2) or O(n^{3/2}) depending on the eigenvalue decay behavior. We also investigate the statistical behavior of approximate KPCA in terms of the convergence of eigenspaces wherein we show that only \sqrt{n} features are required to match the performance of KPCA and if fewer than \sqrt{n} features are used, then approximate KPCA has a worse statistical behavior than that of KPCA. 

TI_7_1

Su, Jianxi

Purdue University 

Title

Full-range tail dependence copulas for modeling dependent insurance and financial data 

Copulas are important tools when it comes to formulating models for multivariate data analysis.  An ideal copula should conform to a wide range of problems at hand by allowing for symmetricity and asymmetricity as well as for varying strengths of tail dependence. The copulas I plan to introduce are exactly such in that they satisfy all the aforementioned criteria. Specifically, in this talk, I shall introduce a class of full-range tail dependence copulas, which have proved to be very useful for modeling dependent financial/insurance data. I shall discuss the key mechanisms for constructing full-range tail dependence copulas and some fundamental properties of these structures.  Future research directions will be also discussed.

TI_19_1

Subha, R. Nair

HHMSPB NSS College for Women

Title

A generalization to the log-Weibull distribution and its applications in cancer research

Through this paper we consider a generalization of a log-transformed version of the inverse Weibull distribution of Keller et al (Reliability Engineering, 1982).The theoretical properties of the distribution are investigated in detail including expressions for its cumulative distribution function, reliability function, hazard rate function, quantile function, characteristic function, raw moments, percentile measures, entropy measures, median, mode etc. Some reliability aspects as well as the distribution and moments of order statistics are also discussed. The maximum likelihood estimation of the parameters of the proposed distribution is attempted and certain applications of the distribution in modelling data sets arising from industrial as well as bio-medical cancer related backgrounds are illustrated using real life examples. Further, the asymptotic behaviour of the estimators are examined with the help of simulated data sets.

TI_45_1

Sun, Ning

Western University

Title

The Pareto Optimal Design for Earthquake Index-based Insurance Based on Exponential Utilities

We obtain a necessary condition for the Pareto optimal earthquake index-based insurance design based on the decomposition of catastrophe risks. Moreover, we derive the explicit form of this Pareto optimal insurance design under the exponential utility assumption. Besides, minimization of the basis risk for this index-based insurance design is also discussed. Finally, we illustrate how a typical design of such an insurance product could be obtained from the observed data using historical economic losses due to earthquakes in mainland China.

TI_17_1

Sung, Chih-Li(Charlie)

Michigan State University 

Title

Exploiting variance reduction potential in local Gaussian process search for large computer experiments

Gaussian process models are commonly used as emulators for computer experiments. However, developing a Gaussian process emulator can be computationally prohibitive when the number of experimental samples is even moderately large. Local Gaussian process approximation (Gramacy and Apley (2015)) was proposed as an accurate and computationally feasible emulation alternative. Constructing local sub-designs specific to predictions at a particular location of interest remains a substantial computational bottleneck to the technique. In this talk, two computationally efficient neighborhood search limiting techniques are introduced, and two examples demonstrate that the proposed methods indeed save substantial computation while retaining emulation accuracy.

TI_13_3

Szabo, Aniko

Medical College of Wisconsin

Title

Semi-parametric Model for Exchangeable Clustered Binary Outcomes 

Dependent or correlated binary data occur in repeated measurement studies, longitudinal experiments, teratological risk assessment, and other important experimental studies. Both parametric and non-parametric models have been proposed for dose-response experiments with such data. In this work we propose semi-parametric models that combine a non- parametric baseline describing the within-cluster dependence structure with a parametric between-group effect. We develop an Expectation Minimization Minorize-Maximize algorithm to fit the model, apply it to several datasets, and compare the semi-parametric estimates of joint probabilities from different dose levels with corresponding GEE and non-parametric estimates. 

TI_36_1

Takemura, Akimichi

Shiga University

Title

Holonomic gradient method for evaluation of multivariate probabilities 

In2011 we developed a new methodology "holonomic gradient method"(HGM), which is useful for  evaluation of probabilities and normalizing constants of probability distributions.  Since then we have applied  HGM to various problems, 

including distribution of roots of Wishart matrices, orthant probabilities and some distributional problems related to wireless communication.  In this talk we give an introduction of HGM and present applications of the method to evaluation of multivariate probabilities. 

TI_18_1

Tomoaki, Imoto

University of Shizuoka 

Title

Bivariate GIT distribution 

In this talk, we propose a bivariate discrete distribution, which is derived from a first passage point of the two dimensional random walk on lattice. This distribution is seen as a convolution of bivariate binomial and negative binomial distributions. Moreover its marginal distributions are also seen as a convolution of univariate binomial and negative binomial distributions and can model both over- and under-dispersion relative to Poisson distribution.  From these properties, the proposed distribution is a flexible model for its dispersion and correlation. The other stochastic processes and operations derived for the proposed distribution are also discussed in this talk.

TI_40_4

Torkashvand, Elaheh

University of Waterloo 

Title

Spatial Dynamical Autocorrelation of fMRI Images 

The concept of dynamical correlation is extended to   functional time series. The dynamical autocorrelation is a measure of functional autocorrelation of a functional time series. The proposed method can be applied  to true, i.e., continuously measured, functional data or possibly to approximated functional data, for example after applying a smoothing step to observations measured in discrete time. An estimator of the dynamical autocorrelation is presented based on the Karhunen-Loève expression of time series. The central limit theorem is applied to obtain the asymptotic distribution of the proposed estimator of the dynamical autocorrelation under the assumption of m-dependency.  

TI_4_1

Vinogradov, Vladimir

Ohio University

Title

On two extensions of Feller-Spitzer class of Bessel densities

We introduce two different extensions of Feller-Spitzer class of Bessel densities. Various properties of members of these classes are derived and compared.

TI_21_1

Wang, Haiying

University of Connecticut

Title

Optimal Subsampling: Sampling with Replacement vs Poisson Sampling 

Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling distributions is an effective approach to improve estimation efficiency. In the context of maximizing a general target function, this paper derives optimal subsampling distributions for both subsampling with replacement and Poisson subsampling. The optimal subsampling distributions minimize functions of the subsampling approximation variances. Furthermore, they provide deep insights on the theoretical difference and similarity between subsampling with replacement and Poisson subsampling. Practically implementable algorithms are proposed based on the optimal structure results, which are evaluate by both theoretical and empirical analysis. 

TI_40_0

Wang, Shan

University of San Francisco

Title

Recent Development in Nonparametric and Semiparametric Techniques​ 

In recent years, semiparametric and nonparametric models have become a popular choice in many areas of statistics since they are more realistic and flexible than parametric models. This invited session focuses on the recent development in these methods and their applications. 

TI_40_1

Wang, Shan

University of San Francisco

Title

Estimation of SEM with MELE approach

In this work, we construct improved estimates of linear functionals of a probability measure with side information using an easy empirical likelihood approach. We allow constraint functions, which determine side information to grow with the sample size and the use of estimated constraint functions. This is the case in applications to the structural equation models. In one case the random errors are modeled to be independent with covariates. In another case, we estimate the model with side information of known marginal medians for observed variable. We report some simulation results on efficiency gain

TI_41_0

Wang, Xia

University of Cincinnati 

Title

Bayesian Modeling of Dependent Non-Gaussian Data 

Dependent non-Gaussian data keep posing new challenges by its rapidly increasing data size and structure complexity. Bayesian perspectives provide feasible and flexible approaches. The session presents the new methods development on Bayesian modeling, computation and model comparison related to semi-continuous data, directional data, intensity data and ordinal data.

TI_41_4

Wang, Xia

University of Cincinnati 

Title

Power Link Functions in Ordinal Regression Models with Gaussian Process Priors 

Link functions and random effects structures are the two important components in building flexible regression models for dependent ordinal data. The power link functions include the commonly used links as special cases but have an additional skewness parameter making the probability response curves adaptive to the data structure. It overcomes the arbitrary symmetry assumption imposed by the commonly used logistic or probit links as well as the fixed skewness in the complementary log-log or log-log links.   By employing Gaussian processes, the regression model can incorporate various dependence structures in the data, such as temporal and spatial correlations.  The full Bayesian estimation of the proposed model is conveniently implemented through Rstan.  Extensive simulation studies are carried out for discussion in model computation, parameterization, and evaluation in terms of estimation bias and overall model performance. The proposed model is applied to the PM2.5 data in Beijing and the Berberis thunbergii abundance data in New England.  The results suggest the proposed model leads to important improvement in estimation and prediction in modeling dependent ordinal response data. 

TI_46_2

Wang, Yueyao

Virginia Tech

Title

Building Degradation Index Using Multivariate Sensory Data with Variable Selection

The modeling and analysis of degradation data have been an active research area in reliability and system health management.  Most of the existing research on degradation modeling assumes that the degradation index is provided. However, there are situations that a degradation index is not available. For example, modern sensor technology allows one to collect multi-channel sensor data that are related to the underlying degradation process, which may not be sufficiently represented by any single channel. Without a degradation index, most existing cannot be applied. Thus, constructing a degradation index is a fundamental step in degradation modeling. In this paper, we develop a general approach for degradation index building based on an additive-nonlinear model with variable selection. The approach is more flexible than a linear combination of sensor signals, and it can automatically select the most informative variables to be used in the degradation index. Maximum likelihood estimation with adaptive group penalty is developed based on training dataset. We use extensive simulations to validate the performance of the developed method. The NASA jet engine sensor dataset is then used for illustrations. The paper is concluded with some discussions and areas for future research. This is joint work with I-Chen Lee and Yili Hong.

TI_26_1

Weerahandi, Samaradasa

X-Techniques, Inc, New York 

Title

Generalized Inference with Application to Business and Clinical Analytics 

In applications, such as the ANOVA under unequal error variances, and Mixed Models, the classical approach can produce only asymptotic tests and confidence intervals for parameters of interest.  This article reviews the notions and methods in Generalized Inference and show how such inferences can be based on exact probability statements. The approach is illustrated by an application concerning Variance Components in Mixed Models having applications in Business and Clinical Analytics. In such problems one may wish to use the Bayesian approach, but in doing so you need a prior. In the absence of a proper prior, Bayesian inferences are highly sensitive to the non-informative prior family, choice of hyper-parameters, and could take days to run models involving large number of parameters, such as that involving consumer response estimation TV ads by County or DMA. The task is easily accomplished by using the BLUP in Mixed Models with parameters tackled by the approach of Generalized Inference. It will also be argued that, the generalized approach can reproduce Parametric Bootstrap inference problems when exist and works even when Parametric Bootstrap approach fails. Moreover, one can reproduce equivalent generalized tests and generalized confidence intervals for any generalized fiducial inference method without having to treat fixed parameters as variables. 

TI_12_2

Womack, Andrew

Indiana University 

Title

Horseshoes, Shape Mixing, and Ultra-sparse Locally Adaptive Shrinkage

Locally adaptive shrinkage in the Bayesian framework provides one method for continuously relaxing discrete selection problems. We present extensions of the Horseshoe prior framework that arise from mixing both the scale and shape parameters from the hierarchical specification of the model. Mixing on the shape parameter provides both better spike and slab behavior as well as a way to model ultra-sparse signals. The reduction in risk comes from a better approximation of the hard thresholding rule that gives rise to discrete selection. As with other local-global priors, these models have non-convex, multimodal posterior distributions. This multi-modality, especially from the infinite spike at the origin, creates issues for fitting the models using out of the box methods like Gibbs samplers or EM algorithms. To address these problems, we implement a new MCMC algorithm that includes mode switching jumps that are akin to doing Stochastic Search Variable Selection for continuous local-global shrinkage models. 

TI_45_2

Wu, Jiang

Central University of Finance and Economics

Title

A Financial Contagion Measure Based on the Maximal Tail Dependence Coefficient for Financial Time Series

A novel financial contagion measure is proposed. Itis based on the maximal tail dependence(MTD) coefficient of the financial time series of returns. Estimators for this contagion measure are provided for popular families of copulas, and a simulation study is employed to analyze the performance of these estimators. Applications are presented to illustrate the use of spatial contagion measures for determining asymmetric linkages in financial markets, and for creating clusters of financial time series. The methodology is also useful for selecting diversified portfolios of asset returns

TI_43_3

Wu, Wenbo

University of Texas

Title

Simultaneous estimation for semi-parametric multi-index models 

Estimation of a general multi-index model comprises determining the number of linear combinations of predictors (structural dimension) that are related to the response, estimating the loadings of each index vector, selecting the active predictors, and estimating the underlying link function. These objectives are often achieved sequentially at different stages of the estimation process. In this study, we propose a unified estimation approach under a semi-parametric model framework to attain these estimation goals simultaneously. The proposed estimation method is more efficient and stable than many existing methods where the estimation error in the structural dimension may propagate to the estimation of the index vectors and variable selection stages. A detailed algorithm is provided to implement the proposed method.  Comprehensive simulations and a real data analysis illustrate the effectiveness of the proposed method. 

TI_20_2

Wu, Yichao

UIC

Title

Nonparametric estimation of multivariate mixtures 

A multivariate mixture model is determined by three elements: the number of components, the mixing proportions and the component distributions. Assuming that the number of components is given and that each mixture component has independent marginal distributions, we propose a non-parametric method to estimate the component distributions. The basic idea is to convert the estimation of component density functions to a problem of estimating the coordinates of the component density functions with respect to a good set of basis functions. Specifically, we construct a set of basis functions by using conditional density functions and try to recover the coordinates of component density functions with respect to this set of basis functions. Furthermore, we show that our estimator for the component density functions is consistent. Numerical studies are used to compare our algorithm with other existing non-parametric methods of estimating component distributions under the assumption of conditionally independent marginals. 

TI_16_2

Xia, Aihua

University of Melbourne

Title

Probability Density Quantiles: Their Divergence from or Convergence to Uniformity 

For each continuous distribution with square-integrable density, there is a probability density quantile (pdQ), which is an absolutely continuous distribution on the unit interval. The pdQ is representative of a location-scale family and carries essential information regarding shape and tail behavior of the family. We demonstrate that questions of convergence and divergence regarding shapes of distributions can be carried out in a location- and scale-free environment via their pdQs. We also establish a map of the Kullback-Leibler divergences from uniformity of these pdQs. Some numerical calculations point to a phenomenon that each application of the pdQ mapping seems to lower the Kullback-Leibler divergence from uniformity and hence we obtain new fixed point theorems for repeated applications of the pdQ mappings. This is a joint work with Robert G. Staudte. 

TI_38_4

Xie, Yanmei

University of Toledo

Title

Analysis of nonignorable missingness in risk factors for hypertension 

The prevention of hypertension is a critical public health challenge across the world.  In the current study, we propose a novel empirical-likelihood-based method to estimate the effect of potential risk factors for hypertension. We adopt a semiparametric perspective on regression analysis with nonignorable missing covariates, which is motivated by the alcohol consumption and blood pressure data from the US National Health and Nutrition Examination Survey. The missingness in alcohol consumption is missing not at random since it is likely to depend largely on alcohol consumption itself. To overcome the difficulty of handling this nonignorable covariate-missing data problem, we propose a unified approach to constructing a system of unbiased estimating equations, which naturally incorporate the incomplete data into the data analysis, making it possible to gain estimation efficiency over complete case analysis. Our analyses demonstrate that increased alcohol consumption per day is significantly associated with increased systolic blood pressure. In addition, having a higher body mass index and being of older age are associated with a significantly higher risk of hypertension.

TI_48_3

Xu, Mengyu

University of Central Florida

Title

Simultaneous Prediction intervals for high-dimensional Vector Autoregressive model

We study the simultaneous prediction intervals for high-dimensional vector autoregressive model. We consider a de-biased calibration for the lasso prediction and propose a Gaussian-multiplier bootstrap based method for one-step ahead prediction. The asymptotic coverage consistency of the prediction interval is obtained. We also develop simulation result to evaluate the finite sample performance of the procedure.

TI_42_0

Xu, Xiaojian

Brock University 

Title

Optimal design, active learning, and efficient statistics for big data  

This session emphasizes the efficient statistical process when dealing with big data. Such efficiency consideration appears at both stages: the stage of optimal and robust designs for data selection (in Talks 1, 2, and 4) and the stage of estimation/predication after data are obtained (Talks 3 and 4).  Our speakers of this session discuss a variety of statistical methods, including probability estimation, quantile regression, optimally weighted least squares, and incomplete U-statistics. 

TI_42_4

Xu, Xiaojian

Brock University 

Title

Robust active learning for approximate linear models 

In this paper, we point out the common nature of active learning in machine learning field and robust experimental designs in statistics field, and present the methods of robust regression designs that can be implemented in a robust active learning process. We consider approximate linear regression models and weighted least squares estimation. Both optimal weighting schemes and robust optimal designs of the training data used for active learning are discussed for various scenarios. An analytical form for robust design density is derived. The simulation results and comparison study using practical examples indicate improved efficiencies. 

TI_14_2

Yanev, George P.

The University of Texas

Title

On Arnold-Villasen ̃or conjectures for characterizing exponential distribution  

Characterizations of the exponential distribution are abundant. Arnold and Villasen ̃or [1] obtained a series of new characterizations based on random samples of size two and conjectured possible generalizations for larger sample size. Extending their techniques, we will prove Arnold and Villasen ̃or’s conjectures for an arbitrary but fixed sample size n. We will discuss results published in [2] as well as more recent findings.  

TI_35_1

Yin, Xiangrong

University of Kentucky 

Title

Moment Kernel for Estimating Central Mean Subspace and Central Subspace 

The T-central subspace, introduced by Luo, Li and Yin (2014), allows one to perform sufficient dimension reduction for any statistical functional of interest. We propose a general estimator using (third) moment kernel to estimate the T-central subspace. In this talk, we particularly focus on central mean subspace via the regression mean function, and central subspace via Fourier transform or slicing.  Theoretical results are established and simulation studies show the advantages of our proposed methods. 

TI_43_0

Yin, Xiangrong

University of Kentucky 

Title

Variable selection and dimension reduction for high-dimension data problems 

Variable selection and dimension reduction are important research topics, especially for high-dimensional data analysis. This session consists of talks in the areas. Dr. Dong’s talk focuses on variable selection on two sets of variables. Dr. Shao’s topic is on the inference for high-dimensional data, while Dr. Wu presents semi-parametric method to estimate multi-dimensions simultaneously, and Dr. Sriperumbudur’s topic is on the studying of kernel PCA, a popular dimension reduction method. 

TI_38_2

Yu, Jihnhee

University of Buffalo

Title

Bayesian empirical likelihood approach to compare quantiles 

Bayes factors, practical tools of applied statistics, have been dealt with extensively in the literature in the context of hypothesis testing. The Bayes factor based on parametric likelihoods can be considered both as a pure Bayesian approach as well as a standard technique for computing P-values for hypothesis testing. We employ empirical likelihood methodology to modify Bayes factor type procedures for the nonparametric setting, establishing asymptotic approximations to the proposed procedures. These approximations are shown to be similar to those of the classical parametric Bayes factor approach. The proposed approach is applied towards developing testing methods involving quantiles, which are commonly used to characterize distributions. We present and evaluate one and two sample distribution free Bayes factor type methods for testing quantiles based on indicators and smooth kernel functions. 

TI_44_1

Yuan, Qingcong

Miami University 

Title

A two-stage variable selection approach in the analysis of metabolomics and microbiome data  

We propose a two-stage variable selection approach to analyze a mice data. Mice under different health conditions (obese or not) and different exposure levels to biodiesel ultrafine particles (UFPs) are considered. Their metabolomics and microbiome information are also recorded. We first did a sure variable screening on the metabolites and microbial species data respectively, then used Bayesian lasso to get a variable selection set. Multivariate analysis methods are then applied on the resulting dataset. The study focuses on the effects of UFPs exposure to gut microbial composition and functions, then evaluates the impact of UFPs to obese host health.  

TI_4_3

Yuanqing Zhang

Shanghai University of International Business and Economics

Title

Inference for Partially Linear Additive Higher Order Spatial Autoregressive Model with Spatial Autoregressive Error and unknown Heteroskedasticity

This article extends spatial autoregressive model with spatial autoregressive disturbances (SARAR(1,1)) which is the most popular spatial econometric model to the case of an arbitrary finite number of nonparametric additive terms and spatial autoregressive models with spatial autoregressive disturbances of arbitrary finite order (SARAR(R,S)). We propose a sieve two stage least squares (S2SLS) regression and generalized method of moments (GMM) procedure of the high order spatial autoregressive parameters of the disturbance process. Under some sufficient conditions, we show that the proposed estimator for the finite dimensional parameter is √n consistent and asymptotically normally distributed.

TI_24_3

Zeitler, David

Grand Valley State University 

Title

Rank Based Estimation With Skew Normal Error Distributions Using Big Data Sets 

Skew normal distributions are a generalization of the normal distribution adding a parameter controlling the direction and magnitude of asymmetry. We will address a rank based algorithm to fit linear models with skew normal errors on big data sets using distributed computation with limited inter-process communication. Distributed computation may include multiple core as well as clustered hardware resources. Both theoretical development and a simulation demonstration using R will be discussed. 

TI_13_1

Zelterman, Dan

Yale University

Title

Distributions for Exchangeable p-Values under an unspecified Alternative Hypothesis 

A typical biomarker study may result in many p-values testing multiple hypotheses.   Several methods have been proposed to adjust for multiple comparisons without exceeding the false discovery rate (FDR).  Under an unspecified alternative hypothesis, we propose a marginal distribution for p-values whose joint distribution facilitates the description of exchangeable p-values.  This model is used to describe the behavior of the number of statistically significant findings under Simes’ (1986, Biometrika) rule controlling FDR.   We apply our model to a published biomarker study in which no statistically significant finding were observed by the authors, and provide new power analyses for the study. 

 

TI_28_3

Zhang, Cheng

Medstar Cardiovascular Research Network

Title

Novel Post-randomization Methods for Controlling Identity Disclosure and Preserving Data Utility

Even when direct identifiers such as name and social security number are removed, identity disclosure of a survey unit in a data set is possible via matching demographic variables whose values are easily known from other sources. So, data agencies need to release a perturbed version of survey data. Ideally, a perturbation mechanism should protect individual’s identity while preserving inferences about the population. For categorical key variables, we propose a novel approach to measuring identification risk for setting strict disclosure control goals. Specifically, we suggest to ensure that the probability of identifying any survey unit is at most a given value ξ. We develop an unbiased post-randomization method that achieves this goal with little data quality loss.

TI_44_0

Zhang, Jing

Miami University 

Title

New Explorations for High-Dimensional Big Data Analysis 

Standard statistical methods are no longer computationally efficient or feasible in the analysis of high dimensional big data analysis.  This session collects ideas of variable selection/dimension reduction / predictive modeling, exploring how to pick up the true "signals" among many noises and how to work with the volume of data. 

TI_44_2

Zhang, Jing

Miami University 

Title

A “Split and Resample” Approach in Big Data Analysis   

Big data are massive in volume, intensity and complexity. Analysis of big data requires: picking up the true "signals" among lots of noises, and handling the volume of data. We introduce a "split and subsampling" algorithm that handles both variable selection and prediction for high dimensional big data. Simulation studies are conducted to show that the proposed algorithm is robust to multicolinearity among the predictors in both linear and generalized linear models, selecting the signal variables with better sensitivity and specificity, and achieving better prediction with lower MSPE values. 

TI_20_4

Zhang, Lingsong

Purdue University

Title

On the analysis of data that lies in the cone 

Complex data arise more often in applications such as images, genomics and many others. Traditional data were analyzed based on theoretical assumptions of data lie in Euclidean space. Recent years many new data types are within restricted space or sets, and require a new set of theory and methodology to analyze it. In this talk, we will focus on two types of data that lies in cones, and propose a generalized principal component type of tools to reveal underlying structure (or hidden factors) within such data. The approach naturally forms nested structure and thus is suitable for future investigation of optimal dimension. Application of this method such as diffusion tensor images will be shown in this talk as well. 

TI_28_1

Zhang, Linjun

Rutgers University 

Title

The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy 

With the unprecedented availability of datasets containing personal information, there are increasing concerns that statistical analysis of such datasets may compromise individual privacy. These concerns give rise to statistical methods that provide privacy guarantees at the cost of some statistical accuracy. A fundamental question is: to satisfy certain desired level of privacy, what is the best statistical accuracy one can achieve?  Standard statistical methods fail to yield sharp results, and new technical tools are called for. In this talk, I will present a general lower bound argument to investigate the tradeoff between statistical accuracy and privacy, with application to three problems: mean estimation, linear regression and classification, in both the classical low-dimensional and modern high-dimensional settings. For these statistical problems, we also design computationally efficient algorithms that match the minimax lower bound under the privacy constraints. Finally I will show the applications of those privacy-preserving algorithms to real data such as SNPs containing sensitive information, for which privacy-preserving statistical methods are necessary. 

TI_21_3

Zhang, Teng

University of Central Florida

Title

Robust PCA by Manifold Optimization 

Robust PCA is a widely used statistical procedure to recover an underlying low-rank matrix with grossly corrupted observations. This work considers the problem of robust PCA as a nonconvex optimization problem on the manifold of low-rank matrices, and proposes two algorithms (for two versions of retractions) based on manifold optimization. It is shown that, with a proper designed initialization, the proposed algorithms are guaranteed to converge to the underlying low-rank matrix linearly. Compared with a previous work based on the Burer-Monterio decomposition of low-rank matrices, the proposed algorithms reduce the dependence on the conditional number of the underlying low-rank matrix theoretically. Simulations and real data examples confirm the competitive performance of our method. 

TI_35_2

Zhang, Wei

University of Arkansas at Little Rock 

Title

Imputation of Missing Data in the State Inpatient Databases 

Eliminating healthcare disparities so underserved are assured access to quality medical care remains a national priority. Large, population based studies necessary to address healthcare disparities can be costly and difficult to perform. An efficient alternative that is becoming increasingly attractive is the use of the State Inpatient Databases. This study aimed at identifying appropriate imputation methods for SID and applying the imputed data sets for healthcare disparities research. We compared six imputation methods for missing data (i.e., complete case analysis, mean imputation, marginal draw method, hot deck imputation, joint multiple imputation (MI), conditional MI) through a novel simulation.

TI_40_3

Zhao, Wei

Indiana University Purdue University Indianapolis 

Title

Optimal Sampling Distributions for Generalized Linear Models  

One of the popular approaches to dealing with large sample data is subsampling, that is, a small portion of the full data set is subsampled with certain weights and used as a surrogate for the subsequent computation and simulation. The crucial part of the method of subsampling is constructing the sampling weights. In this paper, we propose A-optimal sampling distributions after investigating the consistency and asymptotic normality of the subsample estimator to the maximum likelihood estimator in generalized linear models. A two-step algorithm is proposed to approximate the A-optimal subsampling estimator. Simulation results show that our subsampling method outperforms the other subsampling methods with a smaller mean square error of estimation.   

TI_42_3

Zheng, Wei

The University of Tennessee

Title

Incomplete U-statistic based on division and orthogonal array 

U-statistic is an important class of statistics. Unfortunately, its computation easily becomes impractical as the data size $n$ increases. Particularly, the number of combinations, say $m$, that a U-statistic of order $d$ has to evaluate is in the order of $O(n^d)$. Many efforts have been made to approximate the original U-statistic by a small subset of the combinations in history since Blom (1976), who has coined such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require $m$ to grow at least faster than $n$, albeit much slower than $n^d$, in order for the corresponding incomplete U-statistic to be asymptotically efficient in the sense of mean squared error. In this paper, we introduce a new type of incomplete U-statistics, which can be asymptotically efficient even when $m$ grows slower than $n$. In some cases, $m$ is only required to grow faster than $\sqrt{n}$. The results are also extended to the degenerate case and the multi-sample case. 

TI_38_3

Zhong, Ping-Shou

The University of Illinois at Chicago

Title

Order-restricted inference for means with missing values 

Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. 

TI_45_0

Zitikis, Ricardas

Western University

Title

Risk Measures: Theory, Inference, and Applications

 

TI_45_3

Zitikis, Ricardas

Western University

Title

Gini Shortfall: ACoherent RiskMeasure

For quite some time, the value-at-risk (VaR) was an appealing risk measure, and even anindustry and regulatory standard for calculating risk capital in banking and insurance. The VaR isstill a standard, though criticized in many theoretical and empirical works. In this context, theexpected shortfall (ES) has been a remarkable innovation that rewards diversification andcaptures the magnitude of tail risk. But what about tail variability? The coherentrisk measure,called the Gini shortfall (GS), takes care of both the magnitude and the variability of tail risk,thus providing a much-needed missing piece in the encompassing risk-measurement puzzle. Inthis talk, we shall discuss various aspects of theGS, including its origins, properties, andstatistical inference. 

 

 

Abstracts for General-Invited Speakers (Alphabetic Order)

 

G_1_1

Abujarad, Mohammed H.A.

Aligarh Muslim University

Title

Bayesian Survival Analysis of Topp-Leone Generalized Family with Stan

In this article, the discussion has been carried out on the generalization of three distribution by means of exponential, exponentiated exponential and exponentiated extension. We set up three and four parameters life model called the Topp-Leone exponential distribution, Topp-Leone exponentiated exponential distribution and Topp-Leone exponentiated extension distribution. We give extensive consequence of the, survival function and hazard rate function. To fit this model as survival model and hazard rate function we adopted to use Bayesian approach. A real survival data set is used to illustrate. application is done by R and Stan and suitable illustrations are prepared. R and Stan codes have been given to actualize censoring mechanism via optimization and also simulation tools.

G_2_1

Ahmed, Bilal Peer

Islamic University of Science & Technology, Awantipora, Pulwama (J&K), India

Title

Inflated Size-Biased Modified Power Series Distributions and its Applications

In this paper, an Inflated Size-biased Modified Power Series Distributions (ISBMPSD), where inflation occurs at any of the support points is studied. This class include among others the size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution as its particular cases. We obtain the recurrence relations among ordinary, central and factorial moments. The maximum likelihood and Bayesian estimation of the parameters of the Inflated Size-biased MPSD is obtained. As special cases, results are extracted for size-biased generalized Poisson distribution, size-biased generalized negative binomial distribution and size-biased generalized logarithmic series distribution. Finally, an example is presented for the size-biased generalized Poisson distribution to illustrate the results and a goodness of fit test is done using the maximum likelihood and Bayes estimators.

G_6_4

Bulut, Murat

Osmangazi University, Turkey

Title

Robust Logistic Regression based on Liu estimator

In this study, we propose a new estimator in logistic regression to handle multicollinearity and outlier problems simultaneously. There are some biased estimators proposed for the solution of the multicollinearity problem. Also, there are some studies to cope with the outlier problems. But there are only a few amount of studies in the literature when there exist the multicollinearity and outlier problems at the same time in the logistic model. In this study, we introduce a robust logistic estimator based on Liu estimator. We compare the proposed estimator with some other existing estimators by means of a simulation study.

G_5_1

Feng, Yaqin

Ohio University 

Title

Stability and instability of steady states for a branching random walk

We consider the time evolution of a lattice branching random walk with local perturbations. Under certain conditions, we prove the Carleman type bound on the moment growth  of a particle subpopulation number and show the existence of a steady state.

G_5_3

Lazar, Drew

Ball State University

Title

Robust and scalable optimization on manifolds 

In this talk a robust and scalable procedure for estimation on classes of manifolds that generalizes the classical idea of “median of means” estimation is proposed. This procedure is motivated by statistical inference problems in data science which can be cast as optimization problems over manifolds. A key lemma that characterizes a property of the geometric median on manifolds is shown. This lemma allows the formulation of bounds on an estimator which aggregates subset estimators by taking their geometric median. Robustness and scalability of the procedure is illustrated in numerical examples on both simulated and real data sets.

G_1_3

Louzada-Neto, Francisco

ICMC, University of Sao Paulo

Title

Efficient Closed-Form MAP Estimators for Some Survival Distributions and Their Applications to Embedded Systems 

In this paper, we propose maximum a posteriori (MAP) estimators for the parameters of some survival distributions, which have a simple closed-form expression. In principle, we focus on the Nakagami distribution, which plays an essential role in communication engineering problems, particularly to model fading of radio signals. Moreover, we show that the obtained results can be extended to other survival probability distributions, such as the gamma and generalized gamma ones. Numerical results reveal that the MAP estimators outperform the existing estimators and produce almost unbiased estimates even for small sample sizes. Our applications are driven by embedded systems, which are commonly used in communication engineering. Particularly, they can consist of an electronic system inside a microcontroller, which can be programmed to maintain communication between a transmitting antenna and mobile antennas, which are operating at the same frequency.  In this context, from the statistical point of view, closed-form estimators are needed, since they are embedded in mobile devices and need to be sequentially recalculated at real time.

G_6_1

McTague, Jaclyn

LogEcal Analytics

Title

Repeated Significance Testing of Normal Variables with Unknown Variance

In clinical trials when data is accumulated over time, sequential hypothesis testing requires control of type-1 error. It is typically assumed that the sample sizes are large so that, even with an unknown variance, the test statistics are approximately normal. This leads to the reliance on the multivariate normal distribution to calculate the critical values.  We develop the exact joint distribution of the test statistics for any sample size and provide critical values that ensure type-1 error control. We introduce an efficient numerical method that works for any number of tests commonly encountered in the so-called group sequential clinical trials.

G_6_3

Mesbah, Mounir

Sorbonne University

Title

Current statistical issues in HRQoL research: Testing local independence in latent variable models

In this talk, I will give a quick overview about the current research in Health Related Quality of Life (HRQoL) research. I will focus on few important challenging statistical issues, occurring when latent models are used. Local independence is a strong assumption of such models that needs to be checked. I will make the bibliographical point on the psychometrics literature on the subject which deals mainly with effect of local dependence on the inference of the parameters, and its detection.  I will discuss the challenging theoretical and computational issues and present recent simulation results and application to real data sets.

G_1_2

Mynbaev, Kairat

International School of Economics, Kazakh-British Technical University

Title

Nonparametric kernel estimation of unrestricted distributions 

We consider nonparametric estimation of an unrestricted distribution F in that it may, or may not, be absolutely continuous. Three problems are considered: estimation of F(x) at a continuity point x, estimation of F(y)-F(x), where x and y are continuity points and estimation of jumps of F. Contrary to the extant literature, we make no restriction on the existence or smoothness of the derivatives of F. The key insight for our result is the use of Lebesgue-Stieltjes integrals. The method is also applied to inversion theorems for characteristic functions, where we provide explicit estimates for convergence rates.

G_2_2

Odhiambo, Collins

Strathmore University

Title

Extended version of Zero-inflated Negative Binomial Distribution with Application to HIV Exposed Infant Count Data

Routine HIV exposed infants (HEI) data collected shows many HIV positive zeros due to prevention of mother-to-child transmission (PMTCT) policy. However, implementation of PMTCT differs and results to structured zero for HEI positive numbers (optimum PMTCT) and non-structured zero (sub-optimum PMTCT). Hence standard zero-inflated models may not be appropriate. We seek to extend the zero-inflated Negative Binomial (ZINB) model by incorporating variable α. Extensive simulations were conducted by varying α, dispersion and sample size and results compared using BC. HEI data sampled from six high HIV burden counties in Kenya was applied to the model and yielded better performance.

G_2_3

Ogawa, Mitsunori

University of Tokyo

Title

Parameter estimation for discrete exponential families under the presence of nuisance parameters

The parameter estimation problem for discrete exponential family models is discussed under the presence of nuisance parameters.  Maximizing the conditional likelihood usually yields an estimator with statistically nice properties.  However, the computation of its normalization constant often prevents its practical use.  In this talk, we derive a class of computationally tractable estimators for such a situation based on the framework of composite local Bregman divergence with simultaneous use of tools from algebraic statistics.

G_2_4

Peng, Jie

St. Ambrose University

Title

Improved Prediction Intervals for Discrete Distributions

The problem of predicting a future outcome based on the past and currently available samples arises in many applications. Applications of prediction intervals (PIs) based on continuous distributions are well-known. Compared to continuous distributions, results on constructing PIs for discrete distributions are very limited. The problems of constructing prediction intervals for the binomial, Poisson and negative binomial distributions are considered here. Available approximate, exact and conditional methods for these distributions are reviewed and compared. Simple approximate prediction intervals based on the joint distribution of the past samples and the future sample are proposed. Exact coverage studies and expected widths of prediction intervals show that the new prediction intervals are comparable to or better than the available ones in most cases.

G_5_2

Sepanski, Jungsywan

Central Michigan University

Title

Constructing Bivariate Copulas with Distributional Distortions

Distortion of existing copulas provides a way to construct new copulas. We propose distributional distortions that are distribution functions with support on the unit interval. Specifically, the distortion considered in this presentation is the distribution of a unit-Burr random variable formed by the exponential transformation of a negative Burr random variable. The induced new copulas include the well-known BB1, BB2 and BB4 copulas as special cases. The dependence properties and relationships between the base bivariate copula and the induced copula in tail dependence coefficients and tail orders are studied.  The unit-Burr distortion of existing bivariate copulas may result in copulas that allow a maximum range of dependence and permit both lower and upper tail coefficients.  Contour plots and numerical results are also presented. 

G_5_4

Smith, Scott

University of the Incarnate Word 

Title

A Generalization of the Farlie-Gumbel-Morgenstern and Ali-Mikhail-Haq Copulas

An important aspect of modeling bivariate relationships is the choice of underlying copula. One-parameter copulas may be too restrictive to provide adequate fit. We present a two-parameter copula which possesses the Farlie-Gumbel-Morgenstern and Ali-Mikhail-Haq copulas as special cases. We then discuss dependence properties and simulation. Finally, we use the new copula to model two data sets and compare the fit to that of the FGM and AMH copulas.

G_6_2

Wang, Dongliang

SUNY Upstate Medical University

Title

Empirical likelihood inference for Kolmogorov-Smirnov test given censored data 

"Kolmogorov-Smirnov test is commonly used for comparing two distributions and may be particularly valuable for censored data since the K-S test statistic can be interpreted as the maximum survival difference. In this work, the smoothed empirical likelihood (SEL) is developed for the K-S statistic given censored data with desirable asymptotic properties. The developed results not only lead to a new test procedure, but also a reliable interval estimator for maximum survival difference. The SEL method is evaluated by empirical simulations in terms of the coverage probability of the interval estimator, and illustrated via applying to a real life dataset.

 

 

Abstracts for Student Posters

(Alphabetically Ordered)

 

P-01

Amponsah, Charles

Univ of Nevada , Reno

Title

A Bivariate Gamma Mixture Discrete Pareto Distribution

We propose a new stochastic model describing the joint distribution of (X, N), where N has a heavy-tail discrete Pareto distribution while X is the sum of N independent gamma random variables. We present main properties of this distribution, which include marginal and conditional distributions, moments, representations, and parameter estimation. An example from finance illustrates modeling potential of this new mixed bivariate distribution.

P-02

Ash, Jeremy

North Carolina State University

Title

Confidence band estimation methods for accumulation curves at extremely small fractions with applications to drug discovery

Accumulation curves are used to assess the effectiveness of ranking algorithms. Items are ranked according to the algorithm's belief that they possess some desired feature, then items are tested according to relative rank. In a typical virtual screen in drug discovery, millions of chemicals are screened, while only tens of chemicals are tested. We propose modifications to previously developed confidence band estimation methods that have good coverage probabilities and expected widths under these conditions in simulation.  We also perform power analyses to determine whether accumulation curves or other lift curves are better for detecting significant differences between ranking algorithms.

P-03

Cho, Min Ho

The Ohio State University

Title

Aggregated Pairwise Classification of Statistical Shapes

The classification of shapes is of great interest in diverse areas. Statistical shape data have two main properties: (i) shapes are inherently infinite dimensional with strong dependence among the position of nearby points; (ii) shape space is not Euclidean, but is fundamentally curved. To accommodate these features, we work with the square root velocity function, pass to tangent spaces of the manifold of shapes at different projection points, and use principal components within these tangent spaces. We illustrate the impact of the projection point and choice of subspace on the misclassification rate with a novel method of combining pairwise classifiers.

P-04

Damarjian, Hanna

Purdue University Northwest

Title

On the Transmuted Exponential Pareto Distribution

There has been a growing interest in developing statistical distributions that are capable to model various data.  The purpose of this research project is to construct a new model that will portray strong flexibility for various types of data.  This new model will be called the Transmuted Exponential Pareto (TEP) Distribution.  Several lifetime distributions are embedded in this distribution. We provide various mathematical characteristics including the parameter estimation methods and simulation.  Finally, the importance and flexibility of the proposed model will be illustrated by means of some real-life data analysis.

P-05

Das, Manjari

Carnegie Mellon University

Title

Efficient nonparametric estimation of population size from incomplete lists

Estimation of total population size using incomplete lists has long been an important problem across many biological and social sciences. For example, partial, overlapping lists of casualties in the Syrian war by multiple organizations, are of great importance to estimate the magnitude of destruction. Earlier approaches have either used strong parametric assumptions or suboptimal nonparametric techniques which can lead to bias via model misspecification and smoothing. Assuming conditional independence of two list, we derive a nonparametric efficiency bound for estimating the capture probability and construct a bias-corrected estimator. We apply our methods to estimate HIV prevalence in the Alameda-County, California

P-06

Farazi, Md Manzur Rahman

Marquette University

Title

Feature Selection for a Predictive Model using Machine Learning Techniques on Mosquito’s Spectral Data

Mosquitoes’ age is a key indicator to understand the capability of the mosquito to spread diseases and to evaluate the effectiveness of mosquito control interventions. Traditional methods of estimating age via dissection are expensive and require skill personnel. Near-Infrared (NIR) spectroscopy that measures the amount of light absorbed by mosquitoes’ head or thorax, are used as non-invasive method to estimate age. Standard methods do not consider the physiological changes mosquitos go through as they age. We propose a change-point model to estimate age from spectra using PLSR model. The change-point PLSR model performs better in age estimation of the mosquitoes.

P-07

Galarza, Christian

State University of Campinas

Title

On moments of folded and truncated multivariate extended skew-normal distributions

"Following Kan & Robotti (2017), this paper develops recurrence relations for integrals that involve the density of multivariate extended skew-normal distributions, which includes the well-known skew-normal distribution introduced by Azzalini & Dalla-Valle (1996) and the popular multivariate normal distribution. These recursions offer fast computation of arbitrary order product moments of truncated multivariate extended skew-normal and folded multivariate extended skew-normal distributions with the product moments of the multivariate truncated skew-normal, folded skew-normal, truncated multivariate normal and folded normal distributions as a byproduct. Finally, from the application point of view, these moments open the way to propose analytical expressions on the E-step of the Expectation-Maximization (EM) algorithm for complex data, such as, asymmetric longitudinal data with censored and/or missing observations. These new methods are provided to practitioners in the R MomTrunc package, an efficient R library incorporating C++ and FORTRAN subroutines through Rcpp."

P-08

George, Tyler

Central Michigan University

Title

Lack-of-fit Testing Without Replicates Available

A new technique for testing the lack-of-fit (LOF) in a linear regression model when replicates are not available was developed. Most applications result in data that does not contain replicates in its predictors. The classical lack of test found in most linear regression textbooks is not applicable. Many current solutions use close points as ``pseudo" replicates but close is not well defined. Presented in this paper is a more generalized and robust methodology, for testing LOF using a new grouping procedure. Power simulations are used as a comparison of the new test against previous test's for various alternative models.

P-09

Goward, Kenneth

Central Michigan University

Title

A New Generalized Inverse Gaussian Distribution with Bayesian Estimators

A four-parameter family of transformed inverse Gaussian (TIG) distribution is described. A three-parameter family derived from the four-parameter TIG family is considered, with a specific new distribution referred to as the Generalized inverse Gaussian (GIG) distribution being considered. Two different versions of this distribution are provided and computational and theoretical advantages of one over the other are discussed. Maximum likelihood techniques are discussed alongside Bayesian approaches with Jeffreys-type priors for parameter estimation. A simulation study was conducted and results from the Bayesian approach and approximations to the maximum likelihood estimators were analyzed using the Kolmogorov-Smirnov test. The applicability of this distribution is considered on a real world data set.

P-10

Ihtisham, Shumaila

Islamia College, Peshawar, Pakistan

Title

Alpha Power Inverse Pareto Distribution and its Properties

In this study, a new distribution referred to as Alpha-Power Inverse Pareto distribution is introduced by including an extra parameter. Several properties of the proposed distribution are obtained including moment generating function, quantiles, entropies, order statistics, mean residual life function and stochastic ordering. Method of maximum likelihood is used to find estimates of the parameters. Two real datasets are considered to examine the usefulness of the proposed distribution.

P-11

Ijaz, Muhammad

University of Peshawar Pakistan

Title

A New Family of Distributions with Applications

In this paper, the main goal is to introduce a new family of distributions. Generally, the proposed family is this paper is called a new alpha power transformed family (NAPT) of distributions. On the basis of the proposed family of distributions, we have fitted the CDF of the exponential distribution and called it new alpha power transformed exponential distribution (NAPTE). Some of their statistical properties are discussed, including mean residual life, quantile function, skewness, and kurtosis. The graphical representation is also elaborated for various values of parameters while plotting the hazard rate function and probability density function. The parameters are estimated by means of maximum likelihood estimation. Furthermore, the paper also presents the simulation study. To, illustrate the usefulness of new family of distributions two real-life data sets were used. The comparison is made on the basis of goodness of fit criteria’s including Akaike Information criterion, Consistent Akaike Information criterion, and some others. The results have been observed that the new alpha power transformed exponential distribution is more flexible as compared to other existing distributions for these two data sets under study.

P-12

Lee, Joo Chul

University of Connecticut

Title

Online Updating Method to Correct for Measurement Error in Big Data Streams

When huge amounts of data arrive in streams, online updating is an important method to alleviate both computational and data storage issues. This paper extends the scope of previous research for online updating in the context of the classical linear measurement error model. In the case where some covariates are unknowingly measured with error at the beginning of the stream, but then are measured without error after a particular point along the data stream, the updated estimators ignoring the measurement error are biased for the true parameters. We propose a method to correct the bias of the estimators, as well as correct their variances, once the covariates measured without error are first observed; after correction, the traditional online updating method can then proceed as usual. We further derive the asymptotic distributions for the corrected and updated estimators. We provide simulation studies and a real data analysis with the Airline on-time data to illustrate the performance of our proposed method.

P-13

Lun, Zhixin

Oakland University

Title

Simulating from Skewed Multivariate Distributions: The Cases of Lomax, Mardia’s Pareto (Type 1), Logistic, Burr and F Distributions

Convenient and easy to use programs are available to simulate data from several common multivariate distributions (e.g. normal, t). However, functions for directly generating data from other less common multivariate distributions are not as readily available. We will illustrate how to generate random numbers from multivariate Lomax (a flexible family of skewed multivariate distribution). Further, multivariate cases of Mardia’s Pareto of type I, Logistic, Burr, and F can be also considered easily by applying the useful properties of multivariate Lomax distribution. This work provides a useful tool for practitioners when they need to simulate skewed multivariate distribution for various studies.

P-14

Matuk, James

The Ohio State University

Title

Function Estimation through Phase and Amplitude Separation

An important task in functional data analysis is to estimate functional observations based on sparse and noisy observations on a time interval.  To address this problem, we define a Bayesian model that can fit individual functions on a per subject basis, as well as multiple functions simultaneously by borrowing information across subjects.  A distinguishing property of this work is that our model considers amplitude and phase variabilities separately which describe y-axis and x-axis variability, respectively. We validate the proposed framework using multiple simulated examples as well as real data including ECG signals and measurements from Diffusion Tensor Imaging.

P-15

Maxwell, Obubu

Nnamdi Azikiwe University Awka

Title

The Kumaraswamy Inverse Lomax Distribution (K-IL): Properties and Applications

For the first time, the Kumaraswamy Inverse Lomax distribution is introduced, and studied. Some of its basic statistical properties were investigated in minute details, including explicit expressions for the survival function, failure rate, reversed hazard, odds ratio, order statistics, moments, quantile and median. The model parameters were estimated using the maximum likelihood estimation method. Real - life applications were provided and the K-IL distribution offers better fits. Performance was assessed on the basis of the distributions log-likelihood and Akaike information criteria (AIC).

P-16

May, Paul

South Dakota State University

Title

Multiresolution Techniques for High Precision Agriculture

High Precision Agriculture is the use of data to observe and respond to variations in crop fields on both a macroscopic and granular level. Remote sensing techniques have created a wealth of data, but the size of these data sets leads to computational challenges. This has historically forced the use of less computationally expensive, but also less accurate methods. Recent development of multiresolution approximations for spatial covariance structures (Katzfuss 2015, Sang Huang, 2011) allow for the use of GLS and Kriging on very large data sets to make inferences that farmers can turn into profitable actions.

P-17

Melchert, Bryan

Purdue University Fort Wayne

Title

Forecasting Migration Timing of Sockeye Salmon to Bristol Bay, AK

Arrival of Sockeye Salmon (Oncorhynchus nerka) to the Bristol Bay river system of Alaska is notoriously compact, with about 75% of the annual run arriving within 4 weeks. This research seeks to leverage increased data access and modern statistical learning methods to generate an accurate migration timing forecast with potential of annual reproduction, which currently does not exist for the fishery. Included topics are dimensionality reduction, general additive modeling with time series data, gradient boosting methods, and model validation. 

P-18

Mohammed, Mohanad

University of KwaZulu-Natal, Pietermaritzburg, South Africa

Title

Using stacking ensemble for microarray-based cancer classification

Microarray technology has produced a massive amount of gene expression data. This data can be used efficiently for classification that facilitates disease diagnosis and prognosis. There are many computational methods that are utilized for cancer classification using these gene expression data. Artificial neural networks (ANN), support vector machines (SVM), and random forests (RF) are among the most successful methods for classifying tumors. Recent research shows that combining many classifiers can yield better results than using one classifier. In this paper, we used stacking ensemble to combine different classifiers, namely, ANN, SVM, RF, naive Bayes (NB), and k-nearest neighbors (KNN) for microarray-based cancer classification. Results show that stacking ensemble performed better in terms of accuracy, kappa coefficient, sensitivity, specificity, area under the curve (AUC), and receiver operating characteristic (ROC) curve, when applied to publicly available microarray data.

P-19

Ordoñez, José Alejandro

Campinas State University

Title

Objective Bayesian Analysis for the Spatial Student t Regression model

We develop an objective Bayesian analysis for the Spatial Student t regression model with unknown degrees of freedom based on the reference prior method. As the degrees of freedom, the spatial parameter it is typically difficult to elicitate:  The propriety of the posterior distribution is not always guaranteed, whereas proper prior distributions may dominate the analysis. We show that the Bayesian prior analysis using this method yield to a proper posterior distribution and we use it to develop model selection and prediction. Finally, we assess the performance of the method through simulation and illustrate it using a real data application.

P-20

Saha, Dheeman

University of New Mexico

Title

Sparse Bayesian Envelope

Due to the complexity of high dimensional datasets, it is difficult to evaluate them efficiently. However, using a Bayesian framework for dimension reduction and variable selection techniques can help to identify the material and immaterial parts. This, in turn, leads to improved efficiency in the estimation of the regression coefficients. In this work, we combined the idea of dimension reduction with Spike-and-Slab variable selection and proposed a Bayesian sparse Envelope method. In addition, to that, since the true structural dimension of the Envelope is unknown, we used Reversible Jump Markov Chain Monte Carlo to draw samples from the posterior distribution.

P-21

Shen, Luyi

University of Notre Dame

Title

Bayesian community detection for weighted sparse networks using mixture of SBM model

We propose a novel mixture of stochastic block model for community detection in weighted networks. Our model allows modeling the sparsity of network and performing community detection simultaneously by cleverly combining the spike and slab prior with a stochastic block model. A Chinese restaurant process prior is used for modeling the random partition of the model which does require the number of community to be known as a priori. Another appealing feature of our model is that it allows the sparsity level or the network to vary across communities. That is, the sparsity informational the network is incorporated for community detection. Efficient MCMC algorithms are derived for sampling the posterior distribution for inference and our model and algorithms were demonstrated using both simulated and teal data sets.

P-22

Shubhadeep, Chakraborty

Texas A&M University

Title

A New Framework for Distance and Kernel-based Metrics in High Dimensions

The paper presents new metrics to quantify and test for (i) the equality of distributions and (ii) the independence between two high-dimensional random vectors. We show that the energy distance based on the usual Euclidean distance cannot completely characterize the homogeneity of two high-dimensional distributions in the sense that it only detects the equality of means and the traces of covariance matrices in the high-dimensional setup. We propose a new class of metrics which inherit the desirable properties of the energy distance and maximum mean discrepancy/(generalized) distance covariance and the Hilbert-Schmidt Independence Criterion in the low-dimensional setting and is capable of detecting the homogeneity of/completely characterizing independence between the low-dimensional marginal distributions in the high dimensional setup. We further propose t-tests based on the new metrics to perform high-dimensional two-sample testing/independence testing and study their asymptotic behavior under both high dimension low sample size (HDLSS) and high dimension medium sample size (HDMSS) setups. The computational complexity of the t-tests only grows linearly with the dimension and thus is scalable to very high dimensional data. We demonstrate the superior power behavior of the proposed tests for homogeneity of distributions and independence via both simulated and real datasets.

P-23

Soale, Abdul-Nasah

Temple University

Title

On expectile-assisted inverse regression estimation for sufficient dimension reduction

Sufficient dimension reduction (SDR) has become an important tool for multivariate analysis. Among the existing SDR methods in the literature, sliced inverse regression, sliced average variance estimation, and directional regression are popular due to their estimation accuracy and easy implementation. However, these estimators all rely on slicing the response, and may not work well under heteroscedasticity. To improve these estimators, we propose to first estimate the conditional expectile of the response given the predictor and then perform inverse regression based on slicing the expectile. The superior performances of the new estimators are demonstrated through numerical studies and real data analysis.

P-24

Wang, Yang

The university of Alabama

Title

On variable selection in matrix mixture modeling

Finite mixture models are widely used for cluster analysis, including clustering matrix data. Nowadays, high-dimensional matrix observations arise in many fields. It is known that irrelevant variables can severely affect the performance of clustering procedures. Therefore, it is important to develop algorithms capable of excluding irrelevant variables and focusing on informative attributes in order to achieve good clustering results. Several variable selection approaches have been proposed in the multivariate framework. We introduce and study a variable selection procedure that can be applied in the matrix-variate context. The methodological developments are supported by several simulation studies and application to real-life dataset.

P-25

Wang, Runmin

University of Illinois at Urbana-Champaign

Title

Self-Normalization for High Dimensional Time Series

Self-normalization has attracted considerable attention in the recent literature of time series analysis but its scope of applicability has been limited to low/fixed dimensional parameter for low dimensional time series. In this article, we propose a new formulation of self-normalization for the inference of the mean of high dimensional stationary processes. Our original test statistic is a U-statistic with a trimming parameter to remove the bias caused by weak dependence. Under the framework of nonlinear causal processes, we show the asymptotic normality of our U-statistic with the convergence rate dependent upon the order of the Frobenius norm of the long run variance matrix. The self-normalized test statistic is then formulated on the basis of recursive subsampled U-statistic and its limiting null distribution is shown to be a functional of time-changed Brownian motion, which differs from the pivotal limit used in the low dimensional setting. An interesting phenomenon associated with self-normalization is that it works in the high dimensional context even if the convergence rate is unknown. We also present applications to testing for bandedness in covariance matrix and testing for white noise for high dimensional stationary time series and compare the finite sample performance with existing methods in simulation studies. At the root of our theoretical argument, we extend the martingale approximation to the high dimensional setting, which could be of independent theoretical interest.

P-26

Xing, Lin

University of Notre Dame

Title

A metric geometry approach to the weight prediction problem

Many real data can be represented as a hypergraph which is a pair consisting of two sets, one of which is the set of data points, and the other represents higher order relations among data point s, called the set of hyperedges. A standard example of a hypergraph data is a collaboration network in which the set of data points are mathematicians, and each hyperedge can be formed out of a group of mathematician having a joint publication. In this work, we propose a geometric approach to studying problems related to hypergraph data with emphasis on weight prediction problem which is one of the main problems in machine learning. We introduce several classes of metrics on the set of data points, and also on the set of hyperedges, to make these sets become metric spaces. Using the structures of metric spaces on such hypergraph data, we propose modified k nearest neighbors methods which apply to the weight prediction on data points or hyperedges of hypergraph data. We illustrate the techniques in our work by showing experimental analysis on several data.

P-27

Yang, Tiantian

Clemson University

Title

A Comparison of Several Missing Data Imputation Techniques for Analyzing Different Types of Missingness

Missing data is common in real world studies and can create issues in statistical inference. Discarding cases that have missing values or replacing the missing values with inappropriate imputation techniques can both result in biased estimates. Many imputation techniques have assumptions that can be hard to assess in practice, therefore the actual appropriate imputation technique is often unclear. To address this issue, a factorial simulation design was developed to measure the impact of certain data set characteristics on the validity of several popular imputation techniques. The factors in the study were missing mechanisms, missing data percentages, and missing data methods. The evaluation included parameter estimates, bias, confidence interval coverage and width for the parameters of interest. Simulation results suggest all three factors have significant impact on the quality of the estimation. Additional factors such as number of variables, type of variables, and correlations of data are being incorporated in the simulation. Finally, real data examples are discussed to illustrate the applicability of different missing data imputation methods.

P-28

Yao, Yaqiong

University of Connecticut

Title

Optimal two-stage adaptive subsampling design for softmax regression

For massive datasets, statistical analysis using the full data can be extremely time demanding and subsamples are often taken and analyzed according to available computing power. For this purpose, Wang et al. (2018) developed a novel two-stage subsampling design for logistic regression. We generalize this method to include the softmax regression. We derive the asymptotic distribution of the estimator obtained from subsamples that are drawn according to arbitrary subsampling probabilities, and then derive the optimal subsampling probabilities that minimize the asymptotic variance-covariance matrix under the A-optimality and the L-optimality criteria. The optimal subsampling probabilities involves unknown parameters, so we adopt the idea of optimal adaptive design and use a small subsample to obtain pilot estimators. We also consider Poisson subsampling for its higher computational and estimation efficiency. We provide simulation and real data examples to demonstrate the performance of our algorithm.

P-29

Yuu, Elizabeth

Robert Koch Institute

Title

Quantifying microbial dark matter using generalized linear models and its impact on metagenome analyses

We previously introduced DiTASiC (Differential Taxa Abundance including Similarity Correction) to address shared read ambiguity resolution based on a regularized, generalized linear model (GLM) framework. This, and other similar approaches, does not address the remaining unmapped reads, or “microbial dark matter”. We extend our approach by analyzing sub mappings with different error-tolerance and integrating dark matter variables in an effort to create a more appropriate GLM. This new idea has the potential to provide more accurate estimates of taxa abundance and inherent variation; this in turn can lead to improved taxa quantification and differential testing.

P-30

Zang, Xiao

The Ohio State University

Title

Clustering Functional Data using Fisher-Rao Metric

Functional data are infinite dimensional and histograms are no longer applicable for discovering multimodality. Also, due to misalignment pointwise summaries like cross-sectional means and standard deviations are unable to faithfully describe the typical form and variability. Therefore, we developed a functional k-means clustering algorithm that uses Fisher-Rao metric as the distance measure, which simultaneously aligns functions within each cluster using a flexible family of domain warping, with a BIC criterion to choose the optimal number of clusters. In simulation studies our method out-performed Sangalli et al. ‘s method in terms of clustering accuracy. Real-world applications will be illustrated on several datasets.

P-31

Zhang, Han

The University of Alabama

Title

Aggregate Estimation in Sufficient Dimension Reduction for Binary Responses

Many successful inverse regression based sufficient dimension reduction methods have been developed since Sliced Inverse Regression was introduced. However, most of them target on problems with continuous responses. Although some claim to be applicable to both categorical and numerical responses, they may work poorly for binary classification problem since the binary responses provide very limited information. In this paper, we put forward an aggregate estimation method for binary responses, which involves a decomposition step and a combination step. As an ensemble learning approach, aggregate estimation is proved to effectively decrease the bias and exhaustively estimate the dimension reduction space.

P-32

Zhang, Yangfan

University of Illinois Urbana-Champaign

Title

High Dimensional Regression Change Point Detection

In this article, we propose a method to detect possible change point in linear regression. We construct a U-statistic based statistic with self-normalization, and derive its null distribution, which tends out to be pivotal. Our method can allow intercept in the model while detecting the change point in the slope, which is more general than the existing literature. Under certain conditions, the power is also roughly derived. The performances are reasonably good for both size and power. Furthermore, our method can be combined with wild binary segmentation to deal with multiple change point case and estimate the locations.

P-33

Zhang, Yingying

The University of Alabama

Title

On model-based clustering of time-dependent categorical sequences

Clustering categorical sequences is an important problem that arises in many fields such as medicine, sociology, and economics. It is a challenging task due to the fact that there is a lack of techniques for clustering categorical data as the majority of traditional clustering procedures are designed for handling quantitative observations. Situations with categorical data being related to time are even more troublesome. We employ a mixture of first order Markov models with transition probabilities being functions of time to develop a new approach for clustering categorical time-related

data. The proposed methodology is illustrated on synthetic data and applied to a real-life data set containing sequences of life events for respondents participating in the British Household Panel Survey.

P-34

Zhu, Changbo

University of Illinois at Urbana-Champaign

Title

Interpoint Distance Based Two Sample Tests in High Dimension

In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensional distributions correspond to the same marginal distributions but differ in other aspects of the distributions. The tests based on energy distance and maximum mean discrepancy are mainly targeting the differences between marginal means and variances, whereas the test based on L1-distance can capture the difference in marginal distributions. Our theory sheds new light on the limitation of inter-point distance based tests, the impact of different distance metrics, and the behavior of permutation tests in high dimension. Some simulation results and a real data illustration are also presented to corroborate our theoretical findings.