Modern Semiparametric Methods in Action
Southern Regional Council on Statistics
Summer Research Conference
June 8-11, 2008
  • Program
  • Lodging
  • Registration
  • Student Awards
  • About Charleston
  • Contact us
  • Home
  • Program
  • Lodging
  • Registration
  • Awards & Funding
  • Contributed Posters
  • About SRCOS
  • Activities
  • About Charleston
  • Charleston Photo Gallery

SRCOS 2008 Invited Program

SRCOS 2008 Summer Research Conference
Modern Semiparametric Methods in Action
Final Invited Program

All sessions will be held in the Historic District Embassy Suites, Charleston, SC

Sunday, June 8
5:00 pm-8:00 pm Registration and Mixer (hotel reception area)

Monday, June 9
7:30 am-1:00 pm Registration
7:00 am-8:30 am Light breakfast available
    (note:  breakfast also included with rooms at Embassy Suites and Hampton Inn)

8:15 am-8:30 am Opening remarks by Dr. Perry Halushka,
                                   Dean, College of Graduate Studies, MUSC

8:30 am-9:10 am Keynote Speaker 1: Raymond J. Carroll, Texas A&M University
          Chair: Elizabeth Slate, Medical University of South Carolina

9:10 a.m-10:00 am SESSION I: Semiparametric Inference in Penalized Splines and Missing Data
          Chair: Stacia DeSantis, Medical University of South Carolina
          9:10 am-9:35 am Uschi Mueller Harknett, Texas A&M University
          9:35 am-10:00 am Ciprian Crainiceanu, Johns Hopkins University

10:00 am-10:30 am Break with coffee and snacks

10:30 am-11:10 am Keynote Speaker 2: Malay Ghosh, University of Florida

11:10 am-12:00 pm SESSION II: Nonparametric Curve Estimation and Analysis of Projective shapes
          Chair: Jane Harvill, Baylor University
          11:10 am-11:35 am David Hitchcock, University of South Carolina
          11:35 am-12:00 pm Victor Patrangenaru, Florida State University

Afternoon hours dedicated to interaction among faculty and junior researchers/PhD students.

7:00 pm-7:50 pm SESSION III: Semiparametric Inference in Hidden Markov/Measurement Error Models
          Chair: Mulugeta Gebregziabher, Medical University of South Carolina
          7:00 pm-7:25 pm Samiran Sinha, Texas A&M University
          7:25 pm-7:50 pm Subharup Guha, University of Missouri

7:50 pm-8:00 pm Break

8:00 pm-9.30 pm SESSION IV: Teaching Semiparametric Methods (Panel)
           Moderator: Thomas A. Louis, Johns Hopkins University
             Panelists: Michael Kosorok, University of North Carolina
                              Xihong Lin, Harvard University
                              Pranab K. Sen, University of North Carolina
                              Jayaram Sethuraman, Florida State University

Tuesday, June 10
7:30 am-1:00 pm Registration
7:00 am-8:30 am Light breakfast available
   (note: breakfast also included with rooms at Embassy Suites and Hampton Inn)

8:00 am-8:40 am Keynote Speaker 3: Pranab K. Sen, University of North Carolina
          Chair: Robert F. Woolson, Medical University of South Carolina

8:40 am-10:00 am SESSION V: Semiparametric Inference in Biomedical Sciences
         Chair: Bani Mallick, Texas A&M University
         8:40 am-9:05 am Yehua Li, University of Georgia
         9:05 am-9:30 am Erning Li, Texas A&M University
         9:30 am-9:55 am Gary Rosner, MD Anderson Cancer Center

10:00 am-10:20 am Break with coffee and snacks

10:20 am-11:00 am Keynote Speaker 4: Xihong Lin, Harvard University

11:00 am-12:15 pm SESSION VI: Bayesian Semiparametric Inference in Cancer Research and Functional Data
         Chair: Elizabeth G. Hill, Medical University of South Carolina
         11:00 am-11:25 am Veera Baladandyuthpani, MD Anderson Cancer Center
         11:25 am-11:50 am Kim-Anh Do, MD Anderson Cancer Center
         11:50 pm-12:15 pm Jeffrey Morris, MD Anderson Cancer Center

12:15 pm-2:00 pm Stats & ωRaps Poster session and Box Lunch

Afternoon hours dedicated to interaction among faculty and junior researchers/PhD students.

7:00 pm-9:30 pm Conference Banquet
       Chair: Elizabeth Slate, Medical University of South Carolina
       Awards Ceremony: Graduate Student and Service Awards
       Keynote Speaker 5: Thomas A. Louis, Johns Hopkins University

Wednesday, June 11
7:30 am-11:00 am Registration
7:00 am-8:30 am  Light breakfast available
  (note: breakfast also included with rooms at Embassy Suites and Hampton Inn)

8:00 am-8:40 am Keynote Speaker 6: Michael Kosorok, University of North Carolina

8:40 am-10:00 am SESSION VII: Kernel estimation, Partial smoothing splines and Spatial boundary detection
         Chair: Annie Lin, Duke University
         8:40 am-9:05 am Hao Zhang, North Carolina State University
         9:05 am-9:30 am Arthur Berg, University of Florida
         9:30 am-9:55 am Sudipto Banerjee, University of Minnesota

10:00 am-10:30 am Break with coffee and snacks

10:30 am-11.45 pm SESSION VIII: Semiparametric Inference in Survival Analysis
         Chair: Debajyoti Sinha, Florida State University
         10:30 am-10:55 am Bo Cai, University of South Carolina
         10:55 am-11:20 am Jason Fine, University of North Carolina
         11:20 am-11:45 pm Daniel Scharfstein, Johns Hopkins University

11:50 am-12:00 pm Closing remarks by Dr. Barbara Tilley,
                   Chair, Dept. of Biostatistics, Bioinformatics and Epidemiology, MUSC

Adjourn



INVITED SPEAKER ABSTRACTS

Keynote Speaker 1: Raymond J. Carroll, Texas A&M University

Semiparametric methods for gene-environment case-control studies

We consider population-based case-control studies of gene and environment interactions using prospective logistic regression models. In many cases it is reasonable to assume that genotype and environment are independent in the population, possibly conditional on covariates to account for population stratification. We develop a modern semiparametric likelihood approach for this problem, showing that it leads to much more efficient estimates of gene-environment interaction parameters and then gene main effect than the standard approach: decreases of standard errors for the former are often by factors of 50% and more. Multiple extensions are discussed, with applications to an important data set involving BRCA 1/2. The most important extensions are to the problems of missing genotype data (our example), unphased haplotype data (another example) and methods for gaining robustness of inference.


SESSION I: Semiparametric Inference in Penalized Splines and Missing Data

Uschi Mueller Harknett, Texas A&M University

Estimating expectations in nonlinear regression with missing responses

We consider regression models with responses that are allowed to be missing at random. The models are semiparametric in the following sense: we assume a parametric (linear or nonlinear) model for the regression function but no parametric form for the distributions of the variables; we only assume that the errors have mean zero and are independent of the covariates. For estimating general expectations of functions of covariate and response we use an easy-to-implement weighted imputation estimator. The estimator uses all model constraints. Provided an efficient estimator for the model parameter is used, it is therefore efficient in the sense of Hajek and Le Cam. Our results are useful in studying estimation of special cases (e.g. estimation of the mean response, special regression functions): it turns out that the proposed estimator is often quite simple. We discuss some examples of this, and some related questions, and illustrate our results with simulations.


Ciprian Crainiceanu, Johns Hopkins University

Fast Adaptive Penalized Splines

We propose a numerically simple method for locally adaptive smoothing. The heterogeneous regression function is modeled as a penalized spline with a varying smoothing parameter modeled as another penalized spline. This is formulated as a hierarchical mixed model, with spline coefficients following zero mean normal distribution with a smooth variance structure. The modeling framework is similar to that presented in earlier articles. In contrast to those articles, we use the Laplace approximation of the marginal likelihood for estimation. This method is numerically simple and fast. The idea is extended to spatial and non-normal response smoothing.


Keynote Speaker 2: Malay Ghosh, University of Florida

Semiparametric Stein Estimators

Stein estimators and their empirical Bayes interpretation are usually associated with a normal theory model. We show in this paper how we can motivate such estimators with only certain moment assumptions. We provide also certain asymptotic formulas for their mean squared errors in some special cases.


SESSION II: Nonparametric Curve Estimation and Analysis of Projective shapes

David Hitchcock
, University of South Carolina

Bandwidth-based Inference: A Review and Ideas for New Directions

Nonparametric curve estimation is an extremely common statistical procedure. While its primary purpose has been exploratory, some advances in inference have been made. We review inferential tests that make fundamental use of a key element of nonparametric smoothing, the bandwidth, to determine the significance of certain features. A major focus is on two important problems that have been tackled using bandwidth-based inference: testing for the multimodality of a density and testing for the monotonicity of a regression curve. Possible future directions in bandwidth-based inference are discussed. We present some preliminary research ideas for a bandwidth-based detection scheme for influential observations in nonparametric regression. An example involving presidential election data is presented.


Victor Patrangenaru1, Xiuwen Liu2, Michael Crane1
1Department of Statistics, Florida State University
2Department of Computer Science, Florida State University

Nonparametric Analysis of Projective Shapes with Applications to Scene Recognition

There only very few examples of statistical shape data analysis in 3D, since 3D data registration is often inaccessible, or prohibitively costly. A key result in computer vision states that generically, a 3D configuration of points can be retrieved from corresponding configurations in a pair of camera images, up to a projective transformation. Consequently, the projective shape of a 3D configuration can be retrieved from two of its planar views, and a projective shape analysis can be pursued from a sample of images. Using large sample and nonparametric bootstrap for extrinsic means on manifolds, we design an estimation and testing methodology for the mean projective shape of a 3D configuration from its 2D camera images, which is used 3D scene identification. This novel approach makes 3D shape data analysis highly accessible to statisticians.


SESSION III: Semiparametric Inference in Hidden Markov/Measurement Error Models

Samiran Sinha
, Texas A&M University

Semiparametric Bayesian Analysis of Nutritional Epidemiology Data in Presence of Measurement Error

The present paper proposes a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. The goal of this research is to estimate the nonparametric form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by a nutritional epidemiological data we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data contain information on the surrogate variable and repeated measurements of an instrumental variable of the true exposure. We develop a completely flexible Bayesian method where not only the relationship between the disease and exposure variable is treated non-parametrically but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparameteric functions are modeled simultaneously via the B-splines approach. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distribution which makes the method completely robust towards any distribution of the exposure. We apply the proposed method on the motivating data set from NIH-AARP diet and health study. Finally, a small scale simulation study is performed to assess the performance of the proposed method.

Subharup Guha, University of Missouri

Bayesian Hidden Markov Modeling of Array CGH Data

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.


Keynote Speaker 3: Pranab K. Sen, University of North Carolina

Beyond Parametrics Perspectives and Prospectives in Interdisciplinary Research

Fueled by spectacular advances in information and bio-technology, interdisciplinary research is on the cutting edge; yet, unlike in physical sciences, mathematical determinacy does not prevail in (molecular) biology and bioinformatics, in general. Stochastic evolutionary forces undermine the flow of information emerging at a furious rate and in astounding details. There is a genuine need for statistical appraisal at every stage. Standard statistical tools, mostly having genesis in the parametrics, are often of very limited scope in such, typically, abnormally high-dimension and immensely complex datasets cropping up in incredible pace. Beyond parametrics approaches are thereby emerging with greater flexibility and scope, albeit they need to cope with computational statistics (algorithms) and knowledge discovery and data mining approaches, generally aimed to provide ready-made assessments, albeit often lacking statistical support (rationality and interpretability). The plight of semiparametrics in this context is thoroughly appraised along with some illustrative examples.



SESSION V
: Semiparametric Inference in Biomedical Sciences

Yehua Li, University of Georgia

Functional Sliced Inverse Regression in Lipoprotein Profile Data

Lipoprotein concentration in human serum is an important risk factor for cardiovascular heart disease. Different types of lipoprotein in a serum sample can be separated by a centrifugation treatment, according to their density differences. A lipoprotein profile for a patient is obtained by taking the image of his centrifuged serum sample, after an application of a lipophilic stain. In this work, we investigate functional regression models to predict the cholesterol levels from the lipoprotein profile curves. Our model is an extension of the multivariate dimension reduction theory to the functional data in the sense that the response depends on the functional predictor only through a few functional projections. The coefficient functions defining such projections are called Effective Dimension Reduction (EDR) direction, and the corresponding functional space is called the EDR space. We use a sliced inverse regression method to estimate the EDR space, and we generalize Li's (1991) chi square test to decide the dimension of the EDR space. When the functional predictor is non-Gaussion, an adjusted test is proposed. The procedures are examined through simulation studies and applied to the lipoprotein profile data.


Erning Li, Texas A&M University

Analysis of a Primary Endpoint with Longitudinal Covariate Processes

The relationship between a primary endpoint and longitudinal processes is often of interest in medical and public health research. Joint models that represent the association through shared dependence of the primary and longitudinal data on random effects are increasingly popular. Naive implementation by imputing subject-specific effects from individual regression fits yields biased inference, and several methods for reducing this bias have been proposed. These require a parametric (normality) assumption on the random effects, which may be unrealistic. Moreover, the existing methods routinely assume independent within-subject measurement errors in the longitudinal covariate processes. We propose conditional estimation procedures that require neither a distributional or covariance structural assumption on random-effects nor an independence assumption on within-subject measurement errors. The new procedures readily cover scenarios that have multivariate longitudinal covariate processes. We evaluate the performance of the new estimators through simulations and analysis of data from a hypertension study. I will also provide an extension of the methodology to generalized functional linear models.


Gary Rosner, MD Anderson Cancer Center

A Bayesian Model for Repeatedly-Repeated Binary Data

In biomedical applications, data often arise that have a repeatedly-repeated structure. For example, cancer patients may receive multiple 3-week courses of chemotherapy with blood counts measured a few times a week during each cycle. In this case, we have repeated continuous measurements within repeating courses of therapy. A non-biomedical example is baseball, in which players appear at bat several times during a game, with multiple games in a season, and multiple seasons during a player's career.

In this talk, I will focus on repeatedly-repeated binary data and a class of models for analyzing these data. The models include a Bayesian nonparametric prior and partial exchangeability within a Markov structure. I discuss two examples. This is joint work with Peter Mueller and Fernando Quintana


Keynote Speaker 4: Xihong Lin, Harvard University

Nonparametric and Semiparametric Regression with Missing Outcomes Using Weighted Kernel and Profile Estimating Equations

We consider nonparametric and semiparametric regression when an outcome is missing at random (MAR). We first consider nonparametric regression of a scalar outcome on a covariate under MAR. We show that nonparametric kernel regression estimation based only on complete cases is generally inconsistent. We propose inverse probability weighted (IPW) kernel estimating equations and a class of augmented inverse probability weighted (AIPW) kernel estimating equations for nonparametric regression under MAR. Both approaches do not require specification of a parametric model for the error distribution and the estimators are consistent when the probability that a sampling unit is observed, i.e., the selection probability or the response probability, is known by design or is estimated using a correctly specified model. We show that the AIPW kernel estimator is double-robust in that it yields valid inference if either the model for the response probability is correctly specified or a model for the conditional mean of the outcome given covariates and auxiliary variables is correctly specified. In addition, we argue that adequate choices of the augumented term in the AIPW kernel estimating equation help increase the efficiency the estimator of the nonparametric regression function. We study the asymptotic properties of the proposed IPW and AIPW kernel estimators. We extend the results to semiparametric regression under MAR where one covariate effect is modeled nonparametrically and some covariate effects are modeled parametrically. IPW and AIPW profile-kernel estimating equations are proposed to estimate the parametric component. Asymptotic semiparametric efficiency is studied. We perform simulations to evaluate their finite sample performance, and apply the proposed methods to the analysis of the AIDS Costs and Services Utilization Survey data.


SESSION VI
: Bayesian Semiparametric Inference in Cancer Research and Functional Data

Veera Baladandyuthpani
, MD Anderson Cancer Center

Bayesian Joint Modeling Method for Survival Analysis with Curve Predictors

In many biomedical studies one observes data sampled on a grid of space or time, termed functional data. This talk proposes a joint modeling scheme for survival and functional data motivated by real oncology experiments. The joint modeling framework allows synergistic benefit between the regression of functional predictors and the survival endpoint. Additional advantage is that our model handles survival analysis that is rather complex involving irregularly and sparsely sampled curves accounting for correlation within each curve.


Kim-Anh Do, MD Anderson Cancer Center

Bayesian semi-parametric models with Applications to Translational Cancer Research

Collaborators: Yuan Ji, Peter Mueller, Luis Gonzalo, Renata Pasqualini, Wadih Arap

Early detection is critical in disease control and prevention. Biomarkers provide valuable information about the status of a cell at any given time point. Biomarker research has benefited from recent advances in technologies such as gene expression microarrays, and more recently, proteomics. The long term translational research goal is that if drugs can be targeted to specific tissues in the body, then dosage can be altered to achieve the desired effect while minimizing side effects such as toxicity. Motivated by specific problems involving such high throughput data, I have developed computer-intensive statistical methods based on nonparametric and semiparametric mixture model assumptions for real-time analysis in the context of biomarker discovery, in particular analyzing experiments based on phage peptides. Novel statistical methodology development will be highlighted with direct applications to cancer research challenges that address our long term translational goal.


Jeffrey Morris, MD Anderson Cancer Center

Bayesian Inference for High Dimensional Functional and Image Data using Functional Mixed Models

High dimensional, irregular functional data are increasingly encountered in scientific research. For example, MALDI-MS yields proteomics data consisting of one-dimensional spectra with many peaks, array CGH or SNP chip arrays yield one-dimensional functions of copy number information along the genome, 2D gel electrophoresis and LC-MS yield two-dimensional images with spots that correspond to peptides present in the sample, and fMRI yields four-dimensional data consisting of three-dimensional brain images observed over a sequence of time points on a fine grid. In this talk, I will discuss how to identify regions of the functions/images that are related to factors of interest using Bayesian wavelet-based functional mixed models. The flexibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while borrowing strength between observations in all dimensions. I will demonstrate how to identify regions of the functions that are significantly associated with factors of interest, in a way that takes both statistical and practical significance into account and controls the Bayesian false discovery rate to a pre-specified level. These methods will be applied to a series of functional data sets.


Keynote Speaker 5: Thomas A. Louis, Johns Hopkins University

Some Semi-Digestible Remarks on Semi-Parametric Models

Semi-parametric modeling burgeons, enabled by modern computing and energized by impressive progress in developing, evaluating and disseminating new methods. Credit for much of this progress goes to the presenters in this conference. Semi-parametric models are not new. Least-squares started as an "algorithm," a non-parametric estimating equation, and evolved to parametric maximum likelihood for the Gaussian distribution. Similarly, the Mann-Whitney/Wilcoxon procedures resulted from "this seems reasonable" creativity. Later, they achieved the status of uniformly most powerful rank procedures for comparing locations in logistic distributions. In this centennial year of Student's t-test article, we must note that by using the same data for estimating the mean and producing the CI, t-procedures are possibly the first, validated examples of sample reuse.

In addition to the foregoing, currently available semi-parametric procedures include, but are by no means limited to; the sign test, permutation and randomization tests/CIs, the jackknife, M-estimates, influence functions, the bootstrap, cross-validation, CART, Random Forests, loess, splines, GAMs, wavelets, Lasso, SVMs, functional decompositions, the proportional hazards, accelerated failure and exponential tilt models, and semi-parametric Bayesian methods.

Semi-parametric models are very attractive because they add considerable flexibility to an analysis by allowing the data to play a dominant role with estimates and inferences stabilized by parametric controls. By encouraging and facilitating broadening from a focus on means and variances, they encourage studies that target the most relevant scientific and policy goals. However, use of semi-parametric models is not a panacea; expertise and care are needed.

I will discuss semi-parametric models and modeling. Having already revealed much of what I'll cover, the only remaining excitement may be whether I'll use a projector or present a capella.



Keynote Speaker 6: Michael Kosorok, University of North Carolina

Theory Versus Applications: Tension on the Cutting Edge

Interesting and challenging data structures arising in applications, such as brain imaging and genomics research, seem to drive the majority of cutting edge research in biostatistics. Theoretical work, on the other hand, is frequently viewed as counter-productive. In spite of this shift towards applications, we seem to be working hard to keep up with the computationally complex data analysis methods being continually developed by computer scientists and other non-statisticians. Are we loosing ground as a discipline in the quest for innovation and relevance in the modern scientific world? The answer to this question, in my opinion, is no. However, we are poised on a dangerous threshold resulting from a decline in emphasis on the theoretical underpinnings of our research. To correct this, we need to increase the wise use of theoretical techniques, such as empirical processes and semiparametric inference, to verify emerging methods. Much more importantly, however, we need to use disciplined theoretical thinking in the development of new methodologies. Such an approach can give us a disciplinary edge in creating techniques for the discovery of reproducible, scientific results. The disciplines of statistics and biostatistics have long benefited by the rich tension between theoretical thinking and applied thinking, and it is time to readjust our balance.



SESSION VII: Kernel estimation, Partial smoothing splines and Spatial boundary detection

Hao Zhang
, North Carolina State University

Model Selection for Partial Smoothing Splines

We introduce a unified approach for simultaneous variable selection and model estimation in partial smoothing spline models. Theoretical properties of the estimators are carefully studied. Firstly, under general regularity conditions we show that the estimator for the parametric components is consistent and asymptotically normal, and even more interestingly, the nonparametric function estimator is shown to be able to achieve the optimal nonparametric rate at the same time. Secondly, when tuning parameters are properly chosen, the procedure has the desired oracle properties for variable selection. Another advantage of the new procedure is its easiness for computation. We suggest a path-finding type algorithm to compute the estimators, which also expedites the selection of tuning parameters. Simulated and real examples are used to illustrate performance of the approach in various settings.


Arthur Berg, University of Florida

Nonparametric Estimation at a Nearly Parametric Rate

Infinite-order kernels are utilized in the contexts of hazard function estimation and polyspectral density estimation, together with a specially tailored data-based bandwidth selection procedure, to provide minimally biased estimators. Under certain assumptions, a nearly square-root n mean square error convergence rate is achieved with these estimators. Additionally, improvement gained in terms of deficiency [Hodges and Lehmann, 1970] in smoothing the empirical distribution function and the Kaplan-Meier estimator will also be detailed.


Sudipto Banerjee, University of Minnesota
Co-authors: Pei Li and Marshall McBean

Bayesian hierarchical models for detecting boundaries of rapid change in areally referenced spatial data.

Statistical models for spatial data that are areally or regionally-referenced are increasingly used for smoothing maps revealing spatial trends. Such models utilize the adjacency or neighbourhood relationships between different regions. Interest often resides on the formal identification of "edges" or "boundaries" on the map that represent significant differences between the adjacent regions. For example, in a disease map the boundary separating two adjacent regions with dramatically different mortality or morbidity rates could form a "difference boundary". This problem is often referred to as "wombling", or "areal wombling" to be more precise, named after a foundational paper by Womble (Science, 1951), who discussed the importance of detecting regions of rapid changes as a scientific enterprise. We formulate this problem within the framework of hierarchical spatial models. We discuss some simple parametric approaches point out their drawbacks and propose some novel non-parametric modelling approaches using the Dirichlet process and some spatial modifications thereof. In particular, we draw analogies with, and draw distinctions from, Bayesian clustering and multiple comparison problems and illustrate our approach with the Minnesota Pneumonia and Influenza diagnosis dataset.


SESSION VIII: Semiparametric Inference in Survival Analysis

Bo Cai
, University of South Carolina

Bayesian Semiparametric Modeling using Mixtures for Stratified Survival Data

Survival analysis based on mixture models have certain advantages over classical parametric approaches because mixture models provide a convenient and flexible semiparametric framework to model shapes of unknown distributions. We describe a generalized mixture model based on B-splines for modeling monotone function such as the integrated baseline hazard function and covariate link in a proportional hazard model, which includes beta mixtures by Diaconis and Ylvisaker (1985) and triangular mixtures by Perron and Mengersen (2001). A Bayesian hierarchical semiparametric proportional hazard model is developed by using such mixtures for fitting stratified survival data. Data from a multicenter AIDS clinical trial are used for illustration and comparison of hierarchical proportional hazards regression models based on different mixtures.


Jason Fine, University of North Carolina

Analysis of Recurrent Episodes Data: the Length-Frequency Tradeoff

I consider a special type of recurrent event data, "recurrent episode data" in which when a event occurs it last for a random length of time. Recurrent episode data arise frequently in studies of episodic illness. A naive recurrent event analysis disregards the length of each episode, which may contain important information about the severity of the disease, as well as the associated medical cost and quality of life. Analysis of recurrent episode data can be further complicated if the effects of treatment and other progrnostic factors are not constant over the observation period, as occurs when the covariate effects vary across episodes. I will review existing methods applied to recurrent episode data and approach the length-frequency tradeoff using recently developed temporal process regression. Novel endpoints are constructed which summary both episode length and frequency. Time varying coefficient models are proposed, which capture time varying covariate effects. New and existing methods are compared on data from a clinical trial to assess the efficacy of a treatment for cystic fibrosis patients experiencing multiple pulmonary exacerbations.


Daniel Scharfstein, Johns Hopkins University

Causal Inference in Observational Studies with Two-Phase Sampling

In this talk, we consider estimation of the causal effect of a treatment on an outcome from observational data collected in two phases. In the first phase, a simple random sample of individuals are drawn from a population. On these individuals, information is obtained on treatment, outcome, and a few low-dimensional confounders. These individuals are then stratified according to these factors. In the second phase, a random sub-sample of individuals are draw from each stratum, with known, stratum-specific selection probabilities. On these individuals, a rich set of confounding factors are collected. In this setting, we introduce four estimators: (1) simple inverse weighted, (2) locally efficient, (3) doubly robust, and (4) one that is guaranteed to be more efficient than the simple inverse weighted. We evaluate the finite-sample performance of these estimators in a simulation study. We also use our methodology to estimate the causal effect of trauma care on in-hospital mortality using data from the National Study of Cost and Outcomes of Trauma.




photo photo photo photo
Copyright © 2007 DBBE MUSC. All rights reserved.